2 E6885 Network Science Lecture 10: Analysis of Network Flow

Size: px
Start display at page:

Download "2 E6885 Network Science Lecture 10: Analysis of Network Flow"

Transcription

1 E 6885 Topics in Signal Processing -- Network Science E6885 Network Science Lecture 1: Analysis of Network Flow Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University November 21st, 211 Course Structure Class Date 9/12/11 9/19/11 9/26/11 1/3/11 1/1/11 1/17/11 1/24/11 1/31/11 11/14/11 11/21/11 11/28/11 12/5/11 12/12/11 12/19/11 Class Number Topics Covered Overview Social, Information, and Cognitive Network Analysis Network Representations and Characteristics Network Partitioning, Clustering, and Use Case Network Visualization, Sampling and Estimation Network Modeling Network Topology Inference Dynamic Networks -- I Dynamic Networks -- II Final Project Proposal Presentation Analysis of Network Flow Graphical Models Cognitive Networks and Economy Issues in Networks Large-Scale Network Processing System Final Project Presentation 2 E6885 Network Science Lecture 1: Analysis of Network Flow 1

2 Gravity Models Gravity models are a class of models, for describing aggregate levels of interaction among the people of different populations. Traditionally used in: Geography Economics Sociology Hydrology Analysis of Computer Network Traffic For instance, New York > Los Angeles = 2,124,377 * 15,781,273 / (2462 miles)^2 = 52.4 million. El Paso (Texas) <-> Tucson (Arizona) = 73,127 * 79,755 / (263)^2 = 8. million El Paso (Texas) <-> Los Angeles = 21. million Predict migration and traffic flow 3 E6885 Network Science Lecture 1: Analysis of Network Flow Common Gravity Model The general gravity model specifies that the traffic flows Zij be in the form of counts, with independent Poisson distributions and mean functions of the form: E( Z ) = h ( i) h ( j) h ( c ) ij O D S ij Positive function of the origin i Positive function of the origin j Separation attributes: distance, cost, etc. Some commonly used (standard) forms: h ( i) O α = ( π O, i) hd ( j) = ( π D, j) β h S ij ij ( c ) = ( c ) or θ T h ( c ) = exp( θ c ) S ij ij 4 E6885 Network Science Lecture 1: Analysis of Network Flow 2

3 Example: Austrian call data Phone traffic between 32 telecommunication districts in Austria in Call flow volume versus each of origin Gross Regional Product (GRP), destination GRP, and distance. Linear regression (dotted), and a nonparametric smoother (solid) 5 E6885 Network Science Lecture 1: Analysis of Network Flow Inference for Gravity Models Focusing on this model (general gravity model): T logµ = α + β + θ c ij i j ij Generic iteratively re-weighted least-squares method can be used. 6 E6885 Network Science Lecture 1: Analysis of Network Flow 3

4 Example: Gravity Model Accuracy of estimates of traffic volume made by the standard (left, in blue) and general (right, in green) gravity models for the Austrian call data. The standard model tends to over-estimate in somewhat greater frequency than the general model, particularly for medium- and lowvolume flows. The relative error decreases with volume. 7 E6885 Network Science Lecture 1: Analysis of Network Flow Relative Prediction Error of the Gravity Models 8 E6885 Network Science Lecture 1: Analysis of Network Flow 4

5 Traffic Matrix Estimation Sometimes it may not easy to monitoring the flow volumes of pairs. Sensors are placed in the entrances to on- and offramps, such as in highway road networks. We are then facing a problem of predicting the Z ij, or alternatively, estimating their means from the observed link counts: X= ( X ) e e E We are seeking to invert the routing matrix B in the relation X=BZ, and B typically has many fewer rows (i.e., network links) than columns (i.e., origin-destination pairs). 9 E6885 Network Science Lecture 1: Analysis of Network Flow Simple network illustrating the traffic matrix estimation problem. 1 E6885 Network Science Lecture 1: Analysis of Network Flow 5

6 Static Methods Methods based on Least-Squares and Gaussian Models. A simple but commonly adopted model for the link count X is one of the form: X= Bµ + ε errors expected flow volumes In general, µ is not estimable in this model, under certain conditions the expected origin and destination volumes µ ι+ and µ +j are in face estimable. 11 E6885 Network Science Lecture 1: Analysis of Network Flow Static Method cont d Robillard proposed a gravity model for the expected flow volumes (1975). Unfortunately, it has been observed that in practice gravity models often fit too poorly to produce good estimates. However, in some situations we have available some initial set of original destination flow volume measurements. We might use these measurements, rathe rthan a gravity model, to suitably constrain our estimate. Cascetta proposed a method for doing so based on generalized leastsquares (1984). 12 E6885 Network Science Lecture 1: Analysis of Network Flow 6

7 Dynamic Methods Dynamic methods of traffic matrix estimation are designed for estimating the traffic at all time periods or sequentially. The dynamic methods proposed to date are predominantly based on principles of least squares. The majority of the methods require that the length of a typical trip time, from any given origin to any given destination, be substantially shorter than the length of each time interval during which measurements were taken. This assumption has the advantage of simplifying the nature of the routing information that much be encoded in the routing matricesb (t), since it effectively allows us to ignore the possibility that trips beginning in one time period actually end in a different time period. 13 E6885 Network Science Lecture 1: Analysis of Network Flow Dynamic Model Cont d Sequential methods of traffic matrix estimation can be viewed as variations or extensions of the concept of Kalman filtering. In the Kalman filtering approach, the time-varying relationship among the means and the link counts is modeled through a set of equations. The so-called Kalman filter is a sequential, recursive algorithm for determining, at each time t+1, an optimal estimate of the state µ(t+1), based on the observations, where this estimate is optimal in the sense that it is unbaised and has minimum variance among all unbiased estimators. 14 E6885 Network Science Lecture 1: Analysis of Network Flow 7

8 Example: Internet Traffic Matrix Estimation 15 E6885 Network Science Lecture 1: Analysis of Network Flow Comparison of bias-corrected tomogravity and Kalman filtering methods 16 E6885 Network Science Lecture 1: Analysis of Network Flow 8

9 Traffic volume detection Traffic flow volume predictions from bias-corrected tomogravity (left, in blue) and Kalman filtering (right, in red) methods, for four flows with volumes ranging from high (top) to low (bottom). Actual flow volumes are shown in yellow. 17 E6885 Network Science Lecture 1: Analysis of Network Flow Reduced Dimensionality Eigenvalues for the routing matrix corresponding to the Abilene network There are 11 paths traversing just 3 directed links. The large gap between the second and third eigenvalues, and the resulting knee in the spectrum are indicatinve of there being substantially more linear dependence among the columns of B than suggested by its nominal rank of 3. The overall decay in the spectrum of eigenvalues suggests that measurements on roughly five to ten paths, and perhaps as few as two paths, may be sufficient to recover useful information about path costs in this network 18 E6885 Network Science Lecture 1: Analysis of Network Flow 9

10 Visual representation of the first four eigenvectors 19 E6885 Network Science Lecture 1: Analysis of Network Flow Predicting Average Path Delay 2 E6885 Network Science Lecture 1: Analysis of Network Flow 1

11 Modeling and Predicting Personal Information Dissemination Behavior Xiaodan Song, Ching-Yung Lin, Belle Tseng, and Ming-Ting Sun -- KDD E6885 Network Science Lecture 1: Analysis of Network Flow Utilizing relational and temporal info provides more insight than pure content analysis What are a person s role in events? with whom do you discuss what is going on in the company? behavior evolution? interests, tastes? In a certain event, who played the most influential roles? who knew the information? how will a person or group of person response for future event? , Publications 22 E6885 Network Science Lecture 1: Analysis of Network Flow Time 11

12 Outline Motivation The Content-Time-Relation (CTR) model Experimental results Conclusions and ongoing work 23 E6885 Network Science Lecture 1: Analysis of Network Flow Motivation Goal Personal information management Modeling and Predicting Personal Behaviors Prior-Art Systems -- Linkedin, Orkut, Friendster, Yahoo! 36 o Share what matters to you Create your own place online Share photos Create a blog List your favorites Send a blast, and more Keep your friends and family close Control who sees what Share as much as you want, with whomever you want Tools for visually managing personal social networks However in current solutions Users need to manually input, update, and manage these networks Do not model or predict personal behaviors 24 E6885 Network Science Lecture 1: Analysis of Network Flow 12

13 Enron Dataset Enron Dataset A huge collection of real messages sent and received by employees of the Enron corporation. 493,391 s from 154 users within ( 99-11,196, 196,157, 1 272,875, 2 35,922) Unique messages 166,653, Intra- Enron messages 25, E6885 Network Science Lecture 1: Analysis of Network Flow Overview of CTR Model Input: s Applications Receiver Recommendation system From: sally.beck@enron.com To: shona.wilson@enron.com Subject: Re: timing of submitting information to Risk Controls Good memo - let me know if Information Extraction People: Jeff (154) Role: Sender/Receiver Content: Bag of Words Time: CTR Model Prediction Filtering CommunityNet Topic: California Energy Time: 2-21 α β γ θ A φ T ϕ Tm u z w N t r D : observations CTR Model S CTR model incorporates content, time and relations in a generative probabilistic way 26 E6885 Network Science Lecture 1: Analysis of Network Flow 13

14 Related Work (I) Social Network Analysis Static Social Network Analysis Small world: six degrees of separation [Milgram 1967] Introduce link analysis into information retrieval (Page rank [BP 98], Hits [K 98]) Mine communities from the web [Flake 22] Mining the network value of customers (Domingos et al. 21, Kempe et al. 23) Exponential Random Graph Model (ERGM [Wasserman et al.1996]) Dynamic Social Network Analysis Link prediction [Nowell and Kleinberg 23] Tracking network changes [Kubica et al. 22] Dynamic actor-oriented social network [Snijders 23] 27 E6885 Network Science Lecture 1: Analysis of Network Flow Related Work (II) Content Analysis Latent Semantic Analysis (LSA) [Deerwester et al. 199] Capture the semantic concepts of documents by mapping words into the latent semantic space which captures the possible synonym and polysemy of words Based on truncated SVD of document-term matrix: optimal least-square projection to reduce dimensionality Probabilistic LSA [Hofmann 1999] Statistical view of LSA Latent Dirichlet Allocation (LDA) [Blei et al. 23] A generative model which assigns Dirichlet priors to the class modeling at PLSA Assume a document is a mixture of topics Author-Topic model [Rosen-Zvi et al. 24] Try to recognize which part of the document is contributed by which co-author. A document with multiple authors is a mixture of the distributions associated with authors. Each author is associated with a multinomial distribution over topic; Each topic is associated with a multinomial distribution over words Author-Recipient-Topic model [McCallum et al. 25] Given the sender and the set of receivers of an , find senders have similar role in events 28 E6885 Network Science Lecture 1: Analysis of Network Flow terms documents LSA D X T = ~ S N x M N x K K x K K x M None of the previous models use temporal information and social/relational information 14

15 Our Contribution Assumption People tend to send s to different groups of people regarding different time periods Social Who knows who People influence each other Information flow Approach Identify context Who does one user communicate with regarding a given topic? Identify temporal evolution How do relations change over time? Content Time Who knows what Networks grow & decay Information diffusion Content-Time-Relation Model (CTR model) 29 E6885 Network Science Lecture 1: Analysis of Network Flow Content-Time-Relation Algorithm -- I Content-Relation (CR) Content topic classification Integrate social network model Combine content and social relation information with Dirichlet allocations, a causal Bayesian network and an Exponential Random Graph Social Network Model. [McCallum et. al, 25] Given the sender and the set of receivers of an 1. Pick a receiver 2. Get the probability of a topic given the sender and receiver 3. Get the probability a word given the topic α β θ A φ T CR model w : observations a: sender/author, z: topic, S: social network (Exponential Random Graph Model / p* model), D: document/ r: receivers, w: content words, N: Word set, T: Topic u z N r D S p* Given the sender of an 1. Get the probability of a topic given the sender 2. Get the probability of the receiver given the sender and the topic 3. Get the probability of a word given the topic 3 E6885 Network Science Lecture 1: Analysis of Network Flow 15

16 Content-Time-Relation Algorithm -- II Content-Time-Relation (CTR) Topic + time -> event Capture evolutionary information Integrate social network model Combine content, time and social relation information with Dirichlet allocations, a causal Bayesian network and an Exponential Random Graph Social Network Model. Given the sender and the [McCallum et. al, 25] Given the sender and the set of receivers of an 1. Pick a receiver 2. Get the probability of a topic given the sender and receiver 3. Get the probability a word given the topic 31 E6885 Network Science Lecture 1: Analysis of Network Flow α β γ θ A φ T CTR model u z w N time of an 1. Get the probability of a topic given the sender 2. Get the probability of the receiver given the sender and the topic ϕ D 3. Get the Tm probability of a word given the : observations topic a: sender/author, z: topic, S: social network (Exponential Random Graph Model / p* model), D: document/ r: receivers, w: content words, N: Word set, T: Topic t r S p* CTR algorithm Training phase Input Old s with content, sender and receiver information, and time stamps Output Testing phase Input New s with content and time stamps Output (, old), (, old), and (,, old) P w z t P z d t P u r z t (,, new), (, new), and (, new) P u r d t P w z t P z d t 32 E6885 Network Science Lecture 1: Analysis of Network Flow 16

17 Adaptive CTR Social networks dynamically change and evolve over time update the model with newest user behavior information is necessary Aggregative updating the model by adding new user behavior information including the senders and receivers into the model K ( i) = ( k old) ( k i) + ( old) ( t ) i i Pˆ u, r d, t P u, r z, t P z d, t P u, r t P z d, t k= 1 zt / z i told Assume the correlation between current data and the previous data decays over time. The more recent data are more important. A sliding window of size n is used to choose the data for building the prediction model The prediction is only dependent on the recent data, with the influence of old data ignored 33 E6885 Network Science Lecture 1: Analysis of Network Flow Personal Social Network PSN: who a user contacts with during a certain time period number of times u sends s to r P( r u) = total number of s sent out by u (a) Jan- 99 to Dec- 99 (b) Jan- to Jun- (c) Jul- to Dec- 34 E6885 Network Science Lecture 1: Analysis of Network Flow 17

18 CommunityNet Christmas Energy Provide a query apply CTR model visualize the personal topic community by CommunityNet 35 E6885 Network Science Lecture 1: Analysis of Network Flow Topic Analysis - Hot and cold topics Hot Topics Regular Issues Meeting Deal Petroleum Texas Document meeting plan conference balance presentation discussion deal desk book bill group explore petroleum research dear photo Enron station Houston Texas Enron north America street letter draft attach comment review mark Cold Topics Specific or Sensitive Issues Trade Stock Network Project Market trade London bank name Mexico conserve stock earn company share price new network world user save secure system court state India server project govern call market week trade description respond 36 E6885 Network Science Lecture 1: Analysis of Network Flow 18

19 Topic Trends - yearly repeating events Popularity Topic Trends Topic45(y2) Topic45(y21) Topic19(y2) Topic19(y21) Jan Mar May Jul Sep Nov Topic 45, which is talking about a schedule issue, reaches a peak during June to September. For topic 19, it is talking about a meeting issue. The trend repeats year to year. 37 E6885 Network Science Lecture 1: Analysis of Network Flow CTR Model Finds Topic Categories, Key People and Communities Simultaneously Popularity Key Words Topic Analysis Topic 61 Topic Trend of California Power Jan- Apr- Jul- Oct- Jan-1 Apr-1 Jul-1 Oct-1 power California.8816 electrical price.5594 energy generator market until.3681 Key People Jeff_Dasovich James_Steffes Richard_Shapiro Mary_Hain Richard_Sanders Steven_Kean Event California Energy Crisis occurred at exactly this time period Key people can be identified to be active in this event 38 E6885 Network Science Lecture 1: Analysis of Network Flow 19

20 Personal Topic Trends of California Power.5.4 Overall trend Jeff_Dasovich Vince_Kaminski Popularity Jan- May- Sep- Jan-1 May-1 Sep-1 39 E6885 Network Science Lecture 1: Analysis of Network Flow Predicting Receivers Personal social network People tend to send s to the same group of people Latent Dirichlet Allocation - Personal social network Topic clusters do not change over time Content-Time-Relation model Adaptive CTR model Jan-1 Mar-1 May-1 Adaptive CTR(aggregative) Adaptive CTR(6 months) CTR LDA-PSN PSN Jul-1 Sep-1 Nov-1 Comparison using Breese evaluation metrics 4 E6885 Network Science Lecture 1: Analysis of Network Flow 2

21 CTR Model: Predicting Receivers Is a person s behavior predictable? Jeff Dasovich (Enron government relations executive): Whom should I discuss with about Government issue? Accuracy by PSN by LDA-PSN by CTR Adaptive CTR Jan-1 Mar-1 May-1 Jul-1 Sep-1 Nov-1 Time Prediction Performance Personal behavior and intention are somewhat predictable 41 E6885 Network Science Lecture 1: Analysis of Network Flow Conclusions and ongoing work Conclusion Automatically model and predict human behavior of receiving and disseminating information Establish personal CommunityNet profiles based on the Content-Time-Relation algorithm, which incorporates contact, content, and time information simultaneously from personal communication Explore many interesting results, Finding the most important employees in events Predicting senders or receivers of s Perform better than both the social network-based and the content-based predictions Personal behavior and intention are somewhat predictable Ongoing work incorporate nonparametric Bayesian methods such as hierarchical LDA with contact and time information Extend the CTR model to Content-Time-Context model for personalized Retrieval and Recommendation 42 E6885 Network Science Lecture 1: Analysis of Network Flow 21

22 Personalized Recommendation Driven by Information Flow Xiaodan Song, Belle Tseng, Ching-Yung Lin and Ming-Ting Sun -- SIG E6885 Network Science Lecture 1: Analysis of Network Flow Recommendation by Collaborative Filtering (CF) A Given adopt A Infer adopt? B Infer adopt? B Given adopt People with similar tastes People with similar tastes Similarity is symmetric! 44 E6885 Network Science Lecture 1: Analysis of Network Flow 22

23 Adoptions follow a sequence Number of Accessed Users Apr. 24 Jul. 25 Early adopter Late adopter 45 E6885 Network Science Lecture 1: Analysis of Network Flow Rogers Diffusion of Innovations Theory Percentage over all adopters Innovators Early adopters Early majority Late majority Laggards Users adoption patterns: Some users tend to adopt items earlier than others 46 E6885 Network Science Lecture 1: Analysis of Network Flow 23

24 Recommendation Driven by Information Flow An Intuitive Example Innovators adopt Early adopters Innovators Less likely! adopt? Early adopters Late majority Early majority Late majority? Early majority People with similar tastes Laggards Most likely! adopt? People with similar tastes Laggards adopt Influence is not symmetric! 47 E6885 Network Science Lecture 1: Analysis of Network Flow Utilize Information Flow for Personalized Recommendation -- Problem Formulation The typical CF question: What items will user U like? Our Formulation Given user U adopts item Y, who would be likely to adopt item Y next? Innovators Information flows from earlier adopters to later adopters Laggards 48 E6885 Network Science Lecture 1: Analysis of Network Flow 24

25 Analogy: Information Adoption As A Diffusion Process Given user U adopts item Y, who would be likely to adopt item Y next? Information Adoption Information Flow (Diffusion) In physics, diffusion process is usually related to a random walk [R. Kondor and J.-P. Vert, Diffusion Kernels, 24] Information Adoption is modeled as a random walk Users are ranked by the state probabilities 49 E6885 Network Science Lecture 1: Analysis of Network Flow Scheme Overview (I) Leverage the asymmetric influence User ID Item ID Timestamp IF Information Propagation Model Application: Personalized Recommendation /3/1 u r 1 v r 2 r 3 Information Flow Network (IF) model the asymmetric influences between users Information Propagation Model if a user adopts the information, who will likely be the follower? 5 E6885 Network Science Lecture 1: Analysis of Network Flow 25

26 Scheme Overview (II) Adoptions patterns are typically category specific User ID Item ID Timestamp IF Information Propagation Model Application: Personalized Recommendation Topic Detection TIF Topic-Sensitive Information Flow Network (TIF) model the asymmetric influences between users under the same topic 51 E6885 Network Science Lecture 1: Analysis of Network Flow IF (I) User ID Dataset Item ID Timestamp IF Information Propagation Model Application: Personalized Recommendation Objective Model the asymmetric influences between users Early Adoption Matrix (EAM) Count how many items one user adopts earlier than the other pairwise comparison User 1 User 2 User N User User User N 3 52 E6885 Network Science Lecture 1: Analysis of Network Flow 26

27 IF (II) IF A Random Walk Model Network structure Each user as a node (state) The value on edge (i j) represents how likely user j will follow user i to adopt the information Normalize EAM to a Transition probability Matrix F i F ij j 53 E6885 Network Science Lecture 1: Analysis of Network Flow User ID Item ID Timestamp IF (III) Dataset IF Information Propagation Model Application: Personalized Recommendation The random walk over the following graphs does not converge Sink Cycle Make the random walk have a unique stationary distribution F (= F + random jump) 1) Make the matrix stochastic 2) Make the matrix irreducible F u, v F if u, v Fu, v v = 1 N else ( α) T F= αf+ 1 ee N N: number of the nodes 54 E6885 Network Science Lecture 1: Analysis of Network Flow 27

28 TIF User ID Item ID Timestamp IF Information Propagation Model Application: Personalized Recommendation Topic Detection TIF Latent Dirichlet Allocation [Blei et al. 23] β TOPIC 1 φ T α θ z w W D : Observations TOPIC 2 TOPIC 1 TOPIC 2 TOPIC 3 TIF TOPIC 3 55 E6885 Network Science Lecture 1: Analysis of Network Flow Information Propagation Models (I) User ID Dataset Item ID Timestamp IF Information Propagation Model Application: Personalized Recommendation 1. Summation of various propagation steps F F F F ( ) = ( + ) m if m m u r 1 v a special case: when m= N 1 and N 1 N Fif ( N 1) F = U U... T r 2 r 3 N: number of the nodes where U: eigenvector 56 E6885 Network Science Lecture 1: Analysis of Network Flow 28

29 Information Propagation Models (II) User ID Dataset Item ID Timestamp IF Information Propagation Model Application: Personalized Recommendation 2. Exponential weighted summation The longer the path, the less reliable it is ( β ) N: number of the nodes exp 1 exp( β λ2) Fif (exp) exp( βf) = U U exp( β λn) u r 1 r 2 r 3 v T where eigenvalues λ λ λ 1= 1> 2 > > N 57 E6885 Network Science Lecture 1: Analysis of Network Flow Personalized Recommendation User ID Dataset Item ID Timestamp IF Information Propagation Model Application: Personalized Recommendation Construct IF or TIF based on the historical data Trigger earliest users to start the process Predict who will be also interested in these items by information propagation models 58 E6885 Network Science Lecture 1: Analysis of Network Flow 29

30 Experimental Setup Sales-force dataset Apr. 24 to Apr. 25 as training data May 25 to Jul. 25 as test data 133 users, 586 documents MovieLens dataset 943 users, 1682 movies, 1, actions The log data regarding early 8% disclosed movies as training data, late 2% as test data Evaluation Baseline Collaborative Filtering (CF) Metric Precision & Recall 59 E6885 Network Science Lecture 1: Analysis of Network Flow Consistency of Early Adoption Patterns How consistent are users pairwise adoption behaviors over time? Calculate Transition Prob. Matrices (TPM) of both training and test data For each user I, calculate the correlation value of The ith row from TPM of the test data and uniform(1/(n-1)) (Baseline 1) The ith row from TPM of the test data and uniform(1/m) (M is the number of users used in CF) (Baseline 2) The ith rows from these two TPMs Number of Users ER Baseline 1 Baseline 2 IF Number of Users MovieLens Baseline 1 Baseline 2 IF Correlation Value 6 E6885 Network Science Lecture 1: Analysis of Network Flow Correlation Value 3

31 Experimental Results --Recommendation Quality Precision Precision Comparison (Number of Triggered Users = 1, Propagation Steps = 1) Number No. of of recommended retrieved users users CF CF EABIF TEABIF TIF Recall Recall Comparison (Number of Triggered Users = 1, Propagation Steps = 1) CF CF EABIF TEABIF TIF Number No. of of retrieved recommended users users Comparing to Collaborative Filtering (CF) Precision: IF is 91% better, TIF is 18% better Recall: IF is 87% better, TIF is 113% better 61 E6885 Network Science Lecture 1: Analysis of Network Flow Experimental Results -- Propagation Performance Ratio (x1%) Precision Improvement Comparison (Number of triggered users = 1, Baseline: CF) EABIF 1.6 TEABIF TIF m m = 1 m = 2 m = 3 m = 4 m = 5 sum exp(β= 1) exp(β= 1.5) exp(β= 2) exp(β= 3) exp(β= 4) exp(β= 5) exp(β= 8) exp(β= 16) Ratio (x1%) Recall Improvement Comparison (Number of triggered users = 1, Baseline: CF) m = 1 m = 2 m = 3 m = 4 m = 5 sum exp(β= 1) exp(β= 1.5) exp(β= 2) exp(β= 3) exp(β= 4) exp(β= 5) EABIF TEABIF TIF exp(β= 8) exp(β= 16) TIF with exponential weighted summation ( β = 4 ) achieves the best performance: improves 136% on precision and 126% on recall comparing to CF 62 E6885 Network Science Lecture 1: Analysis of Network Flow 31

32 Experimental Results --Recommendation Quality Precision Precision Precision Comparison (Number of Triggered Users = 1, Propagation Steps = 1) Number of recommended users No. of retrieved users 4 Precision Comparison (Number of Triggered Users = 2, Propagation Steps = 1) Number of recommended users No. of retrieved users 4 CF CF EABIF TEABIF TIF CF CF EABIF TTIF EABIF Recall Recall Recall Comparison (Number of Triggered Users = 1, Propagation Steps = 1) CF CF EABIF T EABIF TIF Number No. of recommended of retrieved users users Recall Comparison (Number of Triggered Users = 2, Propagation Steps = 1) CF CF EABIF TEABIF TIF Number of recommended users No. of retrieved users 4 Comparing to Collaborative Filtering (CF) Precision: IF is 91% better, TIF is 18% better Recall: IF is 87% better, TIF is 113% better 63 E6885 Network Science Lecture 1: Analysis of Network Flow Conclusions and Next Steps Conclusions Utilize sequential adoption patterns Leverage asymmetric influences between users IF Leverage category-specific patterns TIF Identify how information flows through the network information propagation models Next Steps Leverage the diffusion rate Improve the information propagation models Evaluate by online user study 64 E6885 Network Science Lecture 1: Analysis of Network Flow 32

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science. September 21, 2017

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science. September 21, 2017 E6893 Big Data Analytics Lecture 3: Big Data Storage and Analytics Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science September 21, 2017 1 E6893 Big Data Analytics

More information

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,

More information

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017 Latent Space Model for Road Networks to Predict Time-Varying Traffic Presented by: Rob Fitzgerald Spring 2017 Definition of Latent https://en.oxforddictionaries.com/definition/latent Latent Space Model?

More information

Behavioral Data Mining. Lecture 18 Clustering

Behavioral Data Mining. Lecture 18 Clustering Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Recommender Systems New Approaches with Netflix Dataset

Recommender Systems New Approaches with Netflix Dataset Recommender Systems New Approaches with Netflix Dataset Robert Bell Yehuda Koren AT&T Labs ICDM 2007 Presented by Matt Rodriguez Outline Overview of Recommender System Approaches which are Content based

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Training data 00 million ratings, 80,000 users, 7,770 movies 6 years of data: 000 00 Test data Last few ratings of

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group

Information Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)

More information

Online Social Networks and Media

Online Social Networks and Media Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize) CSE 158 Lecture 8 Web Mining and Recommender Systems Extensions of latent-factor models, (and more on the Netflix prize) Summary so far Recap 1. Measuring similarity between users/items for binary prediction

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

The Pre-Image Problem in Kernel Methods

The Pre-Image Problem in Kernel Methods The Pre-Image Problem in Kernel Methods James Kwok Ivor Tsang Department of Computer Science Hong Kong University of Science and Technology Hong Kong The Pre-Image Problem in Kernel Methods ICML-2003 1

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets Training data 100 million ratings, 80,000 users, 17,770

More information

Recommendation System for Location-based Social Network CS224W Project Report

Recommendation System for Location-based Social Network CS224W Project Report Recommendation System for Location-based Social Network CS224W Project Report Group 42, Yiying Cheng, Yangru Fang, Yongqing Yuan 1 Introduction With the rapid development of mobile devices and wireless

More information

Practical Machine Learning Agenda

Practical Machine Learning Agenda Practical Machine Learning Agenda Starting From Log Management Moving To Machine Learning PunchPlatform team Thales Challenges Thanks 1 Starting From Log Management 2 Starting From Log Management Data

More information

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

Spatial Latent Dirichlet Allocation

Spatial Latent Dirichlet Allocation Spatial Latent Dirichlet Allocation Xiaogang Wang and Eric Grimson Computer Science and Computer Science and Artificial Intelligence Lab Massachusetts Tnstitute of Technology, Cambridge, MA, 02139, USA

More information

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD Abstract There are two common main approaches to ML recommender systems, feedback-based systems and content-based systems.

More information

CS229 Final Project: Predicting Expected Response Times

CS229 Final Project: Predicting Expected  Response Times CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Cybersecurity is a Team Sport

Cybersecurity is a Team Sport Cybersecurity is a Team Sport Cyber Security Summit at Loyola Marymount University - October 22 2016 Dr. Robert Pittman, CISM Chief Information Security Officer National Cyber Security Awareness Month

More information

CS281 Section 9: Graph Models and Practical MCMC

CS281 Section 9: Graph Models and Practical MCMC CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs

More information

Time Series Analysis by State Space Methods

Time Series Analysis by State Space Methods Time Series Analysis by State Space Methods Second Edition J. Durbin London School of Economics and Political Science and University College London S. J. Koopman Vrije Universiteit Amsterdam OXFORD UNIVERSITY

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Matrix Co-factorization for Recommendation with Rich Side Information HetRec 2011 and Implicit 1 / Feedb 23

Matrix Co-factorization for Recommendation with Rich Side Information HetRec 2011 and Implicit 1 / Feedb 23 Matrix Co-factorization for Recommendation with Rich Side Information and Implicit Feedback Yi Fang and Luo Si Department of Computer Science Purdue University West Lafayette, IN 47906, USA fangy@cs.purdue.edu

More information

MIND THE GOOGLE! Understanding the impact of the. Google Knowledge Graph. on your shopping center website.

MIND THE GOOGLE! Understanding the impact of the. Google Knowledge Graph. on your shopping center website. MIND THE GOOGLE! Understanding the impact of the Google Knowledge Graph on your shopping center website. John Dee, Chief Operating Officer PlaceWise Media Mind the Google! Understanding the Impact of the

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

Lecture 27: Learning from relational data

Lecture 27: Learning from relational data Lecture 27: Learning from relational data STATS 202: Data mining and analysis December 2, 2017 1 / 12 Announcements Kaggle deadline is this Thursday (Dec 7) at 4pm. If you haven t already, make a submission

More information

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks Archana Sulebele, Usha Prabhu, William Yang (Group 29) Keywords: Link Prediction, Review Networks, Adamic/Adar,

More information

Sensor Tasking and Control

Sensor Tasking and Control Sensor Tasking and Control Outline Task-Driven Sensing Roles of Sensor Nodes and Utilities Information-Based Sensor Tasking Joint Routing and Information Aggregation Summary Introduction To efficiently

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov

Algorithms and Applications in Social Networks. 2017/2018, Semester B Slava Novgorodov Algorithms and Applications in Social Networks 2017/2018, Semester B Slava Novgorodov 1 Lesson #1 Administrative questions Course overview Introduction to Social Networks Basic definitions Network properties

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

Link Structure Analysis

Link Structure Analysis Link Structure Analysis Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!) Link Analysis In the Lecture HITS: topic-specific algorithm Assigns each page two scores a hub score

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Graph Exploitation Testbed

Graph Exploitation Testbed Graph Exploitation Testbed Peter Jones and Eric Robinson Graph Exploitation Symposium April 18, 2012 This work was sponsored by the Office of Naval Research under Air Force Contract FA8721-05-C-0002. Opinions,

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Thesis Proposal / Advancement Exam 8 December 2011 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of

More information

Link Prediction for Social Network

Link Prediction for Social Network Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue

More information

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting

Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting Yaguang Li Joint work with Rose Yu, Cyrus Shahabi, Yan Liu Page 1 Introduction Traffic congesting is wasteful of time,

More information

Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp

Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp Collaborative Filtering using Weighted BiPartite Graph Projection A Recommendation System for Yelp Sumedh Sawant sumedh@stanford.edu Team 38 December 10, 2013 Abstract We implement a personal recommendation

More information

Airside Congestion. Airside Congestion

Airside Congestion. Airside Congestion Airside Congestion Amedeo R. Odoni T. Wilson Professor Aeronautics and Astronautics Civil and Environmental Engineering Massachusetts Institute of Technology Objectives Airside Congestion _ Introduce fundamental

More information

Lecture 8: Linkage algorithms and web search

Lecture 8: Linkage algorithms and web search Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk 2017

More information

Community-Based Recommendations: a Solution to the Cold Start Problem

Community-Based Recommendations: a Solution to the Cold Start Problem Community-Based Recommendations: a Solution to the Cold Start Problem Shaghayegh Sahebi Intelligent Systems Program University of Pittsburgh sahebi@cs.pitt.edu William W. Cohen Machine Learning Department

More information

Recurrent Neural Network (RNN) Industrial AI Lab.

Recurrent Neural Network (RNN) Industrial AI Lab. Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series

More information

CSE 258 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

CSE 258 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize) CSE 258 Lecture 8 Web Mining and Recommender Systems Extensions of latent-factor models, (and more on the Netflix prize) Summary so far Recap 1. Measuring similarity between users/items for binary prediction

More information

Information Networks: PageRank

Information Networks: PageRank Information Networks: PageRank Web Science (VU) (706.716) Elisabeth Lex ISDS, TU Graz June 18, 2018 Elisabeth Lex (ISDS, TU Graz) Links June 18, 2018 1 / 38 Repetition Information Networks Shape of the

More information

Predicting Messaging Response Time in a Long Distance Relationship

Predicting Messaging Response Time in a Long Distance Relationship Predicting Messaging Response Time in a Long Distance Relationship Meng-Chen Shieh m3shieh@ucsd.edu I. Introduction The key to any successful relationship is communication, especially during times when

More information

Spectral Methods for Network Community Detection and Graph Partitioning

Spectral Methods for Network Community Detection and Graph Partitioning Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection

More information

Imputation for missing observation through Artificial Intelligence. A Heuristic & Machine Learning approach

Imputation for missing observation through Artificial Intelligence. A Heuristic & Machine Learning approach Imputation for missing observation through Artificial Intelligence A Heuristic & Machine Learning approach (Test case with macroeconomic time series from the BIS Data Bank) Byeungchun Kwon Bank for International

More information

Probabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks

Probabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks Probabilistic Modeling of Leach Protocol and Computing Sensor Energy Consumption Rate in Sensor Networks Dezhen Song CS Department, Texas A&M University Technical Report: TR 2005-2-2 Email: dzsong@cs.tamu.edu

More information

Text Modeling with the Trace Norm

Text Modeling with the Trace Norm Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to

More information

E6885 Network Science Lecture 10: Graph Database (II)

E6885 Network Science Lecture 10: Graph Database (II) E 6885 Topics in Signal Processing -- Network Science E6885 Network Science Lecture 10: Graph Database (II) Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University November 18th, 2013 Course

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Data fusion and multi-cue data matching using diffusion maps

Data fusion and multi-cue data matching using diffusion maps Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by

More information

Supervised Random Walks

Supervised Random Walks Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate

More information

Ruslan Salakhutdinov and Geoffrey Hinton. University of Toronto, Machine Learning Group IRGM Workshop July 2007

Ruslan Salakhutdinov and Geoffrey Hinton. University of Toronto, Machine Learning Group IRGM Workshop July 2007 SEMANIC HASHING Ruslan Salakhutdinov and Geoffrey Hinton University of oronto, Machine Learning Group IRGM orkshop July 2007 Existing Methods One of the most popular and widely used in practice algorithms

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

Behavioral Data Mining. Lecture 9 Modeling People

Behavioral Data Mining. Lecture 9 Modeling People Behavioral Data Mining Lecture 9 Modeling People Outline Power Laws Big-5 Personality Factors Social Network Structure Power Laws Y-axis = frequency of word, X-axis = rank in decreasing order Power Laws

More information

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM) School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Additive hedonic regression models for the Austrian housing market ERES Conference, Edinburgh, June

Additive hedonic regression models for the Austrian housing market ERES Conference, Edinburgh, June for the Austrian housing market, June 14 2012 Ao. Univ. Prof. Dr. Fachbereich Stadt- und Regionalforschung Technische Universität Wien Dr. Strategic Risk Management Bank Austria UniCredit, Wien Inhalt

More information

Mobility Models. Larissa Marinho Eglem de Oliveira. May 26th CMPE 257 Wireless Networks. (UCSC) May / 50

Mobility Models. Larissa Marinho Eglem de Oliveira. May 26th CMPE 257 Wireless Networks. (UCSC) May / 50 Mobility Models Larissa Marinho Eglem de Oliveira CMPE 257 Wireless Networks May 26th 2015 (UCSC) May 2015 1 / 50 1 Motivation 2 Mobility Models 3 Extracting a Mobility Model from Real User Traces 4 Self-similar

More information

CS 6604: Data Mining Large Networks and Time-Series

CS 6604: Data Mining Large Networks and Time-Series CS 6604: Data Mining Large Networks and Time-Series Soumya Vundekode Lecture #12: Centrality Metrics Prof. B Aditya Prakash Agenda Link Analysis and Web Search Searching the Web: The Problem of Ranking

More information

VisoLink: A User-Centric Social Relationship Mining

VisoLink: A User-Centric Social Relationship Mining VisoLink: A User-Centric Social Relationship Mining Lisa Fan and Botang Li Department of Computer Science, University of Regina Regina, Saskatchewan S4S 0A2 Canada {fan, li269}@cs.uregina.ca Abstract.

More information

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model One-Shot Learning with a Hierarchical Nonparametric Bayesian Model R. Salakhutdinov, J. Tenenbaum and A. Torralba MIT Technical Report, 2010 Presented by Esther Salazar Duke University June 10, 2011 E.

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Edge-exchangeable graphs and sparsity

Edge-exchangeable graphs and sparsity Edge-exchangeable graphs and sparsity Tamara Broderick Department of EECS Massachusetts Institute of Technology tbroderick@csail.mit.edu Diana Cai Department of Statistics University of Chicago dcai@uchicago.edu

More information

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Lecture #3: PageRank Algorithm The Mathematics of Google Search Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #7: Recommendation Content based & Collaborative Filtering Seoul National University In This Lecture Understand the motivation and the problem of recommendation Compare

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

Query Independent Scholarly Article Ranking

Query Independent Scholarly Article Ranking Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data

More information

CSE 446 Bias-Variance & Naïve Bayes

CSE 446 Bias-Variance & Naïve Bayes CSE 446 Bias-Variance & Naïve Bayes Administrative Homework 1 due next week on Friday Good to finish early Homework 2 is out on Monday Check the course calendar Start early (midterm is right before Homework

More information

Characterization and Modeling of Deleted Questions on Stack Overflow

Characterization and Modeling of Deleted Questions on Stack Overflow Characterization and Modeling of Deleted Questions on Stack Overflow Denzil Correa, Ashish Sureka http://correa.in/ February 16, 2014 Denzil Correa, Ashish Sureka (http://correa.in/) ACM WWW-2014 February

More information

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017 CPSC 340: Machine Learning and Data Mining Recommender Systems Fall 2017 Assignment 4: Admin Due tonight, 1 late day for Monday, 2 late days for Wednesday. Assignment 5: Posted, due Monday of last week

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

Desiging and combining kernels: some lessons learned from bioinformatics

Desiging and combining kernels: some lessons learned from bioinformatics Desiging and combining kernels: some lessons learned from bioinformatics Jean-Philippe Vert Jean-Philippe.Vert@mines-paristech.fr Mines ParisTech & Institut Curie NIPS MKL workshop, Dec 12, 2009. Jean-Philippe

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

2016 Market Update. Gary Keller and Jay Papasan Keller Williams Realty, Inc.

2016 Market Update. Gary Keller and Jay Papasan Keller Williams Realty, Inc. 2016 Market Update Gary Keller and Jay Papasan Housing Market Cycles 1. Home Sales The Numbers That Drive U.S. 2. Home Price 3. Months Supply of Inventory 4. Mortgage Rates Real Estate 1. Home Sales Nationally

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Nigerian Telecommunications (Services) Sector Report Q3 2016

Nigerian Telecommunications (Services) Sector Report Q3 2016 Nigerian Telecommunications (Services) Sector Report Q3 2016 24 NOVEMBER 2016 Telecommunications Data The telecommunications data used in this report were obtained from the National Bureau of Statistics

More information

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University NIPS 2008: E. Sudderth & M. Jordan, Shared Segmentation of Natural

More information

Chapter 3: Supervised Learning

Chapter 3: Supervised Learning Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

Proximity Prestige using Incremental Iteration in Page Rank Algorithm Indian Journal of Science and Technology, Vol 9(48), DOI: 10.17485/ijst/2016/v9i48/107962, December 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Proximity Prestige using Incremental Iteration

More information

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman http://www.mmds.org Overview of Recommender Systems Content-based Systems Collaborative Filtering J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive

More information

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis

Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Bayesian Spherical Wavelet Shrinkage: Applications to Shape Analysis Xavier Le Faucheur a, Brani Vidakovic b and Allen Tannenbaum a a School of Electrical and Computer Engineering, b Department of Biomedical

More information

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Evaluation

More information

Optimal designs for comparing curves

Optimal designs for comparing curves Optimal designs for comparing curves Holger Dette, Ruhr-Universität Bochum Maria Konstantinou, Ruhr-Universität Bochum Kirsten Schorning, Ruhr-Universität Bochum FP7 HEALTH 2013-602552 Outline 1 Motivation

More information

Text Document Clustering Using DPM with Concept and Feature Analysis

Text Document Clustering Using DPM with Concept and Feature Analysis Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,

More information