Learning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract

Similar documents
Graph-Based vs Depth-Based Data Representation for Multiview Images

A Novel Validity Index for Determination of the Optimal Number of Clusters

Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors

Cluster-Based Cumulative Ensembles

Semi-Supervised Affinity Propagation with Instance-Level Constraints

Extracting Partition Statistics from Semistructured Data

Transition Detection Using Hilbert Transform and Texture Features

Gray Codes for Reflectable Languages

Algorithms for External Memory Lecture 6 Graph Algorithms - Weighted List Ranking

System-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications

the data. Structured Principal Component Analysis (SPCA)

Multiple-Criteria Decision Analysis: A Novel Rank Aggregation Method

The Minimum Redundancy Maximum Relevance Approach to Building Sparse Support Vector Machines

Contents Contents...I List of Tables...VIII List of Figures...IX 1. Introduction Information Retrieval... 8

Capturing Large Intra-class Variations of Biometric Data by Template Co-updating

Discrete sequential models and CRFs. 1 Case Study: Supervised Part-of-Speech Tagging

Automatic Physical Design Tuning: Workload as a Sequence Sanjay Agrawal Microsoft Research One Microsoft Way Redmond, WA, USA +1-(425)

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Query Evaluation Overview. Query Optimization: Chap. 15. Evaluation Example. Cost Estimation. Query Blocks. Query Blocks

On - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2

DETECTION METHOD FOR NETWORK PENETRATING BEHAVIOR BASED ON COMMUNICATION FINGERPRINT

COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY

Naïve Bayes Slides are adapted from Sebastian Thrun (Udacity ), Ke Chen Jonathan Huang and H. Witten s and E. Frank s Data Mining and Jeremy Wyatt,

An Approach to Physics Based Surrogate Model Development for Application with IDPSA

HEXA: Compact Data Structures for Faster Packet Processing

3-D IMAGE MODELS AND COMPRESSION - SYNTHETIC HYBRID OR NATURAL FIT?

Outline: Software Design

Interconnection Styles

Mean Deviation Similarity Index: Efficient and Reliable Full-Reference Image Quality Evaluator

CleanUp: Improving Quadrilateral Finite Element Meshes

An Edge-based Clustering Algorithm to Detect Social Circles in Ego Networks

Adaptive Implicit Surface Polygonization using Marching Triangles

NONLINEAR BACK PROJECTION FOR TOMOGRAPHIC IMAGE RECONSTRUCTION. Ken Sauer and Charles A. Bouman

Pipelined Multipliers for Reconfigurable Hardware

Unsupervised Stereoscopic Video Object Segmentation Based on Active Contours and Retrainable Neural Networks

Plot-to-track correlation in A-SMGCS using the target images from a Surface Movement Radar

8 : Learning Fully Observed Undirected Graphical Models

A New RBFNDDA-KNN Network and Its Application to Medical Pattern Classification

Cluster Centric Fuzzy Modeling

Exploring the Commonality in Feature Modeling Notations

Context-Aware Activity Modeling using Hierarchical Conditional Random Fields

Improving the Perceptual Uniformity of a Gloss Space

What are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study

DOMAIN ADAPTATION BY ITERATIVE IMPROVEMENT OF SOFT-LABELING AND MAXIMIZATION OF NON-PARAMETRIC MUTUAL INFORMATION. M.N.A. Khan, Douglas R.

Evolutionary Feature Synthesis for Image Databases

Superpixel Tracking. School of Information and Communication Engineering, Dalian University of Technology, China 2

Improved Vehicle Classification in Long Traffic Video by Cooperating Tracker and Classifier Modules

Measurement of the stereoscopic rangefinder beam angular velocity using the digital image processing method

A Load-Balanced Clustering Protocol for Hierarchical Wireless Sensor Networks

Incremental Mining of Partial Periodic Patterns in Time-series Databases

A scheme for racquet sports video analysis with the combination of audio-visual information

Algorithms, Mechanisms and Procedures for the Computer-aided Project Generation System

Semantic Concept Detection Using Weighted Discretization Multiple Correspondence Analysis for Disaster Information Management

Improved Circuit-to-CNF Transformation for SAT-based ATPG

One Against One or One Against All : Which One is Better for Handwriting Recognition with SVMs?

THROUGHPUT EVALUATION OF AN ASYMMETRICAL FDDI TOKEN RING NETWORK WITH MULTIPLE CLASSES OF TRAFFIC

Machine Vision. Laboratory Exercise Name: Student ID: S

CA Test Data Manager 4.x Implementation Proven Professional Exam (CAT-681) Study Guide Version 1.0

KERNEL SPARSE REPRESENTATION WITH LOCAL PATTERNS FOR FACE RECOGNITION

Chapter 2: Introduction to Maple V

CA Release Automation 5.x Implementation Proven Professional Exam (CAT-600) Study Guide Version 1.1

Chemical, Biological and Radiological Hazard Assessment: A New Model of a Plume in a Complex Urban Environment

A {k, n}-secret Sharing Scheme for Color Images

Analysis of input and output configurations for use in four-valued CCD programmable logic arrays

Detecting Outliers in High-Dimensional Datasets with Mixed Attributes

Cross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer

FOREGROUND OBJECT EXTRACTION USING FUZZY C MEANS WITH BIT-PLANE SLICING AND OPTICAL FLOW

A radiometric analysis of projected sinusoidal illumination for opaque surfaces

Performance of Histogram-Based Skin Colour Segmentation for Arms Detection in Human Motion Analysis Application

Simulation of Crystallographic Texture and Anisotropie of Polycrystals during Metal Forming with Respect to Scaling Aspects

Multi-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality

Contour Box: Rejecting Object Proposals Without Explicit Closed Contours

Video Data and Sonar Data: Real World Data Fusion Example

Adapting Instance Weights For Unsupervised Domain Adaptation Using Quadratic Mutual Information And Subspace Learning

We don t need no generation - a practical approach to sliding window RLNC

Acoustic Links. Maximizing Channel Utilization for Underwater

Deep Rule-Based Classifier with Human-level Performance and Characteristics

Facility Location: Distributed Approximation

Efficient Implementation of Beam-Search Incremental Parsers

Fast Distribution of Replicated Content to Multi- Homed Clients Mohammad Malli Arab Open University, Beirut, Lebanon

Exploiting Enriched Contextual Information for Mobile App Classification

Comparing Images Under Variable Illumination

Gradient based progressive probabilistic Hough transform

Automated System for the Study of Environmental Loads Applied to Production Risers Dustin M. Brandt 1, Celso K. Morooka 2, Ivan R.

Fast Rigid Motion Segmentation via Incrementally-Complex Local Models

timestamp, if silhouette(x, y) 0 0 if silhouette(x, y) = 0, mhi(x, y) = and mhi(x, y) < timestamp - duration mhi(x, y), else

EXODUS II: A Finite Element Data Model

Time delay estimation of reverberant meeting speech: on the use of multichannel linear prediction

Abstract. Key Words: Image Filters, Fuzzy Filters, Order Statistics Filters, Rank Ordered Mean Filters, Channel Noise. 1.

New Fuzzy Object Segmentation Algorithm for Video Sequences *

Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis

An Interactive-Voting Based Map Matching Algorithm

Mining effective design solutions based on a model-driven approach

Hello neighbor: accurate object retrieval with k-reciprocal nearest neighbors

Flow Demands Oriented Node Placement in Multi-Hop Wireless Networks

Multi Features Content-Based Image Retrieval Using Clustering and Decision Tree Algorithm

Dynamic Algorithms Multiple Choice Test

The SODA AOSE Methodology

Detection and Recognition of Non-Occluded Objects using Signature Map

Numerical simulation of hemolysis: a comparison of Lagrangian and Eulerian modelling

Transcription:

CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and try to predit whether a speifi onvention will be adopted by a user in his oming review. Learning and predition of onvention adoption are done based on exposure of a review, or a reviewer at a speifi time point, to the onvention. In this projet, we define the riteria for exposure of one review to another, whih in turn define an impliit network struture over the reviews. We then use features extrated from this network to learn and predit onvention adoption.. Introdution BeerAdvoate is a website in whih users write reviews on various brands of beer. The BeerAdvoate dataset ontains over.5 million review reords, made by more than 33K users. Some of the users adopt a unique jargon in their review text, and use ertain onventions (speifi words, phrases or abbreviations) whih are shared by multiple reviewers and aross multiple beers. In the sope of this projet, a onvention is an element from a pre-defined set of piees of text C. We do not address the semanti meaning of a onvention, or what makes a piee of text to beome a onvention. In this projet, we look at the binary lassifiation problem of learning when a onvention C is used ( adopted ) and try to predit whether a new review r would adopt that onvention. In addition to its ontent, a review r is haraterized by three omponents: The reviewer user( r ), the produt ( ) beer r, and the time of the review time( r ). The hypothesis behind this projet is that high exposure to a onvention by user( r ) while reviewing beer( r ) at time( r ) inreases the likelihood of review r adopting. One key question is how to define exposure in a review website suh as BeerAdvoate. In problems that deal with information propagation in networks (e.g. soial networks), the network is given in advane and determines whether a node (whih in most ases represents a user) is exposed to another node. In ontrast, a review website suh as BeerAdvoate does not expliitly define a network struture. Instead, it is up to us to define when are two reviews (or two reviewers) are exposed to one another, and at what times. The definition of exposure then defines an underlying exposure network that an be used to reason about information propagation between its nodes. The established exposure relations between reviews and the network struture they define are used to define features for learning and predition of onvention adoption by new reviews that are added to the network. There are three ategories of features (attributes) we use for onvention adoption learning: The extent of the exposure the review has to the onvention, user bias, and onvention bias. ser bias aptures the tendeny of the user to adopt onventions, and onvention bias aptures the tendeny of the onvention to be adopted. Features related to the embedding of the sub-graph indued by the onvention adopting reviews within the general exposure graph are inluded under onvention bias as an impliit measure of orrelation between exposure and onvention propagation for that onvention.. Exposure etwork Model Definition: a review r is exposed to review r ' if r ' is either an earlier review by user( r ), or one of the k preeding reviews on beer( r ). Review r is exposed to a onvention if one of the reviews r is exposed to uses. Note that there is no requirement for usage by r itself. Formally, the set of reviews r is exposed to is Exp[ r] Exp [ r] Exp [ r] where: B

B [ r] { review ( ) ( ) ( ) < ( )} [ r] { review ( ) ( ) ( ) ( ) < ( )} Exp x user r user x time x time r Exp x beer r beer x rank r k rank x rank r rank( x ) is the hronologial rank of review x among the reviews of beer( x ). Note that rank( x) rank( r) time x ( ) time( r) <. < also implies The reasoning behind this definition is that exposure between reviews originates from previous usage of a onvention ( user-based exposure ) or from ontagion from one of the immediate preeding reviews of the same produt whih are immediately visible to the reviewer ( produt-based exposure ). We set the produt exposure parameter k to be 5, whih is the number of reviews in a page on the BeerAdvoate website. While the above definition is binary ( r is either exposed to r ' or not), the extent of exposure of r to the reviews is not uniform - but a dereasing funtion of the time differene between the reviews in the ase of same-user exposure (using the same onventions as a reent review is more probable than using a onvention from an anient one), or a dereasing funtion of the rank differene between the reviews in the ase of same-produt exposure (the reviews lose in rank to the urrent review are more easily visible to the reviewer, and probably more relevant). The above definition of exposure indues a direted network struture G (, E) over the dataset, where the nodes represent reviews, and an edge ( r ' r) exists in the network if and only if review r is exposed to review r '. The extent of exposure of r to r ' then defines a weight for the edge ( r ' r) whih is a dereasing funtion of the time / rank differene between r and r '. We notied that the likelihood of onvention propagation dereases far more drastially for produt-based exposures as rank differene inreases than it does for user-based exposure as time differene inreases. Thus the edge weights are modeled in the following manner: ( time( r) rime( r )) ( r r) ( rank( r) rank( r )) r r ' ; if ' is due to user-based exposure w( r ' r) exp ( ' ) ; if ( ' ) is due to produt-based exposure The exposure and network model disussed here deal with reviews as basi entities (i.e. nodes in the network), and not reviewers. This is important in order to inorporate temporal onsiderations into the model. A review is an instantaneous event - and exposure for the purpose of onvention propagation is only relevant at the moment of the review. Thus, we annot disuss absolute exposure between reviewers (who write multiple reviews at different times), but only exposure at speifi times, whih is equivalent to disussing exposure between reviews. 3. Features The basi features (attributes) we use for learning and prediting adoption of onvention are desribed in the table below. They divided into three ategories: The extent of exposure to onvention of the review r (features,), bias of user( r ) at time( r ) (features 3,4), and the bias of the onvention at time( r ) (features 5-9). The last ategory also inludes features that ome to apture the embedding of the sub-graph G indued by the reviews that adopted within the general exposure network G (features 7-9). These features ome to apture the ontinuity and linearity of the spread of the onvention within the exposure network as an impliit measure of orrelation between exposure and onvention propagation for that onvention. For more ompat formulation, the following sets are defined: { ( ) < ( )} ; C : S[, ] { x x T x uses } ; E {( x, y) T T ( x y) E} T r x time x time r r r r r r These sets are in turn used for feature formulation: [ x] { y ( y x) E} [ x] { y ( x y) E} In ; out

Desription Extent of user-based exposure of review r to onvention Formulation x Exp [ r] sore( r, ) w x x Exp [ r] { x uses } w( x r) ( r) Extent of produt-based exposure of review r to onvention sore( r, ) { x uses } w( x r) x ExpB[ r] w x x ExpB[ r] ( r) 3 The fration of onventions adopted by user( r ) up to time( r ) 4 The fration of reviews by user( r ) to adopt a onvention up to time( r ), maximized over all possible onventions sore r x Exp x C ( ) { [ r] : uses } C ( ) max { uses } C Exp [ r] x Exp[ r] sore r x 5 Likelihood of the onvention to get adopted at time( r ) - the fration of reviews that adopted by time( r ) (, ) sore r [, ] S r T r 6 The likelihood of propagation of given that a review is exposed to - the weighted fration at time( r) of exposures (edges in the network) that represent propagations of 7 The fration of adoptions at time( r ) that serve as soures in G (i.e. start a propagation flow) 8 The fration of adoptions at time( r ) that serve as sinks in G (i.e. end a propagation flow) 9 Average propagation fan-out at time( r ) - the average fration of adopters among the out-neighbors of an adopter ( x, y) E[ r] sore r (, ) { x uses y uses } w( x y) ( x, y) E[ r] ( y) w x sore( r, ) In x S r, S r, x S[ r, ] { } sore( r, ) out x S r, S r, x S[ r, ] { } [ x] S[ r ] [ x] out, sore( r, ) S r, x S[ r, ] out The exposure network G is a massive graph of over.5m nodes and over 600M edges, with attribute data on both nodes and edges. Even in a ompat binary representation, the objet representing G is over 30GB in size. Therefore it is ruial that all feature extration will be performed extremely effiiently. Many of the features depend on a sum of the form F( x). x time( x) < time( r) When omputed naively, suh a sum requires O( ) steps where is the number of reviews, whih makes the omputation infeasible for suh a large graph. However, by visiting reviews in their hronologial order and using dynami programming to inrementally ompute the features, we were able to extrat all features in a linear time O( ). For programming onveniene and omputational effiieny, in our analysis we also only address reviews for whih all feature values are available by traversing only the edges - i.e. reviews that both serve as a soure node and a destination node of some edge (i.e. have in-degree and out-degree of at least one). This still results in a massive dataset of 89,066 reviews.

4. Learning and Evaluation We proessed the dataset, onstruted the exposure network G, and extrated the feature values using SNAP - a high-performane library for analysis of massive networks (http://snap.stanford.edu/snap/). We learn adoption of the following set of onventions: A, M, S, T, deent, stik, ream, grass, butter, skunk. The first four are onventions for abbreviations in the BeerAdvoate ommunity (stand for Aroma, Mouth-full, Smell and Taste ). We only ounted the appearanes of these abbreviations in the text when they were used as onventions - when apitalized and followed by a olon or a hyphen. The rest are ommonly repeating not obvious word roots used in beer desriptions. The frequenies of these onventions aross the entire dataset are given in the following table: A M S T deent stik ream grass butter skunk 30.79% 8.03% 30.3% 9.98%.34%.04% 8.49% 7.66%.7%.03% We partitioned the dataset (89,066 reviews) into a training set ontaining 70% of the data points (580,346 reviews) and a test set ontaining the remaining 30% (48,70 reviews). We then trained an SVM (using a liblinear SVM with a linear kernel with L regularization) to learn and predit the adoption of the above onventions using the features extrated from the adoption network. Our baseline for evaluation was naive predition based on onvention frequeny: a predition sheme that onsiders eah review r and onvention independently and predits that r would use with probability p whih is the frequeny of onvention aross the entire dataset. We ompare auray, preision and reall for both predition methods for eah of the onventions. The results are desribed in the following figures:

Auray is a more signifiant for the higher-frequeny onventions than for lower-frequeny onventions. But even then, the positive and negative lasses are highly skewed (the vast majority of reviews do not use a given onvention). Thus, preision and reall should be taken into aount as more signifiant performane metris of the predition sheme. We an see that the SVM performs surprisingly well when a onvention appears frequently enough in the dataset. It also seems that the SVM predits more onservatively (less prone to assign a positive label) than the baseline when the onvention has low frequeny, whih explains the low reall for infrequent onventions. This an be attributed to the fat that SVM aptures dependeny between the data points, whereas our baseline predits for eah point independently. The number of training examples by far exeeds the number of feature used, so it makes sense to use more features and onstrut a riher model by mapping the existing nine features into a higher dimensional feature spae using a kernel. However, trying to do so using the entire dataset (using libsvm SVMs with a higher-degree polynomial kernel) proved to be extremely omputationally expensive. We ompared the preision and reall sores for the most ommonly used onventions using an SVM with a linear kernel vs. one with a 3 rd -degree polynomial kernel (both using L regularization) on a sample of 0K reviews. Despite our initial assumption, the results showed that the performane of both kernels is very similar (we didn t try higher degree kernels due to long runtimes): 5. Conlusion and Potential Future Work We were surprised to see how well lassifiation based on exposure network features performed in omparison to naive frequenybased predition. Yet, we believe that exposure network based features an be further improved - for instane by using statistial inferene to determine values for network edges weights, or by leveraging network statistis (for instane degree distribution or weakly-onneted omponent deomposition) of the entire graph and onvention-adopting sub-graphs. Additionally, suh predition ould be further improved by inorporating non-network based features, suh as user and produt bias, and semanti/linguisti features of the onventions.