Unsupervised Rank Aggregation with Domain-Specific Expertise

Size: px
Start display at page:

Download "Unsupervised Rank Aggregation with Domain-Specific Expertise"

Transcription

1 Unsupervised Rank Aggregation with Domain-Specific Expertise Alexandre Klementiev, Dan Roth, Kevin Small, and Ivan Titov Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 680 Abstract Consider the setting where a panel of judges is repeatedly asked to (partially) rank sets of objects according to given criteria, and assume that the judges expertise depends on the objects domain. Learning to aggregate their rankings with the goal of producing a better joint ranking is a fundamental problem in many areas of Information Retrieval and Natural Language Processing, amongst others. However, supervised ranking data is generally difficult to obtain, especially if coming from multiple domains. Therefore, we propose a framework for learning to aggregate votes of constituent rankers with domain specific expertise without supervision. We apply the learning framework to the settings of aggregating full rankings and aggregating top-k lists, demonstrating significant improvements over a domain-agnostic baseline in both cases. Introduction Consider the setting where judges are repeatedly asked to (partially) rank sets of objects, and assume that each judge tries to reproduce some true underlying ranking to the best of their ability. Rank aggregation aims to combine the rankings of such experts to produce a better joint ranking. Now, imagine that the judges ability to generate rankings depends on the type of objects being ranked or the criteria they are asked to use while ranking. As a simple example, consider a group of people who are asked to rank a set of conference submissions written in English and another set written in French according to their relevance to the conference theme. One would expect that a bilingual expert in the field would produce reasonable rankings of both cases, while a judge who only speaks French and is unfamiliar with the conference topic may produce mediocre rankings (possibly, only using a set of keywords) for the French submissions and random rankings for the English set. We consider the problem of learning to aggregate these rankings by constituent judges with domain-specific expertise into a joint ranking. The problem of aggregating rankings is ubiquitous in Information Retrieval (IR) and Natural Language Processing (NLP). In IR, for instance, meta-search aims to combine the outputs of multiple search engines to produce a better ranking. In machine translation (MT), aggregation of multiple systems built on different underlying principles has received considerable recent attention (e.g. [Rosti et al., 2007]). In many such applications, one would expect the expertise of the constituent components to depend on the input domain. In IR, the quality of rankings produced by search engines has been shown to be query type dependent (e.g. [Geng et al., 2008]): some may specialize on ranking product reviews while others on ranking scientific documents. In MT system aggregation, component systems may be trained on different corpora (e.g. multi-lingual Hansards, or technical manuals), and tend to be fragile when tested on data sampled from a different domain [Koehn and Schroeder, 2007]. Thus, the relative expertise of these systems depends on which distribution the test source language data is sampled from. Moreover, in these and many other aggregation examples, the input domain information in regards to the expertise of each judge is latent. Supervised learning approaches to solving rank aggregation (e.g. [Liu et al., 2007]) are impractical for many applications as labeled ranking data is generally very expensive to obtain. In IR, for example, heuristics or indirect methods are often employed (e.g. [Shaw and Fox, 994; Dwork et al., 200; Joachims, 2002]) to produce a surrogate for true preference information. Needless to say, supervision for typed ranked data is even harder to obtain. The principal contribution of this paper is a framework for learning to aggregate votes of constituent rankers with domain-specific expertise without supervision. Given only a set of constituent rankings, we learn an aggregation function that attempts to recreate the true ranking without labeled data. The intuition behind our approach is simple: rankers which are experts in a given domain are better at generating votes close to true rankings for that domain and thus will tend to agree with each other, whereas the non-experts will not. Given rankers votes for a set of queries, we aim to discover patterns of rankers agreement. Each distinct pattern corresponds to a specific (latent) domain, enabling us to deduce domain-specific expertise of each judge. The ability to identify expertise of a ranker for a specific query can potentially greatly improve accuracy of the model over simpler aggregation techniques. While our framework does not commit to any particular type of (partial) rankings, we show how to apply it to two kinds of rankings: permutations and top-k lists. 0

2 The remainder of the paper is organized as follows. Section 2 introduces relevant notation and reviews distancebased ranking models. Section 3 introduces our model for aggregating over rankers with domain-specific expertise and Section 4 derives an EM-based algorithm for learning model parameters, and describes methods to make the learning efficient. Section 5 applies the learning framework to two types of rankings: permutations (full rankings) and top-k lists. Finally, Section 6 discusses relevant work, and Section 7 concludes the work and gives ideas for future directions. 2 Distance-Based Models While there has been a significant research effort in the statistics community with regards to analyzing and modeling ranking data (e.g. [Marden, 995]), we are specifically interested in distance-based models beginning with the Mallows model [Mallows, 957]. 2. Notation and Definitions Let us first introduce the notation we will use throughout the paper. The models presented in this work do not commit to a particular type of (partial) ranking. Therefore, we will generally use the same notation for all (partial) ranking types, and the particular type we imply should be clear from context. However, since we apply the model to two particular types of rankings, permutations (full rankings) and top-k lists, let us start with the relevant definitions. Permutations Let {x,x 2,...,x n } be a set of objects to be ranked by a judge. A permutation π is a bijection from the set {, 2,...,n} onto itself; we will denote by π(i) the rank assigned to object x i, and by π (j) the index of the object assigned to rank j. Let us also define e to be the identity permutation (,...,n). Finally, let us denote S n to be the set of all n! permutations over n objects. Let us define a distance between two permutations d : S n S n R and assume that, in addition to satisfying the usual metric properties, it is also right-invariant [Diaconis and Graham, 977]. That is, we assume that the value of d(, ) does not depend on how the objects are indexed, a property natural to the applications we consider in this work. More specifically, if the objects are re-indexed by τ, the distance between two permutation over the objects does not change: d(π, σ) = d(πτ, στ) π, σ, τ S n, where πτ is defined by πτ(i) = π(τ(i)). Note that d(π, σ) = d(ππ,σπ )=d(e, σπ ). That is, the value of d does not change if we re-index the objects such that one of the permutations becomes e =(,...,n) and the other ν = σπ. Borrowing the notation from [Fligner and Verducci, 986] we abbreviate d(e, ν) as D(ν). In the remainder of the paper, when we define ν as a random variable, we may treat D(ν) =D as a random variable as well: whether it is a distance function or a r.v. will be clear from the context. Examples of common right-invariant distance functions over permutations include Kendall s tau distance : I(x) =if x>0, and 0 otherwise. n d K (π, σ) = I(πσ (i) πσ (j)) () j>i which can be also defined as the minimum number of adjacent transpositions required to turn π into σ, and the Spearman s footrule: d S (π, σ) = n π(i) σ(i) (2) Top-k Lists Top-k lists indicate preferences over different (possibly, overlapping) subsets of k n objects, where the elements not in the list are implicitly ranked below all of the list elements. They are used extensively in the IR community to represent the output of a retrieval system; a top-0 list, for instance, may represent the first page of a search engine output. A number of distance measures over top-k lists have been proposed and studied (e.g. [Fagin et al., 2003]). Of particular relevance to us will be the generalization of the Kendall s tau distance () to top-k lists proposed in [Klementiev et al., 2008] which they defined as follows. First two top-k lists π and σ are augmented: items in σ which are not present in π are placed in the same position (k +)in π, and vice versa. The augmented Kendall s tau distance d A (π, σ) is then defined as the minimum number of adjacent transpositions to turn the augmented π into the augmented σ. They show that the distance is right-invariant. 2.2 Mallows Models Distance-based models for ranking data were first introduced in [Mallows, 957], and generate judge s rankings according to: p(π θ, σ) = exp(θ d(π, σ)) (3) Z(θ, σ) where θ R, θ 0 is the the dispersion parameter, the modal ranking σ S n is the location parameter, and Z(θ, σ) = π S n exp(θ d(π, σ)) is a normalizing constant. The probability of ranking π decreases exponentially with distance from the mode σ. The distribution is uniform when θ =0, and becomes sharper as θ grows more negative. Let us note two properties of (3), which will be relevant later in the paper. Firstly, under the right invariance property of d(, ), the normalizing constant does not depend on σ, Z(θ, σ) =Z(θ) (see, e.g. [Lebanon and Lafferty, 2002]). Secondly, [Fligner and Verducci, 986] note that if the distance function can be expressed as D(π) = m V i(π), where random variables V i (π) are independent (with π uniformly distributed), then the MLE of θ under (3), which is the solution to equation E θ (D) = D, may be efficient to compute. [Klementiev et al., 2008] refer to such distance functions as decomposable. [Mallows, 957] first investigated the model with Kendall s and Spearman s metrics on fully ranked data, and the model was later generalized to other distance functions and for use with partially ranked data [Critchlow, 985]. 02

3 2.3 Extended Mallows Models Since the Mallows model was introduced, various extensions have been proposed in statistics and machine learning literature (e.g. [Fligner and Verducci, 986; Murphy and Martin, 2003; Busse et al., 2007]). Of particular interest to us is the multiple input rankings scenario proposed by [Lebanon and Lafferty, 2002] in the context of supervised learning. Assuming that a vector of votes σ =(σ,...,σ K ) from K individual judges is available, the generalized model assigns a probability to ranking π according to: ( K ) p(π θ, σ) = p(π) exp θ i d(π, σ i ) (4) Z(θ, σ) where σ Sn K, θ = (θ,...,θ K ) R K, θ 0, p(π) is a prior, and normalizing constant Z(θ, σ) = π S n p(π) exp( K θ i d(π, σ i )). The free parameters θ i represent the degree of expertise of the individual judges: the closer the value of θ i to zero, the less the vote of the i-th judge affects the assignment of probability. Under the right-invariance property assumption on d(, ), the model has the following associated generative story: p(π, σ θ) =p(π) K p(σ i θ i,π) (5) That is, π is first drawn from prior p(π), and the votes of the K individual judges are produced by drawing independently from K Mallows models p(σ i θ i,π) with the same location parameter π. [Klementiev et al., 2008] propose an Expectation- Maximization (EM) [Dempster et al., 977] based algorithm for learning the parameters of the extended Mallows model from Q vectors of votes {σ (j) } Q j=. Their intuition is that better rankers tend to exhibit agreement more than poor ones (assumed not to collude), which was supported empirically. In the meta-search setting, for example, the observed data is comprised of the rankings σ (j) produced by K search engines for each of the Q queries, and the goal is to infer the joint ranking π. 3 Distance-Based Models with Domain Expertise Up to this point, we have only considered type agnostic models. These models assume that the expertise of the constituent rankers do not depend on the input data domain (e.g. types of queries in the meta-search setting, or the distributions of the test sentences in the MT aggregation setting), or equivalently, that all input comes from the same domain. In the following discussion we will correspondingly refer to input data domains as types. As we have argued, in practical applications, it is more natural to model the data generation process such that domainspecific expertise of the rankers is accounted for explicitly. We now proceed to the task we set out to investigate: the case of aggregating votes of rankers with domain-specific expertise. We propose a mixture of the extended distance-based models (4) as a means to formalize and model this setting. 3. Mixture Model We begin by augmenting the generative story (5) to include the notion of types. First, a type t is selected from T types with probability α t. Then, the location parameter (true ranking) π is drawn uniformly, and the votes of individual experts are drawn independently from K Mallows models p(σ i θ t,i,π) with the same location parameter π. That is, K p(π, σ,t θ, α) α t p(σ i π, θ t,i ) (6) The right-invariance property of the distance function can be used to derive the corresponding conditional model: ( K ) exp θ t,i d(π, σ i ) p(π, t σ, θ, α) =α t (7) Z(θ, σ) where σ = (σ,...,σ K ) Sn K, θ R T K, θ 0, and normalizing constant Z(θ, σ) = T t= π S n α t exp( K θ t,i d(π, σ i )). The free model parameters are a T K matrix θ, where θ t,i represent the degree of expertise of the judge i for type t, and T mixture weights α. Note that this model is more expressive than the type agnostic model which has a single free parameter θ i to model the expertise of each judge. 4 Learning and Inference We now derive an EM-based algorithm for learning the free model parameters α and θ, and propose methods to make both learning and inference efficient. 4. Learning Let us denote the estimates of parameters in the previous EM iteration with α and θ. In our setting, the examples we observe are vectors of rankings {σ (j) } Q j=, where σ(j) i is the ranking produced by i-th ranker for example j. The unobserved data are the corresponding true rankings with the associated types: {(t (j),π (j) )} Q j=. In order to make the learning process more stable we use a symmetric Dirichlet prior Dir(β) on the topic distribution α. Now, following the generative story in Section 3. and taking into account the prior defined on α, we derive the M step (proofs are omitted due to space restrictions): Proposition The expected value of the complete data loglikelihood under (7) is maximized by α and θ such that: α t = (8) Q β + p(π (j),t σ (j), θ, α ) Tβ + Q j= π (j) S n E θt,i (D) = (9) Q d(π (j),σ (j) i )p(π (j),t σ (j), θ, α ) α t Q j= π (j) S n 03

4 Thus, on each iteration of EM, (8) is used to update α (T updates) and (9) are used to update θ parameters (T K updates). Evaluating (8), estimating the right-hand side of (9), and then solving the left hand side for θ t,i directly are all computationally intractable. However, learning can be made efficient when particular properties of the distance function discussed in Section 4. are satisfied. The ideas presented so far can be applied to any kind of (partial) rankings. However, in order to propose efficient alternatives to direct estimation of α and θ, we need to commit to specific ranking types. Let us consider the cases of combining permutations and combining top-k lists as two examples, and address the three problems individually. Estimating the right-hand side of (9) In order to estimate the right-hand side of (9) we need to obtain samples from the model (7). For high dimensional multimodal distributions such as (7), standard sampling methods (e.g. Metropolis-Hastings [Hastings, 970]) do not converge in a reasonable amount of time. Annealing methods [Neal, 996] are still computationally expensive, and additionally require careful tuning of free parameters (i.e. annealing schedule). Therefore, we use the fast approximate sampling algorithm described below. We start by obtaining a sample from each mixture component t. This is done using the Metropolis-Hastings algorithm applied to the extended Mallows model (4). The chain is constructed as follows: denoting the most recent value sampled as τ (t), two indices i, j {,...,n} are chosen at random and the objects τ (t) (i) and τ (t) (j) are transposed forming a new permutation τ (t). If a = p( τ (t),t σ, θ, α )/p(τ (t),t σ, θ, α ) the chain moves to τ (t).ifa<, the chain moves to τ (t) with probability a; otherwise, it stays at τ (t). Once the sampling is complete for each chain, we sample permutations from the obtained set of pertopic permutations ( (τ (),τ (2),...,τ (T ) ) with probability K ) α t exp θ t,i d(τ (t),σ i ). The underlying assumption for this approximate sampling algorithm is that the probability mass of a mixture component is proportional to the probability of a sample generated from this component. The sampling procedure is repeated for all Q examples, and the average per-type distance is used as the estimate of the right-hand side of (9). This procedure is easily extended to top-k lists: when forming the next chain element τ (t), we may either transpose two elements of τ (t), or replace one of its elements with an element not in τ (t) (i.e. from the pool of all elements in the constituent rankings of the given example). While convergence results have been presented in statistics literature (e.g. [Diaconis and Saloff-Coste, 998]) for some distances when sampling from (3), no results are known for the extended models (4) and (7) we have considered in this work. However, we found experimentally that chains converge rapidly for the two settings we are considering. Moreover, as the chain proceeds, we only need to compute the incremental change in distance due to a single swap at each step, which results in substantial computational savings. Estimating α Following (8), we estimate the type mixture coefficients α as the proportions of the types in all of the Q sampled rankings (with the additional pseudocounts β). Solving the left-hand side of (9) for θ As we noted in Section 2.2, solving E θ (D) =D under (3) for θ may be efficient for decomposable distance functions. Indeed, for (decomposable) Kendall s tau distance over permutations, E θ (D K ) is efficient to compute and is monotone decreasing, so line search for θ converges quickly [Fligner and Verducci, 986]. An analogous result was derived by [Klementiev et al., 2008] for the augmented Kendall s tau over top-k lists (Section 2.). Thus, requiring distance functions to satisfy the decomposability property may enable us to solve the left-hand side of (9) efficiently. 4.2 Inference We use the sampling procedure described in Section 4. during inference to estimate the most likely permutation for a given set of votes. 5 Evaluation In the absence of available labeled ranked data exhibiting domain variability, we construct our own data sets representative of a realistic application scenario. We evaluate the proposed framework on two types of rankings we have considered so far: permutations and top-k lists. 5. Aggregating Typed Permutations We first consider the case of rank aggregation for typed permutations using Kendall s tau as the distance function in (7). For this first set of experiments, we considered K =0 judges, producing rankings over n =30objects for Q = 00 examples. Each example was associated with one of T =5 types, according to α = (0.4, 0.2, 0.2, 0., 0.). Roughly half of the judges are chosen to be experts (i.e. produce good rankings) for each of the T types. More precisely, the votes of individual judges were produced by sampling models (3), with the same location parameter σ = e (an identity permutation over n =30objects). We chose their concentration parameters as follows: we first flip a coin to decide whether or not the i-th ranker is an expert for type t. If the ranker is an expert, its parameter θt,i is randomly chosen from a small interval close to, otherwise it is chosen to be around At the end of each EM iteration, we sampled the current model (7), and computed the Kendall s tau distance between the generated permutation π and the true permutation σ for each of the Q examples, and report the performance in terms of the average distance. In addition to the sampling procedure we proposed to estimate the right-hand side of (9), we also tried using the true permutation σ along with the corresponding true type t in place of the sampled values to see how well the learning procedure can do. 04

5 Average d K to true permutation Typed Sampling Typed True UnTyped Sampling UnTyped True Average d A to true top-k list Typed Sampling Typed True UnTyped Sampling UnTyped True EM Iteration Figure : Learning performance over permutations when RHS is estimated using sampling (Sampling), or the true permutation (True). All results are averaged over 0 runs. Our model (Typed) significantly outperforms the type agnostic model (4) (UnTyped) EM Iteration Figure 2: Learning performance over top-k lists when RHS is estimated using sampling (Sampling), or the true top-k list (True). All results are averaged over 0 runs. Our model (Typed) significantly outperforms the type agnostic model (4) (UnTyped). Figure shows that our model significantly outperforms the type agnostic model proposed in [Klementiev et al., 2008] for this setting. While the type agnostic model (UnTyped Sampling) achieves an average distance of 34, our model (Typed Sampling) requires an average distance of 9, representing about 44% reduction in the number of adjacent transpositions at convergence. Moreover, the model converges quickly, and its performance approaches the case when true permutations and their corresponding types are known. 5.2 Aggregating Typed Top-K Lists We now consider the case of combining typed top-k lists using the augmented Kendall s tau distance (see Section 2.) in (7). The setup and the data was produced similarly to permutations experiments in Section 2.. However, when generating top-k, k =30objects were selected from a pool of n = 00. We compare our top-k list instantiated model against the corresponding type agnostic model, reporting results in Figure 2 in terms of the average augmented Kendall s tau distance from the true top-k lists. Again, our framework significantly outperforms the type agnostic model, reducing the average augmented Kendall s tau distance to true lists by approximately 3, resulting in 47% reduction in the number of adjacent transpositions. 5.3 Discussion In many practical applications, constituent rankers are likely to specialize in some input domains, while performing poorly in others. Both experiments demonstrate that the model we proposed can take advantage of the latent type information to learn to aggregate the votes of such rankers effectively. It produces better joint rankings than the type agnostic model of [Klementiev et al., 2008]. The EM-based learning algorithm we propose is robust, converging quickly with just Q = 00 examples across multiple runs. It is worth emphasizing that the additional expressivity of the model we propose does not prevent it from learning from a small number of examples successfully. Additionally, the computational requirements scale linearly with the number of topics. 6 Relevant Work Modeling ranked data is an extensively studied problem in statistics, information retrieval, and machine learning literature. Distance-based models with Kendall s and Spearman s metrics for fully ranked data were introduced and investigated in [Mallows, 957]. A number of new metrics for partial rankings were since introduced and analyzed (e.g. [Critchlow, 985; Estivill-Castro et al., 993; Fagin et al., 2003]), and various extensions to the model itself have been proposed (e.g. [Fligner and Verducci, 986]); see [Marden, 995] for an excellent overview. [Murphy and Martin, 2003] used mixtures of Mallows models (3) to analyze ranking data from heterogeneous populations, and [Busse et al., 2007] propose a method for clustering such data. A large body of work also exists on mixture models (or LCA, [Lazarsfeld and Henry, 968]). While a number of heuristic [Shaw and Fox, 994; Dwork et al., 200] and supervised learning approaches [Liu et al., 2007] exist for rank aggregation, few learn to combine rankings without supervision. Most directly related to our work is the generalization to multiple input rankings proposed and studied in [Lebanon and Lafferty, 2002]; [Klementiev et al., 2008] derived an EMbased algorithm to estimate its parameters. We extend their model to include the notion of domain expertise. 05

6 7 Conclusions In this work, we propose an unsupervised learning framework for rank aggregation over votes of rankers with domainspecific expertise. We introduce a model, derive an EM-based algorithm to estimate its parameters, and propose methods to make learning efficient. Finally, we evaluate the framework on combining full rankings and on combining top-k lists, and demonstrate that it significantly and robustly outperforms the domain agnostic model proposed in [Klementiev et al., 2008]. This approach is potentially applicable to many problems in Information Retrieval and Natural Language Processing, e.g. meta-search or aggregation of machine translation systems output, where domain variability presents a major challenge [Koehn and Schroeder, 2007; Geng et al., 2008]. Developing unsupervised techniques is particularly important as annotated data is very difficult to obtain for ranking problems, especially for multiple domains. Acknowledgments We thank Ming-Wei Chang, Vivek Srikumar, and the anonymous reviewers for their valuable suggestions. This work is supported by NSF grant ITR IIS , DARPA funding under the Bootstrap Learning Program, Swiss NSF scholarship PBGE , and by MIAS, a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC. References [Busse et al., 2007] Ludwig M. Busse, Peter Orbanz, and Joachim M. Buhmann. Cluster analysis of heterogeneous rank data. In Proc. of the International Conference on Machine Learning (ICML), [Critchlow, 985] Douglas E. Critchlow. Metric Methods for Analyzing Partially Ranked Data, volume 34 of Lecture Notes in Statistics. Springer-Verlag, 985. [Dempster et al., 977] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39: 38, 977. [Diaconis and Graham, 977] Persi Diaconis and R. L. Graham. Spearman s footrule as a measure of disarray. Journal of the Royal Statistical Society, 39: , 977. [Diaconis and Saloff-Coste, 998] P. Diaconis and L. Saloff- Coste. What do we know about the Metropolis algorithm? Journal of Computer and System Sciences, 57:20 36, 998. [Dwork et al., 200] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proc. of the International World Wide Web Conference (WWW), pages , 200. [Estivill-Castro et al., 993] Vladimir Estivill-Castro, Heikki Mannila, and Derick Wood. Right invariant metrics and measures of presortedness. Discrete Applied Mathematics, 42: 6, 993. [Fagin et al., 2003] Ronald Fagin, Ravi Kumar, and D. Sivakumar. Comparing top k lists. SIAM Journal on Discrete Mathematics, 7:34 60, [Fligner and Verducci, 986] M. A. Fligner and J. S. Verducci. Distance based ranking models. Journal of the Royal Statistical Society, 48: , 986. [Geng et al., 2008] Xiubo Geng, Tie-Yan Liu, Tao Qin, Andrew Arnold, Hang Li, and Heung-Yeung Shum. Query dependent ranking using k-nearest neighbor. In SIGIR, pages 5 22, [Hastings, 970] W. K. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika, 57():97 09, April 970. [Joachims, 2002] T. Joachims. Unbiased evaluation of retrieval quality using clickthrough data. In SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval, [Klementiev et al., 2008] Alexandre Klementiev, Dan Roth, and Kevin Small. Unsupervised rank aggregation with distance-based models. In Proc. of the International Conference on Machine Learning (ICML), [Koehn and Schroeder, 2007] Philipp Koehn and Josh Schroeder. Experiments in domain adaptation for statistical machine translation. In ACL 2007, Second Workshop on Statistical Machine Translation, Prague, Czech Republic, [Lazarsfeld and Henry, 968] Paul F. Lazarsfeld and Neil W. Henry. Latent Structure Analysis. Houghton Mifflin, 968. [Lebanon and Lafferty, 2002] Guy Lebanon and John Lafferty. Cranking: Combining rankings using conditional probability models on permutations. In Proc. of the International Conference on Machine Learning (ICML), [Liu et al., 2007] Yu-Ting Liu, Tie-Yan Liu, Tao Qin, Zhi- Ming Ma, and Hang Li. Supervised rank aggregation. In Proc. of the International World Wide Web Conference (WWW), [Mallows, 957] C. L. Mallows. Non-null ranking models. Biometrika, 44:4 30, 957. [Marden, 995] John I. Marden. Analyzing and Modeling Rank Data. Chapman & Hall, 995. [Murphy and Martin, 2003] Thomas Brendan Murphy and Donal Martin. Mixtures of distance-based models for ranking data. Computational Statistics & Data Analysis, 4: , [Neal, 996] Radford M. Neal. Sampling from multimodal distribution using tempered transitions. Statistics and Computing, 6: , 996. [Rosti et al., 2007] Antti-Veikko I. Rosti, Necip Fazil Ayan, Bing Xiang, Spyros Matsoukas, Richard Schwartz, and Bonnie J. Dorr. Combining outputs from multiple machine translation systems. In Proc. of the Annual Meeting of the North American Association of Computational Linguistics (NAACL), pages , [Shaw and Fox, 994] Joseph A. Shaw and Edward A. Fox. Combination of multiple searches. In Text REtrieval Conference (TREC), pages ,

Unsupervised Rank Aggregation with Distance-Based Models

Unsupervised Rank Aggregation with Distance-Based Models Alexandre Klementiev klementi@uiuc.edu Dan Roth danr@uiuc.edu Kevin Small ksmall@uiuc.edu University of Illinois at Urbana-Champaign, 0 N Goodwin Ave, Urbana, IL 680 USA Abstract The need to meaningfully

More information

Unsupervised Rank Aggregation with Distance-Based Models

Unsupervised Rank Aggregation with Distance-Based Models Unsupervised Rank Aggregation with Distance-Based Models Alexandre Klementiev, Dan Roth, and Kevin Small University of Illinois at Urbana-Champaign Motivation Consider a panel of judges Each (independently)

More information

Cluster Analysis of Heterogeneous Rank Data

Cluster Analysis of Heterogeneous Rank Data Ludwig M. Busse Peter Orbanz Joachim M. Buhmann Institute of Computational Science, ETH Zurich, 8092 Zurich, Switzerland ludwig.busse@inf.ethz.ch porbanz@inf.ethz.ch jbuhmann@inf.ethz.ch Abstract Cluster

More information

Query Likelihood with Negative Query Generation

Query Likelihood with Negative Query Generation Query Likelihood with Negative Query Generation Yuanhua Lv Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801 ylv2@uiuc.edu ChengXiang Zhai Department of Computer

More information

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION

ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION ADAPTIVE METROPOLIS-HASTINGS SAMPLING, OR MONTE CARLO KERNEL ESTIMATION CHRISTOPHER A. SIMS Abstract. A new algorithm for sampling from an arbitrary pdf. 1. Introduction Consider the standard problem of

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

A Formal Approach to Score Normalization for Meta-search

A Formal Approach to Score Normalization for Meta-search A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003

More information

Generalized Inverse Reinforcement Learning

Generalized Inverse Reinforcement Learning Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract

More information

Robust Relevance-Based Language Models

Robust Relevance-Based Language Models Robust Relevance-Based Language Models Xiaoyan Li Department of Computer Science, Mount Holyoke College 50 College Street, South Hadley, MA 01075, USA Email: xli@mtholyoke.edu ABSTRACT We propose a new

More information

A noninformative Bayesian approach to small area estimation

A noninformative Bayesian approach to small area estimation A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported

More information

Package rankdist. April 8, 2018

Package rankdist. April 8, 2018 Type Package Title Distance Based Ranking Models Version 1.1.3 Date 2018-04-02 Author Zhaozhi Qian Package rankdist April 8, 2018 Maintainer Zhaozhi Qian Implements distance

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Ranking Clustered Data with Pairwise Comparisons

Ranking Clustered Data with Pairwise Comparisons Ranking Clustered Data with Pairwise Comparisons Alisa Maas ajmaas@cs.wisc.edu 1. INTRODUCTION 1.1 Background Machine learning often relies heavily on being able to rank the relative fitness of instances

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

Ranking Clustered Data with Pairwise Comparisons

Ranking Clustered Data with Pairwise Comparisons Ranking Clustered Data with Pairwise Comparisons Kevin Kowalski nargle@cs.wisc.edu 1. INTRODUCTION Background. Machine learning often relies heavily on being able to rank instances in a large set of data

More information

Clustering Documents in Large Text Corpora

Clustering Documents in Large Text Corpora Clustering Documents in Large Text Corpora Bin He Faculty of Computer Science Dalhousie University Halifax, Canada B3H 1W5 bhe@cs.dal.ca http://www.cs.dal.ca/ bhe Yongzheng Zhang Faculty of Computer Science

More information

Motivation. Technical Background

Motivation. Technical Background Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering

More information

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR

Samuel Coolidge, Dan Simon, Dennis Shasha, Technical Report NYU/CIMS/TR Detecting Missing and Spurious Edges in Large, Dense Networks Using Parallel Computing Samuel Coolidge, sam.r.coolidge@gmail.com Dan Simon, des480@nyu.edu Dennis Shasha, shasha@cims.nyu.edu Technical Report

More information

A Deterministic Global Optimization Method for Variational Inference

A Deterministic Global Optimization Method for Variational Inference A Deterministic Global Optimization Method for Variational Inference Hachem Saddiki Mathematics and Statistics University of Massachusetts, Amherst saddiki@math.umass.edu Andrew C. Trapp Operations and

More information

Decision Tree and Instance-Based Learning for Label Ranking

Decision Tree and Instance-Based Learning for Label Ranking Weiwei Cheng cheng@mathematik.uni-marburg.de Jens Hühn huehn@mathematik.uni-marburg.de Eyke Hüllermeier eyke@mathematik.uni-marburg.de Mathematics and Computer Science, Marburg University, Hans-Meerwein-Str.,

More information

Geoff McLachlan and Angus Ng. University of Queensland. Schlumberger Chaired Professor Univ. of Texas at Austin. + Chris Bishop

Geoff McLachlan and Angus Ng. University of Queensland. Schlumberger Chaired Professor Univ. of Texas at Austin. + Chris Bishop EM Algorithm Geoff McLachlan and Angus Ng Department of Mathematics & Institute for Molecular Bioscience University of Queensland Adapted by Joydeep Ghosh Schlumberger Chaired Professor Univ. of Texas

More information

Interval Algorithms for Coin Flipping

Interval Algorithms for Coin Flipping IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.2, February 2010 55 Interval Algorithms for Coin Flipping Sung-il Pae, Hongik University, Seoul, Korea Summary We discuss

More information

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2015 1 : Introduction to GM and Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Wenbo Liu, Venkata Krishna Pillutla 1 Overview This lecture

More information

Rank Aggregation Methods for the Web

Rank Aggregation Methods for the Web S 886 dvanced Topics in rtificial Intelligence: Multiagent Systems Rank ggregation Methods for the Web ynthia work Ravi Kumar Moni Naor. Sivakumar Presented by: Wanying Luo 1 Outline What is rank aggregation

More information

More Efficient Classification of Web Content Using Graph Sampling

More Efficient Classification of Web Content Using Graph Sampling More Efficient Classification of Web Content Using Graph Sampling Chris Bennett Department of Computer Science University of Georgia Athens, Georgia, USA 30602 bennett@cs.uga.edu Abstract In mining information

More information

Directional Sensor Control for Maximizing Information Gain

Directional Sensor Control for Maximizing Information Gain Directional Sensor Control for Maximizing Information Gain Shankarachary Ragi a, Hans D. Mittelmann b, and Edwin K. P. Chong a a Department of Electrical and Computer Engineering, Colorado State University,

More information

Hierarchical Mixture Models for Nested Data Structures

Hierarchical Mixture Models for Nested Data Structures Hierarchical Mixture Models for Nested Data Structures Jeroen K. Vermunt 1 and Jay Magidson 2 1 Department of Methodology and Statistics, Tilburg University, PO Box 90153, 5000 LE Tilburg, Netherlands

More information

Short-Cut MCMC: An Alternative to Adaptation

Short-Cut MCMC: An Alternative to Adaptation Short-Cut MCMC: An Alternative to Adaptation Radford M. Neal Dept. of Statistics and Dept. of Computer Science University of Toronto http://www.cs.utoronto.ca/ radford/ Third Workshop on Monte Carlo Methods,

More information

Comparing different interpolation methods on two-dimensional test functions

Comparing different interpolation methods on two-dimensional test functions Comparing different interpolation methods on two-dimensional test functions Thomas Mühlenstädt, Sonja Kuhnt May 28, 2009 Keywords: Interpolation, computer experiment, Kriging, Kernel interpolation, Thin

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Hybrid Quasi-Monte Carlo Method for the Simulation of State Space Models

Hybrid Quasi-Monte Carlo Method for the Simulation of State Space Models The Tenth International Symposium on Operations Research and Its Applications (ISORA 211) Dunhuang, China, August 28 31, 211 Copyright 211 ORSC & APORC, pp. 83 88 Hybrid Quasi-Monte Carlo Method for the

More information

Edge-exchangeable graphs and sparsity

Edge-exchangeable graphs and sparsity Edge-exchangeable graphs and sparsity Tamara Broderick Department of EECS Massachusetts Institute of Technology tbroderick@csail.mit.edu Diana Cai Department of Statistics University of Chicago dcai@uchicago.edu

More information

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems

A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Using Maximum Entropy for Automatic Image Annotation

Using Maximum Entropy for Automatic Image Annotation Using Maximum Entropy for Automatic Image Annotation Jiwoon Jeon and R. Manmatha Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst Amherst, MA-01003.

More information

Rank Aggregation for Similar Items

Rank Aggregation for Similar Items Rank Aggregation for Similar Items D. Sculley Department of Computer Science Tufts University, Medford, MA, USA dsculley@cs.tufts.edu Abstract The problem of combining the ranked preferences of many experts

More information

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation

Supplementary Material: The Emergence of. Organizing Structure in Conceptual Representation Supplementary Material: The Emergence of Organizing Structure in Conceptual Representation Brenden M. Lake, 1,2 Neil D. Lawrence, 3 Joshua B. Tenenbaum, 4,5 1 Center for Data Science, New York University

More information

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013 Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Discovery of the Source of Contaminant Release

Discovery of the Source of Contaminant Release Discovery of the Source of Contaminant Release Devina Sanjaya 1 Henry Qin Introduction Computer ability to model contaminant release events and predict the source of release in real time is crucial in

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies

More information

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann

Search Engines Chapter 8 Evaluating Search Engines Felix Naumann Search Engines Chapter 8 Evaluating Search Engines 9.7.2009 Felix Naumann Evaluation 2 Evaluation is key to building effective and efficient search engines. Drives advancement of search engines When intuition

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Object Search: Supporting Structured Queries in Web Search Engines

Object Search: Supporting Structured Queries in Web Search Engines Object Search: Supporting Structured Queries in Web Search Engines Kim Cuong Pham, Nicholas Rizzolo, Kevin Small, Kevin Chen-Chuan Chang, Dan Roth University of Illinois at Urbana-Champaign Tufts University

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time

Today. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine

More information

An Empirical Comparison of Spectral Learning Methods for Classification

An Empirical Comparison of Spectral Learning Methods for Classification An Empirical Comparison of Spectral Learning Methods for Classification Adam Drake and Dan Ventura Computer Science Department Brigham Young University, Provo, UT 84602 USA Email: adam drake1@yahoo.com,

More information

Accelerometer Gesture Recognition

Accelerometer Gesture Recognition Accelerometer Gesture Recognition Michael Xie xie@cs.stanford.edu David Pan napdivad@stanford.edu December 12, 2014 Abstract Our goal is to make gesture-based input for smartphones and smartwatches accurate

More information

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches

Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches Masaki Eto Gakushuin Women s College Tokyo, Japan masaki.eto@gakushuin.ac.jp Abstract. To improve the search performance

More information

Variational Methods for Graphical Models

Variational Methods for Graphical Models Chapter 2 Variational Methods for Graphical Models 2.1 Introduction The problem of probabb1istic inference in graphical models is the problem of computing a conditional probability distribution over the

More information

On the Parameter Estimation of the Generalized Exponential Distribution Under Progressive Type-I Interval Censoring Scheme

On the Parameter Estimation of the Generalized Exponential Distribution Under Progressive Type-I Interval Censoring Scheme arxiv:1811.06857v1 [math.st] 16 Nov 2018 On the Parameter Estimation of the Generalized Exponential Distribution Under Progressive Type-I Interval Censoring Scheme Mahdi Teimouri Email: teimouri@aut.ac.ir

More information

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007 CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments

An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments Hui Fang ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign Abstract In this paper, we report

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Clustering Relational Data using the Infinite Relational Model

Clustering Relational Data using the Infinite Relational Model Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015

More information

Clustering web search results

Clustering web search results Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017 Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax

More information

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Unsupervised Learning: Clustering Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com (Some material

More information

5. DIVIDE AND CONQUER I

5. DIVIDE AND CONQUER I 5. DIVIDE AND CONQUER I mergesort counting inversions closest pair of points randomized quicksort median and selection Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley Copyright 2013

More information

Text Modeling with the Trace Norm

Text Modeling with the Trace Norm Text Modeling with the Trace Norm Jason D. M. Rennie jrennie@gmail.com April 14, 2006 1 Introduction We have two goals: (1) to find a low-dimensional representation of text that allows generalization to

More information

Learning to Rank: A New Technology for Text Processing

Learning to Rank: A New Technology for Text Processing TFANT 07 Tokyo Univ. March 2, 2007 Learning to Rank: A New Technology for Text Processing Hang Li Microsoft Research Asia Talk Outline What is Learning to Rank? Ranking SVM Definition Search Ranking SVM

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK

AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK PRESENTED BY: DEEVEN PAUL ADITHELA 2708705 OUTLINE INTRODUCTION DIFFERENT TYPES OF FILTERING

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

CS281 Section 9: Graph Models and Practical MCMC

CS281 Section 9: Graph Models and Practical MCMC CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs

More information

Optimizing Search Engines using Click-through Data

Optimizing Search Engines using Click-through Data Optimizing Search Engines using Click-through Data By Sameep - 100050003 Rahee - 100050028 Anil - 100050082 1 Overview Web Search Engines : Creating a good information retrieval system Previous Approaches

More information

Chapter 7: Competitive learning, clustering, and self-organizing maps

Chapter 7: Competitive learning, clustering, and self-organizing maps Chapter 7: Competitive learning, clustering, and self-organizing maps António R. C. Paiva EEL 6814 Spring 2008 Outline Competitive learning Clustering Self-Organizing Maps What is competition in neural

More information

On Universal Cycles of Labeled Graphs

On Universal Cycles of Labeled Graphs On Universal Cycles of Labeled Graphs Greg Brockman Harvard University Cambridge, MA 02138 United States brockman@hcs.harvard.edu Bill Kay University of South Carolina Columbia, SC 29208 United States

More information

MCMC Methods for data modeling

MCMC Methods for data modeling MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering Introduction 1. Symposium on Data Modelling 2. Outline: a. Definition and uses of MCMC b. MCMC algorithms

More information

Lecture 17 November 7

Lecture 17 November 7 CS 559: Algorithmic Aspects of Computer Networks Fall 2007 Lecture 17 November 7 Lecturer: John Byers BOSTON UNIVERSITY Scribe: Flavio Esposito In this lecture, the last part of the PageRank paper has

More information

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008

Regularization and Markov Random Fields (MRF) CS 664 Spring 2008 Regularization and Markov Random Fields (MRF) CS 664 Spring 2008 Regularization in Low Level Vision Low level vision problems concerned with estimating some quantity at each pixel Visual motion (u(x,y),v(x,y))

More information

Vertex Magic Total Labelings of Complete Graphs

Vertex Magic Total Labelings of Complete Graphs AKCE J. Graphs. Combin., 6, No. 1 (2009), pp. 143-154 Vertex Magic Total Labelings of Complete Graphs H. K. Krishnappa, Kishore Kothapalli and V. Ch. Venkaiah Center for Security, Theory, and Algorithmic

More information

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract Clustering Sequences with Hidden Markov Models Padhraic Smyth Information and Computer Science University of California, Irvine CA 92697-3425 smyth@ics.uci.edu Abstract This paper discusses a probabilistic

More information

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery : Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Hong Cheng Philip S. Yu Jiawei Han University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center {hcheng3, hanj}@cs.uiuc.edu,

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

Available online at ScienceDirect. Procedia Computer Science 20 (2013 )

Available online at  ScienceDirect. Procedia Computer Science 20 (2013 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 20 (2013 ) 522 527 Complex Adaptive Systems, Publication 3 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri

More information

A Deep Relevance Matching Model for Ad-hoc Retrieval

A Deep Relevance Matching Model for Ad-hoc Retrieval A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese

More information

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied

Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Information Processing and Management 43 (2007) 1044 1058 www.elsevier.com/locate/infoproman Examining the Authority and Ranking Effects as the result list depth used in data fusion is varied Anselm Spoerri

More information

Modified Metropolis-Hastings algorithm with delayed rejection

Modified Metropolis-Hastings algorithm with delayed rejection Modified Metropolis-Hastings algorithm with delayed reection K.M. Zuev & L.S. Katafygiotis Department of Civil Engineering, Hong Kong University of Science and Technology, Hong Kong, China ABSTRACT: The

More information

GiRaF: a toolbox for Gibbs Random Fields analysis

GiRaF: a toolbox for Gibbs Random Fields analysis GiRaF: a toolbox for Gibbs Random Fields analysis Julien Stoehr *1, Pierre Pudlo 2, and Nial Friel 1 1 University College Dublin 2 Aix-Marseille Université February 24, 2016 Abstract GiRaF package offers

More information

Assessing the Quality of the Natural Cubic Spline Approximation

Assessing the Quality of the Natural Cubic Spline Approximation Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

Proximity Prestige using Incremental Iteration in Page Rank Algorithm

Proximity Prestige using Incremental Iteration in Page Rank Algorithm Indian Journal of Science and Technology, Vol 9(48), DOI: 10.17485/ijst/2016/v9i48/107962, December 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Proximity Prestige using Incremental Iteration

More information

An Average-Case Analysis of the k-nearest Neighbor Classifier for Noisy Domains

An Average-Case Analysis of the k-nearest Neighbor Classifier for Noisy Domains An Average-Case Analysis of the k-nearest Neighbor Classifier for Noisy Domains Seishi Okamoto Fujitsu Laboratories Ltd. 2-2-1 Momochihama, Sawara-ku Fukuoka 814, Japan seishi@flab.fujitsu.co.jp Yugami

More information

The Plan: Basic statistics: Random and pseudorandom numbers and their generation: Chapter 16.

The Plan: Basic statistics: Random and pseudorandom numbers and their generation: Chapter 16. Scientific Computing with Case Studies SIAM Press, 29 http://www.cs.umd.edu/users/oleary/sccswebpage Lecture Notes for Unit IV Monte Carlo Computations Dianne P. O Leary c 28 What is a Monte-Carlo method?

More information

Opinion Mining by Transformation-Based Domain Adaptation

Opinion Mining by Transformation-Based Domain Adaptation Opinion Mining by Transformation-Based Domain Adaptation Róbert Ormándi, István Hegedűs, and Richárd Farkas University of Szeged, Hungary {ormandi,ihegedus,rfarkas}@inf.u-szeged.hu Abstract. Here we propose

More information

arxiv: v1 [stat.me] 29 May 2015

arxiv: v1 [stat.me] 29 May 2015 MIMCA: Multiple imputation for categorical variables with multiple correspondence analysis Vincent Audigier 1, François Husson 2 and Julie Josse 2 arxiv:1505.08116v1 [stat.me] 29 May 2015 Applied Mathematics

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

Diffusion Wavelets for Natural Image Analysis

Diffusion Wavelets for Natural Image Analysis Diffusion Wavelets for Natural Image Analysis Tyrus Berry December 16, 2011 Contents 1 Project Description 2 2 Introduction to Diffusion Wavelets 2 2.1 Diffusion Multiresolution............................

More information