vector space retrieval many slides courtesy James Amherst

Size: px
Start display at page:

Download "vector space retrieval many slides courtesy James Amherst"

Transcription

1 vector space retrieval many slides courtesy James Amherst 1

2 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the properties of the process, draw conclusions, make predictions Conclusions derived from a model depend on whether the model is a good approximation of the actual situation Statistical models represent repetitive processes, make predictions about frequencies of interesting events Retrieval models can describe the computational process e.g. how documents are ranked Note that how documents or indexes are stored is implementation Retrieval models can attempt to describe the human process e.g. the information need, interaction Few do so meaningfully Retrieval models have an explicit or implicit definition of relevance 2

3 retrieval models today boolean vector space latent semnatic indexing statistical language inference network hyperlink based 3

4 outline review: geometry, linear algebra vector space model vector selection similarity weighting schemes latent semantic indexing 4

5 linear algebra 5

6 vectors 6

7 subspaces 7

8 linear independence, base, dimension, rank! 0$)1.&! ",!"#$%& '$($#'$#1.2 0$)1.&, " 3 # " 4 # $$$# " % "2 15$&$ $6",1, &$%! #7-+$&, & 3 # & 4 # $$$# & %,7)5 15%1! 8 & 3 " 3 9 & 4 " 4 9 $$$& % " %! +%,$.2 % 0$)1.&"%!,(%)$ 8 -%6"-%!,$1.2!"#$%& "#'$($#'$#1 0$)1.&,: ;!! +%,$,.2 % <"0$#,(%)$ 5%0$ 15$,%-$ '"--$#,".# ='"--$#,".#.2 15$,(%)$>! &%#/='> 8 -%6"-7- #7-+$&.2 &%?,@).!7-#,!"#$%& "#'$($#'$#1! &%#/='> 8 '"-$#".#.2 15$,7+,(%)$,(%##$' +A ' 8

9 matrix multiplication 9

10 dot product, norm!!"# $%"!&'# "+, -.*/!0*/)-0").%% *$21 #3/ *.#%04 $%"!&'# 50#3 %/-&2#. %/.2 )&*6/%!! 7 8! 9 "!, " ###"!$:; % 7 8% 9 " %, " ###" %$: #3/) &! " % '7!! % ( 7! $ )79! ) % )! *, )"%* < ##!## 7 " &! "! '! )"%*.20=.#0")<! 7! ##!## ; ##!##

11 cosine 11

12 cosine computation cos(θ) = a b a b cos(θ) = cos(β α) = cos(β) cos(α) + sin(β) sin(α) 12

13 orthogonality 13

14 projections 14

15 A=LDU factorization!!"# $%&!'" ($)#*' +, )-.#..'*/)/ $ 0.#(12 )$)*"% ($)#*' #, $ 3"4.# )#*$%513$# ($)#*' $ 4*)- 1%*) 6*$5"%$3 $%6 $%!'".7-.3"% ($)#*' % /17- )-$) # & 8 $%!!"# $%& "'" ($)#*' +, )-.#..'*/)/ $,% 3"4.# $%6 100.# )#*1%5-*13$# 4*)- 1%*) 6*$5"%$3/,' $ 6*$5"%$3 ($)#*' "9 0*:")/ $%6 # $ 0.#(1)$)*"% ($)#*' /17- )-$) # & 8 $'%! <9 & */ /&((.)#*7 =& 8 & ( > )-.% )-.#. */ %" %..6 9"# # $%6 % 8 $ (? & 8 $'$ (

16 eigenvalues, eigenvectors!! ") &$!"#!$%&'(! /-. 0&,."1! "2 "#$3!!!%4 5 6!!%!.7!"#!$%&'(! 8&) & +-..!)9-$*!$, $-$: ;!.-!"#!$%!+,-. &,8&, )&,")<!) 3!!!%4& !& 5!& "$ -,8!. =-.*)!& &$* & 8&%! )&0! *".!+,"-$! )(0 -/!"#!$%&'(!) 5,.&+!3!4 5 )(0 -/ *": &#-$&'! 9.-*(+, -/!"#!$%&'(!) 5 *!,3!4!!"#!$%&'(!) -/ & (99!.>'-=!.,."&$#('&. 0&:,."1 &.!,8! *"&#-$&'!$,."!) 16

17 matrix diagonal form! %,! -". +%*/"$0 %*'/1/*'/*# /%(/*2/3#)$. " 4 # " 5 # $$$# " % "*' & %. #-/!"#$%& -"2%*( #-)./ ". 3)+6!*.7 & 8 9" 4 " 5 $$$" % :7! #-/* & %. %*2/$#%;+/ $ "*'! 4 &!4!!& 8 < 8 " 5 % 7 #-/ '%"()*"+ # $$$ &!%!"#$%& ), /%(/*2"+6/. ),!=!! 8 &<&!4! *) $/1/"#/' /%(/*2"+ " %*'/1= /%(/*2/3#!!.0!/#$%3! ' 8! " & )$#-)()*"+> & ' & 8 4! & %. *)# 6*%?6/!!& 8 &< -)+'. %@ & -". /%(/*2/3# ". 3)+6!*.! *)# "++!"#$%3/. "$/ '%"()*"+%A";+/ 17!

18 singular value decomposition! "1! "! " " #$ " % # (*'&.'0("2 03*# "0,'# 4* +*,-./-!*+ '!! 5 & '( ) 63*(*! & "! " " #7 '$ ( '(* # " #! &$ ( '(* -(03-$-#'& 8 & ) & 5 ( ) ( 5 9 #"#! ' "! +"'$-#'&: "0! *#0("*! '(* 03*!;%(* (--0! -1 *"$*#)'&%*! -1! )! 18

19 outline review: geometry, linear algebra vector space model vector selection similarity weighting schemes latent semantic indexing 19

20 vector space represent documents and queries as vectors in the term space issue : find the right coefficients use a geometric similarity measure, often angle-related issue: normallization 20

21 mapping to vectors terms: an axis for each term vectors corresponding to terms are canonical document = sum of vectors corresponding to terms contained in doc queries treated the same as documents 21

22 coefficients The coefficients (vector lengths, term weights) represent term presence, importance, or aboutness Magnitude along each dimension Model gives no guidance on how to set term weights Some common choices: Binary: 1 = term is present, 0 = term not present in document tf: The frequency of the term in the document tf idf: idf indicates the discriminatory power of the term Tf idf is far and away the most common Numerous variations 22

23 raw tf weights cat cat cat cat cat cat cat lion lion cat cat lion dog cat cat lion dog dog 23

24 tf = term frequency raw-tf (tf)=count of term in document Robertson tf (okapi tf) based on a set of simple criteria loosely connected to 2-Poisson model popular k=0.5; c=1.5 basic formula is tf /(k+tf) document length = verbosity factor many variants tf tf + k + c doclen avg.doclen 24

25 Robertson tf 25

26 IDF weights Inverse Document Frequency Used to weight terms based on frequency in the corpus Fixed, it can be precomputed for each term basic formula IDF (t) = log( N ) N t N= # of docs Nt= #of docs containing term t 26

27 tf-idf tf * idf the weight on every term is tf(t,d)*idf(t) sometimes variants on tf, IDF no satisfactory model behind these combinations 27

28 outline review: geometry, linear algebra vector space model vector selection similarity weighting schemes latent semantic indexing 28

29 common similarity measures 29

30 vector similarity: cosine θ 1 θ 2 30

31 similarity, normalized!"#"$%&"'(!!!"#$%&$'#!("!!&$# "!"!&$# #! same intersection area but different fraction of sets the size of intersection alone is meaningless often divided by sizes of sets same for vectors, using norm by normalizing vectors, cosine does not change 31

32 cosine similarity: example 32

33 cosine example, normalized round-off error, should be the same 33

34 similarity example 34

35 tf-idf base similarity formula!!!!" "#$%&!!"!#$" "#$%&!!""!!!" '()!!"!#$" '()!!"" ""'()""!"""#$%&"" many options for TF query and TF doc raw tf, Robertson tf, Lucene etc try to come up with yours some options for IDF doc IDF query sometimes not considered normalization is critical 35

36 Lucene comparison 36

37 complicated formulas 37

38 outline review: geometry, linear algebra vector space model vector selection similarity weighting schemes latent semantic indexing 38

39 LSI Variant of the vector space model Use Singular Value Decomposition (a dimensionality reduction technique) to identify uncorrelated, significant basis vectors or factors Rather than non-independent terms Replace original words with a subset of the new factors (say 100) in both documents and queries Compute similarities in this new space Computationally expensive, uncertain effectiveness 39

40 dimensionality reduction when the representation space is rich but data is lying in a small subspace that is when some eigenvalues are zero non-exact: ignore smallest eigenvalues, even if they are not zero 40

41 LSI T0, D0 orthogonal with unit length columns T0 * T0 T =1 S0 = diagonal matrix of eigenvalues m = rank of X 41

42 LSI: example 42

43 LSI: example 43

44 LSI!!!"# $%&!$'$(") *(+&,)-('&!.$) /!!!! 0 12! "!"# $%&!$'$(")*(+&,)-('&!.$) /"!"! 0 12! # 3+"'$(") 4"&%+5 $6 -+'-( 7")*-#! $ +# &!- %"(8 $6 %! & 0 9 $6 %$:# +( %! ' 0 9 $6.$)*4(# +( %! ( 0.!$#-( (*4;-% $6 3+4-(#+$(# $6 %-3*.-3 4$3-) 44

45 using LSI 45

46 LSI: example 46

47 original vs LSI 47

48 using LSI D is new doc vectors (k dimensions) T provides term vectors Given Q=q 1 q 2 q t want to compare to docs Convert Q from t dimensions to k!! )! " *"#! " #"$! % "* $"$ Can now compare to doc vectors Same basic approach can be used to add new docs to the database 48

49 LSI : does it work? Decomposes language into basis vectors In a sense, is looking for core concepts In theory, this means that system will retrieve documents using synonyms of your query words The magic that appeals to people From a demo at lsi.research.telcordia.com They hold the patent on LSI 49

50 vector space: summary Standard vector space Each dimension corresponds to a term in the vocabulary Vector elements are real-valued, reflecting term importance Any vector (document,query,...) can be compared to any other Cosine correlation is the similarity metric used most often Latent Semantic Indexing (LSI) Each dimension corresponds to a basic concept Documents and queries mapped into basic concepts Same as standard vector space after that Whether it s good depends on what you want 50

51 vector space : disadvantages Assumed independence relationship among terms Though this is a very common retrieval model assumption Lack of justification for some vector operations e.g. choice of similarity function e.g., choice of term weights Barely a retrieval model Doesn t explicitly model relevance, a person s information need, language models, etc. Assumes a query and a document can be treated the same (symmetric) 51

52 vector space: advantages Simplicity Ability to incorporate term weights Any type of term weights can be added No model that has to justify the use of a weight Ability to handle distributed term representations e.g., LSI Can measure similarities between almost anything: documents and queries documents and documents queries and queries sentences and sentences etc. 52

Information Retrieval: Retrieval Models

Information Retrieval: Retrieval Models CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten

More information

Instructor: Stefan Savev

Instructor: Stefan Savev LECTURE 2 What is indexing? Indexing is the process of extracting features (such as word counts) from the documents (in other words: preprocessing the documents). The process ends with putting the information

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Boolean Model. Hongning Wang

Boolean Model. Hongning Wang Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer

More information

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)"

CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1) CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Retrieval Models" Provide

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

CS54701: Information Retrieval

CS54701: Information Retrieval CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful

More information

VK Multimedia Information Systems

VK Multimedia Information Systems VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Information Retrieval Basics: Agenda Vector

More information

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100

More information

Information Retrieval and Data Mining Part 1 Information Retrieval

Information Retrieval and Data Mining Part 1 Information Retrieval Information Retrieval and Data Mining Part 1 Information Retrieval 2005/6, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Retrieval - 1 1 Today's Question 1. Information

More information

Information Retrieval. hussein suleman uct cs

Information Retrieval. hussein suleman uct cs Information Management Information Retrieval hussein suleman uct cs 303 2004 Introduction Information retrieval is the process of locating the most relevant information to satisfy a specific information

More information

CS201 Computer Vision Camera Geometry

CS201 Computer Vision Camera Geometry CS201 Computer Vision Camera Geometry John Magee 25 November, 2014 Slides Courtesy of: Diane H. Theriault (deht@bu.edu) Question of the Day: How can we represent the relationships between cameras and the

More information

Vector Space Models: Theory and Applications

Vector Space Models: Theory and Applications Vector Space Models: Theory and Applications Alexander Panchenko Centre de traitement automatique du langage (CENTAL) Université catholique de Louvain FLTR 2620 Introduction au traitement automatique du

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Feature selection. LING 572 Fei Xia

Feature selection. LING 572 Fei Xia Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability

More information

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document

More information

COMP 558 lecture 19 Nov. 17, 2010

COMP 558 lecture 19 Nov. 17, 2010 COMP 558 lecture 9 Nov. 7, 2 Camera calibration To estimate the geometry of 3D scenes, it helps to know the camera parameters, both external and internal. The problem of finding all these parameters is

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 3 Modeling Part I: Classic Models Introduction to IR Models Basic Concepts The Boolean Model Term Weighting The Vector Model Probabilistic Model Chap 03: Modeling,

More information

Homework 5: Transformations in geometry

Homework 5: Transformations in geometry Math b: Linear Algebra Spring 08 Homework 5: Transformations in geometry This homework is due on Wednesday, February 7, respectively on Thursday February 8, 08. a) Find the reflection matrix at the line

More information

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING

CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic

More information

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Boolean Retrieval vs. Ranked Retrieval Many users (professionals) prefer

More information

Retrieval by Content. Part 3: Text Retrieval Latent Semantic Indexing. Srihari: CSE 626 1

Retrieval by Content. Part 3: Text Retrieval Latent Semantic Indexing. Srihari: CSE 626 1 Retrieval by Content art 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent Semantic Indexing LSI isadvantage of exclusive use of representing a document as a T-dimensional vector of

More information

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου

ΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Σηµερινό ερώτηµα Typically we want to retrieve the top K docs (in the cosine ranking for the query) not totally order all docs in the corpus can we pick off docs

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections

More information

modern database systems lecture 4 : information retrieval

modern database systems lecture 4 : information retrieval modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation

More information

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12 Fall 2016 CS646: Information Retrieval Lecture 2 - Introduction to Search Result Ranking Jiepu Jiang University of Massachusetts Amherst 2016/09/12 More course information Programming Prerequisites Proficiency

More information

COMP6237 Data Mining Searching and Ranking

COMP6237 Data Mining Searching and Ranking COMP6237 Data Mining Searching and Ranking Jonathon Hare jsh2@ecs.soton.ac.uk Note: portions of these slides are from those by ChengXiang Cheng Zhai at UIUC https://class.coursera.org/textretrieval-001

More information

Problem 1: Complexity of Update Rules for Logistic Regression

Problem 1: Complexity of Update Rules for Logistic Regression Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1

More information

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system. Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation

More information

BASIC ELEMENTS. Geometry is the study of the relationships among objects in an n-dimensional space

BASIC ELEMENTS. Geometry is the study of the relationships among objects in an n-dimensional space GEOMETRY 1 OBJECTIVES Introduce the elements of geometry Scalars Vectors Points Look at the mathematical operations among them Define basic primitives Line segments Polygons Look at some uses for these

More information

Digital Libraries: Language Technologies

Digital Libraries: Language Technologies Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................

More information

Image Processing. Image Features

Image Processing. Image Features Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

More information

Graphics and Interaction Transformation geometry and homogeneous coordinates

Graphics and Interaction Transformation geometry and homogeneous coordinates 433-324 Graphics and Interaction Transformation geometry and homogeneous coordinates Department of Computer Science and Software Engineering The Lecture outline Introduction Vectors and matrices Translation

More information

COMP30019 Graphics and Interaction Transformation geometry and homogeneous coordinates

COMP30019 Graphics and Interaction Transformation geometry and homogeneous coordinates COMP30019 Graphics and Interaction Transformation geometry and homogeneous coordinates Department of Computer Science and Software Engineering The Lecture outline Introduction Vectors and matrices Translation

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,

More information

1 Document Classification [60 points]

1 Document Classification [60 points] CIS519: Applied Machine Learning Spring 2018 Homework 4 Handed Out: April 3 rd, 2018 Due: April 14 th, 2018, 11:59 PM 1 Document Classification [60 points] In this problem, you will implement several text

More information

Centre for Autonomous Systems

Centre for Autonomous Systems Robot Henrik I Centre for Autonomous Systems Kungl Tekniska Högskolan hic@kth.se 27th April 2005 Outline 1 duction 2 Kinematic and Constraints 3 Mobile Robot 4 Mobile Robot 5 Beyond Basic 6 Kinematic 7

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,

More information

Behavioral Data Mining. Lecture 18 Clustering

Behavioral Data Mining. Lecture 18 Clustering Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i

More information

Fourth Grade Mathematics Goals

Fourth Grade Mathematics Goals Fourth Grade Mathematics Goals Operations & Algebraic Thinking 4.OA.1 Interpret a multiplication equation as a comparison. Represent verbal statements of multiplicative comparisons as multiplication equations.

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction Inverted index Processing Boolean queries Course overview Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural

More information

Favorites-Based Search Result Ordering

Favorites-Based Search Result Ordering Favorites-Based Search Result Ordering Ben Flamm and Georey Schiebinger CS 229 Fall 2009 1 Introduction Search engine rankings can often benet from knowledge of users' interests. The query jaguar, for

More information

CSE 494: Information Retrieval, Mining and Integration on the Internet

CSE 494: Information Retrieval, Mining and Integration on the Internet CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:

More information

This lecture: IIR Sections Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring

This lecture: IIR Sections Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring This lecture: IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring 1 Ch. 6 Ranked retrieval Thus far, our queries have all

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

Information Retrieval

Information Retrieval Natural Language Processing SoSe 2014 Information Retrieval Dr. Mariana Neves June 18th, 2014 (based on the slides of Dr. Saeedeh Momtazi) Outline Introduction Indexing Block 2 Document Crawling Text Processing

More information

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman http://www.mmds.org Overview of Recommender Systems Content-based Systems Collaborative Filtering J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

Geometric Computations for Simulation

Geometric Computations for Simulation 1 Geometric Computations for Simulation David E. Johnson I. INTRODUCTION A static virtual world would be boring and unlikely to draw in a user enough to create a sense of immersion. Simulation allows things

More information

Data Modelling and Multimedia Databases M

Data Modelling and Multimedia Databases M ALMA MATER STUDIORUM - UNIERSITÀ DI BOLOGNA Data Modelling and Multimedia Databases M International Second cycle degree programme (LM) in Digital Humanities and Digital Knoledge (DHDK) University of Bologna

More information

Markscheme May 2017 Mathematical studies Standard level Paper 1

Markscheme May 2017 Mathematical studies Standard level Paper 1 M17/5/MATSD/SP1/ENG/TZ/XX/M Markscheme May 017 Mathematical studies Standard level Paper 1 3 pages M17/5/MATSD/SP1/ENG/TZ/XX/M This markscheme is the property of the International Baccalaureate and must

More information

Linear algebra deals with matrixes: two-dimensional arrays of values. Here s a matrix: [ x + 5y + 7z 9x + 3y + 11z

Linear algebra deals with matrixes: two-dimensional arrays of values. Here s a matrix: [ x + 5y + 7z 9x + 3y + 11z Basic Linear Algebra Linear algebra deals with matrixes: two-dimensional arrays of values. Here s a matrix: [ 1 5 ] 7 9 3 11 Often matrices are used to describe in a simpler way a series of linear equations.

More information

Text Analytics (Text Mining)

Text Analytics (Text Mining) CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John

More information

Towards Understanding Latent Semantic Indexing. Second Reader: Dr. Mario Nascimento

Towards Understanding Latent Semantic Indexing. Second Reader: Dr. Mario Nascimento Towards Understanding Latent Semantic Indexing Bin Cheng Supervisor: Dr. Eleni Stroulia Second Reader: Dr. Mario Nascimento 0 TABLE OF CONTENTS ABSTRACT...3 1 INTRODUCTION...4 2 RELATED WORKS...6 2.1 TRADITIONAL

More information

A Security Model for Multi-User File System Search. in Multi-User Environments

A Security Model for Multi-User File System Search. in Multi-User Environments A Security Model for Full-Text File System Search in Multi-User Environments Stefan Büttcher Charles L. A. Clarke University of Waterloo, Canada December 15, 2005 1 Introduction and Motivation 2 3 4 5

More information

Concept Based Search Using LSI and Automatic Keyphrase Extraction

Concept Based Search Using LSI and Automatic Keyphrase Extraction Concept Based Search Using LSI and Automatic Keyphrase Extraction Ravina Rodrigues, Kavita Asnani Department of Information Technology (M.E.) Padre Conceição College of Engineering Verna, India {ravinarodrigues

More information

A GENTLE INTRODUCTION TO THE BASIC CONCEPTS OF SHAPE SPACE AND SHAPE STATISTICS

A GENTLE INTRODUCTION TO THE BASIC CONCEPTS OF SHAPE SPACE AND SHAPE STATISTICS A GENTLE INTRODUCTION TO THE BASIC CONCEPTS OF SHAPE SPACE AND SHAPE STATISTICS HEMANT D. TAGARE. Introduction. Shape is a prominent visual feature in many images. Unfortunately, the mathematical theory

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Boolean model and Inverted index Processing Boolean queries Why ranked retrieval? Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Multi-domain Predictive AI. Correlated Cross-Occurrence with Apache Mahout and GPUs

Multi-domain Predictive AI. Correlated Cross-Occurrence with Apache Mahout and GPUs Multi-domain Predictive AI Correlated Cross-Occurrence with Apache Mahout and GPUs Pat Ferrel ActionML, Chief Consultant Apache Mahout, PMC & Committer Apache PredictionIO, PMC & Committer pat@apache.org

More information

Information Retrieval

Information Retrieval Information Retrieval Suan Lee - Information Retrieval - 06 Scoring, Term Weighting and the Vector Space Model 1 Recap of lecture 5 Collection and vocabulary statistics: Heaps and Zipf s laws Dictionary

More information

Getting to Know Your Data

Getting to Know Your Data Chapter 2 Getting to Know Your Data 2.1 Exercises 1. Give three additional commonly used statistical measures (i.e., not illustrated in this chapter) for the characterization of data dispersion, and discuss

More information

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017 Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised

More information

Recommender Systems. Techniques of AI

Recommender Systems. Techniques of AI Recommender Systems Techniques of AI Recommender Systems User ratings Collect user preferences (scores, likes, purchases, views...) Find similarities between items and/or users Predict user scores for

More information

Subspace Video Representations. CS 510 Lecture #22 April 14 th, 2014

Subspace Video Representations. CS 510 Lecture #22 April 14 th, 2014 Subspace Video Representations CS 510 Lecture #22 April 14 th, 2014 Standard BoW : Video Edition Research in video analysis is still new BoW is currently the most common method for comparing videos STIPs

More information

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University  Infinite data. Filtering data streams /9/7 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them

More information

Introduction to Transformations. In Geometry

Introduction to Transformations. In Geometry + Introduction to Transformations In Geometry + What is a transformation? A transformation is a copy of a geometric figure, where the copy holds certain properties. Example: copy/paste a picture on your

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document

More information

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan Today s topic CS347 Clustering documents Lecture 8 May 7, 2001 Prabhakar Raghavan Why cluster documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics

More information

Analysis and Latent Semantic Indexing

Analysis and Latent Semantic Indexing 18 Principal Component Analysis and Latent Semantic Indexing Understand the basics of principal component analysis and latent semantic index- Lab Objective: ing. Principal Component Analysis Understanding

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Extracting Coactivated Features from Multiple Data Sets

Extracting Coactivated Features from Multiple Data Sets Extracting Coactivated Features from Multiple Data Sets Michael U. Gutmann University of Helsinki michael.gutmann@helsinki.fi Aapo Hyvärinen University of Helsinki aapo.hyvarinen@helsinki.fi Michael U.

More information

SOME CONCEPTS IN DISCRETE COSINE TRANSFORMS ~ Jennie G. Abraham Fall 2009, EE5355

SOME CONCEPTS IN DISCRETE COSINE TRANSFORMS ~ Jennie G. Abraham Fall 2009, EE5355 SOME CONCEPTS IN DISCRETE COSINE TRANSFORMS ~ Jennie G. Abraham Fall 009, EE5355 Under Digital Image and Video Processing files by Dr. Min Wu Please see lecture10 - Unitary Transform lecture11 - Transform

More information

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity

Outline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using

More information

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CX 4242 DVA March 6, 2014 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Analyze! Limited memory size! Data may not be fitted to the memory of your machine! Slow computation!

More information

Middle School Math Course 3

Middle School Math Course 3 Middle School Math Course 3 Correlation of the ALEKS course Middle School Math Course 3 to the Texas Essential Knowledge and Skills (TEKS) for Mathematics Grade 8 (2012) (1) Mathematical process standards.

More information

Latent Semantic Indexing

Latent Semantic Indexing Latent Semantic Indexing Thanks to Ian Soboroff Information Retrieval 1 Issues: Vector Space Model Assumes terms are independent Some terms are likely to appear together synonyms, related words spelling

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines & Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector

More information

The Course Structure for the MCA Programme

The Course Structure for the MCA Programme The Course Structure for the MCA Programme SEMESTER - I MCA 1001 Problem Solving and Program Design with C 3 (3-0-0) MCA 1003 Numerical & Statistical Methods 4 (3-1-0) MCA 1007 Discrete Mathematics 3 (3-0-0)

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu ECG782: Multidimensional Digital Signal Processing Spring 2014 TTh 14:30-15:45 CBC C313 Lecture 06 Image Structures 13/02/06 http://www.ee.unlv.edu/~b1morris/ecg782/

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Raquel Urtasun & Rich Zemel University of Toronto Nov 4, 2015 Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 1 / 18

More information

Lecture 7: Computational Geometry

Lecture 7: Computational Geometry Lecture 7: Computational Geometry CS 491 CAP Uttam Thakore Friday, October 7 th, 2016 Credit for many of the slides on solving geometry problems goes to the Stanford CS 97SI course lecture on computational

More information

DEGENERACY AND THE FUNDAMENTAL THEOREM

DEGENERACY AND THE FUNDAMENTAL THEOREM DEGENERACY AND THE FUNDAMENTAL THEOREM The Standard Simplex Method in Matrix Notation: we start with the standard form of the linear program in matrix notation: (SLP) m n we assume (SLP) is feasible, and

More information

Homework 5: Transformations in geometry

Homework 5: Transformations in geometry Math 21b: Linear Algebra Spring 2018 Homework 5: Transformations in geometry This homework is due on Wednesday, February 7, respectively on Thursday February 8, 2018. 1 a) Find the reflection matrix at

More information

4th Grade Math Scope & Sequence-June 2017

4th Grade Math Scope & Sequence-June 2017 4th Grade Math Scope & Sequence-June 2017 Topic Strand Concept State Standard 1: Generalize Place Value Understanding * Read and write numbers in expanded form, with number names. * Recognize the relationship

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation

More information

CS545 Contents IX. Inverse Kinematics. Reading Assignment for Next Class. Analytical Methods Iterative (Differential) Methods

CS545 Contents IX. Inverse Kinematics. Reading Assignment for Next Class. Analytical Methods Iterative (Differential) Methods CS545 Contents IX Inverse Kinematics Analytical Methods Iterative (Differential) Methods Geometric and Analytical Jacobian Jacobian Transpose Method Pseudo-Inverse Pseudo-Inverse with Optimization Extended

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Structural Manipulation November 22, 2017 Rapid Structural Analysis Methods Emergence of large structural databases which do not allow manual (visual) analysis and require efficient 3-D search

More information

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components Review Lecture 14 ! PRINCIPAL COMPONENT ANALYSIS Eigenvectors of the covariance matrix are the principal components 1. =cov X Top K principal components are the eigenvectors with K largest eigenvalues

More information