vector space retrieval many slides courtesy James Amherst
|
|
- Bertina French
- 5 years ago
- Views:
Transcription
1 vector space retrieval many slides courtesy James Amherst 1
2 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the properties of the process, draw conclusions, make predictions Conclusions derived from a model depend on whether the model is a good approximation of the actual situation Statistical models represent repetitive processes, make predictions about frequencies of interesting events Retrieval models can describe the computational process e.g. how documents are ranked Note that how documents or indexes are stored is implementation Retrieval models can attempt to describe the human process e.g. the information need, interaction Few do so meaningfully Retrieval models have an explicit or implicit definition of relevance 2
3 retrieval models today boolean vector space latent semnatic indexing statistical language inference network hyperlink based 3
4 outline review: geometry, linear algebra vector space model vector selection similarity weighting schemes latent semantic indexing 4
5 linear algebra 5
6 vectors 6
7 subspaces 7
8 linear independence, base, dimension, rank! 0$)1.&! ",!"#$%& '$($#'$#1.2 0$)1.&, " 3 # " 4 # $$$# " % "2 15$&$ $6",1, &$%! #7-+$&, & 3 # & 4 # $$$# & %,7)5 15%1! 8 & 3 " 3 9 & 4 " 4 9 $$$& % " %! +%,$.2 % 0$)1.&"%!,(%)$ 8 -%6"-%!,$1.2!"#$%& "#'$($#'$#1 0$)1.&,: ;!! +%,$,.2 % <"0$#,(%)$ 5%0$ 15$,%-$ '"--$#,".# ='"--$#,".#.2 15$,(%)$>! &%#/='> 8 -%6"-7- #7-+$&.2 &%?,@).!7-#,!"#$%& "#'$($#'$#1! &%#/='> 8 '"-$#".#.2 15$,7+,(%)$,(%##$' +A ' 8
9 matrix multiplication 9
10 dot product, norm!!"# $%"!&'# "+, -.*/!0*/)-0").%% *$21 #3/ *.#%04 $%"!&'# 50#3 %/-&2#. %/.2 )&*6/%!! 7 8! 9 "!, " ###"!$:; % 7 8% 9 " %, " ###" %$: #3/) &! " % '7!! % ( 7! $ )79! ) % )! *, )"%* < ##!## 7 " &! "! '! )"%*.20=.#0")<! 7! ##!## ; ##!##
11 cosine 11
12 cosine computation cos(θ) = a b a b cos(θ) = cos(β α) = cos(β) cos(α) + sin(β) sin(α) 12
13 orthogonality 13
14 projections 14
15 A=LDU factorization!!"# $%&!'" ($)#*' +, )-.#..'*/)/ $ 0.#(12 )$)*"% ($)#*' #, $ 3"4.# )#*$%513$# ($)#*' $ 4*)- 1%*) 6*$5"%$3 $%6 $%!'".7-.3"% ($)#*' % /17- )-$) # & 8 $%!!"# $%& "'" ($)#*' +, )-.#..'*/)/ $,% 3"4.# $%6 100.# )#*1%5-*13$# 4*)- 1%*) 6*$5"%$3/,' $ 6*$5"%$3 ($)#*' "9 0*:")/ $%6 # $ 0.#(1)$)*"% ($)#*' /17- )-$) # & 8 $'%! <9 & */ /&((.)#*7 =& 8 & ( > )-.% )-.#. */ %" %..6 9"# # $%6 % 8 $ (? & 8 $'$ (
16 eigenvalues, eigenvectors!! ") &$!"#!$%&'(! /-. 0&,."1! "2 "#$3!!!%4 5 6!!%!.7!"#!$%&'(! 8&) & +-..!)9-$*!$, $-$: ;!.-!"#!$%!+,-. &,8&, )&,")<!) 3!!!%4& !& 5!& "$ -,8!. =-.*)!& &$* & 8&%! )&0! *".!+,"-$! )(0 -/!"#!$%&'(!) 5,.&+!3!4 5 )(0 -/ *": &#-$&'! 9.-*(+, -/!"#!$%&'(!) 5 *!,3!4!!"#!$%&'(!) -/ & (99!.>'-=!.,."&$#('&. 0&:,."1 &.!,8! *"&#-$&'!$,."!) 16
17 matrix diagonal form! %,! -". +%*/"$0 %*'/1/*'/*# /%(/*2/3#)$. " 4 # " 5 # $$$# " % "*' & %. #-/!"#$%& -"2%*( #-)./ ". 3)+6!*.7 & 8 9" 4 " 5 $$$" % :7! #-/* & %. %*2/$#%;+/ $ "*'! 4 &!4!!& 8 < 8 " 5 % 7 #-/ '%"()*"+ # $$$ &!%!"#$%& ), /%(/*2"+6/. ),!=!! 8 &<&!4! *) $/1/"#/' /%(/*2"+ " %*'/1= /%(/*2/3#!!.0!/#$%3! ' 8! " & )$#-)()*"+> & ' & 8 4! & %. *)# 6*%?6/!!& 8 &< -)+'. %@ & -". /%(/*2/3# ". 3)+6!*.! *)# "++!"#$%3/. "$/ '%"()*"+%A";+/ 17!
18 singular value decomposition! "1! "! " " #$ " % # (*'&.'0("2 03*# "0,'# 4* +*,-./-!*+ '!! 5 & '( ) 63*(*! & "! " " #7 '$ ( '(* # " #! &$ ( '(* -(03-$-#'& 8 & ) & 5 ( ) ( 5 9 #"#! ' "! +"'$-#'&: "0! *#0("*! '(* 03*!;%(* (--0! -1 *"$*#)'&%*! -1! )! 18
19 outline review: geometry, linear algebra vector space model vector selection similarity weighting schemes latent semantic indexing 19
20 vector space represent documents and queries as vectors in the term space issue : find the right coefficients use a geometric similarity measure, often angle-related issue: normallization 20
21 mapping to vectors terms: an axis for each term vectors corresponding to terms are canonical document = sum of vectors corresponding to terms contained in doc queries treated the same as documents 21
22 coefficients The coefficients (vector lengths, term weights) represent term presence, importance, or aboutness Magnitude along each dimension Model gives no guidance on how to set term weights Some common choices: Binary: 1 = term is present, 0 = term not present in document tf: The frequency of the term in the document tf idf: idf indicates the discriminatory power of the term Tf idf is far and away the most common Numerous variations 22
23 raw tf weights cat cat cat cat cat cat cat lion lion cat cat lion dog cat cat lion dog dog 23
24 tf = term frequency raw-tf (tf)=count of term in document Robertson tf (okapi tf) based on a set of simple criteria loosely connected to 2-Poisson model popular k=0.5; c=1.5 basic formula is tf /(k+tf) document length = verbosity factor many variants tf tf + k + c doclen avg.doclen 24
25 Robertson tf 25
26 IDF weights Inverse Document Frequency Used to weight terms based on frequency in the corpus Fixed, it can be precomputed for each term basic formula IDF (t) = log( N ) N t N= # of docs Nt= #of docs containing term t 26
27 tf-idf tf * idf the weight on every term is tf(t,d)*idf(t) sometimes variants on tf, IDF no satisfactory model behind these combinations 27
28 outline review: geometry, linear algebra vector space model vector selection similarity weighting schemes latent semantic indexing 28
29 common similarity measures 29
30 vector similarity: cosine θ 1 θ 2 30
31 similarity, normalized!"#"$%&"'(!!!"#$%&$'#!("!!&$# "!"!&$# #! same intersection area but different fraction of sets the size of intersection alone is meaningless often divided by sizes of sets same for vectors, using norm by normalizing vectors, cosine does not change 31
32 cosine similarity: example 32
33 cosine example, normalized round-off error, should be the same 33
34 similarity example 34
35 tf-idf base similarity formula!!!!" "#$%&!!"!#$" "#$%&!!""!!!" '()!!"!#$" '()!!"" ""'()""!"""#$%&"" many options for TF query and TF doc raw tf, Robertson tf, Lucene etc try to come up with yours some options for IDF doc IDF query sometimes not considered normalization is critical 35
36 Lucene comparison 36
37 complicated formulas 37
38 outline review: geometry, linear algebra vector space model vector selection similarity weighting schemes latent semantic indexing 38
39 LSI Variant of the vector space model Use Singular Value Decomposition (a dimensionality reduction technique) to identify uncorrelated, significant basis vectors or factors Rather than non-independent terms Replace original words with a subset of the new factors (say 100) in both documents and queries Compute similarities in this new space Computationally expensive, uncertain effectiveness 39
40 dimensionality reduction when the representation space is rich but data is lying in a small subspace that is when some eigenvalues are zero non-exact: ignore smallest eigenvalues, even if they are not zero 40
41 LSI T0, D0 orthogonal with unit length columns T0 * T0 T =1 S0 = diagonal matrix of eigenvalues m = rank of X 41
42 LSI: example 42
43 LSI: example 43
44 LSI!!!"# $%&!$'$(") *(+&,)-('&!.$) /!!!! 0 12! "!"# $%&!$'$(")*(+&,)-('&!.$) /"!"! 0 12! # 3+"'$(") 4"&%+5 $6 -+'-( 7")*-#! $ +# &!- %"(8 $6 %! & 0 9 $6 %$:# +( %! ' 0 9 $6.$)*4(# +( %! ( 0.!$#-( (*4;-% $6 3+4-(#+$(# $6 %-3*.-3 4$3-) 44
45 using LSI 45
46 LSI: example 46
47 original vs LSI 47
48 using LSI D is new doc vectors (k dimensions) T provides term vectors Given Q=q 1 q 2 q t want to compare to docs Convert Q from t dimensions to k!! )! " *"#! " #"$! % "* $"$ Can now compare to doc vectors Same basic approach can be used to add new docs to the database 48
49 LSI : does it work? Decomposes language into basis vectors In a sense, is looking for core concepts In theory, this means that system will retrieve documents using synonyms of your query words The magic that appeals to people From a demo at lsi.research.telcordia.com They hold the patent on LSI 49
50 vector space: summary Standard vector space Each dimension corresponds to a term in the vocabulary Vector elements are real-valued, reflecting term importance Any vector (document,query,...) can be compared to any other Cosine correlation is the similarity metric used most often Latent Semantic Indexing (LSI) Each dimension corresponds to a basic concept Documents and queries mapped into basic concepts Same as standard vector space after that Whether it s good depends on what you want 50
51 vector space : disadvantages Assumed independence relationship among terms Though this is a very common retrieval model assumption Lack of justification for some vector operations e.g. choice of similarity function e.g., choice of term weights Barely a retrieval model Doesn t explicitly model relevance, a person s information need, language models, etc. Assumes a query and a document can be treated the same (symmetric) 51
52 vector space: advantages Simplicity Ability to incorporate term weights Any type of term weights can be added No model that has to justify the use of a weight Ability to handle distributed term representations e.g., LSI Can measure similarities between almost anything: documents and queries documents and documents queries and queries sentences and sentences etc. 52
Information Retrieval: Retrieval Models
CS473: Web Information Retrieval & Management CS-473 Web Information Retrieval & Management Information Retrieval: Retrieval Models Luo Si Department of Computer Science Purdue University Retrieval Models
More informationInformation Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes
CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten
More informationInstructor: Stefan Savev
LECTURE 2 What is indexing? Indexing is the process of extracting features (such as word counts) from the documents (in other words: preprocessing the documents). The process ends with putting the information
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationBoolean Model. Hongning Wang
Boolean Model Hongning Wang CS@UVa Abstraction of search engine architecture Indexed corpus Crawler Ranking procedure Doc Analyzer Doc Representation Query Rep Feedback (Query) Evaluation User Indexer
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)"
CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Retrieval Models" Provide
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationCS54701: Information Retrieval
CS54701: Information Retrieval Basic Concepts 19 January 2016 Prof. Chris Clifton 1 Text Representation: Process of Indexing Remove Stopword, Stemming, Phrase Extraction etc Document Parser Extract useful
More informationVK Multimedia Information Systems
VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Information Retrieval Basics: Agenda Vector
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationInformation Retrieval and Data Mining Part 1 Information Retrieval
Information Retrieval and Data Mining Part 1 Information Retrieval 2005/6, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Information Retrieval - 1 1 Today's Question 1. Information
More informationInformation Retrieval. hussein suleman uct cs
Information Management Information Retrieval hussein suleman uct cs 303 2004 Introduction Information retrieval is the process of locating the most relevant information to satisfy a specific information
More informationCS201 Computer Vision Camera Geometry
CS201 Computer Vision Camera Geometry John Magee 25 November, 2014 Slides Courtesy of: Diane H. Theriault (deht@bu.edu) Question of the Day: How can we represent the relationships between cameras and the
More informationVector Space Models: Theory and Applications
Vector Space Models: Theory and Applications Alexander Panchenko Centre de traitement automatique du langage (CENTAL) Université catholique de Louvain FLTR 2620 Introduction au traitement automatique du
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More informationFeature selection. LING 572 Fei Xia
Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability
More informationTask Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document
More informationCOMP 558 lecture 19 Nov. 17, 2010
COMP 558 lecture 9 Nov. 7, 2 Camera calibration To estimate the geometry of 3D scenes, it helps to know the camera parameters, both external and internal. The problem of finding all these parameters is
More informationModern Information Retrieval
Modern Information Retrieval Chapter 3 Modeling Part I: Classic Models Introduction to IR Models Basic Concepts The Boolean Model Term Weighting The Vector Model Probabilistic Model Chap 03: Modeling,
More informationHomework 5: Transformations in geometry
Math b: Linear Algebra Spring 08 Homework 5: Transformations in geometry This homework is due on Wednesday, February 7, respectively on Thursday February 8, 08. a) Find the reflection matrix at the line
More informationCHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING
43 CHAPTER 3 INFORMATION RETRIEVAL BASED ON QUERY EXPANSION AND LATENT SEMANTIC INDEXING 3.1 INTRODUCTION This chapter emphasizes the Information Retrieval based on Query Expansion (QE) and Latent Semantic
More informationInformation Retrieval CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Boolean Retrieval vs. Ranked Retrieval Many users (professionals) prefer
More informationRetrieval by Content. Part 3: Text Retrieval Latent Semantic Indexing. Srihari: CSE 626 1
Retrieval by Content art 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent Semantic Indexing LSI isadvantage of exclusive use of representing a document as a T-dimensional vector of
More informationΕΠΛ660. Ανάκτηση µε το µοντέλο διανυσµατικού χώρου
Ανάκτηση µε το µοντέλο διανυσµατικού χώρου Σηµερινό ερώτηµα Typically we want to retrieve the top K docs (in the cosine ranking for the query) not totally order all docs in the corpus can we pick off docs
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval (Supplementary Material) Zhou Shuigeng March 23, 2007 Advanced Distributed Computing 1 Text Databases and IR Text databases (document databases) Large collections
More informationmodern database systems lecture 4 : information retrieval
modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation
More informationFall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12
Fall 2016 CS646: Information Retrieval Lecture 2 - Introduction to Search Result Ranking Jiepu Jiang University of Massachusetts Amherst 2016/09/12 More course information Programming Prerequisites Proficiency
More informationCOMP6237 Data Mining Searching and Ranking
COMP6237 Data Mining Searching and Ranking Jonathon Hare jsh2@ecs.soton.ac.uk Note: portions of these slides are from those by ChengXiang Cheng Zhai at UIUC https://class.coursera.org/textretrieval-001
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationInformation Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.
Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationJames Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!
James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence! (301) 219-4649 james.mayfield@jhuapl.edu What is Information Retrieval? Evaluation
More informationBASIC ELEMENTS. Geometry is the study of the relationships among objects in an n-dimensional space
GEOMETRY 1 OBJECTIVES Introduce the elements of geometry Scalars Vectors Points Look at the mathematical operations among them Define basic primitives Line segments Polygons Look at some uses for these
More informationDigital Libraries: Language Technologies
Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................
More informationImage Processing. Image Features
Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching
More informationGraphics and Interaction Transformation geometry and homogeneous coordinates
433-324 Graphics and Interaction Transformation geometry and homogeneous coordinates Department of Computer Science and Software Engineering The Lecture outline Introduction Vectors and matrices Translation
More informationCOMP30019 Graphics and Interaction Transformation geometry and homogeneous coordinates
COMP30019 Graphics and Interaction Transformation geometry and homogeneous coordinates Department of Computer Science and Software Engineering The Lecture outline Introduction Vectors and matrices Translation
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More information1 Document Classification [60 points]
CIS519: Applied Machine Learning Spring 2018 Homework 4 Handed Out: April 3 rd, 2018 Due: April 14 th, 2018, 11:59 PM 1 Document Classification [60 points] In this problem, you will implement several text
More informationCentre for Autonomous Systems
Robot Henrik I Centre for Autonomous Systems Kungl Tekniska Högskolan hic@kth.se 27th April 2005 Outline 1 duction 2 Kinematic and Constraints 3 Mobile Robot 4 Mobile Robot 5 Beyond Basic 6 Kinematic 7
More informationFeature Selection Using Modified-MCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification
More informationHomework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)
Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Fall 2014, Prakash Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationBehavioral Data Mining. Lecture 18 Clustering
Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i
More informationFourth Grade Mathematics Goals
Fourth Grade Mathematics Goals Operations & Algebraic Thinking 4.OA.1 Interpret a multiplication equation as a comparison. Represent verbal statements of multiplicative comparisons as multiplication equations.
More informationIntroduction to Information Retrieval
Introduction Inverted index Processing Boolean queries Course overview Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural
More informationFavorites-Based Search Result Ordering
Favorites-Based Search Result Ordering Ben Flamm and Georey Schiebinger CS 229 Fall 2009 1 Introduction Search engine rankings can often benet from knowledge of users' interests. The query jaguar, for
More informationCSE 494: Information Retrieval, Mining and Integration on the Internet
CSE 494: Information Retrieval, Mining and Integration on the Internet Midterm. 18 th Oct 2011 (Instructor: Subbarao Kambhampati) In-class Duration: Duration of the class 1hr 15min (75min) Total points:
More informationThis lecture: IIR Sections Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring
This lecture: IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring 1 Ch. 6 Ranked retrieval Thus far, our queries have all
More informationText Analytics (Text Mining)
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationEffective Latent Space Graph-based Re-ranking Model with Global Consistency
Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case
More informationInformation Retrieval
Natural Language Processing SoSe 2014 Information Retrieval Dr. Mariana Neves June 18th, 2014 (based on the slides of Dr. Saeedeh Momtazi) Outline Introduction Indexing Block 2 Document Crawling Text Processing
More informationThanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman
Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman http://www.mmds.org Overview of Recommender Systems Content-based Systems Collaborative Filtering J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationGeometric Computations for Simulation
1 Geometric Computations for Simulation David E. Johnson I. INTRODUCTION A static virtual world would be boring and unlikely to draw in a user enough to create a sense of immersion. Simulation allows things
More informationData Modelling and Multimedia Databases M
ALMA MATER STUDIORUM - UNIERSITÀ DI BOLOGNA Data Modelling and Multimedia Databases M International Second cycle degree programme (LM) in Digital Humanities and Digital Knoledge (DHDK) University of Bologna
More informationMarkscheme May 2017 Mathematical studies Standard level Paper 1
M17/5/MATSD/SP1/ENG/TZ/XX/M Markscheme May 017 Mathematical studies Standard level Paper 1 3 pages M17/5/MATSD/SP1/ENG/TZ/XX/M This markscheme is the property of the International Baccalaureate and must
More informationLinear algebra deals with matrixes: two-dimensional arrays of values. Here s a matrix: [ x + 5y + 7z 9x + 3y + 11z
Basic Linear Algebra Linear algebra deals with matrixes: two-dimensional arrays of values. Here s a matrix: [ 1 5 ] 7 9 3 11 Often matrices are used to describe in a simpler way a series of linear equations.
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John
More informationTowards Understanding Latent Semantic Indexing. Second Reader: Dr. Mario Nascimento
Towards Understanding Latent Semantic Indexing Bin Cheng Supervisor: Dr. Eleni Stroulia Second Reader: Dr. Mario Nascimento 0 TABLE OF CONTENTS ABSTRACT...3 1 INTRODUCTION...4 2 RELATED WORKS...6 2.1 TRADITIONAL
More informationA Security Model for Multi-User File System Search. in Multi-User Environments
A Security Model for Full-Text File System Search in Multi-User Environments Stefan Büttcher Charles L. A. Clarke University of Waterloo, Canada December 15, 2005 1 Introduction and Motivation 2 3 4 5
More informationConcept Based Search Using LSI and Automatic Keyphrase Extraction
Concept Based Search Using LSI and Automatic Keyphrase Extraction Ravina Rodrigues, Kavita Asnani Department of Information Technology (M.E.) Padre Conceição College of Engineering Verna, India {ravinarodrigues
More informationA GENTLE INTRODUCTION TO THE BASIC CONCEPTS OF SHAPE SPACE AND SHAPE STATISTICS
A GENTLE INTRODUCTION TO THE BASIC CONCEPTS OF SHAPE SPACE AND SHAPE STATISTICS HEMANT D. TAGARE. Introduction. Shape is a prominent visual feature in many images. Unfortunately, the mathematical theory
More informationIntroduction to Information Retrieval
Boolean model and Inverted index Processing Boolean queries Why ranked retrieval? Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationMulti-domain Predictive AI. Correlated Cross-Occurrence with Apache Mahout and GPUs
Multi-domain Predictive AI Correlated Cross-Occurrence with Apache Mahout and GPUs Pat Ferrel ActionML, Chief Consultant Apache Mahout, PMC & Committer Apache PredictionIO, PMC & Committer pat@apache.org
More informationInformation Retrieval
Information Retrieval Suan Lee - Information Retrieval - 06 Scoring, Term Weighting and the Vector Space Model 1 Recap of lecture 5 Collection and vocabulary statistics: Heaps and Zipf s laws Dictionary
More informationGetting to Know Your Data
Chapter 2 Getting to Know Your Data 2.1 Exercises 1. Give three additional commonly used statistical measures (i.e., not illustrated in this chapter) for the characterization of data dispersion, and discuss
More informationClustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017
Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised
More informationRecommender Systems. Techniques of AI
Recommender Systems Techniques of AI Recommender Systems User ratings Collect user preferences (scores, likes, purchases, views...) Find similarities between items and/or users Predict user scores for
More informationSubspace Video Representations. CS 510 Lecture #22 April 14 th, 2014
Subspace Video Representations CS 510 Lecture #22 April 14 th, 2014 Standard BoW : Video Edition Research in video analysis is still new BoW is currently the most common method for comparing videos STIPs
More informationMining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams
/9/7 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them
More informationIntroduction to Transformations. In Geometry
+ Introduction to Transformations In Geometry + What is a transformation? A transformation is a copy of a geometric figure, where the copy holds certain properties. Example: copy/paste a picture on your
More informationNatural Language Processing
Natural Language Processing Information Retrieval Potsdam, 14 June 2012 Saeedeh Momtazi Information Systems Group based on the slides of the course book Outline 2 1 Introduction 2 Indexing Block Document
More informationToday s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan
Today s topic CS347 Clustering documents Lecture 8 May 7, 2001 Prabhakar Raghavan Why cluster documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics
More informationAnalysis and Latent Semantic Indexing
18 Principal Component Analysis and Latent Semantic Indexing Understand the basics of principal component analysis and latent semantic index- Lab Objective: ing. Principal Component Analysis Understanding
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationExtracting Coactivated Features from Multiple Data Sets
Extracting Coactivated Features from Multiple Data Sets Michael U. Gutmann University of Helsinki michael.gutmann@helsinki.fi Aapo Hyvärinen University of Helsinki aapo.hyvarinen@helsinki.fi Michael U.
More informationSOME CONCEPTS IN DISCRETE COSINE TRANSFORMS ~ Jennie G. Abraham Fall 2009, EE5355
SOME CONCEPTS IN DISCRETE COSINE TRANSFORMS ~ Jennie G. Abraham Fall 009, EE5355 Under Digital Image and Video Processing files by Dr. Min Wu Please see lecture10 - Unitary Transform lecture11 - Transform
More informationOutline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity
Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using
More informationCSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CX 4242 DVA March 6, 2014 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Analyze! Limited memory size! Data may not be fitted to the memory of your machine! Slow computation!
More informationMiddle School Math Course 3
Middle School Math Course 3 Correlation of the ALEKS course Middle School Math Course 3 to the Texas Essential Knowledge and Skills (TEKS) for Mathematics Grade 8 (2012) (1) Mathematical process standards.
More informationLatent Semantic Indexing
Latent Semantic Indexing Thanks to Ian Soboroff Information Retrieval 1 Issues: Vector Space Model Assumes terms are independent Some terms are likely to appear together synonyms, related words spelling
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationThe Course Structure for the MCA Programme
The Course Structure for the MCA Programme SEMESTER - I MCA 1001 Problem Solving and Program Design with C 3 (3-0-0) MCA 1003 Numerical & Statistical Methods 4 (3-1-0) MCA 1007 Discrete Mathematics 3 (3-0-0)
More informationECG782: Multidimensional Digital Signal Processing
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu ECG782: Multidimensional Digital Signal Processing Spring 2014 TTh 14:30-15:45 CBC C313 Lecture 06 Image Structures 13/02/06 http://www.ee.unlv.edu/~b1morris/ecg782/
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationCSC 411: Lecture 14: Principal Components Analysis & Autoencoders
CSC 411: Lecture 14: Principal Components Analysis & Autoencoders Raquel Urtasun & Rich Zemel University of Toronto Nov 4, 2015 Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 1 / 18
More informationLecture 7: Computational Geometry
Lecture 7: Computational Geometry CS 491 CAP Uttam Thakore Friday, October 7 th, 2016 Credit for many of the slides on solving geometry problems goes to the Stanford CS 97SI course lecture on computational
More informationDEGENERACY AND THE FUNDAMENTAL THEOREM
DEGENERACY AND THE FUNDAMENTAL THEOREM The Standard Simplex Method in Matrix Notation: we start with the standard form of the linear program in matrix notation: (SLP) m n we assume (SLP) is feasible, and
More informationHomework 5: Transformations in geometry
Math 21b: Linear Algebra Spring 2018 Homework 5: Transformations in geometry This homework is due on Wednesday, February 7, respectively on Thursday February 8, 2018. 1 a) Find the reflection matrix at
More information4th Grade Math Scope & Sequence-June 2017
4th Grade Math Scope & Sequence-June 2017 Topic Strand Concept State Standard 1: Generalize Place Value Understanding * Read and write numbers in expanded form, with number names. * Recognize the relationship
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Mohsen Kamyar چهارمین کارگاه ساالنه آزمایشگاه فناوری و وب بهمن ماه 1391 Outline Outline in classic categorization Information vs. Data Retrieval IR Models Evaluation
More informationCS545 Contents IX. Inverse Kinematics. Reading Assignment for Next Class. Analytical Methods Iterative (Differential) Methods
CS545 Contents IX Inverse Kinematics Analytical Methods Iterative (Differential) Methods Geometric and Analytical Jacobian Jacobian Transpose Method Pseudo-Inverse Pseudo-Inverse with Optimization Extended
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Structural Manipulation November 22, 2017 Rapid Structural Analysis Methods Emergence of large structural databases which do not allow manual (visual) analysis and require efficient 3-D search
More information( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components
Review Lecture 14 ! PRINCIPAL COMPONENT ANALYSIS Eigenvectors of the covariance matrix are the principal components 1. =cov X Top K principal components are the eigenvectors with K largest eigenvalues
More information