Part 15: Knowledge-Based Recommender Systems. Francesco Ricci
|
|
- Patience White
- 6 years ago
- Views:
Transcription
1 Part 15: Knowledge-Based Recommender Systems Francesco Ricci
2 Content p Knowledge-based recommenders: definition and examples p Case-Based Reasoning p Instance-Based Learning p A recommender system exploiting a simple case model (the product is a case) p A more complex CBR recommender system for travel planning 2
3 Core Recommendation Techniques U is a set of users I is a set of items/products [Burke, 2007] 3
4 Knowledge Based Recommender p Suggests products based on inferences about a user s needs and preferences p Functional knowledge: about how a particular item meets a particular user need p The user model can be any knowledge structure that supports this inference n A query, i.e., the set of preferred features for a product n A case (in a case-based reasoning system) n An adapted similarity metric (for matching) n A part of an ontology p There is a large use of domain knowledge encoded in a knowledge representation language/approach. 4
5 ActiveBuyersGuide 5
6 Wizard: My Product Advisor Possible user's requests The system decides what the wizard says
7 7
8 8
9 9
10 10
11 Trip.com 11
12 12
13 13
14 14
15 Matching in TripleHop Example: TripleHop Catalogue of Destinations C-UM:00341 activities relaxing lying on a beach shopping sitting in cafes constraint meat = beef budget = 200 matching [Delgado and Davidson, 2002] 15
16 TripleHop and Content-Based RS p p p p p The content (destination description) is exploited in the recommendation process A classical Content-Based method would have used a simpler content model,e.g., keywords or TF-IDF Here a more complex knowledge structure a tree of concepts is used to model the product (and the query) The query is the user model and it is acquired every time the user asks for a new recommendation - (not exactly, more details later) n Stress on ephemeral needs rather than building a persistent user model Typical in Knowledge-Based RS, they are more focused on ephemeral users because Collaborative Filtering and Content-Based methods cannot cope with that users. 16
17 Learning User Profile: query mining C-UM:00341 Crete C-UM:00357 activitie s constraint activitie s constraint relaxing shopping budget = 200 relaxing adventure lying on a beach sitting in cafes Old query user model activitie s meat = beef C-UM:00357bis constraint lying on a beach hiking meat = pork new user request lying on a beach relaxing sitting in cafes shopping hiking adventure meat = pork budget = 200 new user request, as computed by the systems. Shadowed means less important. 17
18 Query Augmentation p Personalization in search is not only information filtering p Query augmentation: when a query is entered it can be compared against contextual and individual information to refine the query n Ex1: If the user is searching for a restaurant and enter a keyword Thai then the query can be augmented to Thai food (See Part 8 Query expansion based on co-occurrence analysis in the corpus of documents) n Ex2: If the query Thai food does not retrieve any restaurant the query can be refined to Asian food n Ex3: If the query Asian food retrieves too many restaurant, and the user searched in the past for Chinese food the query can be refined to Chinese food. 18
19 Query Augmentation in TripleHop 1. The current query is compared with previous queries of the same user 2. Preferences expressed in past (similar) queries are identified 3. A new query is built by combining the short term preferences contained in the query with the inferred preferences extracted from the persistent user model (past queries) 4. When the query is matched against an item (destination) if two destinations have the same degree of matching for the explicit preferences then the inferred preferences are used to break the tie p This is another example of the cascade approach n the two combined RS are based on the same knowledge but with two definitions of the user model. 19
20 What is Case Based Reasoning? p A case-based reasoner solves new problems by adapting solutions that were used to solve old problems (Riesbeck & Shank 1989) p CBR problem solving process: n store previous experiences (cases) in memory n to solve new problems p Retrieve form the memory similar experience about similar situations p Reuse the experience in the context of the new situation: complete or partial reuse, or adapt according to differences p Store new experience in memory (learning) [Aamodt and Plaza, 1994] 20
21 Case-Based Reasoning [Aha, 1998] 21
22 CBR Assumption p New problem can be solved by n retrieving similar problems n adapting retrieved solutions p Similar problems have similar solutions P P P? P P P P P P S S S X S S S S S S 22
23 Examples of CBR p Classification: The patient s ear problems are like this prototypical case of otitis media p Compiling solutions: Patient N s heart symptoms can be explained in the same way as previous patient D s p Assessing values: My house is like the one that sold down the street for $250,000 but has a better view p Justifying with precedents: This Missouri case should be decided just like Roe v. Wade where the court held that a state s limitations on abortion are illegal p Evaluating options: If we attack Cuban/Russian missile installations, it would be just like Pearl Harbor 23
24 Instance-based learning Lazy Learning p One way of solving tasks of approximating discrete or real valued target functions p Have training examples: (x n, f(x n )), n=1,...,n p Key idea: n just store the training examples n when a test example is given then find the closest matches n use the closest matches to guess the value of the target function on the test example. 24
25 The distance between examples p We need a measure of distance (or similarity) in order to know who are the neighbours p Assume that we have T attributes for the learning problem. Then one example point x has elements x t R, t=1,,t p The distance between two points x and y is often defined as the Euclidean distance: d( x, y) = T t= 1 [ x t y t 2 ] 25
26 no one no two yes three yes four no five six yes no seven Eight?? 26
27 Training data Number Lines Line types Rectangles Colours Mondrian? No No Yes Yes No Yes No Test instance Number Lines Line types Rectangles Colours Mondrian?
28 Lines LinesT Rect Colors Class Distance to test Train no 3,32 Train yes 2,83 Train yes 2,45 Train no 2,65 Train yes 2,65 Train no 5,20 Feature values are not normalized test Train1-0,32 0,32-0,11 0,06 no 0,80 Train2-0,08 0,32-0,21-0,28 yes 0,52 Train3-0,08-0,16-0,11-0,28 yes 0,69 Train4-0,08-0,16 0,08 0,06 no 0,77 Train5 0,16-0,16-0,11 0,39 yes 0,86 Train6 0,40-0,16 0,47 0,06 no 0,76 test 0,40 0,32-0,02-0,28 x' = (x avg(x))/4*stdev(x)), where x is a feature value of the feature X Feature values are normalized What is the difference between this feature values normalization and vector normalization in IR? 28
29 Example of CBR Recommender System p Entree is a restaurant recommender system it finds restaurants: 1. matching some user goals (case features) 2. or similar to restaurants the user knows and likes 29
30 The Product is the Case p In Entrée a case is a restaurant the case is the product p The problem component is the description of the restaurant given by the user p The user will input a partial description of it this is the only difficulty p The solution part of the case is the restaurant itself i.e. the name of the restaurant p The assumption is that the needs of the user can be modeled as the features of the product description. 30
31 Partial Match p In general, only a subset of the preferences will be matched in the recommended restaurant. 31
32 Nearest Neighbor 32
33 Recommendation in Entree p The system first selects from the database the set of all restaurants that satisfy the largest number of logical constraints generated by considering the input features type and value p If necessary, implicitly relaxes the lowest important constraints until some restaurants could be retrieved n Typically the relaxation of constraints will produce many restaurants in the result set p Sorts the retrieved cases using a similarity metric this takes into account all the input features. 33
34 Similarity in Entree p This similarity metric assumes that the user goals, corresponding to the input features (or the features of the source case), could be sorted to reflect the importance of such goals from the user point of view p Hence the global similarity metric (algorithm) sorts the products first with respect the most important goal and then iteratively with respect to the remaining goals (multi-level sort) p Attention: it does not works as a maximization of a Utility-Similarity defined as the sum of local utilities. 34
35 Example Restaurant Price Cusine Atmosphere Dolce 10 A A Gabbana 12 B B p p p p If the user query q is: price=9 AND cusine=b AND Atm=B And the weights (importance) of the features is: 0.5 price, 0.3 Cusine, and 0.2 Atmosphere The Entrée will suggest Dolce first (and then Gabbana) A more traditional CBR system will suggest Gabbana because the similarities are (30 is the price range): n Sim(q,Dolce) = 0.5 * (1-1/30) * * 0 = 0.48 n Sim(q, Gabbana) = 0.5 (1-3/30) * * 1 = =
36 36
37 37
38 Query Tightening 38
39 39
40 [Ricci et al., 2002] 40
41 NutKing as a CBR System p p p p Problem = recommend a set of tourism related products and build a travel plan Cases = All the recommended travel plans that users have built using the system (how they were built and what they contain) Retrieval = search in the memory travel plans built during similar recommendation sessions Reuse 1. extract from previous travel plans elementary components (items) and use them to build a new plan 2. rank items found in the catalogues 41
42 Travel Plan Model and Interaction Session Collaborative Component 1: travel wish clf (family, bdg_medium,7,hotel) case Queries on content attributes (Golfing=True AND Nightlife=True) cnq (category=3 AND Health=True) Collaborative Component 2: selected products Travel bag rating (Kitzbühel,True,True,..) (Hotel Schwarzer, 3, True, ) 42
43 Item Ranking Input Current Case twc tb u r Q Suggest Q changes Travel components 1. Search the catalogue Interactive query management Locations from Catalogue loc1 loc2 loc3 4. Sort locations loc i by similarity to locations in reference cases Case Base 2. Search Similar Cases 3. Output Reference Set Case twc tb u r loc2 loc3 loc1 Ranked Items Output 43
44 Two-fold Similarity Sessions similarity Target user Target session s1 s2 s3 s4 s5 s6? Product similarity 44
45 Rank using Two-Fold Similarity p Given the current session case c and a set of retrieved products R (using the interactive query management facility - IQM) 1. retrieve 10 cases (c 1,, c 10 ) from the repository of stored cases (recommendation sessions managed by the system) that are most similar to c with respect to the collaborative features 2. extract products (p 1,, p 10 ) from cases (c 1,, c 10 ) of the same type as those in R 3. For each product r in R compute the Score(r) as the maximum of the product of a) the similarity of r with p i, b) the similarity of the current case c and the retrieved case c i containing p i 4. sort and display products in R according to the Score(r). 45
46 Example: Scoring Two Destinations Destinations matching the user s query D1 D2?? current case CC D1 current case CC D2 C1 CD1 C2 CD2 similar cases in the case base Score(D i ) = Max j {Sim(CC,C j )*Sim(D i,cd j )} Sim(CC,C1) 0.2 Sim(CC,C2) 0.6 Sim(D1, CD1) 0.4 Sim(D1, CD2) 0.7 Sim(D2, CD1) 0.5 Sim(D2, CD2) 0.3 Score(D1)=Max{0.2*0.4,0.6*0.7}=0.42 Score(D2)=Max{0.2*0.5,0.6*0.3}=
47 Tree-based Case Representation p A case is a rooted tree and each node has a: n node-type: similarity between two nodes in two cases is defined only for nodes with the same node-type n metric-type: node content structure - how to measure the node similarity with another node in a second case nt: cart nt: destination nt: case mt: vector mt: vector nt: destinations mt: set mt: vector nt: location mt: hierarchical cart1 dest1 X 1 c1 clf1 cnq1 dests1 accs1 acts1 dest2 ITEM X 2 X 3 X 4 47
48 Item Representation TRAVELDESTINATION=(X 1,X 2,X 3,X 4 ) Node Type Metric Type Example: Canazei X 1 LOCATION Set of hierarchical related symbols Country=ITALY, Region=TRENTINO, TouristArea=FASSA, Village=CANAZEI X 2 INTERESTS Array of Booleans Hiking=1, Trekking=1, Biking=1 X 3 ALTITUDE Numeric 1400 X 4 LOCTYPE Array of Booleans Urban=0, Mountain=1, Rivereside=0 X 1 = (Italy, Trentino, Fassa, Canazei) X 2 = (1,1,1) dest1 X 3 = 1400 X 4 = (0, 1, 0) 48
49 Item Query Language p For querying purposes items x a represented as simple vector features x=(x 1,, x n ) X 1 = (Italy, Trentino, Fassa, Canazei) X 2 = (1,1,1) dest1 X 3 = 1400 X 4 = (0, 1, 0) (Italy, Trentino, Fassa, Canazei, 1, 1, 1, 1400, 0, 1, 0) p A query is a conjunction of constraints over features: q=c 1 c 2 c m where m n and c k = x i k x i k l = true = v x i k u if x i if x if x k i i k k is boolean is nominal is numerical 49
50 Item Similarity dest1 If X and Y are two items with same node-type X 1 =(Italy, Trentino, Fassa, Canazei) X 2 = (1,1,1) X 3 = 1400 X 4 = (0, 1, 0) d(x,y) = (1/ i w i ) 1/2 [ i w i d i (X i,y i ) 2 ] 1/2 where 0 w i 1, and i=1..n (number of features). 1 if X i or Y i are unknown overlap(x i,y i ) if X i is symbolic X i - Y i /range i if X i is finite integer or real d i (X i,y i ) = Jaccard(X i,y i ) if X i is an array of Boolean Hierarchical(X i,y i ) if X i is a hierarchy Modulo(X i,y i ) if X i is a circular feature (month) Date (X i,y i ) if X i is a date Sim(X,Y) = 1 - d(x,y) or Sim(X,Y) = exp(- d(x,y)) 50
51 Item Similarity Example X 1 = (I, TN, Fassa, Canazei) Y 1 = (I, TN, Fassa,?) dest1 X 2 = (1,1,1) X 3 = 1400 Y 2 = (1,0,1) Y 3 = 1200 dest2 X 4 = (0, 1, 0) Y 4 = (1, 1, 0) Sim(dest 1, dest 2 ) = exp(!(1/ 4) d 1 (X 1,Y 1 ) 2 +!+ d 4 (X 4,Y 4 ) 2 ) = exp(!(1/ 4) (0.3) 2 + (1! 2 / 3) 2 + ((1400!1200) / 2000) 2 + (1!1/ 2) 2 ) = exp(!(1/ 4) 0, 461) = exp(!0,339) = 0, in the union 2 in the union 51
52 Case Distance nt: cart mt: vector nt: case nt: destinations mt: vector mt: set nt: destination mt: vector nt: location mt: hierarchical cart1 dest1 X 1 c1 clf1 cnq1 dests1 accs1 acts1 dest2 X 2 X 3 X 4 cart2 dest3 Y 1 c2 clf12 cnq2 dests2 accs2 acts2 dest4 dest5 Y 2 Y 3 Y 4 52
53 Case Distance 1 d( c cnq , c2) = W1d ( cart 3 1, cart 2) + W2d ( clf1, clf2) + W3d ( cnq1, 2) i= 1 W nt: case mt: vector cart1 i c1 clf1 cnq1 cart2 c2 clf12 cnq2 53
54 , cart2 ) = d( dests1, dests2 ) + d( accs 1, accs 2) d( act1, 2) d ( cart + act nt: cart nt: case mt: vector mt: vector cart1 c1 clf1 cnq1 dests1 accs1 acts1 cart2 c2 clf12 cnq2 dests2 accs2 acts2 54
55 55 c1 nt: case mt: vector clf1 cnq1 cart1 nt: cart mt: vector dests1 accs1 acts1 nt: destinations mt: set dest1 dest2 c2 clf12 cnq2 cart2 dests2 accs2 acts2 dest3 dest4 dest5 )), ( ), ( ), ( ), ( ), ( ), ( ( 2*3 1 ), ( dest dest d dest dest d dest dest d dest dest d dest dest d dest dest d dests dests d =
56 CBR Knowledge Containers p CBR is a knowledge-based approach to problem solving p The knowledge is contained into four containers n Cases: the instances belonging to our case base n Case representation language: the representation language that we decided to use to represent cases n Retrieval knowledge: the knowledge encoded in the similarity metric and in the retrieval algorithm n Adaptation knowledge: how to reuse a retrieved solution to solve the current problem. 56
57 Conclusions p Knowledge-based systems exploits knowledge to map a user to the products she likes p KB systems uses a variety of techniques p Knowledge-based systems requires a big effort in term of knowledge extraction, representation and system design p Many KB recommender systems are rooted in Case-Based Reasoning p Similarity of complex data objects is required often required in KB RSs. p NutKing is a hybrid case-based recommender system p The case is the recommendation session. 57
58 Questions p p p p p p p p p What are the main differences between a CF recommender system and a KB RS (such as activebuyers.com or Entree)? What is the role of query augmentation? What is the basic rationale of a CBR recommender system? What is a case in a CBR recommender system such as Entree? How a CBR recommender system learns to recommend? What are the knowledge containers is a CBR RS? What are the main differences between a classical CBR recommender system such as Entrée and Nutking? What are the motivations for the introduction of the doublesimilarity ranking method? What are the types of local similarity metrics used in Nutking? 58
Recommender Systems. Francesco Ricci Database and Information Systems Free University of Bozen, Italy
Recommender Systems Francesco Ricci Database and Information Systems Free University of Bozen, Italy fricci@unibz.it Content Example of Recommender System The basic idea of collaborative-based filtering
More informationRecommendations in Context
Recommendations in Context Francesco Ricci Free University of Bolzano/Bozen fricci@unibz.it What movie should I see? 2 1 What book should I buy? 3 What news should I read? 4 2 What paper should I read?
More informationPart 12: Advanced Topics in Collaborative Filtering. Francesco Ricci
Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules
More informationCOOPERATIVE QUERY REWRITING FOR DECISION MAKING SUPPORT AND RECOMMENDER SYSTEMS
Applied Artificial Intelligence, 21:895 932 Copyright # 2007 Taylor & Francis Group, LLC ISSN: 0883-9514 print/1087-6545 online DOI: 10.1080/08839510701527515 COOPERATIVE QUERY REWRITING FOR DECISION MAKING
More informationCase-based Recommendation. Peter Brusilovsky with slides of Danielle Lee
Case-based Recommendation Peter Brusilovsky with slides of Danielle Lee Where we are? Search Navigation Recommendation Content-based Semantics / Metadata Social Modern E-Commerce Site The Power of Metadata
More informationPart 11: Collaborative Filtering. Francesco Ricci
Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating
More informationData mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20
Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine
More informationBasic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert
More informationInverted Index for Fast Nearest Neighbour
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationSupporting User Query Relaxation in a Recommender System
Supporting User Query Relaxation in a Recommender System Nader Mirzadeh 1, Francesco Ricci 1, and Mukesh Bansal 2 1 Electronic Commerce and Tourism Research Laboratory, ITC-irst, Via Solteri 38, Trento,
More informationData Mining and Machine Learning: Techniques and Algorithms
Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,
More informationQUERY PROCESSING AND OPTIMIZATION FOR PICTORIAL QUERY TREES HANAN SAMET
qt0 QUERY PROCESSING AND OPTIMIZATION FOR PICTORIAL QUERY TREES HANAN SAMET COMPUTER SCIENCE DEPARTMENT AND CENTER FOR AUTOMATION RESEARCH AND INSTITUTE FOR ADVANCED COMPUTER STUDIES UNIVERSITY OF MARYLAND
More informationTourism applications of Artificial Intelligence techniques. Dr. Antonio Moreno, ITAKA research group, URV
Tourism applications of Artificial Intelligence techniques Dr. Antonio Moreno, ITAKA research group, URV ITAKA Basic research lines Multi-agent systems Ontology Learning Information Extraction Automated
More informationQuery Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4
Query Languages Berlin Chen 2005 Reference: 1. Modern Information Retrieval, chapter 4 Data retrieval Pattern-based querying The Kinds of Queries Retrieve docs that contains (or exactly match) the objects
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationCHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES
70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically
More informationExam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:
English Student no:... Page 1 of 14 Contact during the exam: Geir Solskinnsbakk Phone: 735 94218/ 93607988 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationCollaborative Filtering and Evaluation of Recommender Systems
Collaborative Filtering and Evaluation of Recommender Systems Francesco Ricci ecommerce and Tourism Research Laboratory Automated Reasoning Systems Division ITC-irst Trento Italy ricci@itc.it http://ectrl.itc.it
More informationGoing nonparametric: Nearest neighbor methods for regression and classification
Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 8, 28 Locality sensitive hashing for approximate NN
More informationStatistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology
Statistical Learning Part 2 Nonparametric Learning: The Main Ideas R. Moeller Hamburg University of Technology Instance-Based Learning So far we saw statistical learning as parameter learning, i.e., given
More informationGoing nonparametric: Nearest neighbor methods for regression and classification
Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 3, 208 Locality sensitive hashing for approximate NN
More informationCOLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA
COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia WWW 2010
More informationInformation Retrieval. Information Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationLearning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search
1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history
More informationVector Space Scoring Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson
Vector Space Scoring Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Spamming indices This was invented
More informationUSING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION
USING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION Gîlcă Natalia, Roșia de Amaradia Technological High School, Gorj, ROMANIA Gîlcă Gheorghe, Constantin Brîncuși University from
More informationImproving Recommendation Effectiveness by Adapting the Dialogue Strategy in Online Travel Planning
Improving Recommendation Effectiveness by Adapting the Dialogue Strategy in Online Travel Planning Abstract Conversational recommender systems support a structured human-computer interaction in order to
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationIntroduction to Machine Learning. Xiaojin Zhu
Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006
More informationIntroduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others
Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationGive a short answer (i.e. 1 to 3 sentences) for the following questions:
CS35 Sample Final Exam Questions These questions are pulled from several previous finals. Your actual final will include fewer questions and may include different topics since we did not cover the same
More informationA System for Identifying Voyage Package Using Different Recommendations Techniques
GLOBAL IMPACT FACTOR 0.238 DIIF 0.876 A System for Identifying Voyage Package Using Different Recommendations Techniques 1 Gajjela.Sandeep, 2 R. Chandrashekar 1 M.Tech (CS),Department of Computer Science
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationRecommender Systems 6CCS3WSN-7CCSMWAL
Recommender Systems 6CCS3WSN-7CCSMWAL http://insidebigdata.com/wp-content/uploads/2014/06/humorrecommender.jpg Some basic methods of recommendation Recommend popular items Collaborative Filtering Item-to-Item:
More informationLecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013
Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest
More informationChapter 4: Text Clustering
4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can
More information3 No-Wait Job Shops with Variable Processing Times
3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationSequence clustering. Introduction. Clustering basics. Hierarchical clustering
Sequence clustering Introduction Data clustering is one of the key tools used in various incarnations of data-mining - trying to make sense of large datasets. It is, thus, natural to ask whether clustering
More informationVECTOR SPACE CLASSIFICATION
VECTOR SPACE CLASSIFICATION Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. Chapter 14 Wei Wei wwei@idi.ntnu.no Lecture
More informationSYMBOLIC FEATURES IN NEURAL NETWORKS
SYMBOLIC FEATURES IN NEURAL NETWORKS Włodzisław Duch, Karol Grudziński and Grzegorz Stawski 1 Department of Computer Methods, Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract:
More informationPacific Northwest Region Programming Contest Division 2
Pacific Northwest Region Programming Contest Division 2 November 3rd, 2018 2018 Pacific Northwest Region Programming Contest 2 Reminders For all problems, read the input data from standard input and write
More informationIndexing in Search Engines based on Pipelining Architecture using Single Link HAC
Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Anuradha Tyagi S. V. Subharti University Haridwar Bypass Road NH-58, Meerut, India ABSTRACT Search on the web is a daily
More information7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015
Foundations of Machine Learning École Centrale Paris Fall 2015 7. Nearest neighbors Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr Learning
More informationDocument Clustering: Comparison of Similarity Measures
Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation
More informationA Review on Cluster Based Approach in Data Mining
A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,
More informationCS 6320 Natural Language Processing
CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic
More informationText Documents clustering using K Means Algorithm
Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationImproving Recommender Systems with Adaptive Conversational Strategies
Improving Recommender Systems with Adaptive Conversational Strategies Tariq Mahmood University of Trento Trento, Italy mahmood@disi.unitn.it Francesco Ricci Free University of Bozen-Bolzano Bolzano, Italy
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems
More informationInformation Retrieval CSCI
Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1
More informationData Mining and Data Warehousing Classification-Lazy Learners
Motivation Data Mining and Data Warehousing Classification-Lazy Learners Lazy Learners are the most intuitive type of learners and are used in many practical scenarios. The reason of their popularity is
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable References Bigtable: A Distributed Storage System for Structured Data. Fay Chang et. al. OSDI
More informationSpatial Index Keyword Search in Multi- Dimensional Database
Spatial Index Keyword Search in Multi- Dimensional Database Sushma Ahirrao M. E Student, Department of Computer Engineering, GHRIEM, Jalgaon, India ABSTRACT: Nearest neighbor search in multimedia databases
More informationData Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy
Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department
More informationA hybrid method to categorize HTML documents
Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper
More informationBrowser-Oriented Universal Cross-Site Recommendation and Explanation based on User Browsing Logs
Browser-Oriented Universal Cross-Site Recommendation and Explanation based on User Browsing Logs Yongfeng Zhang, Tsinghua University zhangyf07@gmail.com Outline Research Background Research Topic Current
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval
More information1) Give decision trees to represent the following Boolean functions:
1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following
More informationApproaches to Mining the Web
Approaches to Mining the Web Olfa Nasraoui University of Louisville Web Mining: Mining Web Data (3 Types) Structure Mining: extracting info from topology of the Web (links among pages) Hubs: pages pointing
More informationK- Nearest Neighbors(KNN) And Predictive Accuracy
Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.
More informationPart 11: Collaborative Filtering. Francesco Ricci
Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating
More informationJarek Szlichta
Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns
More informationInformation Retrieval. Wesley Mathew
Information Retrieval Wesley Mathew 30-11-2012 Introduction and motivation Indexing methods B-Tree and the B+ Tree R-Tree IR- Tree Location-aware top-k text query 2 An increasing amount of trajectory data
More informationDialog System & Technology Challenge 6 Overview of Track 1 - End-to-End Goal-Oriented Dialog learning
Dialog System & Technology Challenge 6 Overview of Track 1 - End-to-End Goal-Oriented Dialog learning Julien Perez 1 and Y-Lan Boureau 2 and Antoine Bordes 2 1 Naver Labs Europe, Grenoble, France 2 Facebook
More informationStudying the Impact of Text Summarization on Contextual Advertising
Studying the Impact of Text Summarization on Contextual Advertising G. Armano, A. Giuliani, and E. Vargiu Intelligent Agents and Soft-Computing Group Dept. of Electrical and Electronic Engineering University
More informationAssignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran
Assignment 4 (Sol.) Introduction to Data Analytics Prof. andan Sudarsanam & Prof. B. Ravindran 1. Which among the following techniques can be used to aid decision making when those decisions depend upon
More informationInformation Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes
CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More information7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech
Foundations of Machine Learning CentraleSupélec Paris Fall 2016 7. Nearest neighbors Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning
More informationUninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall
Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes
More informationInternational Journal of Computer Engineering and Applications, Volume IX, Issue X, Oct. 15 ISSN
DIVERSIFIED DATASET EXPLORATION BASED ON USEFULNESS SCORE Geetanjali Mohite 1, Prof. Gauri Rao 2 1 Student, Department of Computer Engineering, B.V.D.U.C.O.E, Pune, Maharashtra, India 2 Associate Professor,
More informationMath 3336 Section 6.1 The Basics of Counting The Product Rule The Sum Rule The Subtraction Rule The Division Rule
Math 3336 Section 6.1 The Basics of Counting The Product Rule The Sum Rule The Subtraction Rule The Division Rule Examples, Examples, and Examples Tree Diagrams Basic Counting Principles: The Product Rule
More informationWeighting and selection of features.
Intelligent Information Systems VIII Proceedings of the Workshop held in Ustroń, Poland, June 14-18, 1999 Weighting and selection of features. Włodzisław Duch and Karol Grudziński Department of Computer
More informationCase-Based Reasoning
0/0/ Case-Based Reasoning In this lecture, we turn to another popular form of reasoning system: case based reasoning (CBR) Unlike Rule-based systems and Fuzzy Logic, CBR does not use any rules or logical
More informationLarge Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao
Large Scale Chinese News Categorization --based on Improved Feature Selection Method Peng Wang Joint work with H. Zhang, B. Xu, H.W. Hao Computational-Brain Research Center Institute of Automation, Chinese
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationClassification and Regression
Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan
More informationAnalysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009
Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context
More informationClustering Results. Result List Example. Clustering Results. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Presenting Results Clustering Clustering Results! Result lists often contain documents related to different aspects of the query topic! Clustering is used to
More informationModule 6 NP-Complete Problems and Heuristics
Module 6 NP-Complete Problems and Heuristics Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 97 E-mail: natarajan.meghanathan@jsums.edu Optimization vs. Decision
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationTopics in Machine Learning
Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationMultimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency
Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following
More informationAutomated Online News Classification with Personalization
Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationQuery Answering Using Inverted Indexes
Query Answering Using Inverted Indexes Inverted Indexes Query Brutus AND Calpurnia J. Pei: Information Retrieval and Web Search -- Query Answering Using Inverted Indexes 2 Document-at-a-time Evaluation
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationChapter 4: Algorithms CS 795
Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that
More information