Part 15: Knowledge-Based Recommender Systems. Francesco Ricci

Size: px
Start display at page:

Download "Part 15: Knowledge-Based Recommender Systems. Francesco Ricci"

Transcription

1 Part 15: Knowledge-Based Recommender Systems Francesco Ricci

2 Content p Knowledge-based recommenders: definition and examples p Case-Based Reasoning p Instance-Based Learning p A recommender system exploiting a simple case model (the product is a case) p A more complex CBR recommender system for travel planning 2

3 Core Recommendation Techniques U is a set of users I is a set of items/products [Burke, 2007] 3

4 Knowledge Based Recommender p Suggests products based on inferences about a user s needs and preferences p Functional knowledge: about how a particular item meets a particular user need p The user model can be any knowledge structure that supports this inference n A query, i.e., the set of preferred features for a product n A case (in a case-based reasoning system) n An adapted similarity metric (for matching) n A part of an ontology p There is a large use of domain knowledge encoded in a knowledge representation language/approach. 4

5 ActiveBuyersGuide 5

6 Wizard: My Product Advisor Possible user's requests The system decides what the wizard says

7 7

8 8

9 9

10 10

11 Trip.com 11

12 12

13 13

14 14

15 Matching in TripleHop Example: TripleHop Catalogue of Destinations C-UM:00341 activities relaxing lying on a beach shopping sitting in cafes constraint meat = beef budget = 200 matching [Delgado and Davidson, 2002] 15

16 TripleHop and Content-Based RS p p p p p The content (destination description) is exploited in the recommendation process A classical Content-Based method would have used a simpler content model,e.g., keywords or TF-IDF Here a more complex knowledge structure a tree of concepts is used to model the product (and the query) The query is the user model and it is acquired every time the user asks for a new recommendation - (not exactly, more details later) n Stress on ephemeral needs rather than building a persistent user model Typical in Knowledge-Based RS, they are more focused on ephemeral users because Collaborative Filtering and Content-Based methods cannot cope with that users. 16

17 Learning User Profile: query mining C-UM:00341 Crete C-UM:00357 activitie s constraint activitie s constraint relaxing shopping budget = 200 relaxing adventure lying on a beach sitting in cafes Old query user model activitie s meat = beef C-UM:00357bis constraint lying on a beach hiking meat = pork new user request lying on a beach relaxing sitting in cafes shopping hiking adventure meat = pork budget = 200 new user request, as computed by the systems. Shadowed means less important. 17

18 Query Augmentation p Personalization in search is not only information filtering p Query augmentation: when a query is entered it can be compared against contextual and individual information to refine the query n Ex1: If the user is searching for a restaurant and enter a keyword Thai then the query can be augmented to Thai food (See Part 8 Query expansion based on co-occurrence analysis in the corpus of documents) n Ex2: If the query Thai food does not retrieve any restaurant the query can be refined to Asian food n Ex3: If the query Asian food retrieves too many restaurant, and the user searched in the past for Chinese food the query can be refined to Chinese food. 18

19 Query Augmentation in TripleHop 1. The current query is compared with previous queries of the same user 2. Preferences expressed in past (similar) queries are identified 3. A new query is built by combining the short term preferences contained in the query with the inferred preferences extracted from the persistent user model (past queries) 4. When the query is matched against an item (destination) if two destinations have the same degree of matching for the explicit preferences then the inferred preferences are used to break the tie p This is another example of the cascade approach n the two combined RS are based on the same knowledge but with two definitions of the user model. 19

20 What is Case Based Reasoning? p A case-based reasoner solves new problems by adapting solutions that were used to solve old problems (Riesbeck & Shank 1989) p CBR problem solving process: n store previous experiences (cases) in memory n to solve new problems p Retrieve form the memory similar experience about similar situations p Reuse the experience in the context of the new situation: complete or partial reuse, or adapt according to differences p Store new experience in memory (learning) [Aamodt and Plaza, 1994] 20

21 Case-Based Reasoning [Aha, 1998] 21

22 CBR Assumption p New problem can be solved by n retrieving similar problems n adapting retrieved solutions p Similar problems have similar solutions P P P? P P P P P P S S S X S S S S S S 22

23 Examples of CBR p Classification: The patient s ear problems are like this prototypical case of otitis media p Compiling solutions: Patient N s heart symptoms can be explained in the same way as previous patient D s p Assessing values: My house is like the one that sold down the street for $250,000 but has a better view p Justifying with precedents: This Missouri case should be decided just like Roe v. Wade where the court held that a state s limitations on abortion are illegal p Evaluating options: If we attack Cuban/Russian missile installations, it would be just like Pearl Harbor 23

24 Instance-based learning Lazy Learning p One way of solving tasks of approximating discrete or real valued target functions p Have training examples: (x n, f(x n )), n=1,...,n p Key idea: n just store the training examples n when a test example is given then find the closest matches n use the closest matches to guess the value of the target function on the test example. 24

25 The distance between examples p We need a measure of distance (or similarity) in order to know who are the neighbours p Assume that we have T attributes for the learning problem. Then one example point x has elements x t R, t=1,,t p The distance between two points x and y is often defined as the Euclidean distance: d( x, y) = T t= 1 [ x t y t 2 ] 25

26 no one no two yes three yes four no five six yes no seven Eight?? 26

27 Training data Number Lines Line types Rectangles Colours Mondrian? No No Yes Yes No Yes No Test instance Number Lines Line types Rectangles Colours Mondrian?

28 Lines LinesT Rect Colors Class Distance to test Train no 3,32 Train yes 2,83 Train yes 2,45 Train no 2,65 Train yes 2,65 Train no 5,20 Feature values are not normalized test Train1-0,32 0,32-0,11 0,06 no 0,80 Train2-0,08 0,32-0,21-0,28 yes 0,52 Train3-0,08-0,16-0,11-0,28 yes 0,69 Train4-0,08-0,16 0,08 0,06 no 0,77 Train5 0,16-0,16-0,11 0,39 yes 0,86 Train6 0,40-0,16 0,47 0,06 no 0,76 test 0,40 0,32-0,02-0,28 x' = (x avg(x))/4*stdev(x)), where x is a feature value of the feature X Feature values are normalized What is the difference between this feature values normalization and vector normalization in IR? 28

29 Example of CBR Recommender System p Entree is a restaurant recommender system it finds restaurants: 1. matching some user goals (case features) 2. or similar to restaurants the user knows and likes 29

30 The Product is the Case p In Entrée a case is a restaurant the case is the product p The problem component is the description of the restaurant given by the user p The user will input a partial description of it this is the only difficulty p The solution part of the case is the restaurant itself i.e. the name of the restaurant p The assumption is that the needs of the user can be modeled as the features of the product description. 30

31 Partial Match p In general, only a subset of the preferences will be matched in the recommended restaurant. 31

32 Nearest Neighbor 32

33 Recommendation in Entree p The system first selects from the database the set of all restaurants that satisfy the largest number of logical constraints generated by considering the input features type and value p If necessary, implicitly relaxes the lowest important constraints until some restaurants could be retrieved n Typically the relaxation of constraints will produce many restaurants in the result set p Sorts the retrieved cases using a similarity metric this takes into account all the input features. 33

34 Similarity in Entree p This similarity metric assumes that the user goals, corresponding to the input features (or the features of the source case), could be sorted to reflect the importance of such goals from the user point of view p Hence the global similarity metric (algorithm) sorts the products first with respect the most important goal and then iteratively with respect to the remaining goals (multi-level sort) p Attention: it does not works as a maximization of a Utility-Similarity defined as the sum of local utilities. 34

35 Example Restaurant Price Cusine Atmosphere Dolce 10 A A Gabbana 12 B B p p p p If the user query q is: price=9 AND cusine=b AND Atm=B And the weights (importance) of the features is: 0.5 price, 0.3 Cusine, and 0.2 Atmosphere The Entrée will suggest Dolce first (and then Gabbana) A more traditional CBR system will suggest Gabbana because the similarities are (30 is the price range): n Sim(q,Dolce) = 0.5 * (1-1/30) * * 0 = 0.48 n Sim(q, Gabbana) = 0.5 (1-3/30) * * 1 = =

36 36

37 37

38 Query Tightening 38

39 39

40 [Ricci et al., 2002] 40

41 NutKing as a CBR System p p p p Problem = recommend a set of tourism related products and build a travel plan Cases = All the recommended travel plans that users have built using the system (how they were built and what they contain) Retrieval = search in the memory travel plans built during similar recommendation sessions Reuse 1. extract from previous travel plans elementary components (items) and use them to build a new plan 2. rank items found in the catalogues 41

42 Travel Plan Model and Interaction Session Collaborative Component 1: travel wish clf (family, bdg_medium,7,hotel) case Queries on content attributes (Golfing=True AND Nightlife=True) cnq (category=3 AND Health=True) Collaborative Component 2: selected products Travel bag rating (Kitzbühel,True,True,..) (Hotel Schwarzer, 3, True, ) 42

43 Item Ranking Input Current Case twc tb u r Q Suggest Q changes Travel components 1. Search the catalogue Interactive query management Locations from Catalogue loc1 loc2 loc3 4. Sort locations loc i by similarity to locations in reference cases Case Base 2. Search Similar Cases 3. Output Reference Set Case twc tb u r loc2 loc3 loc1 Ranked Items Output 43

44 Two-fold Similarity Sessions similarity Target user Target session s1 s2 s3 s4 s5 s6? Product similarity 44

45 Rank using Two-Fold Similarity p Given the current session case c and a set of retrieved products R (using the interactive query management facility - IQM) 1. retrieve 10 cases (c 1,, c 10 ) from the repository of stored cases (recommendation sessions managed by the system) that are most similar to c with respect to the collaborative features 2. extract products (p 1,, p 10 ) from cases (c 1,, c 10 ) of the same type as those in R 3. For each product r in R compute the Score(r) as the maximum of the product of a) the similarity of r with p i, b) the similarity of the current case c and the retrieved case c i containing p i 4. sort and display products in R according to the Score(r). 45

46 Example: Scoring Two Destinations Destinations matching the user s query D1 D2?? current case CC D1 current case CC D2 C1 CD1 C2 CD2 similar cases in the case base Score(D i ) = Max j {Sim(CC,C j )*Sim(D i,cd j )} Sim(CC,C1) 0.2 Sim(CC,C2) 0.6 Sim(D1, CD1) 0.4 Sim(D1, CD2) 0.7 Sim(D2, CD1) 0.5 Sim(D2, CD2) 0.3 Score(D1)=Max{0.2*0.4,0.6*0.7}=0.42 Score(D2)=Max{0.2*0.5,0.6*0.3}=

47 Tree-based Case Representation p A case is a rooted tree and each node has a: n node-type: similarity between two nodes in two cases is defined only for nodes with the same node-type n metric-type: node content structure - how to measure the node similarity with another node in a second case nt: cart nt: destination nt: case mt: vector mt: vector nt: destinations mt: set mt: vector nt: location mt: hierarchical cart1 dest1 X 1 c1 clf1 cnq1 dests1 accs1 acts1 dest2 ITEM X 2 X 3 X 4 47

48 Item Representation TRAVELDESTINATION=(X 1,X 2,X 3,X 4 ) Node Type Metric Type Example: Canazei X 1 LOCATION Set of hierarchical related symbols Country=ITALY, Region=TRENTINO, TouristArea=FASSA, Village=CANAZEI X 2 INTERESTS Array of Booleans Hiking=1, Trekking=1, Biking=1 X 3 ALTITUDE Numeric 1400 X 4 LOCTYPE Array of Booleans Urban=0, Mountain=1, Rivereside=0 X 1 = (Italy, Trentino, Fassa, Canazei) X 2 = (1,1,1) dest1 X 3 = 1400 X 4 = (0, 1, 0) 48

49 Item Query Language p For querying purposes items x a represented as simple vector features x=(x 1,, x n ) X 1 = (Italy, Trentino, Fassa, Canazei) X 2 = (1,1,1) dest1 X 3 = 1400 X 4 = (0, 1, 0) (Italy, Trentino, Fassa, Canazei, 1, 1, 1, 1400, 0, 1, 0) p A query is a conjunction of constraints over features: q=c 1 c 2 c m where m n and c k = x i k x i k l = true = v x i k u if x i if x if x k i i k k is boolean is nominal is numerical 49

50 Item Similarity dest1 If X and Y are two items with same node-type X 1 =(Italy, Trentino, Fassa, Canazei) X 2 = (1,1,1) X 3 = 1400 X 4 = (0, 1, 0) d(x,y) = (1/ i w i ) 1/2 [ i w i d i (X i,y i ) 2 ] 1/2 where 0 w i 1, and i=1..n (number of features). 1 if X i or Y i are unknown overlap(x i,y i ) if X i is symbolic X i - Y i /range i if X i is finite integer or real d i (X i,y i ) = Jaccard(X i,y i ) if X i is an array of Boolean Hierarchical(X i,y i ) if X i is a hierarchy Modulo(X i,y i ) if X i is a circular feature (month) Date (X i,y i ) if X i is a date Sim(X,Y) = 1 - d(x,y) or Sim(X,Y) = exp(- d(x,y)) 50

51 Item Similarity Example X 1 = (I, TN, Fassa, Canazei) Y 1 = (I, TN, Fassa,?) dest1 X 2 = (1,1,1) X 3 = 1400 Y 2 = (1,0,1) Y 3 = 1200 dest2 X 4 = (0, 1, 0) Y 4 = (1, 1, 0) Sim(dest 1, dest 2 ) = exp(!(1/ 4) d 1 (X 1,Y 1 ) 2 +!+ d 4 (X 4,Y 4 ) 2 ) = exp(!(1/ 4) (0.3) 2 + (1! 2 / 3) 2 + ((1400!1200) / 2000) 2 + (1!1/ 2) 2 ) = exp(!(1/ 4) 0, 461) = exp(!0,339) = 0, in the union 2 in the union 51

52 Case Distance nt: cart mt: vector nt: case nt: destinations mt: vector mt: set nt: destination mt: vector nt: location mt: hierarchical cart1 dest1 X 1 c1 clf1 cnq1 dests1 accs1 acts1 dest2 X 2 X 3 X 4 cart2 dest3 Y 1 c2 clf12 cnq2 dests2 accs2 acts2 dest4 dest5 Y 2 Y 3 Y 4 52

53 Case Distance 1 d( c cnq , c2) = W1d ( cart 3 1, cart 2) + W2d ( clf1, clf2) + W3d ( cnq1, 2) i= 1 W nt: case mt: vector cart1 i c1 clf1 cnq1 cart2 c2 clf12 cnq2 53

54 , cart2 ) = d( dests1, dests2 ) + d( accs 1, accs 2) d( act1, 2) d ( cart + act nt: cart nt: case mt: vector mt: vector cart1 c1 clf1 cnq1 dests1 accs1 acts1 cart2 c2 clf12 cnq2 dests2 accs2 acts2 54

55 55 c1 nt: case mt: vector clf1 cnq1 cart1 nt: cart mt: vector dests1 accs1 acts1 nt: destinations mt: set dest1 dest2 c2 clf12 cnq2 cart2 dests2 accs2 acts2 dest3 dest4 dest5 )), ( ), ( ), ( ), ( ), ( ), ( ( 2*3 1 ), ( dest dest d dest dest d dest dest d dest dest d dest dest d dest dest d dests dests d =

56 CBR Knowledge Containers p CBR is a knowledge-based approach to problem solving p The knowledge is contained into four containers n Cases: the instances belonging to our case base n Case representation language: the representation language that we decided to use to represent cases n Retrieval knowledge: the knowledge encoded in the similarity metric and in the retrieval algorithm n Adaptation knowledge: how to reuse a retrieved solution to solve the current problem. 56

57 Conclusions p Knowledge-based systems exploits knowledge to map a user to the products she likes p KB systems uses a variety of techniques p Knowledge-based systems requires a big effort in term of knowledge extraction, representation and system design p Many KB recommender systems are rooted in Case-Based Reasoning p Similarity of complex data objects is required often required in KB RSs. p NutKing is a hybrid case-based recommender system p The case is the recommendation session. 57

58 Questions p p p p p p p p p What are the main differences between a CF recommender system and a KB RS (such as activebuyers.com or Entree)? What is the role of query augmentation? What is the basic rationale of a CBR recommender system? What is a case in a CBR recommender system such as Entree? How a CBR recommender system learns to recommend? What are the knowledge containers is a CBR RS? What are the main differences between a classical CBR recommender system such as Entrée and Nutking? What are the motivations for the introduction of the doublesimilarity ranking method? What are the types of local similarity metrics used in Nutking? 58

Recommender Systems. Francesco Ricci Database and Information Systems Free University of Bozen, Italy

Recommender Systems. Francesco Ricci Database and Information Systems Free University of Bozen, Italy Recommender Systems Francesco Ricci Database and Information Systems Free University of Bozen, Italy fricci@unibz.it Content Example of Recommender System The basic idea of collaborative-based filtering

More information

Recommendations in Context

Recommendations in Context Recommendations in Context Francesco Ricci Free University of Bolzano/Bozen fricci@unibz.it What movie should I see? 2 1 What book should I buy? 3 What news should I read? 4 2 What paper should I read?

More information

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules

More information

COOPERATIVE QUERY REWRITING FOR DECISION MAKING SUPPORT AND RECOMMENDER SYSTEMS

COOPERATIVE QUERY REWRITING FOR DECISION MAKING SUPPORT AND RECOMMENDER SYSTEMS Applied Artificial Intelligence, 21:895 932 Copyright # 2007 Taylor & Francis Group, LLC ISSN: 0883-9514 print/1087-6545 online DOI: 10.1080/08839510701527515 COOPERATIVE QUERY REWRITING FOR DECISION MAKING

More information

Case-based Recommendation. Peter Brusilovsky with slides of Danielle Lee

Case-based Recommendation. Peter Brusilovsky with slides of Danielle Lee Case-based Recommendation Peter Brusilovsky with slides of Danielle Lee Where we are? Search Navigation Recommendation Content-based Semantics / Metadata Social Modern E-Commerce Site The Power of Metadata

More information

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20 Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine

More information

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert

More information

Inverted Index for Fast Nearest Neighbour

Inverted Index for Fast Nearest Neighbour Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Supporting User Query Relaxation in a Recommender System

Supporting User Query Relaxation in a Recommender System Supporting User Query Relaxation in a Recommender System Nader Mirzadeh 1, Francesco Ricci 1, and Mukesh Bansal 2 1 Electronic Commerce and Tourism Research Laboratory, ITC-irst, Via Solteri 38, Trento,

More information

Data Mining and Machine Learning: Techniques and Algorithms

Data Mining and Machine Learning: Techniques and Algorithms Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,

More information

QUERY PROCESSING AND OPTIMIZATION FOR PICTORIAL QUERY TREES HANAN SAMET

QUERY PROCESSING AND OPTIMIZATION FOR PICTORIAL QUERY TREES HANAN SAMET qt0 QUERY PROCESSING AND OPTIMIZATION FOR PICTORIAL QUERY TREES HANAN SAMET COMPUTER SCIENCE DEPARTMENT AND CENTER FOR AUTOMATION RESEARCH AND INSTITUTE FOR ADVANCED COMPUTER STUDIES UNIVERSITY OF MARYLAND

More information

Tourism applications of Artificial Intelligence techniques. Dr. Antonio Moreno, ITAKA research group, URV

Tourism applications of Artificial Intelligence techniques. Dr. Antonio Moreno, ITAKA research group, URV Tourism applications of Artificial Intelligence techniques Dr. Antonio Moreno, ITAKA research group, URV ITAKA Basic research lines Multi-agent systems Ontology Learning Information Extraction Automated

More information

Query Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4

Query Languages. Berlin Chen Reference: 1. Modern Information Retrieval, chapter 4 Query Languages Berlin Chen 2005 Reference: 1. Modern Information Retrieval, chapter 4 Data retrieval Pattern-based querying The Kinds of Queries Retrieve docs that contains (or exactly match) the objects

More information

SOCIAL MEDIA MINING. Data Mining Essentials

SOCIAL MEDIA MINING. Data Mining Essentials SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically

More information

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:

Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time: English Student no:... Page 1 of 14 Contact during the exam: Geir Solskinnsbakk Phone: 735 94218/ 93607988 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Collaborative Filtering and Evaluation of Recommender Systems

Collaborative Filtering and Evaluation of Recommender Systems Collaborative Filtering and Evaluation of Recommender Systems Francesco Ricci ecommerce and Tourism Research Laboratory Automated Reasoning Systems Division ITC-irst Trento Italy ricci@itc.it http://ectrl.itc.it

More information

Going nonparametric: Nearest neighbor methods for regression and classification

Going nonparametric: Nearest neighbor methods for regression and classification Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 8, 28 Locality sensitive hashing for approximate NN

More information

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology Statistical Learning Part 2 Nonparametric Learning: The Main Ideas R. Moeller Hamburg University of Technology Instance-Based Learning So far we saw statistical learning as parameter learning, i.e., given

More information

Going nonparametric: Nearest neighbor methods for regression and classification

Going nonparametric: Nearest neighbor methods for regression and classification Going nonparametric: Nearest neighbor methods for regression and classification STAT/CSE 46: Machine Learning Emily Fox University of Washington May 3, 208 Locality sensitive hashing for approximate NN

More information

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA

COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia WWW 2010

More information

Information Retrieval. Information Retrieval and Web Search

Information Retrieval. Information Retrieval and Web Search Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search 1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history

More information

Vector Space Scoring Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Vector Space Scoring Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Vector Space Scoring Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Content adapted from Hinrich Schütze http://www.informationretrieval.org Spamming indices This was invented

More information

USING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION

USING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION USING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION Gîlcă Natalia, Roșia de Amaradia Technological High School, Gorj, ROMANIA Gîlcă Gheorghe, Constantin Brîncuși University from

More information

Improving Recommendation Effectiveness by Adapting the Dialogue Strategy in Online Travel Planning

Improving Recommendation Effectiveness by Adapting the Dialogue Strategy in Online Travel Planning Improving Recommendation Effectiveness by Adapting the Dialogue Strategy in Online Travel Planning Abstract Conversational recommender systems support a structured human-computer interaction in order to

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning. Xiaojin Zhu Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

More information

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Introduction to object recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others Overview Basic recognition tasks A statistical learning approach Traditional or shallow recognition

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Data Matrices and Vector Space Model Denis Helic KTI, TU Graz Nov 6, 2014 Denis Helic (KTI, TU Graz) KDDM1 Nov 6, 2014 1 / 55 Big picture: KDDM Probability

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Give a short answer (i.e. 1 to 3 sentences) for the following questions:

Give a short answer (i.e. 1 to 3 sentences) for the following questions: CS35 Sample Final Exam Questions These questions are pulled from several previous finals. Your actual final will include fewer questions and may include different topics since we did not cover the same

More information

A System for Identifying Voyage Package Using Different Recommendations Techniques

A System for Identifying Voyage Package Using Different Recommendations Techniques GLOBAL IMPACT FACTOR 0.238 DIIF 0.876 A System for Identifying Voyage Package Using Different Recommendations Techniques 1 Gajjela.Sandeep, 2 R. Chandrashekar 1 M.Tech (CS),Department of Computer Science

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Geometric data structures:

Geometric data structures: Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other

More information

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most

In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100

More information

Recommender Systems 6CCS3WSN-7CCSMWAL

Recommender Systems 6CCS3WSN-7CCSMWAL Recommender Systems 6CCS3WSN-7CCSMWAL http://insidebigdata.com/wp-content/uploads/2014/06/humorrecommender.jpg Some basic methods of recommendation Recommend popular items Collaborative Filtering Item-to-Item:

More information

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013 Lecture 24: Image Retrieval: Part II Visual Computing Systems Review: K-D tree Spatial partitioning hierarchy K = dimensionality of space (below: K = 2) 3 2 1 3 3 4 2 Counts of points in leaf nodes Nearest

More information

Chapter 4: Text Clustering

Chapter 4: Text Clustering 4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering

Sequence clustering. Introduction. Clustering basics. Hierarchical clustering Sequence clustering Introduction Data clustering is one of the key tools used in various incarnations of data-mining - trying to make sense of large datasets. It is, thus, natural to ask whether clustering

More information

VECTOR SPACE CLASSIFICATION

VECTOR SPACE CLASSIFICATION VECTOR SPACE CLASSIFICATION Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. Chapter 14 Wei Wei wwei@idi.ntnu.no Lecture

More information

SYMBOLIC FEATURES IN NEURAL NETWORKS

SYMBOLIC FEATURES IN NEURAL NETWORKS SYMBOLIC FEATURES IN NEURAL NETWORKS Włodzisław Duch, Karol Grudziński and Grzegorz Stawski 1 Department of Computer Methods, Nicolaus Copernicus University ul. Grudziadzka 5, 87-100 Toruń, Poland Abstract:

More information

Pacific Northwest Region Programming Contest Division 2

Pacific Northwest Region Programming Contest Division 2 Pacific Northwest Region Programming Contest Division 2 November 3rd, 2018 2018 Pacific Northwest Region Programming Contest 2 Reminders For all problems, read the input data from standard input and write

More information

Indexing in Search Engines based on Pipelining Architecture using Single Link HAC

Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Anuradha Tyagi S. V. Subharti University Haridwar Bypass Road NH-58, Meerut, India ABSTRACT Search on the web is a daily

More information

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015 Foundations of Machine Learning École Centrale Paris Fall 2015 7. Nearest neighbors Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr Learning

More information

Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

More information

A Review on Cluster Based Approach in Data Mining

A Review on Cluster Based Approach in Data Mining A Review on Cluster Based Approach in Data Mining M. Vijaya Maheswari PhD Research Scholar, Department of Computer Science Karpagam University Coimbatore, Tamilnadu,India Dr T. Christopher Assistant professor,

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

Text Documents clustering using K Means Algorithm

Text Documents clustering using K Means Algorithm Text Documents clustering using K Means Algorithm Mrs Sanjivani Tushar Deokar Assistant professor sanjivanideokar@gmail.com Abstract: With the advancement of technology and reduced storage costs, individuals

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Improving Recommender Systems with Adaptive Conversational Strategies

Improving Recommender Systems with Adaptive Conversational Strategies Improving Recommender Systems with Adaptive Conversational Strategies Tariq Mahmood University of Trento Trento, Italy mahmood@disi.unitn.it Francesco Ricci Free University of Bozen-Bolzano Bolzano, Italy

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems

More information

Information Retrieval CSCI

Information Retrieval CSCI Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1

More information

Data Mining and Data Warehousing Classification-Lazy Learners

Data Mining and Data Warehousing Classification-Lazy Learners Motivation Data Mining and Data Warehousing Classification-Lazy Learners Lazy Learners are the most intuitive type of learners and are used in many practical scenarios. The reason of their popularity is

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable References Bigtable: A Distributed Storage System for Structured Data. Fay Chang et. al. OSDI

More information

Spatial Index Keyword Search in Multi- Dimensional Database

Spatial Index Keyword Search in Multi- Dimensional Database Spatial Index Keyword Search in Multi- Dimensional Database Sushma Ahirrao M. E Student, Department of Computer Engineering, GHRIEM, Jalgaon, India ABSTRACT: Nearest neighbor search in multimedia databases

More information

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department

More information

A hybrid method to categorize HTML documents

A hybrid method to categorize HTML documents Data Mining VI 331 A hybrid method to categorize HTML documents M. Khordad, M. Shamsfard & F. Kazemeyni Electrical & Computer Engineering Department, Shahid Beheshti University, Iran Abstract In this paper

More information

Browser-Oriented Universal Cross-Site Recommendation and Explanation based on User Browsing Logs

Browser-Oriented Universal Cross-Site Recommendation and Explanation based on User Browsing Logs Browser-Oriented Universal Cross-Site Recommendation and Explanation based on User Browsing Logs Yongfeng Zhang, Tsinghua University zhangyf07@gmail.com Outline Research Background Research Topic Current

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval

More information

1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

More information

Approaches to Mining the Web

Approaches to Mining the Web Approaches to Mining the Web Olfa Nasraoui University of Louisville Web Mining: Mining Web Data (3 Types) Structure Mining: extracting info from topology of the Web (links among pages) Hubs: pages pointing

More information

K- Nearest Neighbors(KNN) And Predictive Accuracy

K- Nearest Neighbors(KNN) And Predictive Accuracy Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.

More information

Part 11: Collaborative Filtering. Francesco Ricci

Part 11: Collaborative Filtering. Francesco Ricci Part : Collaborative Filtering Francesco Ricci Content An example of a Collaborative Filtering system: MovieLens The collaborative filtering method n Similarity of users n Methods for building the rating

More information

Jarek Szlichta

Jarek Szlichta Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns

More information

Information Retrieval. Wesley Mathew

Information Retrieval. Wesley Mathew Information Retrieval Wesley Mathew 30-11-2012 Introduction and motivation Indexing methods B-Tree and the B+ Tree R-Tree IR- Tree Location-aware top-k text query 2 An increasing amount of trajectory data

More information

Dialog System & Technology Challenge 6 Overview of Track 1 - End-to-End Goal-Oriented Dialog learning

Dialog System & Technology Challenge 6 Overview of Track 1 - End-to-End Goal-Oriented Dialog learning Dialog System & Technology Challenge 6 Overview of Track 1 - End-to-End Goal-Oriented Dialog learning Julien Perez 1 and Y-Lan Boureau 2 and Antoine Bordes 2 1 Naver Labs Europe, Grenoble, France 2 Facebook

More information

Studying the Impact of Text Summarization on Contextual Advertising

Studying the Impact of Text Summarization on Contextual Advertising Studying the Impact of Text Summarization on Contextual Advertising G. Armano, A. Giuliani, and E. Vargiu Intelligent Agents and Soft-Computing Group Dept. of Electrical and Electronic Engineering University

More information

Assignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

Assignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran Assignment 4 (Sol.) Introduction to Data Analytics Prof. andan Sudarsanam & Prof. B. Ravindran 1. Which among the following techniques can be used to aid decision making when those decisions depend upon

More information

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes CS630 Representing and Accessing Digital Information Information Retrieval: Retrieval Models Information Retrieval Basics Data Structures and Access Indexing and Preprocessing Retrieval Models Thorsten

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech Foundations of Machine Learning CentraleSupélec Paris Fall 2016 7. Nearest neighbors Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning

More information

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes

More information

International Journal of Computer Engineering and Applications, Volume IX, Issue X, Oct. 15 ISSN

International Journal of Computer Engineering and Applications, Volume IX, Issue X, Oct. 15  ISSN DIVERSIFIED DATASET EXPLORATION BASED ON USEFULNESS SCORE Geetanjali Mohite 1, Prof. Gauri Rao 2 1 Student, Department of Computer Engineering, B.V.D.U.C.O.E, Pune, Maharashtra, India 2 Associate Professor,

More information

Math 3336 Section 6.1 The Basics of Counting The Product Rule The Sum Rule The Subtraction Rule The Division Rule

Math 3336 Section 6.1 The Basics of Counting The Product Rule The Sum Rule The Subtraction Rule The Division Rule Math 3336 Section 6.1 The Basics of Counting The Product Rule The Sum Rule The Subtraction Rule The Division Rule Examples, Examples, and Examples Tree Diagrams Basic Counting Principles: The Product Rule

More information

Weighting and selection of features.

Weighting and selection of features. Intelligent Information Systems VIII Proceedings of the Workshop held in Ustroń, Poland, June 14-18, 1999 Weighting and selection of features. Włodzisław Duch and Karol Grudziński Department of Computer

More information

Case-Based Reasoning

Case-Based Reasoning 0/0/ Case-Based Reasoning In this lecture, we turn to another popular form of reasoning system: case based reasoning (CBR) Unlike Rule-based systems and Fuzzy Logic, CBR does not use any rules or logical

More information

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao Large Scale Chinese News Categorization --based on Improved Feature Selection Method Peng Wang Joint work with H. Zhang, B. Xu, H.W. Hao Computational-Brain Research Center Institute of Automation, Chinese

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

Classification and Regression

Classification and Regression Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan

More information

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009

Analysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009 Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context

More information

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Clustering Results. Result List Example. Clustering Results. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Presenting Results Clustering Clustering Results! Result lists often contain documents related to different aspects of the query topic! Clustering is used to

More information

Module 6 NP-Complete Problems and Heuristics

Module 6 NP-Complete Problems and Heuristics Module 6 NP-Complete Problems and Heuristics Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 97 E-mail: natarajan.meghanathan@jsums.edu Optimization vs. Decision

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

Topics in Machine Learning

Topics in Machine Learning Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency

Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Multimedia Information Extraction and Retrieval Term Frequency Inverse Document Frequency Ralf Moeller Hamburg Univ. of Technology Acknowledgement Slides taken from presentation material for the following

More information

Automated Online News Classification with Personalization

Automated Online News Classification with Personalization Automated Online News Classification with Personalization Chee-Hong Chan Aixin Sun Ee-Peng Lim Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore, 639798

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Query Answering Using Inverted Indexes

Query Answering Using Inverted Indexes Query Answering Using Inverted Indexes Inverted Indexes Query Brutus AND Calpurnia J. Pei: Information Retrieval and Web Search -- Query Answering Using Inverted Indexes 2 Document-at-a-time Evaluation

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Chapter 4: Algorithms CS 795

Chapter 4: Algorithms CS 795 Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that

More information