Opportunities and challenges in personalization of online hotel search

Similar documents
Deep Learning for Recommender Systems

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

ECS289: Scalable Machine Learning

Part I: Data Mining Foundations

Machine Learning Techniques

Using Machine Learning to Optimize Storage Systems

Machine Learning. Chao Lan

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Recommendation Systems

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

Tutorial on Machine Learning Tools

Machine Learning in Action

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

F-SECURE S UNIQUE CAPABILITIES IN DETECTION & RESPONSE

Overture Advertiser Workbook. Chapter 4: Tracking Your Results

CS145: INTRODUCTION TO DATA MINING

Data Science Bootcamp Curriculum. NYC Data Science Academy

Beacon Catalog. Categories:

Recommender Systems. Master in Computer Engineering Sapienza University of Rome. Carlos Castillo

KDD 10 Tutorial: Recommender Problems for Web Applications. Deepak Agarwal and Bee-Chung Chen Yahoo! Research

Contents. Preface to the Second Edition

Mining Web Data. Lijun Zhang

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

D B M G Data Base and Data Mining Group of Politecnico di Torino

Facial Expression Classification with Random Filters Feature Extraction

Applying Supervised Learning

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

MLlib and Distributing the " Singular Value Decomposition. Reza Zadeh

The Future of Intermediaries and the Consequential Ripple Effect

Using Existing Numerical Libraries on Spark

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

Intro to Analytics Learning Web Analytics

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

CSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Machine Learning Techniques for Data Mining

CS5670: Computer Vision

Google Analytics. powerful simplicity, practical insight

BUYER S GUIDE WEBSITE DEVELOPMENT

Machine Learning in the Process Industry. Anders Hedlund Analytics Specialist

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Scaled Machine Learning at Matroid

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Advanced Marketing Lab

Web Usage Mining. Overview Session 1. This material is inspired from the WWW 16 tutorial entitled Analyzing Sequential User Behavior on the Web

Slides for Data Mining by I. H. Witten and E. Frank

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

ML 프로그래밍 ( 보충 ) Scikit-Learn

Why do we need graph processing?

CPSC 340: Machine Learning and Data Mining. Recommender Systems Fall 2017

Unsupervised Learning

Mining Web Data. Lijun Zhang

Oracle Machine Learning Notebook

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. Kim Hazelwood Facebook AI Infrastructure

CSE 258. Web Mining and Recommender Systems. Advanced Recommender Systems

6 TOOLS FOR A COMPLETE MARKETING WORKFLOW

Machine Learning Part 1

MLI - An API for Distributed Machine Learning. Sarang Dev

The Importance of Tracking Internal Communications HOW ORGANIZATIONS BENEFIT FROM ANALYTICS

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Domain Adaptation For Mobile Robot Navigation

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Building a Formula For Success Why Online Sales Matter p. 1 First Things First p. 2 Internet Research Equals Internet Sales p.

Table Of Contents: xix Foreword to Second Edition

TURN DATA INTO ACTIONABLE INSIGHTS. Google Analytics Workshop

Code Mania Artificial Intelligence: a. Module - 1: Introduction to Artificial intelligence and Python:

CURZON PR BUYER S GUIDE WEBSITE DEVELOPMENT

Using Numerical Libraries on Spark

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Capital Markets Seminar Hendrik Klindworth, Founder & CEO

SCIENCE. An Introduction to Python Brief History Why Python Where to use

Practical Machine Learning Agenda

Marketing COURSE NUMBER: 22:630:679 COURSE TITLE: Web Analytics with Real World Applications

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Machine Learning with Python

11. How many users are currently using the Beta today? How many do you expect to sign up over the next 3 months?

Machine Learning Lecture 3

Business Cases for Machine Learning

Table of Contents. What Really is a Hidden Unit? Visualizing Feed-Forward NNs. Visualizing Convolutional NNs. Visualizing Recurrent NNs

Matrix Computations and " Neural Networks in Spark

Predicting Computing Prices Dynamically Using Machine Learning

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

The exam is closed book, closed notes except your one-page cheat sheet.

Measurement and evaluation: Web analytics and data mining. MGMT 230 Week 10

Digital Audience Analysis: Understanding Online Car Shopping Behavior & Sources of Traffic to Dealer Websites

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

NMLRG #4 meeting in Berlin. Mobile network state characterization and prediction. P.Demestichas (1), S. Vassaki (2,3), A.Georgakopoulos (2,3)

on learned visual embedding patrick pérez Allegro Workshop Inria Rhônes-Alpes 22 July 2015

Enterprise Miner Software: Changes and Enhancements, Release 4.1

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Spice UK. Susan Hallam. Susan Hallam Page 1. Spice UK. Agenda for Today

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation

Transcription:

Opportunities and challenges in personalization of online hotel search David Zibriczky Data Science & Analytics Lead, User Profiling

Introduction 2

Introduction About Mission: Helping the travelers to find their ideal hotel at the best price Main Product: Hotel Metasearch Aggregates hotels and advertisers Availability and price comparison User interface for hotel search Redirecting users to advertisers Company facts: 1.8M+ hotels 400+ advertisers (booking sites) 190 countries HQ in Düsseldorf, Germany 3

Introduction Hotel Metasearch vs. OTA Metasearch!= Online Travel Agency (OTA) Aggregator of OTAs Redirecting visitors to OTAs Price comparison No direct feedback about hotels Common Booking is the ultimate goal Traditional booking online booking Helps users in hotel search Source: http://www.otrams.com/ 4

Introduction CPC Referral revenue model: CPC (Cost-per-click) CPC bidding per each hotel by the advertisers Features: Simple model to calculate revenue No influence on measurement by advertisers CPC ~ expected value after clicking Difficulties: Effect on CPC bidding takes time to measure Easy to cheat on short-term revenue Indirect measure of performance 5

Introduction Business Goals One of the main goals is to increase the revenue By increasing the following factors: CPC bidding (per hotel) Price of clicked hotels Number of clicks (bookings) Number of visitors Challenges: CPC vs. number of clicks click-boosters doesn t help Hotel price vs. number of bookings Short-term vs. long-term optimization Multi-criteria optimization Bouncers 6

Introduction User Value From user perspective: 1. Utility function 2. Effort to find the hotel Assumption: Increasing user value KPI improvement Goals: Maximize the utility of the hotel Bigger likelihood to book, higher CPC Better representation of value for price Increase in clicked price Minimize the effort for finding the hotel Better churn rate Increase the user experience Higher user retention 7

Introduction Personalization Personalization: Tailoring the hotel search process to individual preferences Goal: Adding value to a non-personalized solution usability, serendipity, decision support Techniques: Personalized recommendations User interface Improving search process Visualizing relevant features Personal campaigns, targeting Product: Personalization service that learns real-time and adapt to context 8

Potential Use Cases for Personalization 9

Use Cases Destinations 1. Function: Type/select a destination Goal of Personalization: 1. Best destination to travel 2. List of best cities/destinations to travel Challenges: User cold start Difficult to predict the next best destination One suggestion for use case #1 2. 10 10

Use Cases Search Suggestions Function: Autocomplete of search terms Goal of Personalization: Suggesting cities, POIs or keywords Challenges: User cold start Diversity of suggestions 11

Use Cases Hotel listing Function: List of hotels Goal of Personalization: Personalized sorting of hotels Matching the search criteria Challenges: Depends on the deals and CPC Positional bias Ranking method for any filtering Influence of context 12

Use Cases Advertisers Function: List of advertisers 1. 2. Goal of Personalization: 1. Best advertiser at View Deal button 2. Ranking all advertisers Challenges: Brand awareness Influence of CPC Price vs. brand? 13

Use Cases Images 1. Function: Images about the hotel Goal of Personalization: Image rec. 1. Top image 2. Images in details 2. Challenges: Labeling of images Diversity of topics Redundancy Positional bias 14

Use Cases Hotel listing on Map Function: Showing hotels on map Goal of Personalization: Most relevant hotels Visualization of relevance Challenges: Geospatial dependence Influence of POI Ranking is not trivial Number of hotels to show 15

Use Cases Explanation Function: Explanation of recommendations Goal of Personalization: Distance from a POI Amenities or other keywords Challenges: Trivial explanation doesn t add value Optimal number of explanations No feedback on that 16

Use Cases Search Criteria 2. 3. 4. 5. 7. 1. Function: Filter boxes Goal of Personalization: Search criteria suggestions Challenges: Personalization vs. default settings How to visualize suggestions? 6. 17

Common challenges in hotel industry 1. Episodic interactions (next travel, in-session modeling) 2. Unstable preference (seasonality, context, lack of domain knowledge) 3. Tracking (limited tracking, less registration, lack of feedback, cold start) 4. User Engagement (redirection, bouncers) 5. Price Sensitivity 6. Online booking vs. real world 18

19 How? A quick overview

How Data User interactions: Identification: cookie, members Actions: search criteria, hotel interactions, booking, navigation frontend/backend logging Hotel inventory: Static: metadata, amenities, images, ratings partner API, crawler Dynamic: availability, price, advertiser deals, CPC Other entities: destinations, POIs, filters Context: time, seasonality, device, platform, referrer, location, parameter box 20

How Application of ML Overview Classification: Visitor classification, churn, filter usage prediction (XGB, GBDT, NN, RF, SVM, LOGR) Regression: Price preference, expected LTV, value for price, CPC bidding (GBRT, RFR, SVR, LR) Clustering: User/hotel segmenting, discriminative features (K-Means, K-Medoid, DBSCAN) Association Rule Mining: Next best hotels, filters, destinations (Apriori) Feature Engineering: Image features (CNN), entity embedding (PCA, t-sne, MF) Natural Language Processing: Sentiment analysis, topic modeling (Word2vec, LDA) Ensemble learning: Combining multiple algorithms (boosting, stacking, linear comb.) 21

How Application of ML RecSys Segment-based popularity: Most popular destinations/hotels/filters in a specific user segment Case-based Reasoning: Actions of other users in the same/similar contexts Content-based Filtering: Similar hotels, most preferred hotel features Collaborative Filtering: K-Nearest Neighbors: Next best destinations, similar hotels, clustering (Item-KNN, User-KNN) Matrix Factorization: Personalized rec., user/hotel modeling, tensors (SGD, IALS, SVD, ) Deep Learning: Next best hotels or actions (RNN, GRU4Rec) Knowledge-based RS: Conversational recommenders, domain knowledge representation 22

How Evaluation Goal: Offline evaluation Goal: Online evaluation Reducing the cost of experiments Prototyping Evidence Good-enough candidates Techniques: Data Analysis and Insights Finding offline metrics Finding a Ground Truth Avoid over-optimization (offline!= online) Testing the feature in production KPI optimization Monitoring Accept/reject Techniques: Real-time manual testing A/B testing Surveys Parameter tuning 23

How Limitations/risks of Personalization 1. Data-driven solution (quality of the data) 2. User cold start 3. Misprediction 4. Self-reinforcement loop 5. Over-personalization 6. Cost of experiments 24

Thank You! Questions?

Appendix Abbreviations XGB GDT NN RF SVM LOGR GBRT RFR SVR LR K-Means K-Medoid DBSCAN PCA t-sne CNN MF Apriori Word2vec LDA Extreme Gradient Boosting Gradient Boosted Decision Trees Neural Network Random Forest Support Vector Machine Logistic Regression Gradient Boosted Regression Trees Random Forest Regressor Support Vector Regression Linear Regression K-Means Clustering K-Medoid Clustering Density-based spatial clustering of applications with noise Principal Component Analysis t-distributed Stochastic Neighbor Embedding Convolutional Neural Network Matrix Factorization Apriori algorithm Word2vec embedding Latent Dirichlet Allocation Item-KNN User-KNN SGD IALS SVD RNN GRU4Rec Item-based k-nearest-neighbor algorithm User-based k-nearest-neighbor algorithm Stochastic Gradient Descent Implicit Alternating Least Squares Singular Value Decomposition Recurrent neural network Recurrent neural network with Gated Recurrent Units for Recommender Systems 26