The News Recommendation Evaluation Lab (NewsREEL) Online evaluation of recommender systems

Similar documents
Optimizing a Scalable News Recommender System

Real-time News Recommendations using Apache Spark

Development of a News Recommender System based on Apache Flink

Development and Evaluation of a Highly Scalable News Recommender System

News Recommender System based on Association CLEF NewsREEL 2017

Optimizing and Evaluating Stream-based News Recommendation Algorithms

Stream-Based Recommendations: Online and Offline Evaluation as a Service

InBeat: News Recommender System as a CLEF-NEWSREEL 14

PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads

VoltDB vs. Redis Benchmark

A News Recommender Engine with a Killer Sequence

COS 333: Advanced Programming Techniques. Copyright 2017 by Robert M. Dondero, Ph.D. Princeton University

An Analysis of Recommender Algorithms for Online News

KNIME for the life sciences Cambridge Meetup

Diversity in Recommender Systems Week 2: The Problems. Toni Mikkola, Andy Valjakka, Heng Gui, Wilson Poon

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

The Google File System

CSE 701: LARGE-SCALE GRAPH MINING. A. Erdem Sariyuce

Distributed systems for stream processing

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Identifying the Skills and Team Members Needed to Support Synchronous Online Sessions and Webinars

@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS

Benchmarking Apache Flink and Apache Spark DataFlow Systems on Large-Scale Distributed Machine Learning Algorithms

A System for ecommerce Recommender Research with Context and Feedback

Performance Tools for Technical Computing

Feature List. I Feature List

Prediction Systems. Dan Crankshaw UCB RISE Lab Seminar 10/3/2015

BIG DATA TESTING: A UNIFIED VIEW

A+B. Approaches Get Mature. How to evaluate? Self-*, Autonomic Communication. Virtualization f 3. f 4 f 1 f 2 CCN. Loc/ID Split Functional Composition

Play2SDG: Bridging the Gap between Serving and Analytics in Scalable Web Applications

An Interdisciplinary Collaboration Platform for Smart Grid Research

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

The Road to 200,000 Downloads: The Cesium Story. Sarah

Flash Storage Complementing a Data Lake for Real-Time Insight

RELEASE NOTES. 1.5 December F. Polycom RealAccess

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Evolution of an Apache Spark Architecture for Processing Game Data

Webinar Benchmarks Report

Review, Adapt, Improve. April 3, 2018

Extending the Lifetime of SSD Controller

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Social Network Analytics on Cray Urika-XA

MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis

MULTITHERMAN: Out-of-band High-Resolution HPC Power and Performance Monitoring Support for Big-Data Analysis

Active Endpoints. ActiveVOS Platform Architecture Active Endpoints

Scalable Tools - Part I Introduction to Scalable Tools

Databases 2 (VU) ( / )

Call for Participation in AIP-6

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Dr Markus Hagenbuchner CSCI319. Distributed Systems Chapter 3 - Processes

Club admins are able to perform the following actions via the Participant Login Management screen:

Router Virtualization as an Enabler for Future Internet Multimedia Applications

Welcome. Orientation to online CPS102 Computer Science 2 (Java 2)

Volley: Automated Data Placement for Geo-Distributed Cloud Services

Adaptive Server Allocation for Peer-assisted VoD

Realtime Recommendations

Developing Microsoft Azure Solutions (70-532) Syllabus

When does RDBMS representation make sense When do other representations make sense. Prerequisites: CS 450/550 Database Concepts

CSE 124: Networked Services Fall 2009 Lecture-19

Multi-tenancy version of BigDataBench

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She

Portfolio. Mihai Marin

Product Release Notes

Practical Big Data Processing An Overview of Apache Flink

Chapter 11 - Data Replication Middleware

CERN openlab & IBM Research Workshop Trip Report

grpc - A solution for RPCs by Google Distributed Systems Seminar at Charles University in Prague, Nov 2016 Jan Tattermusch - grpc Software Engineer

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis

Container 2.0. Container: check! But what about persistent data, big data or fast data?!

Competitive Intelligence and Web Mining:

Bridging the standardization gap

Parallel learning of content recommendations using map- reduce

First Reference Framework Release and Evaluation Report

TempWeb rd Temporal Web Analytics Workshop

Why AI Frameworks Need (not only) RDMA?

Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013

High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg

COS 333: Advanced Programming Techniques

Polycom RealAccess. Cloud Edition. Contents. Software 2.4 January P1

MicroProfile: Optimizing Java EE For a Microservices Architecture

Apache Bahir Writing Applications using Apache Bahir

BEST BIG DATA CERTIFICATIONS

Blurring the Line Between Developer and Data Scientist

AMAZON.COM RECOMMENDATIONS ITEM-TO-ITEM COLLABORATIVE FILTERING PAPER BY GREG LINDEN, BRENT SMITH, AND JEREMY YORK

Deep Learning Frameworks with Spark and GPUs

BigDataBench-MT: Multi-tenancy version of BigDataBench

IRQA General Information:

Executive Committee Meeting

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Software Architecture Review

Flexible Network Analytics in the Cloud. Jon Dugan & Peter Murphy ESnet Software Engineering Group October 18, 2017 TechEx 2017, San Francisco

The Stream Processor as a Database. Ufuk

An Introduction to The Beam Model

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Advanced Photoshop Elements 5 0 For Digital Photographers

Techniques to improve the scalability of Checkpoint-Restart

Lab 1 MonarchPress Product Description. Robert O Donnell CS411. Janet Brunelle. September 20, Version #2

Atlassian Confluence 5 Essentials

Transcription:

The News Recommendation Evaluation Lab (NewsREEL) Online evaluation of recommender systems Andreas Lommatzsch TU Berlin, TEL-14, Ernst-Reuter-Platz 7, 10587 Berlin @Recommender Meetup Amsterdam (September 28 th, 2016)

About Me Background: Computer Science Focus: Artificial Intelligence / Business management applications PhD: Algorithms for Distributed Information Retrieval Systems Research Project with Industry Partners at the TUB Information Retrieval / Information Management Recommender Systems Machine Learning The CrowdRec Project Development of recommender algorithms focusing on heterogeneous data, real-time, context, streams 2

Objective Provide Recommendations for Online News Portals Live Evaluation based on #clicked recommendations (CTR) The TUB presents NewsREEL at ECIR 2016 Living Labs and Stream-based Evaluation are on the rise. A recent inquiry of recommendation experts reveals shockingly low levels of confidence in offline evaluation. 9 out of 10 experts agree that existing protocols can provide spurious results. They suggest to rely on socalled real user feedback. Such feedback can be obtained in living labs such as clef Newsreel. Researchers connect their servers with the Open Recommendation Platform (http://orp.plista.com). They Recommendation #1 Recommendation #2 Recommendation #3 Recommendation #4 Recommendation A: 3

Objective Provide Recommendations for Online News Portals Challenges: Continuous changes in the set of news items (new items, corrected items) Continuous stream describing user-item interactions Noisy user tracking (users do not login explicitly) Several different news portals (local news, sports, tech news, ) Time-dependent user preferences Recommendation requests must be answered within 100ms Scalability concurrent requests Evaluation Live user feedback O n l i n e N e w s P o r t a l The TUB presents NewsREEL at ECIR 2016 Living Labs and Stream-based Evaluation are on the rise. A recent inquiry of recommendation experts reveals shockingly low levels of confidence in offline evaluation. 9 out of 10 experts agree that existing protocols can provide spurious results. They suggest to rely on socalled real user feedback. Such feedback can be obtained in living labs such as clef Newsreel. Researchers connect their servers with the Open Recommendation Platform (http://orp.plista.com). They Recommendation #1 Recommendation #2 Recommendation #3 Recommendation #4 Recommendation A: 4

Outline Introduction to Online Real-time Recommendations 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 The TUB presents NewsREEL at ECIR 2016 Living Labs and Stream-based Evaluation are on the rise. A recent inquiry of recommendation experts reveals shockingly low levels of confidence in offline evaluation. 9 out of 10 experts agree that existing protocols can provide spurious results. They suggest to rely on socalled real user feedback. Such feedback can be obtained in living labs such as clef Newsreel. Researchers connect their servers with the Open number of requested suggestions Recommendation #1 Recommendation #2 Recommendation #3 Recommendation #4 Recommendation Platform (http://orp.plista.com). They Recommendation A: domain: ksta.de (418) The News Recommendation Lab Challenge O n l i n e N e w s P o r t a l Traditional recommender systems Stream-based recommender algorithms How does the number of requested suggestions influence the performance? Evaluation estimated precision 1 6 user-based CF most popular sequence item-based CF content-based most popular Technical Aspects Organizing a Challenge Conclusion & Future Plans 5

The News Recommendation Evaluation Lab Challenge in Detail

The TUB presents NewsREEL at ECIR 2016 Living Labs and Stream-based Evaluation are on the rise. A recent inquiry of recommendation experts reveals shockingly low levels of confidence in offline evaluation. 9 out of 10 experts agree that existing protocols can provide spurious results. They suggest to rely on socalled real user feedback. Such feedback can be obtained in living labs such as clef Newsreel. Researchers connect their servers with the Open Recommendation #1 Recommendation #2 Recommendation #3 Recommendation #4 Recommendation Platform (http://orp.plista.com). They Recommendation A: The NewsREEL Challenge Architecture O n l i n e N e w s P o r t a l NewsREEL Challenge Architecture Different portals and recommender teams Online evaluation based on user clicks users news portal name publisher PLISTA server recommender team 1 recommender team 2 recommender team N 7

ORP Interface The TUB presents NewsREEL at ECIR 2016 Living Labs and Stream-based Evaluation are on the rise. A recent inquiry of recommendation experts reveals shockingly low levels of confidence in offline evaluation. 9 out of 10 experts agree that existing protocols can provide spurious results. They suggest to rely on socalled real user feedback. Such feedback can be obtained in living labs such as clef Newsreel. Researchers connect their servers with the Open Recommendation #1 Recommendation #2 Recommendation #3 Recommendation #4 Recommendation Platform (http://orp.plista.com). They Recommendation A: The NewsREEL Challenge Architecture O n l i n e N e w s P o r t a l NewsREEL Challenge Architecture Different portals and recommender teams Online evaluation based on user clicks News Portal Algorithm #1 User Log data offline Evaluator PLISTA Server offline Test- Server Algorithm #2 Algorithm #n 8

The NewsREEL Challenge Provided Data 10GB log data / day JSON format Provided Information: Item data User-item interactions Recommendation Requests Error messages {"recs": {"ints": {"3": [188752630, 186443351, 188900490, 188718683, 188619479, 188823393]}}, "event_type": "impression", "context": {"simple": {"82": 0, "62": 1979832, "63": 1840689, "49": 48, "67": 1928642, "68": 1851453, "69": 1851422, "80": 1982218, "81": 0, "24": 55, "25": 188932307, "27": 1677, "22": 1453961, "23": 23, "47": 504183, "44": 1851485, "42": 0, "29": 17332, "40": 1618697, "5": 0, "4": 68150, "7": 14684, "6": 573577, "9": 26888, "13": 2, "76": 1, "75": 1919991, "74": 1920088, "39": 748, "59": 1275566, "14": 33331, "17": 48985, "16": 48811, "19": 78119, "18": 5, "57": 354840161, "56": 1138207, "37": 1991332, "35": 315003, "52": 1, "31": 0}, "clusters": {"46": {"761805": 100, "472857": 100, "761809": 100}, "51": {"1": 255}, "1": {"7": 255}, "33": {"19840": 3, "513275": 10, "2180636": 5, "2934618": 4, "5518": 1, "2767420": 1, "26519": 2, "129348": 3, "234": 2, "9441716": 1, "9422": 5, "10666434": 2, "10388533": 7, "4050": 1}, "3": [55, 28, 34, 91, 23, 21], "2": [11, 11, 61, 60, 61, 26, 21], "64": {"3": 255}, "65": {"1": 255}, "66": {"12": 255}}, "lists": {"11": [13840], "8": [18841, 18842, 48511], "10": [10, 13, 1768, 1769, 1770]}}, "timestamp": 1405893599691} 9 {"recs": {"ints": {"3": [188675659, 177785300, 188684239, 188898567,

The TUB presents NewsREEL at ECIR 2016 Living Labs and Stream-based Evaluation are on the rise. A recent inquiry of recommendation experts reveals shockingly low levels of confidence in offline evaluation. 9 out of 10 experts agree that existing protocols can provide spurious results. They suggest to rely on socalled real user feedback. Such feedback can be obtained in living labs such as clef Newsreel. Researchers connect their servers with the Open Recommendation #1 Recommendation #2 Recommendation #3 Recommendation #4 Recommendation Platform (http://orp.plista.com). They Recommendation A: The NewsREEL Challenge Impressions and Clicks for one Week (per domain) O n l i n e N e w s P o r t a l 10

impressions per minute The TUB presents NewsREEL at ECIR 2016 Living Labs and Stream-based Evaluation are on the rise. A recent inquiry of recommendation experts reveals shockingly low levels of confidence in offline evaluation. 9 out of 10 experts agree that existing protocols can provide spurious results. They suggest to rely on socalled real user feedback. Such feedback can be obtained in living labs such as clef Newsreel. Researchers connect their servers with the Open Recommendation #1 Recommendation #2 Recommendation #3 Recommendation #4 Recommendation Platform (http://orp.plista.com). They Recommendation A: The NewsREEL Challenge Impressions (Wednesday / Sunday) O n l i n e N e w s P o r t a l 7500 30000 6250 25000 Wednesday, April 24th Mittwoch, 24.04.2013 Sunday, April 28th Sonntag, 28.04.2013 [publisher: ksta.de] 5000 20000 3750 15000 2500 10000 1250 5000 0 0 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 0:00 0:00 4:00 8:00 12:00 16:00 20:00 24:00 11

CTR [%] The TUB presents NewsREEL at ECIR 2016 Living Labs and Stream-based Evaluation are on the rise. A recent inquiry of recommendation experts reveals shockingly low levels of confidence in offline evaluation. 9 out of 10 experts agree that existing protocols can provide spurious results. They suggest to rely on socalled real user feedback. Such feedback can be obtained in living labs such as clef Newsreel. Researchers connect their servers with the Open Recommendation #1 Recommendation #2 Recommendation #3 Recommendation #4 Recommendation Platform (http://orp.plista.com). They Recommendation A: The NewsREEL Challenge Time dependent Click-Through-Rate O n l i n e N e w s P o r t a l The Click-Through-Rate dependent from the hour of the day 2.5 ksta.de sport1.de 2.0 gulli.com tagesspiegel.de 1.5 cio.de motor-talk.de 1.0 0.5 0.0 0 2 4 6 8 10 12 14 16 18 20 22 hour of the day 12

The TUB presents NewsREEL at ECIR 2016 Living Labs and Stream-based Evaluation are on the rise. A recent inquiry of recommendation experts reveals shockingly low levels of confidence in offline evaluation. 9 out of 10 experts agree that existing protocols can provide spurious results. They suggest to rely on socalled real user feedback. Such feedback can be obtained in living labs such as clef Newsreel. Researchers connect their servers with the Open Recommendation #1 Recommendation #2 Recommendation #3 Recommendation #4 Recommendation Platform (http://orp.plista.com). They Recommendation A: The NewsREEL Challenge Device usage O n l i n e N e w s P o r t a l 100% 90% 80% 70% 60% 50% 40% 30% other phone tablet desktop 20% 10% 0% Used interface devices and plugins give important information about the user s usage context 13 domain: ksta.de

Traditional Recommender Systems 14

Motivation State of the Art: Recommender Systems 15 Collaborative Filtering Recommender based on a User-Item-Matrix Typical scenarios: Recommending Books Songs, movies Products Friends 2 22 2 22 2 22 2 1 1 1 1 5 1 5 2 5 2 3 5 1 2 5 3 3 3 2 4 4 5 5 3 4 3 5 5 5 5 5 4 5 4

Motivation State of the Art: Recommender Systems Characteristic Properties 2 Huge user-item-matrix 22 Sparsity 2 Almost static matrix 22 2 22 State-Of-The-Art Approach 2 Matrix-factorization methods Heuristics and distributed processing Neighborhood models 1 1 1 1 5 1 5 2 5 2 3 5 1 2 5 3 3 3 2 4 4 5 5 3 4 3 5 5 5 5 5 4 5 4 = x x 16 M = USV T

Recommendations based on Streams 17

Approach - Algorithms Overview Optimize recommender algorithms for the streaming scenario Adapt Recommender Algorithms for Streams Continuously adapt the model used by the recommender; integrate new data, remove old data Start with simple algorithm! sliding window-based model 18

Approach - Algorithms Most Recently Requested 1/2 Idea Recommend the items most recently viewed by other users Implementation: Store the last impression events in a ring buffer Provide the most recent items as suggestions 19

Approach - Algorithms Most Recently Requested 2/2 Strengths Fast, simplified synchronization Biased towards popular items Randomized recommendations Weaknesses Not personalized Does not consider the context 20

Approach - Algorithms Adapt algorithms for stream-based scenarios Time-Frame-based Algorithms Most-recently requested Most-popular recommender Most-popular sequence User-based Collaborative Filtering Item-based Collaborative Filtering Content-based Filtering POPULAR MOST POPULAR MOST POPULAR MOST 2 Implement the Algorithms based on streams (fifo) or micro-batches 21

Evaluation??? 22

estimated precision 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 number of requested suggestions domain: ksta.de (418 Evaluation Results Recommendation precision dependent from the time of day estimated precision How does the number of requested suggestions influence the performance? user-based CF most popular seque item-based CF content-based most popular 1 6 How does the hour of the day influence the performance? 0.45 0.40 0.35 0.30 0.25 0.20 0.15 user-based CF most popular sequence item-based CF content-based most popular domain: sport1 (596) 0.10 0.05 0.00 houroftheday 23

estimated precision 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 number of requested suggestions domain: ksta.de (418 Evaluation Results Recommendation precision dependent from # requested results 0.00 estimated precision How does the number of requested suggestions influence the performance? user-based CF most popular seque item-based CF content-based most popular 1 6 0.40 How does the number of requested suggestions influence the performance? 0.35 0.30 0.25 0.20 0.15 0.10 0.05 user-based CF most popular sequence item-based CF content-based most popular domain: ksta.de (418) 0.00 1 6 number of requested suggestions 24

estimated precision 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 number of requested suggestions domain: ksta.de (418 Evaluation Results Recommendation precision dependent on the publisher 0.00 estimated precision How does the number of requested suggestions influence the performance? user-based CF most popular seque item-based CF content-based most popular 1 6 0.7 How does the domain influence the performance? 0.6 0.5 0.4 0.3 0.2 user-based CF most popular sequence item-based CF content-based most popular 0.1 0.0 domain 25

0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 number of requested suggestions domain: ksta.de (418 Evaluation Results Discussion estimated precision How does the number of requested suggestions influence the performance? user-based CF most popular seque item-based CF content-based most popular 1 6 Timeframe-based Algorithms Results The portal and the time have a big influence on the results Hidden parameters (e.g. the placement of recommendations) should be considered Parameter optimization (e.g. size of the time window) is very important Ensemble Algorithms combining several algorithms based on the context outperform single algorithms 26

Technical Requirements How to ensure real-time recommendations? 27

Technical Requirements Approach Ensure that messages are processed concurrently Higher Priority for requests Avoid global locks / synchronized blocks Choose a framework optimized for the expected number of messages Java Concurrent Collections, Google Guava, Redis Akka, Apache Flink, Apache Spark 28

Technical Requirements Results In Memory-Approaches work well in NewsREEL as long as there are no extreme load peaks Little overhead Limited maintenance effort Apache Flink / Apache Spark are well-suited for building models, not for ensuring real-time responses Delays in model updates are not critical in NewsREEL Use distributed environments for learning the recommender models 29

Experiences from the NewsREEL challenge

Running a Challenge based on Live Data Experiences from the NewsREEL challenge Our Experience The scenario of online stream-based recommendations is new to most participants Live-Evaluation requires that the algorithm handles unexpected events Variance in the user behavior is a problem for many participants Tight time constraints make the challenge difficult to debug Provided example recommender component reduce the barrier for new participants, but participants tend to only slightly modify the templates instead of implementing new ideas A context-dependent bias in the average CTR makes it difficult to compare different strategies 31

Cube4U's Gigaminx, black body., Work by Tetracube, CC BY-SA 3.0, http://upload.wikimedia.org/wikipedia/commons/0/0d/c4u-gigaminx-01.jpg Conclusion Providing News Recommendation is challenging Incorporate the time-dependent relevance in the recommender models The performance of the algorithms highly depends on the context. There is no overall best algorithm A parameter optimization is very important The continuous changes and the variance in the user behavior make the parameter optimization an complex task The optimal technical solution depends from the number of messages 32

The Future NewsREEL in 2017 The NewsREEL challenge will continue in 2017 as a CLEF challenge (conferences and labs of the evaluation forum) New Research Topics in 2017 Multi-Media Content Multi-Lingual Content Correlation between online and offline evaluation Combine academic and business metrics 33

Conclusion Participate in NewsREEL 2017! Heterogeneous data + Streamed data + Real-life Online Evaluation + Multiple Metrics Challenging Research Topics 34

Conclusion Participate in NewsREEL 2017! Sign-up for NewsREEL 2017 Labs registration opens: 4 Nov 2016 Tutorials, code examples, and the baseline recommender implementation are provided on GitHub Papers describing already implemented recommender algorithms API-Documentation: orp.plista.com Web-Page for configuring the message stream (*under construction*) 35

Thank you for your attention Questions? More Information: Visit the NewsREEL challenge web site: http://www.clef-newsreel.org/ Benchmarking News Recommendations: The CLEF NewsREEL Use Case In SIGIR Forum December 2015, Volume 49 Number 2 http://sigir.org/files/forum/2015d/p129.pdf 36 The CrowdRec Project: http://crowdrec.eu/

Contact Dr.-Ing. Andreas Lommatzsch TU Berlin, Sekr. TEL-14, AOT Application Center Data Analytics http://www.dai-labor.de/de/kompetenzzentren/irml/mitarbeiter/andreas.lommatzsch andreas.lommatzsch@dai-labor.de Fon +49 (0) 30 / 314 74 071 Fax +49 (0) 30 / 314 74 003 DAI-Labor Competence Center Information Retrieval & Machine Learning