Real-time Recommendations on Spark. Jan Neumann, Sridhar Alla (Comcast Labs) DC Spark Interactive Meetup East May
|
|
- Agnes Rice
- 5 years ago
- Views:
Transcription
1 Real-time Recommendations on Spark Jan Neumann, Sridhar Alla (Comcast Labs) DC Spark Interactive Meetup East May
2 Who am I? Jan Neumann, Lead of Big Data and Content Analysis Research Teams This is joint work with Sridhar Alla, Director/Big Data Architect, Enterprise BI Comcast Labs DC Responsibilities Develop Content Discovery and Metadata Back-end Services for Comcast Innovation Voice Interface for TV Machine Learning/Data Science Expertise for all of Comcast
3 Comcast Labs DC powers all CONTENT DISCOVERY for X1 Search Algorithmic Menus Poster Art Video On-Demand Live TV Personalized Recommendations
4 Who are we similar to? METADATA LIKE SEARCH LIKE RECOMMENDATIONS LIKE Powering millions of devices Taking into account your TV channels, subscriptions and tastes Including live programming
5 How it all works Formed in 2011, we now operate one of the largest and most sophisticated metadata and content discovery platforms in the industry. CONTENT INFORMATION METADATA PROVIDERS DISCOVERY CONTENT PROVIDERS CONTENT IMAGES LOGOS MENU BILLING SYSTEMS CUSTOMER USAGE SUBSCRIBER INFORMATION CATALOGS ENTITLEMENTS RECOMMEND BROWSE PERSONALIZE VOICE CONTROL SEARCH MILLIONS OF DEVICES CHANNEL LINEUPS DEEP METADATA PURCHASES
6 What are recommendation systems? Recommendation systems (RS) are everywhere Video (Comcast, Netflix) Music (Apple Genius, Spotify, Pandora) Products (Amazon, Ebay) Targeted Advertisements Search Recommendations Items
7 What are recommendation systems? Recommendation systems (RS) are everywhere Video (Comcast, Netflix) Music (Apple Genius, Spotify, Pandora) Products (Amazon, Ebay) Targeted Advertisements RS help to match users with items Ease information overload - long-tail Sales assistance (guidance, advisory, persuasion, ) Search Recommendations Items
8 What are recommendation systems? Recommendation systems (RS) are everywhere Video (Comcast, Netflix) Music (Apple Genius, Spotify, Pandora) Products (Amazon, Ebay) Targeted Advertisements RS help to match users with items Ease information overload - long-tail Sales assistance (guidance, advisory, persuasion, ) Search Recommendations Different system designs / paradigms Based on availability of exploitable data Implicit and explicit user feedback Domain characteristics Items
9 Importance of recommending from the long tail
10 Content-based Recommendations Main idea: Recommend items to customer x similar to previous items rated highly by x Example: Movie recommendations Recommend movies with same actor(s), director, genre, Websites, blogs, news Recommend other sites with similar content From J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
11 Pros: Content-based Approach +: No need for data on other users No cold-start or sparsity problems +: Able to recommend to users with unique tastes +: Able to recommend new & unpopular items No first-rater problem +: Able to provide explanations Can provide explanations of recommended items by listing contentfeatures that caused an item to be recommended From J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
12 Cons: Content-based Approach : Finding the appropriate features is hard E.g., images, movies, music : Recommendations for new users How to build a user profile? : Overspecialization Never recommends items outside user s content profile People might have multiple interests Unable to exploit quality judgments of other users From J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
13 Collaborative Filtering: Example Customer X Customer Y Buys Metallica CD Does search on Metallica Buys Megadeth CD RS suggests Megadeth from data collected about customer X From J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
14 Collaborative Filtering From J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
15 Pros/Cons of Collaborative Filtering + Works for any kind of item No feature selection needed - Cold Start: Need enough users in the system to find a match - Sparsity: The user/ratings matrix is sparse Hard to find users that have rated the same items - First rater: Cannot recommend an item that has not been previously rated New items, Esoteric items - Popularity bias: Cannot recommend items to someone with unique taste Tends to recommend popular items From J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
16 This Talk: Real-time TV Recommendations + = Trending For You
17 How can we do it? Challenges We have millions of users, thousands of programs Programs on live TV are constantly changing (Cold-Start) Approach inspired by Google news personalization: scalable online collaborative filtering, Das et al., 2007 Recommend what people in your geographic area with a taste similar to you are currently watching and do it in Spark.
18 Real-time Recommendations Algorithm Cluster user by taste profiles and geographic proximity Calculate Top K trending programs for each cluster Look up cluster for user and return trending programs User Request User-> Cluster Cluster -> Trending Programs User -> Trending Programs
19 Real-time Recommendations in Spark Thanks to Spark we can implement this quickly User History from HDFS Live Tune Activity via Kafka Batch: User Clustering with MlLib Real-time: TopK Trending Programs per Cluster w/ Spark Streaming Real-time Program recommendations per user
20 Collaborative Filtering Implementation: Matrix Factorization
21 Collaborative Filtering Implementation: Matrix Factorization
22 No Ratings: Implicit Matrix Factorization ALS.trainImplicit(view_count,k,numIter,alpha,lambda) For more info see Music Recommendations with Spark, Chris Johnson (Spotify), Spark Summit View Count in % watched min Confidence = f(view count) Preference Matrix U 1 U 2 U 3 U 4 U 5 User Matrix #users*k M 1 M 2 M 3 Movie Matrix k*#movies
23 Math Detour : Cluster Normalized User Vectors Spark KMeans can only cluster points in Euclidean space Cannot cluster preference vectors directly: P 1 P 2 2 = U 1 M U 2 M 2 = (U 1 U 2 )MM T (U 1 U 2 ) T U 1 U 2 2 M T X D Y T X T
24 Batch: Cluster Users based on their Tastes Group users by geographic area Compute user taste vector from viewing history for each geographic area Cluster users to find groups with similar tastes Usage history aggregation Implicit Matrix Factorization SVD for Dimensionality Reduction Kmeans Clustering of Users User-> Cluster
25 Batch Implementation # for each geographic region # convert user viewing history to ratings (hash user_id to int) val user_history = sc.textfile( user_history.dat ) val ratings = user_history.flat_map(parse_ratings)
26 Batch Implementation # for each geographic region # convert user viewing history to ratings (hash user_id to int) val user_history = sc.textfile( user_history.dat ) val ratings = user_history.flat_map(parse_ratings) # build matrix factorization model val mf_model = ALS.train_implicit(ratings, rank, n, lambda, alpha)
27 Batch Implementation # transform the movie feature matrix val productrows = mf_model.productfeatures.map(s=>vectors.dense(s._2)) val productrowmatrix = new RowMatrix(productRows) val productsvd = productrowmatrix.computesvd(svdrank) val userfeatures = userrowmatrix.multiply(productsvd.v).multiply(matrices.diag(productsvd.s))
28 Batch Implementation # transform the movie feature matrix val productrows = mf_model.productfeatures.map(s=>vectors.dense(s._2)) val productrowmatrix = new RowMatrix(productRows) val productsvd = productrowmatrix.computesvd(svdrank) val userfeatures = userrowmatrix.multiply(productsvd.v).multiply(matrices.diag(productsvd.s)) # use latent taste space to cluster users val cluster_model = KMeans(userFeatures.rows,numClusters,numIter)
29 Real-time Data Ingest How data flows into the system Back-end servers log user interactions with STB Data is processed and formatted using Flume Data is then passed on to Real-time Process using Storm Long-storage in HDFS For external consumption via access-contolled Kafka/REST We connect to Kafka (real-time) or access data directly from HDFS (batch)
30 Real-time: Compute TopK TV programs per cluster What is each viewer watching? Aggregate popular programs across users for each cluster Keep Top K (using Twitter Algebird TopK Monoid) Update Viewer State Count Programs being viewed Map Program Counts to User Clusters Create TopK Program List for each Cluster Cluster -> Trending Programs
31 Real-Time Implementation // format event_time device_id program_id station_id dma_title tune_type // get data from Kafka val tuneeventsperuser = KafkaUtils.createStream(ssc, zkquorum, groupid, topics, storagelevel).flatmap(parsetuneeventbyuser)
32 Real-Time Implementation // format event_time device_id program_id station_id dma_title tune_type // get data from Kafka val tuneeventsperuser = KafkaUtils.createStream(ssc, zkquorum, groupid, topics, storagelevel).flatmap(parsetuneeventbyuser) // what is being watched by each user val userstate = tuneeventsperuser.updatestatebykey(updateuserhistory).cache()
33 Real-Time Implementation // format event_time device_id program_id station_id dma_title tune_type // get data from Kafka val tuneeventsperuser = KafkaUtils.createStream(ssc, zkquorum, groupid, topics, storagelevel).flatmap(parsetuneeventbyuser) // what is being watched by each user val userstate = tuneeventsperuser.updatestatebykey(updateuserhistory).cache() // aggregate tunes per program per cluster val tvtunes = userstate.map { case (userid,tuneinfo) => ((tuneinfo.programid,user2cluster(userid)),1) }.reducebykey(_+_)
34 Compute Top K Programs per Cluster import com.twitter.algebird.topkmonoid case class ProgramCount (val programid:long, val count: Int) extends Ordered[ProgramCount] { def compare(that: ProgramCount):Int = {... } } val topkmonoid = new TopKMonoid[ProgramCount](topk)
35 Compute Top K Programs per Cluster import com.twitter.algebird.topkmonoid case class ProgramCount (val programid:long, val count: Int) extends Ordered[ProgramCount] { def compare(that: ProgramCount):Int = {... } } val topkmonoid = new TopKMonoid[ProgramCount](topk) val tvtopk = tvtunes.map { case ((programid,clusterid),cnt) => (clusterid,topkmonoid.build(programcount(programid,cnt))) }.reducebykey(topkmonoid.plus)
36 Compute Top K Programs per Cluster import com.twitter.algebird.topkmonoid case class ProgramCount (val programid:long, val count: Int) extends Ordered[ProgramCount] { def compare(that: ProgramCount):Int = {... } } val topkmonoid = new TopKMonoid[ProgramCount](topk) val tvtopk = tvtunes.map { case ((programid,clusterid),cnt) => (clusterid,topkmonoid.build(programcount(programid,cnt))) }.reducebykey(topkmonoid.plus) // export top tunes to Redis for lookup by web service tvtopk.foreachrdd(rdd => rdd.foreachpartition(p => (savetoredis(p)))
37 Results Leverage existing Hadoop infrastructure and data Compute 10 user clusters for 100k users in less than 10 minutes using 100 cores, 128GB RAM Consume STB events on a real time basis directly from Kafka Calculate Top K trending programs for each cluster in 20 sec micro batches storing the results in Redis. Service requests for Personalized Trending Shows = Happy Customer
38 Trending for You Web Service Morning
39 Trending for You Web Service Evening
40 Final Words Thanks to Spark we implemented first version in 2-3 weeks Example accelerated adoption of Spark in dev & research Many further improvements possible Do time-dependent clustering of user tastes Gather feedback from real users We are hiring! Contact us at jobs.comcast.com
41
42 Math Detour : Cluster Normalized User Vectors Spark Kmeans can only cluster elements in Euclidean space Problem: P 1 P 2 2 = U 1 M U 2 M 2 = (U 1 U 2 )MM T (U 1 U 2 ) T U 1 U 2 2 M.computeSVD M T = X D Y T where X T X = I, Y T Y = I, D = diag
43 Math Detour : Cluster Normalized User Vectors Spark Kmeans can only cluster elements in Euclidean space Problem: P 1 P 2 2 = U 1 M U 2 M 2 = (U 1 U 2 )MM T (U 1 U 2 ) T U 1 U 2 2 M.computeSVD M T = X D Y T where X T X = I, Y T Y = I, D = diag (U 1 U 2 )MM T (U 1 U 2 ) T = (U 1 U 2 )YDDY T (U 1 U 2 ) T = U 1 U 2 2 With U = U Y D
44 Math Detour : Cluster Normalized User Vectors Spark Kmeans can only cluster elements in Euclidean space Problem: P 1 P 2 2 = U 1 M U 2 M 2 = (U 1 U 2 )MM T (U 1 U 2 ) T U 1 U 2 2 M.computeSVD M T = X D Y T where X T X = I, Y T Y = I, D = diag (U 1 U 2 )MM T (U 1 U 2 ) T = (U 1 U 2 )YDDY T (U 1 U 2 ) T 2 = U 1 U 2 With U = U Y D KMeans.train( U,numClusters,numIter)
45 Math Detour : Cluster Normalized User Vectors Spark Kmeans can only cluster elements in Euclidean space P 1 P 2 2 = U 1 M U 2 M 2 = (U 1 U 2 )MM T (U 1 U 2 ) T U 1 U 2 2 M.computeSVD M T = X D Y T where X T X = I, Y T Y = I, D = diag (U 1 U 2 )MM T (U 1 U 2 ) T = (U 1 U 2 )YDDY T (U 1 U 2 ) T 2 = U 1 U 2 With U = U Y D KMeans.train( U,numClusters,numIter)
46 Motivation behind Matrix Factorizations (Latent Space Models) The Color Purple Serious Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s 11 Lethal Weapon Geared towards males The Lion King The Princess Independence Diaries Day Dumb and Funny Dumber From J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
47 Factor 2 The Effect of Regularization The Color Purple serious Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s 11 Lethal Weapon Factor 1 Geared towards males min P, Q xi training 2 ( r q p ) The Princess Diaries i x min factors error + length x p 2 x i q 2 i The Lion King funny Independence Day Dumb and Dumber
48 Factor 2 The Effect of Regularization The Color Purple serious Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s 11 Lethal Weapon Factor 1 Geared towards males min P, Q xi training 2 ( r q p ) The Princess Diaries i x min factors error + length x p 2 x i q 2 i The Lion King funny Independence Day Dumb and Dumber
49 Factor 2 The Effect of Regularization The Color Purple serious Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s 11 Lethal Weapon Factor 1 Geared towards males min P, Q xi training 2 ( r q p ) The Princess Diaries i x min factors error + length x p 2 x i q 2 i The Lion King funny Independence Day Dumb and Dumber
50 Factor 2 The Effect of Regularization The Color Purple serious Amadeus Braveheart Geared towards females Sense and Sensibility Ocean s 11 Lethal Weapon Factor 1 Geared towards males min P, Q xi training 2 ( r q p ) The Princess Diaries i x min factors error + length x p 2 x i q 2 i The Lion King funny Independence Day Dumb and Dumber
CS 124/LINGUIST 180 From Languages to Information
CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys Metallica
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Customer X Buys Metalica CD Buys Megadeth CD Customer Y Does search on Metalica Recommender system suggests Megadeth
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu //8 Jure Leskovec, Stanford CS6: Mining Massive Datasets High dim. data Graph data Infinite data Machine learning
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu /7/0 Jure Leskovec, Stanford CS6: Mining Massive Datasets, http://cs6.stanford.edu High dim. data Graph data Infinite
More informationThanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman
Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman http://www.mmds.org Overview of Recommender Systems Content-based Systems Collaborative Filtering J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive
More informationMining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Infinite data. Filtering data streams
/9/7 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them
More informationRecommendation and Advertising. Shannon Quinn (with thanks to J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University)
Recommendation and Advertising Shannon Quinn (with thanks to J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Lecture breakdown Part : Advertising Bipartite Matching AdWords Part : Recommendation
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #16: Recommenda2on Systems
CS 6: (Big) Data Management Systems B. Aditya Prakash Lecture #6: Recommendaon Systems Example: Recommender Systems Customer X Buys Metallica CD Buys Megadeth CD Customer Y Does search on Metallica Recommender
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #7: Recommendation Content based & Collaborative Filtering Seoul National University In This Lecture Understand the motivation and the problem of recommendation Compare
More informationBBS654 Data Mining. Pinar Duygulu
BBS6 Data Mining Pinar Duygulu Slides are adapted from J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org Mustafa Ozdal Example: Recommender Systems Customer X Buys Metallica
More informationData Mining Techniques
Data Mining Techniques CS 6 - Section - Spring 7 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Project Project Deadlines Feb: Form teams of - people 7 Feb:
More informationCS 124/LINGUIST 180 From Languages to Information
CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys CD of
More informationData Mining Techniques
Data Mining Techniques CS 60 - Section - Fall 06 Lecture Jan-Willem van de Meent (credit: Andrew Ng, Alex Smola, Yehuda Koren, Stanford CS6) Recommender Systems The Long Tail (from: https://www.wired.com/00/0/tail/)
More informationCS 124/LINGUIST 180 From Languages to Information
CS /LINGUIST 80 From Languages to Information Dan Jurafsky Stanford University Recommender Systems & Collaborative Filtering Slides adapted from Jure Leskovec Recommender Systems Customer X Buys CD of
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu /6/01 Jure Leskovec, Stanford C6: Mining Massive Datasets Training data 100 million ratings, 80,000 users, 17,770
More informationRecommender Systems Collabora2ve Filtering and Matrix Factoriza2on
Recommender Systems Collaborave Filtering and Matrix Factorizaon Narges Razavian Thanks to lecture slides from Alex Smola@CMU Yahuda Koren@Yahoo labs and Bing Liu@UIC We Know What You Ought To Be Watching
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu Training data 00 million ratings, 80,000 users, 7,770 movies 6 years of data: 000 00 Test data Last few ratings of
More informationCOMP 465: Data Mining Recommender Systems
//0 movies COMP 6: Data Mining Recommender Systems Slides Adapted From: www.mmds.org (Mining Massive Datasets) movies Compare predictions with known ratings (test set T)????? Test Data Set Root-mean-square
More informationMachine Learning and Data Mining. Collaborative Filtering & Recommender Systems. Kalev Kask
Machine Learning and Data Mining Collaborative Filtering & Recommender Systems Kalev Kask Recommender systems Automated recommendations Inputs User information Situation context, demographics, preferences,
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu //8 Jure Leskovec, Stanford CS6: Mining Massive Datasets Training data 00 million ratings, 80,000 users, 7,770 movies
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
We need your help with our research on human interpretable machine learning. Please complete a survey at http://stanford.io/1wpokco. It should be fun and take about 1min to complete. Thanks a lot for your
More informationCreating a Recommender System. An Elasticsearch & Apache Spark approach
Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused
More informationRecommender Systems - Content, Collaborative, Hybrid
BOBBY B. LYLE SCHOOL OF ENGINEERING Department of Engineering Management, Information and Systems EMIS 8331 Advanced Data Mining Recommender Systems - Content, Collaborative, Hybrid Scott F Eisenhart 1
More informationMining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationRecommender Systems. Collaborative Filtering & Content-Based Recommending
Recommender Systems Collaborative Filtering & Content-Based Recommending 1 Recommender Systems Systems for recommending items (e.g. books, movies, CD s, web pages, newsgroup messages) to users based on
More informationWeb Personalisation and Recommender Systems
Web Personalisation and Recommender Systems Shlomo Berkovsky and Jill Freyne DIGITAL PRODUCTIVITY FLAGSHIP Outline Part 1: Information Overload and User Modelling Part 2: Web Personalisation and Recommender
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationMusic Recommendation with Implicit Feedback and Side Information
Music Recommendation with Implicit Feedback and Side Information Shengbo Guo Yahoo! Labs shengbo@yahoo-inc.com Behrouz Behmardi Criteo b.behmardi@criteo.com Gary Chen Vobile gary.chen@vobileinc.com Abstract
More informationRecommender System. What is it? How to build it? Challenges. R package: recommenderlab
Recommender System What is it? How to build it? Challenges R package: recommenderlab 1 What is a recommender system Wiki definition: A recommender system or a recommendation system (sometimes replacing
More informationA Survey on Various Techniques of Recommendation System in Web Mining
A Survey on Various Techniques of Recommendation System in Web Mining 1 Yagnesh G. patel, 2 Vishal P.Patel 1 Department of computer engineering 1 S.P.C.E, Visnagar, India Abstract - Today internet has
More informationRecommendation Systems
Recommendation Systems CS 534: Machine Learning Slides adapted from Alex Smola, Jure Leskovec, Anand Rajaraman, Jeff Ullman, Lester Mackey, Dietmar Jannach, and Gerhard Friedrich Recommender Systems (RecSys)
More informationCS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.
CS535 Big Data - Fall 2017 Week 8-A-1 CS535 BIG DATA FAQs Term project proposal New deadline: Tomorrow PA1 demo PART 1. BATCH COMPUTING MODELS FOR BIG DATA ANALYTICS 5. ADVANCED DATA ANALYTICS WITH APACHE
More informationRecommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen
Recommendation Algorithms: Collaborative Filtering CSE 6111 Presentation Advanced Algorithms Fall. 2013 Presented by: Farzana Yasmeen 2013.11.29 Contents What are recommendation algorithms? Recommendations
More informationHow GPUs Power Comcast's X1 Voice Remote and Smart Video Analytics. Jan Neumann Comcast Labs DC May 10th, 2017
How GPUs Power Comcast's X1 Voice Remote and Smart Video Analytics Jan Neumann Comcast Labs DC May 10th, 2017 Comcast Applied Artificial Intelligence Lab Media & Video Analytics Smart TV Deep Learning
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationCOMP6237 Data Mining Making Recommendations. Jonathon Hare
COMP6237 Data Mining Making Recommendations Jonathon Hare jsh2@ecs.soton.ac.uk Introduction Recommender systems 101 Taxonomy of recommender systems Collaborative Filtering Collecting user preferences as
More informationWeighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract
Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD Abstract There are two common main approaches to ML recommender systems, feedback-based systems and content-based systems.
More informationDeep Learning for Recommender Systems
join at Slido.com with #bigdata2018 Deep Learning for Recommender Systems Oliver Gindele @tinyoli oliver.gindele@datatonic.com Big Data Conference Vilnius 28.11.2018 Who is Oliver? + Head of Machine Learning
More informationCS 572: Information Retrieval
CS 7: Information Retrieval Recommender Systems : Implementation and Applications Acknowledgements Many slides in this lecture are adapted from Xavier Amatriain (Netflix), Yehuda Koren (Yahoo), and Dietmar
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationData Mining Lecture 2: Recommender Systems
Data Mining Lecture 2: Recommender Systems Jo Houghton ECS Southampton February 19, 2019 1 / 32 Recommender Systems - Introduction Making recommendations: Big Money 35% of Amazons income from recommendations
More informationMatrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1
Matrix-Vector Multiplication by MapReduce From Rajaraman / Ullman- Ch.2 Part 1 Google implementation of MapReduce created to execute very large matrix-vector multiplications When ranking of Web pages that
More informationRecommender Systems - Introduction. Data Mining Lecture 2: Recommender Systems
Recommender Systems - Introduction Making recommendations: Big Money 35% of amazons income from recommendations Netflix recommendation engine worth $ Billion per year And yet, Amazon seems to be able to
More informationAntonio Fernández Anta
Antonio Fernández Anta Joint work with Luis F. Chiroque, Héctor Cordobés, Rafael A. García Leiva, Philippe Morere, Lorenzo Ornella, Fernando Pérez, and Agustín Santos Recommendation Engines (RE) suggest
More informationPersonalizing Netflix with Streaming datasets
Personalizing Netflix with Streaming datasets Shriya Arora Senior Data Engineer Personalization Analytics @shriyarora What is this talk about? Helping you decide if a streaming pipeline fits your ETL problem
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationCSE 158 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)
CSE 158 Lecture 8 Web Mining and Recommender Systems Extensions of latent-factor models, (and more on the Netflix prize) Summary so far Recap 1. Measuring similarity between users/items for binary prediction
More informationBuilding a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch
Nick Pentreath Nov / 14 / 16 Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationBig Data Infrastructures & Technologies
Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory
More informationCSE 258 Lecture 8. Web Mining and Recommender Systems. Extensions of latent-factor models, (and more on the Netflix prize)
CSE 258 Lecture 8 Web Mining and Recommender Systems Extensions of latent-factor models, (and more on the Netflix prize) Summary so far Recap 1. Measuring similarity between users/items for binary prediction
More informationFluentd + MongoDB + Spark = Awesome Sauce
Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here Wipro Open Source Practice: Vision & Mission Vision
More informationCS 345A Data Mining Lecture 1. Introduction to Web Mining
CS 345A Data Mining Lecture 1 Introduction to Web Mining What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns Web Mining v. Data Mining Structure (or lack of
More informationWeb Personalization & Recommender Systems
Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher, Depaul University - Recent publications: see the last page (Reference section) Web Personalization & Recommender
More informationQ.I. Leap Analytics Inc.
QI Leap Analytics Inc REX A Cloud-Based Personalization Engine Ali Nadaf May 2017 Research shows that online shoppers lose interest after 60 to 90 seconds of searching[ 1] Users either find something interesting
More informationData Analytics with HPC. Data Streaming
Data Analytics with HPC Data Streaming Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationWHITEPAPER. The Lambda Architecture Simplified
WHITEPAPER The Lambda Architecture Simplified DATE: April 2016 A Brief History of the Lambda Architecture The surest sign you have invented something worthwhile is when several other people invent it too.
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationBig Data on AWS. Peter-Mark Verwoerd Solutions Architect
Big Data on AWS Peter-Mark Verwoerd Solutions Architect What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing
More informationMicrosoft Exam
Volume: 42 Questions Case Study: 1 Relecloud General Overview Relecloud is a social media company that processes hundreds of millions of social media posts per day and sells advertisements to several hundred
More informationKNIME for the life sciences Cambridge Meetup
KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016 What is KNIME? A bit of motivation: tool blending, data blending, documentation, automation, reproducibility More
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 24, 2015 Course Information Website: www.stat.ucdavis.edu/~chohsieh/ecs289g_scalableml.html My office: Mathematical Sciences Building (MSB)
More informationThe News Recommendation Evaluation Lab (NewsREEL) Online evaluation of recommender systems
The News Recommendation Evaluation Lab (NewsREEL) Online evaluation of recommender systems Andreas Lommatzsch TU Berlin, TEL-14, Ernst-Reuter-Platz 7, 10587 Berlin @Recommender Meetup Amsterdam (September
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationCollective Intelligence in Action
Collective Intelligence in Action SATNAM ALAG II MANNING Greenwich (74 w. long.) contents foreword xv preface xvii acknowledgments xix about this book xxi PART 1 GATHERING DATA FOR INTELLIGENCE 1 "1 Understanding
More informationDistributed systems for stream processing
Distributed systems for stream processing Apache Kafka and Spark Structured Streaming Alena Hall Alena Hall Large-scale data processing Distributed Systems Functional Programming Data Science & Machine
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationLambda Architecture with Apache Spark
Lambda Architecture with Apache Spark Michael Hausenblas, Chief Data Engineer MapR First Galway Data Meetup, 2015-02-03 2015 MapR Technologies 2015 MapR Technologies 1 Polyglot Processing 2015 2014 MapR
More informationBig Data Infrastructure at Spotify
Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure September 26, 2013 2 Who am I? According to ZDNet: "The work they have done to improve the Apache Hive data warehouse system
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationPay TV solution from ADB
Pay TV solution from ADB Complete solution for broadcast and broadband environment Integrated with personalised recommendations Consistent content discovery across multiple devices Entire functionality
More informationRecommender Systems. Nivio Ziviani. Junho de Departamento de Ciência da Computação da UFMG
Recommender Systems Nivio Ziviani Departamento de Ciência da Computação da UFMG Junho de 2012 1 Introduction Chapter 1 of Recommender Systems Handbook Ricci, Rokach, Shapira and Kantor (editors), 2011.
More informationNowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?
Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/
More informationMEDIAMPLIFY : A Cloud to Cable TV Platform for Music, TV, and Video Dr. Edwin A. Hernandez Chief Technology Officer EGLA COMMUNICATIONS
MEDIAMPLIFY : Amplify your reach A Cloud to Cable TV Platform for Music, TV, and Video Dr. Edwin A. Hernandez Chief Technology Officer Mediamplify is the one stop shop multi-platform media distribution
More informationBig Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018
Big Data com Hadoop Impala, Hive e Spark VIII Sessão - SQL Bahia 03/03/2018 Diógenes Pires Connect with PASS Sign up for a free membership today at: pass.org #sqlpass Internet Live http://www.internetlivestats.com/
More informationTI2736-B Big Data Processing. Claudia Hauff
TI2736-B Big Data Processing Claudia Hauff ti2736b-ewi@tudelft.nl Intro Streams Streams Map Reduce HDFS Pig Pig Design Patterns Hadoop Ctd. Graphs Giraph Spark Zoo Keeper Spark Learning objectives Implement
More informationELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She
ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4 Prof. James She james.she@ust.hk 1 Selected Works of Activity 4 2 Selected Works of Activity 4 3 Last lecture 4 Mid-term
More informationNew Challenges in Big Data: Technical Perspectives. Hwanjo Yu POSTECH
New Challenges in Big Data: Technical Perspectives Hwanjo Yu POSTECH http:/hwanjoyu.org Over 1 Billion SNS users!! Viral Marketing Word-of-Mouth Effect > TV advertising......... Influence Maximization
More informationSystem For Product Recommendation In E-Commerce Applications
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 11, Issue 05 (May 2015), PP.52-56 System For Product Recommendation In E-Commerce
More informationDiversity in Recommender Systems Week 2: The Problems. Toni Mikkola, Andy Valjakka, Heng Gui, Wilson Poon
Diversity in Recommender Systems Week 2: The Problems Toni Mikkola, Andy Valjakka, Heng Gui, Wilson Poon Review diversification happens by searching from further away balancing diversity and relevance
More informationRecommender Systems New Approaches with Netflix Dataset
Recommender Systems New Approaches with Netflix Dataset Robert Bell Yehuda Koren AT&T Labs ICDM 2007 Presented by Matt Rodriguez Outline Overview of Recommender System Approaches which are Content based
More informationThe OTT Co-Viewing Experience: 2017 November 2017
The OTT Co-Viewing Experience: 2017 November 2017 Sponsored by Objectives IAB Digital Video Center of Excellence has identified OTT/Connected TV as one of its research priorities in 2017. During the first
More informationAll it takes is One to experience it all.
All it takes is One to experience it all. Welcome to Suddenlink All it takes is One to start connecting to everything you love. We ve created this guide to help you get to know this all-in-one connected
More informationDocument Information
Horizon 2020 Framework Programme Grant Agreement: 732328 FashionBrain Document Information Deliverable number: D5.3 Deliverable title: Early Demo for Trend Prediction Deliverable description: This early
More informationMetadata, Chief technicolor
Metadata, the future of home entertainment Christophe Diot Christophe Diot Chief Scientist @ technicolor 2 2011-09-26 What is a metadata? Metadata taxonomy Usage metadata Consumption (number of views,
More informationJure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah
Jure Leskovec (@jure) Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah 2 My research group at Stanford: Mining and modeling large social and information networks
More informationWeb Personalization & Recommender Systems
Web Personalization & Recommender Systems COSC 488 Slides are based on: - Bamshad Mobasher, Depaul University - Recent publications: see the last page (Reference section) Web Personalization & Recommender
More informationOracle Big Data Discovery
Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationUsing Data Mining to Determine User-Specific Movie Ratings
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,
More informationHortonworks and The Internet of Things
Hortonworks and The Internet of Things Dr. Bernhard Walter Solutions Engineer About Hortonworks Customer Momentum ~700 customers (as of November 4, 2015) 152 customers added in Q3 2015 Publicly traded
More informationSecurity analytics: From data to action Visual and analytical approaches to detecting modern adversaries
Security analytics: From data to action Visual and analytical approaches to detecting modern adversaries Chris Calvert, CISSP, CISM Director of Solutions Innovation Copyright 2013 Hewlett-Packard Development
More informationBig data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT
: Choices for high availability and disaster recovery on Microsoft Azure By Arnab Ganguly DataCAT March 2019 Contents Overview... 3 The challenge of a single-region architecture... 3 Configuration considerations...
More informationStudy and Analysis of Recommendation Systems for Location Based Social Network (LBSN)
, pp.421-426 http://dx.doi.org/10.14257/astl.2017.147.60 Study and Analysis of Recommendation Systems for Location Based Social Network (LBSN) N. Ganesh 1, K. SaiShirini 1, Ch. AlekhyaSri 1 and Venkata
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationCloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018
Cloudline Autonomous Driving Solutions Accelerating insights through a new generation of Data and Analytics October, 2018 HPE big data analytics solutions power the data-driven enterprise Secure, workload-optimized
More information