Realtime Recommendations
|
|
- Daisy Elliott
- 5 years ago
- Views:
Transcription
1 Realtime Recommendations with Redis Torben Brodt plista GmbH April 25th, 2013 NoSQL Search Roadshow
2 Introduction Torben Brodt, Head of Data Engineering computer science studies 5 years plista publication collaborative filtering evangelist for "power of algorithms plista GmbH recommendations & advertising founded in 2008, Berlin [DE] ~5k recommendations/ second
3
4 Contents 1. How to feed a recommender? 2. How to build a recommendation? 3. How to scale a recommender?
5 How to feed a recommender?
6 How to feed a recommender? to show recommendations we are integrated on the website we have URL + HTTP Headers user agent IP address -> geolocation
7 How to feed a recommender? push the data away quickly make use of data quickly RULE: be quick src
8 How to feed a recommender?
9 How to feed a recommender?
10 Technology overview Apache Lucene for Content MySQL for relational data Machine Learning Hadoop? No! It's batch + slow In Memory? Yes, stream computing Redis for Statistics Live Backup
11 How to build a recommendation?
12 How to build a recommendation? different recommender families Behavioral based on interaction between user and article Most Popular Collaborative Filtering Item to Item Content based on the articles Content Similarity Latest Item Classification
13 Most popular with welt.de/football/.html ZINCR "p:welt.de" ZREVRANGEBYSCORE p:welt.de summer_is_coming 420 plista_company Live Read + Live Write = Real Time Recommendations 135
14 p:welt.de Recap Data types 689 summer_is_coming 420 plista_company String, Lists, Set,.. Hash map between string fields and string values, very fast HINCR complexity O(1) Sorted Set ZINCR complexity: O(log(N)) where N is the number of elements in the sorted set. Allows to limit number of result: ZREVRANGEBYSCORE UNION + INTERSECT
15 Most popular with timeseries welt.de/football/.html ZINCR "p:welt.de: " ZUNION "p:welt.de: " "p:welt.de: " "p:welt.de: " ZREVRANGEBYSCORE p:welt.de: p:welt.de: p:welt.de: summer_is_coming 135 summer_is_coming plista_best_company 689 summer_is_coming plista_best_company plista_best_company 135
16 Most popular with timeseries welt.de/football/.html ZINCR "p:welt.de: " ZUNION... WEIGHTS "p:welt.de: ".. "p:welt.de: ".. "p:welt.de: ".. ZREVRANGEBYSCORE p:welt.de: p:welt.de: p:welt.de: summer_is_coming 135 summer_is_coming plista_best_company 689 summer_is_coming plista_best_company plista_best_company 135
17 Most popular with timeseries 4 2 : : : h -2h -3h -4h -5h 1-6h -7h -8h
18 Most popular to any context it's not only publisher, we use ~50 context attributes publisher = welt.de weekday = sunday summer_is_coming 420 dortmund_wins 200 plista_company context attributes: publisher weekday geolocation demographics... geolocation = dortmund dortmund_wins
19 Most popular to any context how it looks like in Redis ZUNION... WEIGHTS p:welt.de: p:welt.de: p:welt.de: w:sunday: w:sunday: w:sunday: publisher = welt.de weekday = sunday summer_is_coming 420 dortmund_wins 200 plista_company g:dortmund: g:dortmund: g:dortmund: geolocation = dortmund dortmund_wins
20 Most popular with Effect size which context has an influence? ZUNION... WEIGHTS p:welt.de: * 70% p:welt.de: * 70% p:welt.de: * 70% w:sunday: w:sunday: w:sunday: * 10% 2 * 10% 1 * 10% g:dortmund: * 30% g:dortmund: * 30% g:dortmund: * 30% Effect Size Examples: small effect: weather big effect: publisher Data with small effect should not been taken into account, otherwise we get avg results
21 SUM over.. timeseries different context previous hits of the user similar publisher knowledge Σ publisher = welt.de 689 summer_is_coming 420 plista_company 135 ZUNION... WEIGHTS p:welt.de: p:welt.de: p:welt.de: w:sunday: w:sunday: w:sunday: g:dortmund: g:dortmund: g:dortmund: redis can do it ;)
22 Even more Matrix Operations ;) Similarity Matrix Human Control Matrix Meta-learning Matrix cooperation with aided from Σ
23 More recommenders possible this was only about most popular other algorithms using redis incremental collaborative filtering article to article paths (~graph).. using external data sources
24 How to scale a recommender?
25 How to scale a recommender? Distribution to many servers 1 client to access n servers partitioning of data using hashing
26 How to scale a recommender? Distribution to many servers 1 client to access n servers partitioning of data using hashing for UNION we run into problems combined keys need to be on same server NO consistent hashing possible workaround: prefix hashing
27 How to scale a recommender? src Low Latency master/slave replication should be close to edge servers e.g. 1 redis instance per 1 webserver
28 How to scale a recommender? Application in Database LUA Support is shipped but single core process a long read blocks all writes concurrency issue src
29 How to scale a recommender? in spite of all those disadvantages Redis fits perfect for simple operations SUM + AGGREGATE + MIN + MAX In-Memory operations are pretty fast real-time features feel better in a real-time database (e.g. time series) we don't need batch
30 What else in Redis? message bus many recommenders live statistics caching "One technology to rule them all"
31 Questions? xing.com/profile/torben_brodt
SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME. Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013
SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013 1 WHO AM I? Ryan Tabora Think Big Analytics - Senior Data Engineer Lover of dachshunds,
More informationCIB Session 12th NoSQL Databases Structures
CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is
More informationRethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu
RethinkDB Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu Content Introduction System Features Data Model ReQL Applications Introduction Niharika Vithala What is a NoSQL Database Databases that
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationRedis to the Rescue? O Reilly MySQL Conference
Redis to the Rescue? O Reilly MySQL Conference 2011-04-13 Who? Tim Lossen / @tlossen Berlin, Germany backend developer at wooga Redis Intro Case 1: Monster World Case 2: Happy Hospital Discussion Redis
More informationApache Lucene 4. Robert Muir
Apache Lucene 4 Robert Muir Agenda Overview of Lucene Conclusion Resources Q & A Download of Lucene: core/ analysis/ queryparser/ highlighter/ suggest/ expressions/ join/ memory/ codecs/... core/ Lucene
More informationManaging IoT and Time Series Data with Amazon ElastiCache for Redis
Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All
More informationTechnical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved
Technical Deep Dive: Cassandra + Solr Confiden7al Business case 2 Super scalable realtime analytics Hadoop is fantastic at performing batch analytics Cassandra is an advanced column family oriented system
More informationMultimedia Streaming. Mike Zink
Multimedia Streaming Mike Zink Technical Challenges Servers (and proxy caches) storage continuous media streams, e.g.: 4000 movies * 90 minutes * 10 Mbps (DVD) = 27.0 TB 15 Mbps = 40.5 TB 36 Mbps (BluRay)=
More informationICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC
ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System Overview The current paradigm (CCL and Relational DataBase) Propose of a new monitor data system using NoSQL Monitoring Storage Requirements
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference
More informationMapReduce. Stony Brook University CSE545, Fall 2016
MapReduce Stony Brook University CSE545, Fall 2016 Classical Data Mining CPU Memory Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining
More informationData Partitioning and MapReduce
Data Partitioning and MapReduce Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies,
More informationHadoop. copyright 2011 Trainologic LTD
Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides
More informationIntroduction to BigData, Hadoop:-
Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,
More informationOleksandr Kuzomin, Bohdan Tkachenko
International Journal "Information Technologies Knowledge" Volume 9, Number 2, 2015 131 INTELLECTUAL SEARCH ENGINE OF ADEQUATE INFORMATION IN INTERNET FOR CREATING DATABASES AND KNOWLEDGE BASES Oleksandr
More informationHuge market -- essentially all high performance databases work this way
11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch
More informationPerformance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop
Performance Enhancement of Data Processing using Multiple Intelligent Cache in Hadoop K. Senthilkumar PG Scholar Department of Computer Science and Engineering SRM University, Chennai, Tamilnadu, India
More informationand data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed
5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid
More informationThe Road to a Complete Tweet Index
The Road to a Complete Tweet Index Yi Zhuang Staff Software Engineer @ Twitter Outline 1. Current Scale of Twitter Search 2. The History of Twitter Search Infra 3. Complete Tweet Index 4. Search Engine
More informationTHE FLEXIBLE DATA-STRUCTURE SERVER THAT COULD.
REDIS THE FLEXIBLE DATA-STRUCTURE SERVER THAT COULD. @_chriswhitten_ REDIS REDIS April 10, 2009; 6 years old Founding Author: Salvatore Sanfilippo Stable release: 3.0.3 / June 4, 2015; 3 months ago Fundamental
More informationAccelerator Design for Big Data Processing Frameworks
Accelerator Design for Big Data Processing Frameworks Hiroki Matsutani Dept. of ICS, Keio University http://www.arc.ics.keio.ac.jp/~matutani July 5th, 2017 International Forum on MPSoC for Software-Defined
More informationUsing space-filling curves for multidimensional
Using space-filling curves for multidimensional indexing Dr. Bisztray Dénes Senior Research Engineer 1 Nokia Solutions and Networks 2014 In medias res Performance problems with RDBMS Switch to NoSQL store
More informationRealtime visitor analysis with Couchbase and Elasticsearch
Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationTALK 1: CONVINCE YOUR BOSS: CHOOSE THE "RIGHT" DATABASE. Prof. Dr. Stefan Edlich Beuth University of Technology Berlin (App.Sc.)
TALK 1: CONVINCE YOUR BOSS: CHOOSE THE "RIGHT" DATABASE Prof. Dr. Stefan Edlich Beuth University of Technology Berlin (App.Sc.) nosqlfrankfurt.de nosql powerdays 2 years of NoSQL Consulting http://nosql-database.org
More informationREDIS: NOSQL DATA STORAGE
REDIS: NOSQL DATA STORAGE 1 Presented By- Shalini Somani (13MCEC22) CATEGORIES OF NOSQL STORAGES Key-Value memcached Redis Column Family Cassandra HBase Document MongoDB Tabular BigTable Graph, XML, Object,
More informationImplementation of Relational Operations: Other Operations
Implementation of Relational Operations: Other Operations Module 4, Lecture 2 Database Management Systems, R. Ramakrishnan 1 Simple Selections SELECT * FROM Reserves R WHERE R.rname < C% Of the form σ
More informationSpyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems
Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems Andrew W Leung Ethan L Miller University of California, Santa Cruz Minglong Shao Timothy Bisson Shankar Pasupathy NetApp 7th USENIX
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationGoal of the presentation is to give an introduction of NoSQL databases, why they are there.
1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in
More informationVoldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data
More informationFrom the event loop to the distributed system. Martyn 3rd November, 2011
From the event loop to the distributed system Martyn Loughran martyn@pusher.com @mloughran 3rd November, 2011 From the event loop to the distributed system From the event loop to the distributed system
More informationThe News Recommendation Evaluation Lab (NewsREEL) Online evaluation of recommender systems
The News Recommendation Evaluation Lab (NewsREEL) Online evaluation of recommender systems Andreas Lommatzsch TU Berlin, TEL-14, Ernst-Reuter-Platz 7, 10587 Berlin @Recommender Meetup Amsterdam (September
More informationParametric Search using In-memory Auxiliary Index
Parametric Search using In-memory Auxiliary Index Nishant Verman and Jaideep Ravela Stanford University, Stanford, CA {nishant, ravela}@stanford.edu Abstract In this paper we analyze the performance of
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationGoals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service.
Goals Memcache as a Service Tom Anderson Rapid application development - Speed of adding new features is paramount Scale Billions of users Every user on FB all the time Performance Low latency for every
More informationEvaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques Chapter 14, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections Cost depends on #qualifying
More informationLecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka
Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka What problem does Kafka solve? Provides a way to deliver updates about changes in state from one service to another
More informationEvaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques [R&G] Chapter 14, Part B CS4320 1 Using an Index for Selections Cost depends on #qualifying tuples, and clustering. Cost of finding qualifying data
More information/ Cloud Computing. Recitation 6 October 2 nd, 2018
15-319 / 15-619 Cloud Computing Recitation 6 October 2 nd, 2018 1 Overview Announcements for administrative issues Last week s reflection OLI unit 3 module 7, 8 and 9 Quiz 4 Project 2.3 This week s schedule
More informationCapabilities of Cloudant NoSQL Database IBM Corporation
Capabilities of Cloudant NoSQL Database After you complete this section, you should understand: The features of the Cloudant NoSQL Database: HTTP RESTfulAPI Secondary indexes and MapReduce Cloudant Query
More information8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara
Week 1-B-0 Week 1-B-1 CS535 BIG DATA FAQs Slides are available on the course web Wait list Term project topics PART 0. INTRODUCTION 2. DATA PROCESSING PARADIGMS FOR BIG DATA Sangmi Lee Pallickara Computer
More informationLarge-Scale Web Applications
Large-Scale Web Applications Mendel Rosenblum Web Application Architecture Web Browser Web Server / Application server Storage System HTTP Internet CS142 Lecture Notes - Intro LAN 2 Large-Scale: Scale-Out
More informationIntroduction to MapReduce (cont.)
Introduction to MapReduce (cont.) Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com USC INF 553 Foundations and Applications of Data Mining (Fall 2018) 2 MapReduce: Summary USC INF 553 Foundations
More informationApache Hadoop Goes Realtime at Facebook. Himanshu Sharma
Apache Hadoop Goes Realtime at Facebook Guide - Dr. Sunny S. Chung Presented By- Anand K Singh Himanshu Sharma Index Problem with Current Stack Apache Hadoop and Hbase Zookeeper Applications of HBase at
More informationDeveloping Microsoft Azure Solutions (70-532) Syllabus
Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data
More informationVirtual views. Incremental View Maintenance. View maintenance. Materialized views. Review of bag algebra. Bag algebra operators (slide 1)
Virtual views Incremental View Maintenance CPS 296.1 Topics in Database Systems A view is defined by a query over base tables Example: CREATE VIEW V AS SELECT FROM R, S WHERE ; A view can be queried just
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationAnnouncements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems
Announcements CompSci 516 Database Systems Lecture 12 - and Spark Practice midterm posted on sakai First prepare and then attempt! Midterm next Wednesday 10/11 in class Closed book/notes, no electronic
More informationThe NoSQL Ecosystem. Adam Marcus MIT CSAIL
The NoSQL Ecosystem Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in The Architecture of Open Source Applications
More informationThe Stream Processor as a Database. Ufuk
The Stream Processor as a Database Ufuk Celebi @iamuce Realtime Counts and Aggregates The (Classic) Use Case 2 (Real-)Time Series Statistics Stream of Events Real-time Statistics 3 The Architecture collect
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationEvaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques Chapter 12, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections v Cost depends on #qualifying
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationPNDA.io: when BGP meets Big-Data
PNDA.io: when BGP meets Big-Data Let s go back in time 26 th April 2017 The Internet is very much alive Millions of BGP events occurring every day 15 Routers Monitored 410 active peers (both IPv4 and IPv6)
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationDevelopment and Evaluation of a Highly Scalable News Recommender System
Development and Evaluation of a Highly Scalable News Recommender System Ilya Verbitskiy 1, Patrick Probst 2, and Andreas Lommatzsch 3 1 Complex and Distributed IT Systems, CIT Technische Universität Berlin,
More informationDistributed Computation Models
Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017
Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 About the Presentation Problems Existing Solutions Denis Magda
More informationGridGain and Apache Ignite In-Memory Performance with Durability of Disk
GridGain and Apache Ignite In-Memory Performance with Durability of Disk Dmitriy Setrakyan Apache Ignite PMC GridGain Founder & CPO http://ignite.apache.org #apacheignite Agenda What is GridGain and Ignite
More informationFiltering 100M objects What can go wrong?
Filtering 100M objects What can go wrong? Alexey Ragozin alexey.ragozin@gmail.com Dec 2013 Problem description 100M object (50M was tested) ~ 100 fields per object ~ 1kb per object (ProtoBuf binary format)
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationOracle Big Data. A NA LYT ICS A ND MA NAG E MENT.
Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem
More informationIndexing Web pages. Web Search: Indexing Web Pages. Indexing the link structure. checkpoint URL s. Connectivity Server: Node table
Indexing Web pages Web Search: Indexing Web Pages CPS 296.1 Topics in Database Systems Indexing the link structure AltaVista Connectivity Server case study Bharat et al., The Fast Access to Linkage Information
More informationData pipelines with PostgreSQL & Kafka
Data pipelines with PostgreSQL & Kafka Oskari Saarenmaa PostgresConf US 2018 - Jersey City Agenda 1. Introduction 2. Data pipelines, old and new 3. Apache Kafka 4. Sample data pipeline with Kafka & PostgreSQL
More informationDevelopment of a News Recommender System based on Apache Flink
Development of a News Recommender System based on Apache Flink Alexandru Ciobanu 1 and Andreas Lommatzsch 2 1 Technische Universität Berlin Straße des 17. Juni, D-10625 Berlin, Germany alexandru.ciobanu@campus.tu-berlin.de
More informationDesktop Crawls. Document Feeds. Document Feeds. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Web crawlers Retrieving web pages Crawling the web» Desktop crawlers» Document feeds File conversion Storing the documents Removing noise Desktop Crawls! Used
More informationDeveloping Microsoft Azure Solutions (70-532) Syllabus
Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages
More informationIntroduction Storage Processing Monitoring Review. Scaling at Showyou. Operations. September 26, 2011
Scaling at Showyou Operations September 26, 2011 I m Kyle Kingsbury Handle aphyr Code http://github.com/aphyr Email kyle@remixation.com Focus Backend, API, ops What the hell is Showyou? Nontrivial complexity
More informationDatabases and Big Data Today. CS634 Class 22
Databases and Big Data Today CS634 Class 22 Current types of Databases SQL using relational tables: still very important! NoSQL, i.e., not using relational tables: term NoSQL popular since about 2007.
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationA Technical Discussion about Search Solutions
A Technical Discussion about Search Solutions By Alejandro Perez, Software Architect, Flaptor, Inc. In this paper we analyze several important aspects involved in defining search requirements, and discuss
More informationArchitectural challenges for building a low latency, scalable multi-tenant data warehouse
Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics
More informationCertified Big Data Hadoop and Spark Scala Course Curriculum
Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills
More informationBIS Database Management Systems.
BIS 512 - Database Management Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query
More informationMIS Database Systems.
MIS 335 - Database Systems http://www.mis.boun.edu.tr/durahim/ Ahmet Onur Durahim Learning Objectives Database systems concepts Designing and implementing a database application Life of a Query in a Database
More informationBig Data on AWS. Peter-Mark Verwoerd Solutions Architect
Big Data on AWS Peter-Mark Verwoerd Solutions Architect What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing
More informationVirtual IMS user group: Newsletter 57
: Newsletter 57 Welcome to the newsletter. The at www.fundi.com/virtualims is an independently-operated vendor-neutral site run by and for the IMS user community. presentation The latest webinar from the
More informationBoolean Retrieval. Manning, Raghavan and Schütze, Chapter 1. Daniël de Kok
Boolean Retrieval Manning, Raghavan and Schütze, Chapter 1 Daniël de Kok Boolean query model Pose a query as a boolean query: Terms Operations: AND, OR, NOT Example: Brutus AND Caesar AND NOT Calpuria
More informationCompSci 516: Database Systems
CompSci 516 Database Systems Lecture 12 Map-Reduce and Spark Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements Practice midterm posted on sakai First prepare and
More informationQuestion 1. Question 2
CS144 Practice Problems For Final Fall 2010 Note: These problems cover a subset of the material that we expect you to be familiar with for the final. In particular, these problems primarily cover material
More informationVroom: Accelerating the Mobile Web with Server-Aided Dependency Resolution
Vroom: Accelerating the Mobile Web with Server-Aided Dependency Resolution Vaspol Ruamviboonsuk1, Ravi Netravali2, Muhammed Uluyol1, Harsha V. Madhyastha1 1 University of Michigan, 2MIT 1 Mobile Web Dominant...
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationUS Patent 6,658,423. William Pugh
US Patent 6,658,423 William Pugh Detecting duplicate and near - duplicate files Worked on this problem at Google in summer of 2000 I have no information whether this is currently being used I know that
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 CS 347 Notes 12 5 Web Search Engine Crawling
More informationCS 347 Parallel and Distributed Data Processing
CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 12: Distributed Information Retrieval CS 347 Notes 12 2 CS 347 Notes 12 3 CS 347 Notes 12 4 Web Search Engine Crawling Indexing Computing
More informationDATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016
DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016 AGENDA FOR TODAY Advanced Mysql More than just SELECT Creating tables MySQL optimizations: Storage engines, indexing.
More informationTime Series Storage with Apache Kudu (incubating)
Time Series Storage with Apache Kudu (incubating) Dan Burkert (Committer) dan@cloudera.com @danburkert Tweet about this talk: @getkudu or #kudu 1 Time Series machine metrics event logs sensor telemetry
More informationCMSC 461 Final Exam Study Guide
CMSC 461 Final Exam Study Guide Study Guide Key Symbol Significance * High likelihood it will be on the final + Expected to have deep knowledge of can convey knowledge by working through an example problem
More informationCarnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#14(b): Implementation of Relational Operations Administrivia HW4 is due today. HW5 is out. Faloutsos/Pavlo
More informationMining Data Streams. From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records.
DATA STREAMS MINING Mining Data Streams From Data-Streams Management System Queries to Knowledge Discovery from continuous and fast-evolving Data Records. Hammad Haleem Xavier Plantaz APPLICATIONS Sensors
More informationInforma)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies
Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:
More informationMapReduce for Data Warehouses 1/25
MapReduce for Data Warehouses 1/25 Data Warehouses: Hadoop and Relational Databases In an enterprise setting, a data warehouse serves as a vast repository of data, holding everything from sales transactions
More informationScaling Without Sharding. Baron Schwartz Percona Inc Surge 2010
Scaling Without Sharding Baron Schwartz Percona Inc Surge 2010 Web Scale!!!! http://www.xtranormal.com/watch/6995033/ A Sharding Thought Experiment 64 shards per proxy [1] 1 TB of data storage per node
More informationS Y B Voc Software Development Syllabus
S Y B Voc Software Development Syllabus Course Level Job Roles Course Name: Pattern: Examination Pattern: Eligibility: Medium of Instruction: NSQF Level-VI 1. Jr. Software Developer 2. Trainer Software
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationPresented by Nanditha Thinderu
Presented by Nanditha Thinderu Enterprise systems are highly distributed and heterogeneous which makes administration a complex task Application Performance Management tools developed to retrieve information
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More information