Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search. Toan Vinh Luu, PhD Senior Search Engineer local.

Size: px
Start display at page:

Download "Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search. Toan Vinh Luu, PhD Senior Search Engineer local."

Transcription

1 Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search Toan Vinh Luu, PhD Senior Search Engineer local.ch AG

2 In this talk Autosuggestion feature Autosuggestion architecture Evaluation

3 local.ch Local search engine in Switzerland (web, mobile) Each month: > 4 millions unique users > 8 millions queries on mobile (ios, android, ) Users search for: Services (e.g restaurant zurich ) Resident information (e.g peter meier ) Phone number (e.g ) Addresses, point of interest...

4 Why autosuggestion is important? User taps on the phone 8 times instead of 34 times to get to the result list when searching for Electric installation Wallisellen

5 What should we suggest to user?

6 Popular data suggestion

7 Popular queries suggestion mc donalds has less entries than muller but is queried >10x >2000 queries/month for cablecom which have only 1 entry

8 Query history suggestion 9% mobile queries are historical queries. 38% users search by a query in the past

9 Spellchecker suggestion > mistakes per month on mobile (9%)

10 Detail entry suggestion

11 Special information suggestion

12 Autosuggestion Architecture Autosuggest API/Search API SuggestData component Spellchecker component Popular query component Query history component Index Index Index Index Local.ch Database Popular query processor Index Query log

13 How do we process popular queries Popular is just not high frequency! User s language 4 languages are used in Switzerland. Fail if we suggest bäckerei for a French speaking user Location Fail if we suggest a hospital in Zurich for an user in Geneva Misspell Fail if we suggest zürich and züruch Unique users Fail if we suggest toan just because I searched my name thousands of times Blacklist Fail if we suggest f**k, pe**is

14 Popular query processor Preprocessing query log: Text normalization, stopword, blacklist, keep only queries return results A query log item in elasticsearch index { "q": "restaurant", "language": "de", "lon": , "lat": , "datetime": " :10:07, "user": eeaad0c09abc41676c1c

15 Find candidate popular queries for each language { "query" : {, "query_string" : { aggs" : { "q" : { "query" : "language:%s AND date:[%s TO %s] AND -q.untouched:/[0].*/ % (language, fromdate, todate) "terms" : { "field" : "q.untouched", "size" : TOP_POPULAR

16 Find number of unique users given a query { "query" : { "query_string" : { "query" : "q.untouched:%s AND date:[%s TO %s] % (query, fromdate, todate), "aggs": { "num_users": { "cardinality": { "field": "user"

17 Bounding box to limit popular queries given location % Popular query: Chuv (Centre Hospitalier Universitaire Vaudois)

18 Histogram of query chuv based on freq, longitude and latitude

19 46.52, , ,6.64

20 Percentiles aggregation to find min, max value of querying location "query" : { "match" : {"q" : {"query" : chuv, "aggs" : { "lat_outlier" : { "percentiles" : { "field" : "lat", "percents" : [5, 95], "lon_outlier" : { "percentiles" : { "field" : "lon", "percents" : [5, 95]

21 Popular query stored in Solr index { "q": "chuv", "lang": ["de,"fr, "en ], "users": 7435, "min_lat": , "max_lon": , "max_lat": , "min_lon": , "freq": 9524

22 Solr request to suggest popular query q:ch* lang:en users: [100 TO *] min_lat:[* TO " + user_lat + "] min_lon:[* TO " + user_lon + "] max_lat:[" + user_lat + " TO *] max_lon:[" + user_lon + " TO *] & sort=freq desc

23 Evaluation Several metrics are used to evaluate autosuggestion feature Number of typed characters to get to result list Average length of input: 10.0 chars Average length of clicked suggestion: 15.4 chars Number of clicks on suggested items Average rank of clicked item

24 Number of clicks on suggested items since new feature release Release date

25 2.5 Average rank of clicked item Release new query suggestion 0

26 Conclusion We can combine 2 search frameworks to bring better search experience to user: Solr is efficient for querying, faceting and caching Elasticsearch is efficient for big data aggregation and query log storing

27 Contact information Search team at local.ch We are hiring a search engineer! Contact: yannick.suter@localsearch.ch

Deep dive into analytics using Aggregation. Boaz

Deep dive into analytics using Aggregation. Boaz Deep dive into analytics using Aggregation Boaz Leskes @bleskes Elasticsearch an end-to-end search and analytics platform. full text search highlighted search snippets search-as-you-type did-you-mean suggestions

More information

Turbocharge your MySQL analytics with ElasticSearch. Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe 2017

Turbocharge your MySQL analytics with ElasticSearch. Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe 2017 Turbocharge your MySQL analytics with ElasticSearch Guillaume Lefranc Data & Infrastructure Architect, Productsup GmbH Percona Live Europe 2017 About the Speaker Guillaume Lefranc Data Architect at Productsup

More information

Getting Started with Milestone 2. From Lat Lon, to Cartesian, and back again

Getting Started with Milestone 2. From Lat Lon, to Cartesian, and back again Getting Started with Milestone 2 From Lat Lon, to Cartesian, and back again Initial Steps 1. Download m2 handout 2. Follow the walkthrough in Section 4 3. Read the EZGL QuickStart Guide 4. Modify main.cpp

More information

Jans Aasman, Ph.D. CEO Franz Inc GeoSpatial with AllegroGraph

Jans Aasman, Ph.D. CEO Franz Inc GeoSpatial with AllegroGraph Jans Aasman, Ph.D. CEO Franz Inc Ja@Franz.com GeoSpatial with AllegroGraph Why the RDF community needs GeoTemporal reasoning capabilities Most of the Semantic Web projects are about People and their relationships

More information

New features in Elasticsearch 1.0

New features in Elasticsearch 1.0 New features in Elasticsearch 1.0 @lucacavanna what is elasticsearch? RESTful analytics document oriented schema-free search Lucene open source real-time distributed JSON Copyright Elasticsearch 2014.

More information

E l a s t i c s e a r c h F e a t u r e s. Contents

E l a s t i c s e a r c h F e a t u r e s. Contents Elasticsearch Features A n Overview Contents Introduction... 2 Location Based Search... 2 Search Social Media(Twitter) data from Elasticsearch... 4 Query Boosting in Elasticsearch... 4 Machine Learning

More information

MetaCarta GeoSearch Toolkit for Solr James Goodwin Principal Engineer, Nokia

MetaCarta GeoSearch Toolkit for Solr James Goodwin Principal Engineer, Nokia MetaCarta GeoSearch Toolkit for Solr James Goodwin Principal Engineer, Nokia 2010 Nokia Overview Introduction to MetaCarta About Nokia MetaCarta Geographic Search Defining GeoSearch Functionality for Solr

More information

Preference Elicitation for Single Crossing Domain

Preference Elicitation for Single Crossing Domain Preference Elicitation for Single Crossing Domain joint work with Neeldhara Misra (IIT Gandhinagar) March 6, 2017 Appeared in IJCAI 2016 Motivation for Preference Elicitation One often wants to learn how

More information

Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć sematext.com

Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć  sematext.com Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć Sematext International @kucrafal @sematext sematext.com Who Am I Solr 3.1 Cookbook author (4.0 inc) Sematext consultant & engineer Solr.pl

More information

GeoTemporal Reasoning in a Web 3.0 World

GeoTemporal Reasoning in a Web 3.0 World GeoTemporal Reasoning in a Web 3.0 World (or the joy of having a spatial database in an RDF Triple Store) Jans Aasman Franz Inc. www.franz.com This talk What do people do with an RDF Database How to combine

More information

Specifications of the EMSC testimony s Service

Specifications of the EMSC testimony s Service Specifications of the EMSC testimony s Service Version 1.0 Status Final Authors Matthieu Landès (EMSC) Dissemination level Public Related project EPOS, Grant agreement n 676564, WP 8.5 Keywords EPOS, Individual

More information

Package rmapzen. October 7, 2018

Package rmapzen. October 7, 2018 Package rmapzen October 7, 2018 Type Package Title Client for 'Mapzen' and Related Map APIs Version 0.4.1 Maintainer Tarak Shah Provides an interface to 'Mapzen'-based APIs (including

More information

HOW BUILDING OUR OWN E-COMMERCE SEARCH CHANGED OUR APPROACH TO SEARCH QUALITY. 2018// Berlin

HOW BUILDING OUR OWN E-COMMERCE SEARCH CHANGED OUR APPROACH TO SEARCH QUALITY. 2018// Berlin HOW BUILDING OUR OWN E-COMMERCE SEARCH CHANGED OUR APPROACH TO SEARCH QUALITY 13.06.18 1 Our Search Team @otto.de Search Team in 2017 Christine Bellstedt Business Designer Search www.otto.de Search Quality

More information

Scaling Spatial Data OpenStreetMap as Infrastructure

Scaling Spatial Data OpenStreetMap as Infrastructure Scaling Spatial Data OpenStreetMap as Infrastructure Sajjad Anwar @geohacker The Fifth Elephant July 2014, Bangalore 2204118 INSANELY SUCCESSFUL 4110099712 COMPLEX DATA 2.7 billion Nodes 263 million

More information

12 June 2015, Free University of Bozen-Bolzano. Building Large Scale Recommender Systems. Omar Moling

12 June 2015, Free University of Bozen-Bolzano. Building Large Scale Recommender Systems. Omar Moling 12 June 2015, Free University of Bozen-Bolzano Building Large Scale Recommender Systems Omar Moling AGENDA Intro Data Algorithms Systems 42MATTERS FACTS Founded in 2011 in Zurich, Switzerland Very strong

More information

Big Data on Big Maps. Displaying Vast Amounts of Geospatial Data

Big Data on Big Maps. Displaying Vast Amounts of Geospatial Data Big Data on Big Maps Displaying Vast Amounts of Geospatial Data Roberto Mercado Héctor Alejandro Saucedo Briseño LJ Qian Oracle Spatial and Graph October 4, 2017 3 Safe Harbor Statement The following is

More information

EPiServer Find Advanced Session. Patrick van Kleef Mari Jørgensen

EPiServer Find Advanced Session. Patrick van Kleef Mari Jørgensen EPiServer Find Advanced Session Patrick van Kleef Mari Jørgensen Introduction Patrick van Kleef Macaw EPiServer experience EPiServer MVP Blogs Presentations Forum www.patrickvankleef.com Agenda Unified

More information

Side by Side with Solr and Elasticsearch

Side by Side with Solr and Elasticsearch Side by Side with Solr and Elasticsearch Rafał Kuć Radu Gheorghe Rafał Logsene Radu Logsene Overview Agenda documents documents schema mapping queries searches searches index&store index&store aggregations

More information

Chapter 38 Map-Reduce Meets GIS

Chapter 38 Map-Reduce Meets GIS Chapter 38 Map-Reduce Meets GIS Part I. Preliminaries Part II. Tightly Coupled Multicore Part III. Loosely Coupled Cluster Part IV. GPU Acceleration Part V. Big Data Chapter 35. Basic Map-Reduce Chapter

More information

ELCA GROUP. Annual report 2016

ELCA GROUP. Annual report 2016 ELCA GROUP Annual report 2016 I am extremely proud of the organization and the outstanding colleagues working for ELCA. They turn our motto, We make it work. into reality every day in every project and

More information

Dynatrace FastPack for Liferay DXP

Dynatrace FastPack for Liferay DXP Dynatrace FastPack for Liferay DXP The Dynatrace FastPack for Liferay Digital Experience Platform provides a preconfigured Dynatrace profile custom tailored to Liferay DXP environments. This FastPack contains

More information

3 The standard grid. N ode(0.0001,0.0004) Longitude

3 The standard grid. N ode(0.0001,0.0004) Longitude International Conference on Information Science and Computer Applications (ISCA 2013 Research on Map Matching Algorithm Based on Nine-rectangle Grid Li Cai1,a, Bingyu Zhu2,b 1 2 School of Software, Yunnan

More information

Parallel SQL and Streaming Expressions in Apache Solr 6. Shalin Shekhar Lucidworks Inc.

Parallel SQL and Streaming Expressions in Apache Solr 6. Shalin Shekhar Lucidworks Inc. Parallel SQL and Streaming Expressions in Apache Solr 6 Shalin Shekhar Mangar @shalinmangar Lucidworks Inc. Introduction Shalin Shekhar Mangar Lucene/Solr Committer PMC Member Senior Solr Consultant with

More information

Photoscenery for Realistic Scene Generation and Visualization in Flightgear: A Tutorial

Photoscenery for Realistic Scene Generation and Visualization in Flightgear: A Tutorial Photoscenery for Realistic Scene Generation and Visualization in Flightgear: A Tutorial Srikanth A 1, Indhu B 2, L Krishnamurthy 1, VPS Naidu 3 Dept. of Mechanical Engineering, NIE, Mysore, India 1 Dept.

More information

Android project proposals

Android project proposals Android project proposals Luca Bedogni, Federico Montori 13 April 2018 Abstract In this document, we describe three possible projects for the exam of Laboratorio di applicazioni mobili course. Each student

More information

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li

DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A

More information

Large scale corporate Web Analysis for Business Intelligence

Large scale corporate Web Analysis for Business Intelligence Industrial Clusters in England Large scale corporate Web Analysis for Business Intelligence Michele Barbera, Andrey Bratus, Nicola Sambin {barbera,bratus,sambin}@spaziodati.eu 29 April, 2016 25 Software

More information

Alexander Barth, Aida Alvera-Azcárate, Mohamed Ouberdous, Charles Troupin, Sylvain Watelet & Jean-Marie Beckers

Alexander Barth, Aida Alvera-Azcárate, Mohamed Ouberdous, Charles Troupin, Sylvain Watelet & Jean-Marie Beckers Diva workshop 2014 Diva in 4 dimensions (GODIVA) Alexander Barth, Aida Alvera-Azcárate, Mohamed Ouberdous, Charles Troupin, Sylvain Watelet & Jean-Marie Beckers Acknowledgements: SeaDataNet, EMODnet Chemistry,

More information

storing, retrieving and analysing marine ecosystem data of space. and Jan Erik Stiansen

storing, retrieving and analysing marine ecosystem data of space. and Jan Erik Stiansen A framework for storing, retrieving and analysing marine ecosystem data of different origin with variable scale and distribution in time and space. Trond Westgård Geir Odd Johansen Cecilie Kvamme Bjørn

More information

DHIS 2 Android User Manual 2.22

DHIS 2 Android User Manual 2.22 DHIS 2 Android User Manual 2.22 2006-2016 DHIS2 Documentation Team Revision 1925 Version 2.22 2016-11-23 11:33:56 Warranty: THIS DOCUMENT IS PROVIDED BY THE AUTHORS ''AS IS'' AND ANY EXPRESS OR IMPLIED

More information

Greenplum SQL Class Outline

Greenplum SQL Class Outline Greenplum SQL Class Outline The Basics of Greenplum SQL Introduction SELECT * (All Columns) in a Table Fully Qualifying a Database, Schema and Table SELECT Specific Columns in a Table Commas in the Front

More information

The Road to a Complete Tweet Index

The Road to a Complete Tweet Index The Road to a Complete Tweet Index Yi Zhuang Staff Software Engineer @ Twitter Outline 1. Current Scale of Twitter Search 2. The History of Twitter Search Infra 3. Complete Tweet Index 4. Search Engine

More information

Harvard Hypermap: An Open Source Framework for Making the World's Geospatial Information more Accessible

Harvard Hypermap: An Open Source Framework for Making the World's Geospatial Information more Accessible American Association of Geographers Boston, Massachusetts April, 2017 Harvard Hypermap: An Open Source Framework for Making the World's Geospatial Information more Accessible Benjamin Lewis, Paolo Corti,

More information

The AgMIP GEOSHARE: A GEOSHARE Tool for Aggregating Outputs from the AgMIP s Global Gridded Crop Modeling Initiative (Ag-GRID) User s Manual

The AgMIP GEOSHARE: A GEOSHARE Tool for Aggregating Outputs from the AgMIP s Global Gridded Crop Modeling Initiative (Ag-GRID) User s Manual The AgMIP Tool @ GEOSHARE: A GEOSHARE Tool for Aggregating Outputs from the AgMIP s Global Gridded Crop Modeling Initiative (Ag-GRID) User s Manual November 4, 2014 Users of the Ag-GRID data obtained through

More information

Amusing algorithms and data-structures that power Lucene and Elasticsearch. Adrien Grand

Amusing algorithms and data-structures that power Lucene and Elasticsearch. Adrien Grand Amusing algorithms and data-structures that power Lucene and Elasticsearch Adrien Grand Agenda conjunctions regexp queries numeric doc values compression cardinality aggregation How are conjunctions implemented?

More information

Lesson 14 - Activity 1

Lesson 14 - Activity 1 13 Lesson 14 - Activity 1 / Term 1: Lesson 14 Coding Activity 1 Test if an integer is not between 5 and 76 inclusive. Sample Run 1 Enter a number: 7 False Sample Run 2 Enter a number: 1 True / class Lesson_14_Activity_One

More information

Head-N-Tail Analysis to Increase Engagement Amrit Sarkar, Search Engineer, Lucidworks Inc

Head-N-Tail Analysis to Increase Engagement Amrit Sarkar, Search Engineer, Lucidworks Inc Head-N-Tail Analysis to Increase Engagement Amrit Sarkar, Search Engineer, Lucidworks Inc Who are we? Based in San Francisco Offices in Cambridge, Bangalore, Bangkok, New York City, Raleigh, Munich Over

More information

CHARACTER(LEN=11) shiptempfile! rawinsonde filename. CHARACTER(LEN=11) tempdropfile! Dropwindsode filename. CHARACTER(LEN=11) tempfile

CHARACTER(LEN=11) shiptempfile! rawinsonde filename. CHARACTER(LEN=11) tempdropfile! Dropwindsode filename. CHARACTER(LEN=11) tempfile ################################################################## ################################################################## ###### ###### ###### ingest_upperair.f09 ###### ###### ###### ######

More information

Using ElasticSearch to Enable Stronger Query Support in Cassandra

Using ElasticSearch to Enable Stronger Query Support in Cassandra Using ElasticSearch to Enable Stronger Query Support in Cassandra www.impetus.com Introduction Relational Databases have been in use for decades, but with the advent of big data, there is a need to use

More information

Day 4 Percentiles and Box and Whisker.notebook. April 20, 2018

Day 4 Percentiles and Box and Whisker.notebook. April 20, 2018 Day 4 Box & Whisker Plots and Percentiles In a previous lesson, we learned that the median divides a set a data into 2 equal parts. Sometimes it is necessary to divide the data into smaller more precise

More information

Fast and Efficient A/B Testing Analysis with Shiny and SQL. Charlie Thompson Storyblocks

Fast and Efficient A/B Testing Analysis with Shiny and SQL. Charlie Thompson Storyblocks Fast and Efficient A/B Testing Analysis with Shiny and SQL Charlie Thompson Storyblocks A/B Testing at Storyblocks Our search page for stock video Related Search cards test Related Search cards test Test

More information

Kismet Mobile Client

Kismet Mobile Client Kismet Mobile Client Robert Bauer I. ABSTRACT The goal of this project is to create a Kismet client which could be run from a mobile device, such as an ipod, connect to a Kismet server, and monitor and

More information

Crime statistics mobile application

Crime statistics mobile application COMP90055 : Computing Project 25 points Crime statistics mobile application Software development project Jose Ricardo Buitron de la Vega - 595356 Supervisor: Rui Zhang Applications Crime prevention The

More information

SLA Compliance Assurance. Splunk.conf 2013

SLA Compliance Assurance. Splunk.conf 2013 SLA Compliance Assurance Charles Wheelus Senior Data Scientist, Cequint Splunk.conf 2013 October 2nd, 2013 1 About me: Charles Wheelus, MSCS 2 About me: Charles Wheelus, MSCS Senior Data Scientist, Cequint

More information

SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds

SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance Thresholds Erich Schubert, Michael Weiler, Hans-Peter Kriegel! Institute of Informatics Database Systems Group

More information

Road to Auto Scaling

Road to Auto Scaling Road to Auto Scaling Varun Thacker Lucidworks Apache Lucene/Solr Committer, and PMC member Agenda APIs Metrics Recipes Auto-Scale Triggers SolrCloud Overview ZooKee per Lots Shard 1 Leader Shard 3 Replica

More information

WHY AND HOW TO LEVERAGE THE POWER AND SIMPLICITY OF SQL ON APACHE FLINK - FABIAN HUESKE, SOFTWARE ENGINEER

WHY AND HOW TO LEVERAGE THE POWER AND SIMPLICITY OF SQL ON APACHE FLINK - FABIAN HUESKE, SOFTWARE ENGINEER WHY AND HOW TO LEVERAGE THE POWER AND SIMPLICITY OF SQL ON APACHE FLINK - FABIAN HUESKE, SOFTWARE ENGINEER ABOUT ME Apache Flink PMC member & ASF member Contributing since day 1 at TU Berlin Focusing on

More information

Presented by: Megan Bishop & Courtney Valentine

Presented by: Megan Bishop & Courtney Valentine Presented by: Megan Bishop & Courtney Valentine Early navigators relied on landmarks, major constellations, and the sun s position in the sky to determine latitude and longitude Now we have location- based

More information

Spacetraking CHUV centre hospitalier universitaire vaudois 2017 Sergio Jacomella

Spacetraking CHUV centre hospitalier universitaire vaudois 2017 Sergio Jacomella Spacetraking CHUV centre hospitalier universitaire vaudois 2017 Sergio Jacomella Condeco Global Reach With offices all over the world and a network of partners, Condeco is able to deploy solutions wherever

More information

Migrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring

Migrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring Migrating massive monitoring to Bigtable without downtime Martin Parm, Infrastructure Engineer for Monitoring This is a big deal. -- Nicholas Harteau/VP, Engineering & Infrastructure https://news.spotify.com/dk/2016/02/23/announcing-spotify-infrastructures-googley-future/

More information

@InfluxDB. David Norton 1 / 69

@InfluxDB. David Norton  1 / 69 @InfluxDB David Norton (@dgnorton) david@influxdb.com 1 / 69 Instrumenting a Data Center 2 / 69 3 / 69 4 / 69 The problem: Efficiently monitor hundreds or thousands of servers 5 / 69 The solution: Automate

More information

Professional Data - Wrestling Techniques Using Elasticsearch's Aggregation Framework. Mark 18/6/2015

Professional Data - Wrestling Techniques Using Elasticsearch's Aggregation Framework. Mark 18/6/2015 Professional Data - Wrestling Techniques Using Elasticsearch's Aggregation Framework Mark Harwood @elasticmark 18/6/2015 Some brief background How search moved into analytics 2 Search interface 1.0 search

More information

Assignment 5: SQL II Solution

Assignment 5: SQL II Solution Data Modelling and Databases Exercise dates: March 29/March 30, 2018 Ce Zhang, Gustavo Alonso Last update: April 12, 2018 Spring Semester 2018 Head TA: Ingo Müller Assignment 5: SQL II Solution This assignment

More information

Back to the

Back to the Back to the future : SQL 92 for Elasticsearch? @LucianPrecup @nosqlmatters#nosql14 2014-09-04 whoami CTO of Adelean (http://adelean.com/, http://www.elasticsearch.com/about/partners/) Integrate search,

More information

New Data Architectures For Netflow Analytics NANOG 74. Fangjin Yang - Imply

New Data Architectures For Netflow Analytics NANOG 74. Fangjin Yang - Imply New Data Architectures For Netflow Analytics NANOG 74 Fangjin Yang - Cofounder @ Imply The Problem Comparing technologies Overview Operational analytic databases Try this at home The Problem Netflow data

More information

Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area

Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area Ariel Rabkin Princeton University asrabkin@cs.princeton.edu Work done with Matvey Arye, Siddhartha Sen, Vivek S. Pai, and

More information

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch Nick Pentreath Nov / 14 / 16 Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning

More information

Search and Time Series Databases

Search and Time Series Databases Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria

More information

Mobile Phone Monitoring System For Android Operating System

Mobile Phone Monitoring System For Android Operating System Mobile Phone Monitoring System For Android Operating System Ms.M.Kalpana Devi Asst.Professor,SRIT,CBE Ms.D.Vasuki Final Year M.C.A.,Student Abstract The Purpose of the project is to trace out the status

More information

Efficient and Scalable Friend Recommendations

Efficient and Scalable Friend Recommendations Efficient and Scalable Friend Recommendations Comparing Traditional and Graph-Processing Approaches Nicholas Tietz Software Engineer at GraphSQL nicholas@graphsql.com January 13, 2014 1 Introduction 2

More information

Historical Clicks for Product Search: GESIS at CLEF LL4IR 2015

Historical Clicks for Product Search: GESIS at CLEF LL4IR 2015 Historical Clicks for Product Search: GESIS at CLEF LL4IR 2015 Philipp Schaer 1 and Narges Tavakolpoursaleh 12 1 GESIS Leibniz Institute for the Social Sciences, 50669 Cologne, Germany firstname.lastname@gesis.org

More information

Acquiring and Processing NREL Wind Prospector Data. Steven Wallace, Old Saw Consulting, 27 Sep 2016

Acquiring and Processing NREL Wind Prospector Data. Steven Wallace, Old Saw Consulting, 27 Sep 2016 Acquiring and Processing NREL Wind Prospector Data Steven Wallace, Old Saw Consulting, 27 Sep 2016 NREL Wind Prospector Interactive web page for viewing and querying wind data Over 40,000 sites in the

More information

Who are we anyway? Adam Erickson. Jeff Tomlinson. aether. Senior Drupal Engineer - Hockey fanatic - Youth hockey coach

Who are we anyway? Adam Erickson. Jeff Tomlinson. aether. Senior Drupal Engineer - Hockey fanatic - Youth hockey coach Who are we anyway? Adam Erickson Senior Drupal Engineer - Hockey fanatic - Youth hockey coach Jeff Tomlinson Architect - Generalist - Beer geek americkson @atomickson aether We re Four Kitchens We build

More information

732A54 - Big Data Analytics Lab compendium

732A54 - Big Data Analytics Lab compendium Description and Aim 732A54 - Big Data Analytics Lab compendium (Spark and Spark SQL) In the lab exercises you will work with the historical meteorological data from the Swedish Meteorological and Hydrological

More information

DATA CUBE : A RELATIONAL AGGREGATION OPERATOR GENERALIZING GROUP-BY, CROSS-TAB AND SUB-TOTALS SNEHA REDDY BEZAWADA CMPT 843

DATA CUBE : A RELATIONAL AGGREGATION OPERATOR GENERALIZING GROUP-BY, CROSS-TAB AND SUB-TOTALS SNEHA REDDY BEZAWADA CMPT 843 DATA CUBE : A RELATIONAL AGGREGATION OPERATOR GENERALIZING GROUP-BY, CROSS-TAB AND SUB-TOTALS SNEHA REDDY BEZAWADA CMPT 843 WHAT IS A DATA CUBE? The Data Cube or Cube operator produces N-dimensional answers

More information

Advanced Geolocation for the Mobile Web. Andy

Advanced Geolocation for the Mobile Web. Andy Advanced Geolocation for the Mobile Web Andy Gup, @agup How to get a good location Challenges Solutions Smartphone/Tablet GPS Built for consumer use-cases Accuracy only needs to be good enough Tiny antenna

More information

vesseltracker.com Vesseltracker API 2010 vesseltracker.com

vesseltracker.com Vesseltracker API 2010 vesseltracker.com vesseltracker.com Vesseltracker API Inhalt Vesseltracker API Inhalt 2 Introduction 3 Map Implementation 3 Map for... a single Vessel 3 Map for... a list of Vessels 4 Map for... a region 5 7 Vessel... Information

More information

Improving Drupal search experience with Apache Solr and Elasticsearch

Improving Drupal search experience with Apache Solr and Elasticsearch Improving Drupal search experience with Apache Solr and Elasticsearch Milos Pumpalovic Web Front-end Developer Gene Mohr Web Back-end Developer About Us Milos Pumpalovic Front End Developer Drupal theming

More information

DHIS 2 Android User Manual 2.23

DHIS 2 Android User Manual 2.23 DHIS 2 Android User Manual 2.23 2006-2016 DHIS2 Documentation Team Revision 2174 2016-11-23 11:23:21 Version 2.23 Warranty: THIS DOCUMENT IS PROVIDED BY THE AUTHORS ''AS IS'' AND ANY EXPRESS OR IMPLIED

More information

SAS/GRAPH and ANNOTATE Facility More Than Just a Bunch of Labels and Lines

SAS/GRAPH and ANNOTATE Facility More Than Just a Bunch of Labels and Lines 2015 Paper AD-48 SAS/GRAPH and ANNOTATE Facility More Than Just a Bunch of Labels and Lines Mike Hunsucker, 14th Weather Squadron (USAF), Asheville, NC ABSTRACT SAS/GRAPH procedures enhanced with the ANNOTATE

More information

GEOIP STORE SWITCHER FOR MAGENTO 2

GEOIP STORE SWITCHER FOR MAGENTO 2 1 User Guide GEOIP Store Switcher for Magento 2 GEOIP STORE SWITCHER FOR MAGENTO 2 USER GUIDE BSSCOMMERCE 1 2 User Guide GEOIP Store Switcher for Magento 2 Contents 1. GEOIP Store Switcher for Magento

More information

COMP 244 DATABASE CONCEPTS & APPLICATIONS

COMP 244 DATABASE CONCEPTS & APPLICATIONS COMP 244 DATABASE CONCEPTS & APPLICATIONS Querying Relational Data 1 Querying Relational Data A query is a question about the data and the answer is a new relation containing the result. SQL is the most

More information

Semantic Search at Bloomberg

Semantic Search at Bloomberg Semantic Search at Bloomberg Search Solutions 2017 Edgar Meij Team lead, R&D AI emeij@bloomberg.net @edgarmeij Bloomberg Professional Service Bloomberg at a glance Bloomberg Professional Service Trading

More information

Performance and Scalability Overview

Performance and Scalability Overview Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Anlytics platform PENTAHO PERFORMANCE ENGINEERING TEAM

More information

ITACS : Interactive Tool for Analysis of the Climate System

ITACS : Interactive Tool for Analysis of the Climate System Contents 1 2 3 4 ITACS : Interactive Tool for Analysis of the Climate System Features of the ITACS Atmospheric Analysis Data, Outgoing Longwave Radiation (by NOAA), SST, Ocean Analysis Data, etc. Plain

More information

Engineering at Scale. Paul Baecke

Engineering at Scale. Paul Baecke Engineering at Scale THE CHALLENGES OF PREDICTING QUERIES IN WEB SEARCH ENGINES Paul Baecke Introduction How is what we do Extreme Computing? What is the product Complexity online Complexity offline

More information

DHIS2 Android user guide 2.26

DHIS2 Android user guide 2.26 DHIS2 Android user guide 2.26 2006-2016 DHIS2 Documentation Team Revision HEAD@02efc58 2018-01-02 00:22:07 Version 2.26 Warranty: THIS DOCUMENT IS PROVIDED BY THE AUTHORS ''AS IS'' AND ANY EXPRESS OR IMPLIED

More information

INTERACTIVE SQL EXAMPLES

INTERACTIVE SQL EXAMPLES INTERACTIVE SQL EXAMPLES create a table to store information about weather observation stations: -- No duplicate ID fields allowed CREATE TABLE STATION (ID INTEGER PRIMARY KEY, CITY CHAR(20), STATE CHAR(2),

More information

BRING THE NOISE! MAKING SENSE OF A HAILSTORM OF METRICS. Abe Jon

BRING THE NOISE! MAKING SENSE OF A HAILSTORM OF METRICS. Abe Jon BRING THE NOISE! MAKING SENSE OF A HAILSTORM OF METRICS Abe Stanway @abestanway Jon Cowie @jonlives Ninety minutes is a long time. This talk: ~10 ~25 ~30 ~10 ~15 - motivations - skyline - oculus - demo!

More information

ARCHITECTURE ARCHITECTURE OVERVIEW

ARCHITECTURE ARCHITECTURE OVERVIEW ARCHITECTURE ARCHITECTURE OVERVIEW The personalization of the customer experience is in every marketer s mind and this requirement has strong impacts on customer data integration, across channels and applications.

More information

Twitter Data Collection and Analysis

Twitter Data Collection and Analysis Twitter Data Collection and Analysis Tutorial Session EDEE CSM Course Darshan Santani April 7 2016 Outline Twitter API Basics Applications API (REST vs. Streaming) Descriptive Analysis Authentication Localization

More information

Building A Billion Spatio-Temporal Object Search and Visualization Platform

Building A Billion Spatio-Temporal Object Search and Visualization Platform 2017 2 nd International Symposium on Spatiotemporal Computing Harvard University Building A Billion Spatio-Temporal Object Search and Visualization Platform Devika Kakkar, Benjamin Lewis Goal Develop a

More information

Ctek SkyCloud. Application Note. Ctek SkyCloud. Asset Tracking and Management AN010. APP Note AN010. Ctek, Inc.

Ctek SkyCloud. Application Note. Ctek SkyCloud. Asset Tracking and Management AN010. APP Note AN010. Ctek, Inc. Revision: 1.1 Application Note Ctek SkyCloud APP Note Ctek SkyCloud Asset Tracking and Management Ctek, Inc. 1891 N. Gaffey St. Ste. E San Pedro, CA 90731 Table of Contents INTRODUCTION... 1 SKYCLOUD SERVICES...

More information

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22 ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS CS121: Relational Databases Fall 2017 Lecture 22 E-R Diagramming 2 E-R diagramming techniques used in book are similar to ones used in industry

More information

The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources

The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources FOSS4G 2017 Boston The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources Devika Kakkar and Ben Lewis Harvard Center for Geographic Analysis

More information

The Importance of Performance & Security and Simple Steps to Achieve Them CHRIS WELLS CEO NEXCESS.NET LLC

The Importance of Performance & Security and Simple Steps to Achieve Them CHRIS WELLS CEO NEXCESS.NET LLC The Importance of Performance & Security and Simple Steps to Achieve Them CHRIS WELLS CEO NEXCESS.NET LLC Detroit, MI USA NORTH? NORTH-CENTRAL? MIDDLE? NORTH-EAST WEST MID-WEST??? SOUTH Quick Facts About

More information

MySQL Worst Practices. Introduction. by Jonathan Baldie

MySQL Worst Practices. Introduction. by Jonathan Baldie MySQL Worst Practices by Jonathan Baldie Introduction MySQL and MariaDB are two of the most popular database engines in the world. They re rightly chosen for their speed potential, portability, and the

More information

Learn Relational Database from Scratch. Dan Li, Ph.D. Associate Professor Computer Science Eastern Washington University

Learn Relational Database from Scratch. Dan Li, Ph.D. Associate Professor Computer Science Eastern Washington University Learn Relational Database from Scratch Dan Li, Ph.D. Associate Professor Computer Science Eastern Washington University Self-Introduction Associate professor of Computer Science at EWU Area of expertise

More information

Alexey Grigorev Team ololobhi (Abhishek & ololo)

Alexey Grigorev Team ololobhi (Abhishek & ololo) Alexey Grigorev Team ololobhi (Abhishek & ololo) Data set ~3 mln train pairs, ~1 mln test pairs ~10.8 mln images (~45 gb) Target Evaluation metric: AUC Category_ID Title Pictures Price No seller data locationid

More information

Search Engine Architecture II

Search Engine Architecture II Search Engine Architecture II Primary Goals of Search Engines Effectiveness (quality): to retrieve the most relevant set of documents for a query Process text and store text statistics to improve relevance

More information

Search Engines and Time Series Databases

Search Engines and Time Series Databases Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18

More information

Package opencage. January 16, 2018

Package opencage. January 16, 2018 Package opencage January 16, 2018 Type Package Title Interface to the OpenCage API Version 0.1.4 Tool for accessing the OpenCage API, which provides forward geocoding (from placename to longitude and latitude)

More information

Building the News Search Engine

Building the News Search Engine Building the News Search Engine Ramkumar Aiyengar Team Leader, R&D News Search, Bloomberg L.P. andyetitmoves@apache.org A technology company Our strength and focus is data The Terminal, vertical portals

More information

I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs

I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs Yishi Lin, Xiaowei Chen, John C.S. Lui The Chinese University of Hong Kong 9/4/15 EXACT DISTANCE QUERIES ON DYNAMIC

More information

Database Group Research Overview. Immanuel Trummer

Database Group Research Overview. Immanuel Trummer Database Group Research Overview Immanuel Trummer Talk Overview User Query Data Analysis Result Processing Talk Overview Fact Checking Query User Data Vocalization Data Analysis Result Processing Query

More information

Introduction to Databases

Introduction to Databases Introduction to Databases Abou Bakar Kaleem 1 Overview - Database - Relational Databases - Introduction to SQL Introduction to Databases 2 1 Database (1) Database : - is a collection of related data -

More information

Integrating Advanced Analytics with Big Data

Integrating Advanced Analytics with Big Data Integrating Advanced Analytics with Big Data Ian McKenna, Ph.D. Senior Financial Engineer 2017 The MathWorks, Inc. 1 The Goal SCALE! 2 The Solution tall 3 Agenda Introduction to tall data Case Study: Predicting

More information

OKKAM-based instance level integration

OKKAM-based instance level integration OKKAM-based instance level integration Paolo Bouquet W3C RDF2RDB This work is co-funded by the European Commission in the context of the Large-scale Integrated project OKKAM (GA 215032) RoadMap Using the

More information

Automated Fixing of Programs with Contracts

Automated Fixing of Programs with Contracts Automated Fixing of Programs with Contracts Yi Wei, Yu Pei, Carlo A. Furia, Lucas S. Silva, Stefan Buchholz, Bertrand Meyer and Andreas Zeller Chair of Software Engineering, ETH Zürich Software Engineering

More information

Place Recommendation Using Location-Based Services and Real-time Social Network Data

Place Recommendation Using Location-Based Services and Real-time Social Network Data Place Recommendation Using Location-Based Services and Real-time Social Network Data Kanda Runapongsa Saikaew, Patcharaporn Jiranuwattanawong, Patinya Taearak Abstract Currently, there is excessively growing

More information