Technical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved
|
|
- Wilfred Reed
- 6 years ago
- Views:
Transcription
1 Technical Deep Dive: Cassandra + Solr Confiden7al
2 Business case 2
3 Super scalable realtime analytics Hadoop is fantastic at performing batch analytics Cassandra is an advanced column family oriented system Solr offers realtime analytics like a traditional RDBMS (except joins) 3
4 What is Solr? 4
5 Lucene High performance inverted index: Java based Embeddable library... 5
6 Solr Distributed search Facets Schemas Dismax queries 6
7 Terms Posting List Term to integer document id list. dog = [0,3,6,7,9] cat = [1,2,3,5,9] Terms are stored in sorted order. 7
8 Query Execution Query is parsed into terms Each term is looked up from the terms dictionary For each term, the posting list is iterated, and conjoined or disjoined with the other term s posting lists 8
9 Datastax Enterprise (DSE) 9
10 DSE Cluster 10
11 Datastax Enterprise Combines Cassandra with Solr Best of both worlds Distributed Dynamo based data distribution Reliable proven scalability Lucene and Solr
12 DSE Solr Features Near realtime search Multiple data centers Reindex directly from Cassandra Fast transaction log Run MapReduce on Solr data Realtime analytics 12
13 DSE Solr Architecture Extends Cassandra secondary index API Distributes queries using ring topology over HTTP Data stored in Cassandra Lucene index stored on each node directly on the OS filesystem (index is not stored in Cassandra) Index per column family only 13
14 DSE Solr Architecture Schema and configuration stored in Cassandra Updates can hit any server, routed to the correct node(s) automatically RandomPartitioner MD5 hashes documents / rows to the correct node(s) 14
15 Architecture How Solr is integrated into Cassandra 15
16 DSE Solr Search Queries are automatically distributed to online nodes in the cluster When replication factor > 1, queries are load balanced 16
17 DSE Solr Commit Log Commit log is sync with Solr If a node crashes, no data is lost, the commit log is replayed on restart 17
18 DSE Solr Data Model 18
19 DSE Best Practices 19
20 Production Increase replication factor for more queries per second Like Cassandra, allocate enough RAM, the system IO cache determines queries per second and query latency 20
21 Heap Space Field caches used by sorting and facets Terms dictionary index The index is not loaded into heap Rely on the system IO cache 21
22 Loading Configuration Files into DSE DSE stores the configuration files in Cassandra Same configuration files used for each node Use curl to HTTP POST the schema.xml and solrconfig.xml files into DSE 22
23 Near Realtime Search Use DSENRTCachingDirectoryFactory Small segments flushed to RAM Once large enough, the small segments are flushed to disk Set autosoftcommit to 1-5 seconds Reduce or eliminate the auto-warming in caches 23
24 Validation Log DSE Search stores Solr analyzing errors in the validation log /var/log/cassandra/solrvalidation.log 24
25 DSENRTCachingDirectory Factory maxmergesizemb - The threshold (MB) for writing a merge segment to a RAMDirectory or to the file system maxcachemb - The maximum value (MB) of the RAMDirectory 25
26 Using DSE Comes with Wikipedia demonstration application Here is a quick example 26
27 Query using CQL Solr queries may be executed via CQL Here is a quick example SELECT title FROM solr WHERE solr_query='title:b*'; 27
28 Resource URL Configuration files are stored in Cassandra Same configuration per column family <keyspace>.<columnfamily>/ <filename>.<ext> 28
29 Solr Admin Console 29
30 Rebuilding an Index Indexes can be rebuilt Rebuilding is useful when the schema changes or the index has become corrupted./bin/dsetool rebuild_indexes wiki solr 30
31 Turn on Compression Text can usually be compressed by a large factor Turning on compression enables more data to use to system IO cache UPDATE COLUMN FAMILY solr WITH compression_options= {sstable_compression:snappycompressor, chunk_length_kb:64}; 31
32 General Solr 32
33 Important Ideas Queries Documents and Fields Analyzers Segments Schema 33
34 Documents and Fields Lucene indexes documents Document consist of fields Fields consist of a name and one or more values 34
35 Analyzers Convert text into tokens / terms Records the position of each token Converts tokens as per design, such as stemming 35
36 Segments Lucene stores the index in discrete units called segments A merge policy is set for how and when to merge (like compact) segments At query time, segments are accessed 36
37 Schema Structure First field types are defined such as primitives, then text fields and their analyzers 37
38 Schema Type Mapping Solr field types are mapped to native Cassandra types Solr Type Cassandra Type TextField UTF8Type LongField LongType IntField Int32Type StringField UTF8Type 38
39 Query Overview Solr queries offer many of the same features as SQL (except joins) Powerful, expressive, and fast 39
40 Query Types Search on any number of fields with boolean logic (AND, OR, +, -) Sort results per field similar to SQL Range queries Phrase queries Regular expression queries Query boosting (DisMax) 40
41 Filter Queries Cached bit sets No score calculated Good for queries with many results that are reused such as types or access controls 41
42 Debug Queries Pass in debug=true Provides info about timing of components Debug info about the query Debug info about the result scoring 42
43 Sort By Solr queries offer many of the same features as SQL (except joins) Powerful, expressive, and fast 43
44 Range Queries createdate: [ T23:59:59.999Z TO *] field:[* TO 100] -field:[* TO *] finds all documents without a value for field 44
45 Phrase Query "data stax"~4 Search for "data and stax" within 4 words of each other 45
46 Prefix Queries myfield:foo* Queries cannot begin with an asterik 46
47 Regular Expressions Use forward slash to demarcate a regular expression query Match on a five-digit zip code body:/[0-9]{5}/ 47
48 Spatial Queries Bounding box Distance Filtering based on distance 48
49 Auto Suggest Uses SpellCheckComponent Spellcheck / suggest is built from an existing index Can be set to automatically rebuild the suggest index on commit 49
50 Prefix Auto Suggest It is recommended to use FSTLookup or WFSTLookup They are more memory efficient 50
51 Auto Suggest Parameters spellcheck TRUE spellcheck.dictionary suggest spellcheck.onlymorepopular TRUE spellcheck.count 5 (number of suggestions returned) StringField UTF8Type 51
52 Auto Suggest by Popular Queries Prefix based auto-suggest can be limiting Use EdgeNGramFilterFactory to query within terms Sort results by a hit count field 52
53 Dismax Query Parser Dismax query parser provides query time field level boosting granularity, with less special syntax Dismax generally makes the best first choice query parser for user facing Solr applications 53
54 Facets Intersection count of another query Commonly seen on shopping and other web sites Solr supports multi-select faceting Range faceting 54
55 Facets Parameters facet TRUE facet.field fields comma separated facet.query Query to facet on facet.method enum, fc, fcs (near realtime search) 55
56 Facet Example facet TRUE facet.field fields comma separated facet.query Query to facet on facet.method enum, fc, fcs (near realtime search) 56
57 Group By Much like SQL group by Sort group values Many options available, sort documents in a group, scroll results per-group No aggregations 57
58 Highlighting Highlighting re-analyzes each document Fast vector highlighter is faster however requires more storage 58
59 Highlighting Parameters hl TRUE hl.fl fields comma separated hl.usefastvectorhighlighte r true/false 59
60 The End 60
61 61
Soir 1.4 Enterprise Search Server
Soir 1.4 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more David Smiley Eric Pugh *- PUBLISHING -J BIRMINGHAM - MUMBAI Preface
More informationBattle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć sematext.com
Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć Sematext International @kucrafal @sematext sematext.com Who Am I Solr 3.1 Cookbook author (4.0 inc) Sematext consultant & engineer Solr.pl
More informationrpaf ktl Pen Apache Solr 3 Enterprise Search Server J community exp<= highlighting, relevancy ranked sorting, and more source publishing""
Apache Solr 3 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, relevancy ranked sorting, and more David Smiley Eric Pugh rpaf ktl Pen I I riv IV I J community
More informationRealtime visitor analysis with Couchbase and Elasticsearch
Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo
More informationSEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME. Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013
SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013 1 WHO AM I? Ryan Tabora Think Big Analytics - Senior Data Engineer Lover of dachshunds,
More informationHigh Performance Solr. Shalin Shekhar Mangar
High Performance Solr Shalin Shekhar Mangar Performance constraints CPU Memory Disk Network 2 Tuning (CPU) Queries Phrase query Boolean query (AND) Boolean query (OR) Wildcard Fuzzy Soundex roughly in
More informationCassandra 1.0 and Beyond
Cassandra 1.0 and Beyond Jake Luciani, DataStax jake@datastax.com, 11/11/11 1 About me http://twitter.com/tjake Cassandra Committer Thrift PMC Early DataStax employee Ex-Wall St. (happily) Job Trends from
More informationIntro Cassandra. Adelaide Big Data Meetup.
Intro Cassandra Adelaide Big Data Meetup instaclustr.com @Instaclustr Who am I and what do I do? Alex Lourie Worked at Red Hat, Datastax and now Instaclustr We currently manage x10s nodes for various customers,
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationGlossary. Updated: :00
Updated: 2018-07-25-07:00 2018 DataStax, Inc. All rights reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
More informationJune 20, 2017 Revision NoSQL Database Architectural Comparison
June 20, 2017 Revision 0.07 NoSQL Database Architectural Comparison Table of Contents Executive Summary... 1 Introduction... 2 Cluster Topology... 4 Consistency Model... 6 Replication Strategy... 8 Failover
More informationSearch Engines and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationEnterprise Search with ColdFusion Solr. Dan Sirucek cf.objective 2012 May 2012
Enterprise Search with ColdFusion Solr Dan Sirucek cf.objective 2012 May 2012 About Me Senior Learning Technologist at WellPoint, Inc Developer for 14 years Developing in ColdFusion for 8 years Started
More informationEPL660: Information Retrieval and Search Engines Lab 3
EPL660: Information Retrieval and Search Engines Lab 3 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Apache Solr Popular, fast, open-source search platform built
More informationGoal of this document: A simple yet effective
INTRODUCTION TO ELK STACK Goal of this document: A simple yet effective document for folks who want to learn basics of ELK (Elasticsearch, Logstash and Kibana) without any prior knowledge. Introduction:
More informationEfficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)
Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-
More informationOpen Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria
Open Source Search Andreas Pesenhofer max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria max.recall information systems max.recall is a software and consulting company enabling
More informationA Scotas white paper September Scotas OLS
A Scotas white paper September 2013 Scotas OLS Introduction When you have to perform searches over big data, you need specialized solutions that can deal with the velocity, variety and volume of this valuable
More informationIndexing and Search with
Indexing and Search with Lucene @Greplin About Greplin + More! The Nature of our Service Volume of insertions >>> Volume of searches Peak insertion rate has peaked to 5k documents / second Fully loaded
More informationCIB Session 12th NoSQL Databases Structures
CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is
More informationCassandra 2012: What's New & Upcoming. Sam Tunnicliffe
Cassandra 2012: What's New & Upcoming Sam Tunnicliffe sam@datastax.com DSE : integrated Big Data platform Built on Cassandra Analytics using Hadoop (Hive/Pig/Mahout) Enterprise Search with Solr Cassandra
More informationA Non-Relational Storage Analysis
A Non-Relational Storage Analysis Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Cloud Computing - 2nd semester 2012/2013 Universitat Politècnica de Catalunya Microblogging - big data?
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017
Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 About the Presentation Problems Existing Solutions Denis Magda
More informationCassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent
Tanton Jeppson CS 401R Lab 3 Cassandra, MongoDB, and HBase Introduction For my report I have chosen to take a deeper look at 3 NoSQL database systems: Cassandra, MongoDB, and HBase. I have chosen these
More informationColumn-Family Databases Cassandra and HBase
Column-Family Databases Cassandra and HBase Kevin Swingler Google Big Table Google invented BigTableto store the massive amounts of semi-structured data it was generating Basic model stores items indexed
More informationSearch and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria
More informationCouchbase Architecture Couchbase Inc. 1
Couchbase Architecture 2015 Couchbase Inc. 1 $whoami Laurent Doguin Couchbase Developer Advocate @ldoguin laurent.doguin@couchbase.com 2015 Couchbase Inc. 2 2 Big Data = Operational + Analytic (NoSQL +
More informationParallel SQL and Streaming Expressions in Apache Solr 6. Shalin Shekhar Lucidworks Inc.
Parallel SQL and Streaming Expressions in Apache Solr 6 Shalin Shekhar Mangar @shalinmangar Lucidworks Inc. Introduction Shalin Shekhar Mangar Lucene/Solr Committer PMC Member Senior Solr Consultant with
More informationBig Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI
Big Data Development CASSANDRA NoSQL Training - Workshop November 20 to 24 2016 (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 9798 Dubai UAE, email training-coordinator@isidusnet
More informationTools for Social Networking Infrastructures
Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationOracle NoSQL Database Enterprise Edition, Version 18.1
Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 10: Mutable State (1/2) March 15, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationA short introduction to the development and evaluation of Indexing systems
A short introduction to the development and evaluation of Indexing systems Danilo Croce croce@info.uniroma2.it Master of Big Data in Business SMARS LAB 3 June 2016 Outline An introduction to Lucene Main
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More informationImproving Drupal search experience with Apache Solr and Elasticsearch
Improving Drupal search experience with Apache Solr and Elasticsearch Milos Pumpalovic Web Front-end Developer Gene Mohr Web Back-end Developer About Us Milos Pumpalovic Front End Developer Drupal theming
More informationRelevancy Workbench Module. 1.0 Documentation
Relevancy Workbench Module 1.0 Documentation Created: Table of Contents Installing the Relevancy Workbench Module 4 System Requirements 4 Standalone Relevancy Workbench 4 Deploy to a Web Container 4 Relevancy
More informationClick to add text IBM Collaboration Solutions
IBM Connections Search: Troubleshooting and Best Practices 5/14/2014 Greg Presayzen Client Technical Professional Mark McCarville Advisory Software Engineer Click to add text IBM Collaboration Solutions
More informationADVANCED DATABASES CIS 6930 Dr. Markus Schneider. Group 5 Ajantha Ramineni, Sahil Tiwari, Rishabh Jain, Shivang Gupta
ADVANCED DATABASES CIS 6930 Dr. Markus Schneider Group 5 Ajantha Ramineni, Sahil Tiwari, Rishabh Jain, Shivang Gupta WHAT IS ELASTIC SEARCH? Elastic Search Elasticsearch is a search engine based on Lucene.
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationWhat s New in DataStax Enterprise 3.1? A Guide for Developers, Architects and IT Managers. White Paper BY DATASTAX CORPORATION November 2013
What s New in DataStax Enterprise 3.1? A Guide for Developers, Architects and IT Managers White Paper BY DATASTAX CORPORATION November 2013 1 Table of Contents Abstract 3 Introduction 3 What s New in DataStax
More informationEPL660: Information Retrieval and Search Engines Lab 2
EPL660: Information Retrieval and Search Engines Lab 2 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Apache Lucene Extremely rich and powerful full-text search
More informationModule 9: Managing Schema Objects
Module 9: Managing Schema Objects Overview Naming guidelines for identifiers in schema object definitions Storage and structure of schema objects Implementing data integrity using constraints Implementing
More informationGhislain Fourny. Big Data 5. Wide column stores
Ghislain Fourny Big Data 5. Wide column stores Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 2 Where we are User interfaces
More informationMassively scalable NoSQL with Apache Cassandra! Jonathan Ellis Project Chair, Apache Cassandra CTO,
Massively scalable NoSQL with Apache Cassandra! Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced Cassandra Job Trends Big Data trend Why Big Data Matters Big data Analytics (Hadoop)?
More informationGhislain Fourny. Big Data 5. Column stores
Ghislain Fourny Big Data 5. Column stores 1 Introduction 2 Relational model 3 Relational model Schema 4 Issues with relational databases (RDBMS) Small scale Single machine 5 Can we fix a RDBMS? Scale up
More informationCS 655 Advanced Topics in Distributed Systems
Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3
More informationState of the Dolphin Developing new Apps in MySQL 8
State of the Dolphin Developing new Apps in MySQL 8 Highlights of MySQL 8.0 technology updates Mark Swarbrick MySQL Principle Presales Consultant Jill Anolik MySQL Global Business Unit Israel Copyright
More informationApache Lucene 4. Robert Muir
Apache Lucene 4 Robert Muir Agenda Overview of Lucene Conclusion Resources Q & A Download of Lucene: core/ analysis/ queryparser/ highlighter/ suggest/ expressions/ join/ memory/ codecs/... core/ Lucene
More informationDATABASE DESIGN II - 1DL400
DATABASE DESIGN II - 1DL400 Fall 2016 A second course in database systems http://www.it.uu.se/research/group/udbl/kurser/dbii_ht16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationNPTEL Course Jan K. Gopinath Indian Institute of Science
Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,
More informationIs Elasticsearch the Answer?
High-Performance Big-Data Computation Solution Is Elasticsearch the Answer? Yoav Melamed Navigation The need Optional solutions What is Elasticsearch Not out of the box Shard limitations and capabilities
More informationIntroduction to IR Systems: Supporting Boolean Text Search
Introduction to IR Systems: Supporting Boolean Text Search Ramakrishnan & Gehrke: Chapter 27, Sections 27.1 27.2 CPSC 404 Laks V.S. Lakshmanan 1 Information Retrieval A research field traditionally separate
More informationNoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018
NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data
More informationPerformance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1
Performance Best Practices Paper for IBM Tivoli Directory Integrator v6.1 and v6.1.1 version 1.0 July, 2007 Table of Contents 1. Introduction...3 2. Best practices...3 2.1 Preparing the solution environment...3
More informationMySQL Architecture and Components Guide
Guide This book contains the following, MySQL Physical Architecture MySQL Logical Architecture Storage Engines overview SQL Query execution InnoDB Storage Engine MySQL 5.7 References: MySQL 5.7 Reference
More informationCloudera Kudu Introduction
Cloudera Kudu Introduction Zbigniew Baranowski Based on: http://slideshare.net/cloudera/kudu-new-hadoop-storage-for-fast-analytics-onfast-data What is KUDU? New storage engine for structured data (tables)
More informationAPI Gateway Version September Key Property Store User Guide
API Gateway Version 7.5.2 15 September 2017 Key Property Store User Guide Copyright 2017 Axway All rights reserved. This documentation describes the following Axway software: Axway API Gateway 7.5.2 No
More informationLucene 4 - Next generation open source search
Lucene 4 - Next generation open source search Simon Willnauer Apache Lucene Core Committer & PMC Chair simonw@apache.org / simon.willnauer@searchworkings.org Who am I? Lucene Core Committer Project Management
More informationCS November 2018
Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationHadoop & Big Data Analytics Complete Practical & Real-time Training
An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE
More informationADVANCED HBASE. Architecture and Schema Design GeeCON, May Lars George Director EMEA Services
ADVANCED HBASE Architecture and Schema Design GeeCON, May 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera Consulting on Hadoop projects (everywhere) Apache Committer
More informationUsing space-filling curves for multidimensional
Using space-filling curves for multidimensional indexing Dr. Bisztray Dénes Senior Research Engineer 1 Nokia Solutions and Networks 2014 In medias res Performance problems with RDBMS Switch to NoSQL store
More informationShark: Hive (SQL) on Spark
Shark: Hive (SQL) on Spark Reynold Xin UC Berkeley AMP Camp Aug 21, 2012 UC BERKELEY SELECT page_name, SUM(page_views) views FROM wikistats GROUP BY page_name ORDER BY views DESC LIMIT 10; Stage 0: Map-Shuffle-Reduce
More informationelasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay Banon
elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay Banon - @kimchy Lucene Basics - Directory A File System Abstraction Mainly used to read and write files Used to read and write
More informationC exam. Number: C Passing Score: 800 Time Limit: 120 min IBM C IBM Cloud Platform Application Development
C5050-285.exam Number: C5050-285 Passing Score: 800 Time Limit: 120 min IBM C5050-285 IBM Cloud Platform Application Development Exam A QUESTION 1 What are the two key benefits of Cloudant Sync? (Select
More informationHive and Shark. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)
Hive and Shark Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Hive and Shark 1393/8/19 1 / 45 Motivation MapReduce is hard to
More informationMongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM
MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM About us Adamo Tonete MongoDB Support Engineer Agustín Gallego MySQL Support Engineer Agenda What are MongoDB and MySQL; NoSQL
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationOracle NoSQL Database Enterprise Edition, Version 18.1
Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across
More informationLAB 7: Search engine: Apache Nutch + Solr + Lucene
LAB 7: Search engine: Apache Nutch + Solr + Lucene Apache Nutch Apache Lucene Apache Solr Crawler + indexer (mainly crawler) indexer + searcher indexer + searcher Lucene vs. Solr? Lucene = library, more
More informationInventory (input to ECOMP and ONAP Roadmaps)
Inventory (input to ECOMP and ONAP Roadmaps) 1Q2018 2Q2018 3Q2018 4Q2018 1Q2019 2Q2019 3Q2019 4Q2019 ONAP participation and alignment Operations, Product, and other features with A&AI design impact Inventory
More informationApache Lucene - Overview
Table of contents 1 Apache Lucene...2 2 The Apache Software Foundation... 2 3 Lucene News...2 3.1 27 November 2011 - Lucene Core 3.5.0... 2 3.2 26 October 2011 - Java 7u1 fixes index corruption and crash
More informationCONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION
Hands-on Session NoSQL DB Donato Summa THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION 1 Summary Elasticsearch How to get Elasticsearch up and running ES data organization
More informationApache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source
Apache Ignite TM - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC https://ignite.apache.org @apacheignite @dsetrakyan Agenda About In- Memory Computing Apache Ignite
More informationSemantic Web Technologies. Topic: RDF Triple Stores
Semantic Web Technologies Topic: RDF Triple Stores olaf.hartig@liu.se Acknowledgement: Some slides in this slide set are adaptations of slides of Olivier Curé (University of Paris-Est Marne la Vallée,
More informationAndrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs
Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why
More informationJure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah
Jure Leskovec (@jure) Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah 2 My research group at Stanford: Mining and modeling large social and information networks
More informationGridGain and Apache Ignite In-Memory Performance with Durability of Disk
GridGain and Apache Ignite In-Memory Performance with Durability of Disk Dmitriy Setrakyan Apache Ignite PMC GridGain Founder & CPO http://ignite.apache.org #apacheignite Agenda What is GridGain and Ignite
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationCourse Content MongoDB
Course Content MongoDB 1. Course introduction and mongodb Essentials (basics) 2. Introduction to NoSQL databases What is NoSQL? Why NoSQL? Difference Between RDBMS and NoSQL Databases Benefits of NoSQL
More informationMajor Features: Postgres 10
Major Features: Postgres 10 BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the Postgres 10 release. Creative Commons Attribution License
More informationElementary IR: Scalable Boolean Text Search. (Compare with R & G )
Elementary IR: Scalable Boolean Text Search (Compare with R & G 27.1-3) Information Retrieval: History A research field traditionally separate from Databases Hans P. Luhn, IBM, 1959: Keyword in Context
More informationmysolr Documentation Release Rubén Abad, Miguel Olivares
mysolr Documentation Release 0.8.2 Rubén Abad, Miguel Olivares June 05, 2014 Contents 1 Basic Usage 3 2 Contents 5 2.1 Installation................................................ 5 2.2 User Guide................................................
More informationwhitepaper RediSearch: A High Performance Search Engine as a Redis Module
whitepaper RediSearch: A High Performance Search Engine as a Redis Module Author: Dvir Volk, Senior Architect, Redis Labs Table of Contents RediSearch At-a-Glance 2 A Little Taste: RediSearch in Action
More informationLucidWorks: Searching with curl October 1, 2012
LucidWorks: Searching with curl October 1, 2012 1. Module name: LucidWorks: Searching with curl 2. Scope: Utilizing curl and the Query admin to search documents 3. Learning objectives Students will be
More informationMapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia
MapReduce Spark Some slides are adapted from those of Jeff Dean and Matei Zaharia What have we learnt so far? Distributed storage systems consistency semantics protocols for fault tolerance Paxos, Raft,
More informationNoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu
NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related
More information1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions
Big Data Hadoop Architect Online Training (Big Data Hadoop + Apache Spark & Scala+ MongoDB Developer And Administrator + Apache Cassandra + Impala Training + Apache Kafka + Apache Storm) 1 Big Data Hadoop
More informationMySQL Cluster Web Scalability, % Availability. Andrew
MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationCisco ParStream Cisco ParStream DSA Link Guide
Cisco ParStream Cisco ParStream DSA Link Guide 2017 Cisco and/or its affiliates. Document Information: Title: Cisco ParStream DSA Link Guide Version: 3.3.0 Date Published:
More informationCSE 544 Principles of Database Management Systems
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1 Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable References Bigtable: A Distributed Storage System for Structured Data. Fay Chang et. al. OSDI
More informationCS November 2017
Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account
More informationApache Lucene Eurocon: Preview
Apache Lucene Eurocon: Preview www.lucene-eurocon.org Overview Introduction Near Real Time Search: Yonik Seeley A link to download these slides will be available after the webcast is complete. An on-demand
More informationTALK 1: CONVINCE YOUR BOSS: CHOOSE THE "RIGHT" DATABASE. Prof. Dr. Stefan Edlich Beuth University of Technology Berlin (App.Sc.)
TALK 1: CONVINCE YOUR BOSS: CHOOSE THE "RIGHT" DATABASE Prof. Dr. Stefan Edlich Beuth University of Technology Berlin (App.Sc.) nosqlfrankfurt.de nosql powerdays 2 years of NoSQL Consulting http://nosql-database.org
More informationHBASE INTERVIEW QUESTIONS
HBASE INTERVIEW QUESTIONS http://www.tutorialspoint.com/hbase/hbase_interview_questions.htm Copyright tutorialspoint.com Dear readers, these HBase Interview Questions have been designed specially to get
More informationMigrating to Cassandra in the Cloud, the Netflix Way
Migrating to Cassandra in the Cloud, the Netflix Way Jason Brown - @jasobrown Senior Software Engineer, Netflix Tech History, 1998-2008 In the beginning, there was the webapp and a single database in a
More information