Our Index. Searching in Infinispan. Infinispan Query engine Clustering a Lucene index Cloud deployed applications Future. Map/Reduce Fulltext indexing

Similar documents
Introduction to Infinispan

Common JBoss Data Grid Architectures

<Add your title> Name Title Red Hat, Inc. Date

Hibernate OGM. Michal Linhard Quality Assurance Engineer, Red Hat. Red Hat Developer Conference February 17 th, 2012

Monday, November 21, 2011

Realtime visitor analysis with Couchbase and Elasticsearch

A memcached implementation in Java. Bela Ban JBoss 2340

Infinispan for Ninja Developers

Red Hat JBoss Data Grid 6.5

Red Hat JBoss Data Grid 6.3

Red Hat JBoss Data Grid 6.6

Hibernate Search in Action

Monday, April 25, 2011

Red Hat JBoss Data Grid 7.1 Migration Guide

Text Search With Lucene

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13

Polyglot Persistence. EclipseLink JPA for NoSQL, Relational, and Beyond. Shaun Smith Gunnar Wagenknecht

Pragmatic Clustering. Mike Cannon-Brookes CEO, Atlassian Software Systems

Developing Applications with Java EE 6 on WebLogic Server 12c

foreword to the first edition preface xxi acknowledgments xxiii about this book xxv about the cover illustration

Setting Schema Name For Native Queries In. Hibernate >>>CLICK HERE<<<

Red Hat JBoss Data Grid 7.2

Hibernate Search. Apache Lucene Integration. Reference Guide

Apache Lucene Integration. Reference Guide

DISTRIBUTED DATABASE OPTIMIZATIONS WITH NoSQL MEMBERS

Spring-beans-2.5.xsd' Must Have Even Number Of Uri's

Hibernate Search Googling your persistence domain model. Emmanuel Bernard Doer JBoss, a division of Red Hat

Java Enterprise Edition

Full-Text Search: Human Heaven and Database Savior in the Cloud

Technical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved

Red Hat JBoss Data Grid 7.0

Read & Download (PDF Kindle) Microsoft SQL Server 2008 Administrator's Pocket Consultant

Monday, October 31, 11

Creating Ultra-fast Realtime Apps and Microservices with Java. Markus Kett, CEO Jetstream Technologies

Spring Persistence. with Hibernate PAUL TEPPER FISHER BRIAN D. MURPHY

Apache Lucene Integration. Reference Guide

OData server endpoint for Infinispan

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

Enterprise Java in 2012 and Beyond From Java EE 6 To Cloud Computing

Java EE 7: Back-End Server Application Development

The dialog boxes Import Database Schema, Import Hibernate Mappings and Import Entity EJBs are used to create annotated Java classes and persistence.

Generating A Hibernate Mapping File And Java Classes From The Sql Schema

Open Source. in the Corporate World. JBoss. Application Server. State of the Art: Aaron Mulder

Thursday, March 8, 12. Architecture at Scale

Tuesday, June 22, JBoss Users & Developers Conference. Boston:2010

Azure Certification BootCamp for Exam (Developer)

Soir 1.4 Enterprise Search Server

Index. setmaxresults() method, 169 sorting, 170 SQL DISTINCT query, 171 uniqueresult() method, 169

TopLink Grid: Scaling JPA applications with Coherence

SQL 2005 BACKUP AND RESTORE PRODUCT CATALOG EBOOK

Standardize caching in Java. Introduction to JCache and In-Memory data grid solutions

Databases and Big Data Today. CS634 Class 22

Migrating traditional Java EE applications to mobile

<Insert Picture Here> MySQL Cluster What are we working on

Sunday, May 1,

Type of Classes Nested Classes Inner Classes Local and Anonymous Inner Classes

@joerg_schad Nightmares of a Container Orchestration System

Schema Management In Hibernate Interview. Questions >>>CLICK HERE<<<

Read & Download (PDF Kindle) Pro Apache Hadoop

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

1 Markus Eisele, Insurance - Strategic IT-Architecture

rpaf ktl Pen Apache Solr 3 Enterprise Search Server J community exp<= highlighting, relevancy ranked sorting, and more source publishing""

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies

Manual Trigger Sql Server 2008 Examples Insert Update

Writing Portable Applications for J2EE. Pete Heist Compoze Software, Inc.

Database code in PL-SQL PL-SQL was used for the database code. It is ready to use on any Oracle platform, running under Linux, Windows or Solaris.

Jakarta EE Meets NoSQL at the Cloud Age

The Next Generation. Prabhat Jha Principal Engineer

How To Get Database Schema In Java Using >>>CLICK HERE<<<

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

Hibernate Search: A Successful Search, a Happy User Make it Happen!

Building Search Applications

Red Hat? Linux? 6 Server Download Free (EPUB, PDF)

Redis based In-Memory Data Grid and Distributed Locks in Java. Nikita Koksharov, Founder of Redisson

10 ways to reduce your tax bill. Amit Nithianandan Senior Search Engineer Zvents Inc.

The age of Big Data Big Data for Oracle Database Professionals

Scientific Cluster Deployment and Recovery Using puppet to simplify cluster management

object/relational persistence What is persistence? 5

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Java- EE Web Application Development with Enterprise JavaBeans and Web Services

Specialized - Mastering Spring 4.2

Shale and the Java Persistence Architecture. Craig McClanahan Gary Van Matre. ApacheCon US 2006 Austin, TX

Module 8 The Java Persistence API

The Evolution of Java Persistence

Coherence An Introduction. Shaun Smith Principal Product Manager

Distributed Systems 16. Distributed File Systems II

Information Retrieval

CIB Session 12th NoSQL Databases Structures

Java Performance: The Definitive Guide

Persistent Storage with Docker in production - Which solution and why?

Architect Exam Guide. OCM EE 6 Enterprise. (Exams IZO-807,1ZO-865 & IZO-866) Oracle Press ORACLG. Paul R* Allen and Joseph J.

Eclipse Ignore Xml Schema Problems >>>CLICK HERE<<<

Ray Tsang Developer Advocate Google Cloud Platform

GETTING STARTED WITH SPRING FRAMEWORK, SECOND EDITION BY ASHISH SARIN, J SHARMA

<Insert Picture Here> Future<JavaEE>

EMBEDDED MESSAGING USING ACTIVEMQ

Model Driven Architecture with Java

Win32 Multithreaded Programming Epub Gratuit

Developing Microsoft Azure Solutions

Transcription:

Who am I? Sanne Grinovero Software Engineer at Red Hat Hibernate, especially Search Infinispan, focus on Query and Lucene Hibernate OGM Apache Lucene JGroups

Our Index Searching in Infinispan Map/Reduce Fulltext indexing Infinispan Query engine Clustering a Lucene index Cloud deployed applications Future

Infinispan An advanced multi-node cache A transactional scalable datagrid targeting high performance and cloud A NoSQL database, a key-value store How do you query a key value store? SELECT * FROM GRID

To Query a Grid What's in C7? Object v = cache.get( c7 );

If you don't know the key, no way to find the value

Some services have no chance:

Let's test my bookshelf Where's Hibernate Search in Action? Could you hand me ISBN 978-1-93398817-7? How many books about Gaudí?

A real-world example

A real-world example

A real-world example

A real-world example

Bookshelves don't scale

How to implement the bookshelf features on a k/v? Where's Hibernate Search in Action? Can you hand me ISBN 978-1-933988-17-7? How many books about Gaudí?

Most document based NoSQLs support Map/Reduce Infinispan does not focus on documents That won't stop you from using any format JSON, XML, YAML, Java: public class Book implements Serializable { final String title; final String author; final String editor; public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; } }

Iterate & collect class TitleBookSearcher implements Mapper<String, Book, String, Book> { final String title; public TitleBookSearcher(String t) { title = t; } public void map(string key, Book value, Collector collector){ if ( title.equals( value.title ) ) collector.emit( key, value ); } class BookReducer implements Reducer<String, Book> { public Book reduce(string reducedkey, Iterator<Book> iter) { return iter.next(); } }

How to implement the bookshelf features on a k/v? Where's Hibernate Search in Action? Can you hand me ISBN 978-1-933988-17-7? How many books about Shakespeare? To properly score fulltext results we need to consider relative term frequencies on the whole corpus Pre-tagging is a poor choice

Apache Lucene Open source Apache top level project Countless products and sites use it Integrates in Hibernate via Hibernate Search Clusterable via Infinispan

What does Lucene get us? Similarity scoring searches Advanced text analysis Sinonyms, Stopwords, Stemming,... Reusable declarative Filters TermVectors MoreLikeThis Faceted Search Speed!

Lucene: Stopwords a, able, about, across, after, all, almost, also, am, among, an, and, any, are, as, at, be, because, been, but, by, can, cannot, could, dear, did, do, does, either, else, ever, every, for, from, get, got, had, has, have, he, her, hers, him, his, how, however, i, if, in, into, is, it, its, just, least, let, like, likely, may, me, might, most, must, my, neither, no, nor, not, of, off, often, on, only, or, other, our, own, rather, said, say, says, she, should, since, so, some, than, that, the, their, them, then, there, these, they, this, tis, to, too, twas, us, wants, was, we, were, what, when, where, which, while, who, whom, why, will, with, would, yet, you, your

Filters

Faceted Search

Would you want a web Search engine to return hits in alphabetical order?

The downsides Requires an Index in memory on filesystem in Infinispan Made of immutable segments Optimized for search speed, not for updates A world of strings and frequencies

Infinispan Query quickstart Enable it in configuration Have infinispan-query.jar in your classpath Annotate your POJO values to specify what to index <dependency> <groupid>org.infinispan</groupid> <artifactid>infinispan-query</artifactid> <version>5.1.0.beta3</version> </dependency>

Enable Infinispan Query, programmatically Configuration c = new Configuration().fluent().indexing().addProperty( "hibernate.search.default.directory_provider", "ram").build(); CacheManager manager = new DefaultCacheManager(c);

Enable Query in Infinispan XML configurations <?xml version="1.0" encoding="utf-8"?> <infinispan xmlns:xsi="http://www.w3.org/2001/xmlschema-instance" xsi:schemalocation="urn:infinispan:config:5.0 http://www.infinispan.org/schemas/infinispan-config-5.0.xsd" xmlns="urn:infinispan:config:5.0"> <default> <indexing enabled="true" indexlocalonly="true"> <properties> <property name="hibernate.search.option1" value="..." /> <property name="hibernate.search.option2" value="..." /> </properties> </indexing> </default>

Annotate your model @ProvidedId @Indexed public class Book implements Serializable { @Field String title; @Field String author; @Field String editor; public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; } }

Run a Query SearchManager qf = Search.getSearchManager(cache); Query query = qf.buildquerybuilderforclass(book.class).get().phrase().onfield("title").sentence("in action").createquery(); List<Object> list = qf.getquery(query).list();

The code Integrates Hibernate Search Listen to Hibernate events & transactions Infinispan events & transactions Maps Java types and model graphs to Lucene Documents Thin-layer design

Index mapping

declarative analyzers @Entity @Indexed @AnalyzerDef(name = "frenchanalyzer", tokenizer = @TokenizerDef(factory=StandardTokenizerFactory.class),filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {@Parameter(name = "language", value = "French")}) }) public class Book { @Field(index=Index.TOKENIZED, store=store.no) @Analyzer(definition = "frenchanalyzer")

Query test https://github.com/infinispan/infinispan

I'm not going in details.. org.apache.lucene.search.query lucenequery = querybuilder.phrase().onfield( "description" ).andfield( "title" ).sentence( "a book on highly scalable query engines" ).enablefulltextfilter( ready-for-shipping ).createquery(); CacheQuery cachequery = searchmanager.getquery( lucenequery, Book.class); List<Book> objectlist = cachequery.list();

Architecture: simplest approach

Scalability issues Global writer locks NFS based index sharing very tricky

Queue-based clustering (via filesystem)

Index stored in Infinispan

Clustering native Lucene access Using org.apache.lucene directly Distributed on multiple nodes On any cloud

Single node performance idea Write ops/sec Queries/sec RAMDirectory Infinispan 0 Infinispan 0 queries per second RAMDirectory Infinispan D4 Infinispan D40 Infinispan D4 Infinispan D40 FSDirectory FSDirectory Infinispan Local Infinispan Local 0 50 100 150 200 250 300 350 400 0 5000 10000 15000 20000 25000

multi-node performance idea Write ops/sec Queries/sec RAMDirectory Infinispan 0 Infinispan 0 queries per second RAMDirectory Infinispan D4 Infinispan D40 Infinispan D4 Infinispan D40 FSDirectory FSDirectory Infinispan Local Infinispan Local 0 50 100 150 200 250 300 350 400 0 5000 10000 15000 20000 25000

Why does writing not scale?

Performance hints Setting Lucene's maximum segment size to fit in LuceneDirectory chunk_size will avoid readlocks Verify blob sizes fit in JGroups network packets, tune JGroups Check for CacheStores sweet spot size

Memory requirements RAMDirectory: all must fit in a single VM's memory FSDirectory: OS does a great caching job but if it doesn't fit in memory Infinispan: comparable to FSDirectory Flexible Fast Network vs. disk

Ingredients for a cloud One Infinispan to rule them all Store Lucene indexes Hibernate second level cache Application managed cache Datagrid EJB, session replication in AS7 As a JPA store via Hibernate OGM

Ingredients for a cloud JGroups discovery protocol MPING TCP_PING JDBC_PING S3_PING Choose a CacheLoader Database based Jclouds (S3,...) Cassandra

What's next Facilitate writing scalability Ease configuration aspects for clustering ergonomics! Parallel searching A component of http://www.cloudtm.eu

Related talks at JUDCon 15:15 JPA applications in the era of NoSQL and Clouds: Introducing OGM

Q&A http://infinispan.org http://in.relation.to http://jboss.org @Infinispan @Hibernate @SanneGrinovero