Improving Drupal search experience with Apache Solr and Elasticsearch

Similar documents
Search Engines and Time Series Databases

Search and Time Series Databases

Goal of this document: A simple yet effective

Who are we anyway? Adam Erickson. Jeff Tomlinson. aether. Senior Drupal Engineer - Hockey fanatic - Youth hockey coach

Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć sematext.com

EPL660: Information Retrieval and Search Engines Lab 3

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider. Group 5 Ajantha Ramineni, Sahil Tiwari, Rishabh Jain, Shivang Gupta

A Scotas white paper September Scotas Push Connector

Amazon Elasticsearch Service

Search Autocomplete Magento Extension

E l a s t i c s e a r c h F e a t u r e s. Contents

Realtime visitor analysis with Couchbase and Elasticsearch

Open Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria

rpaf ktl Pen Apache Solr 3 Enterprise Search Server J community exp<= highlighting, relevancy ranked sorting, and more source publishing""

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018

Soir 1.4 Enterprise Search Server

Monitor your containers with the Elastic Stack. Monica Sarbu

Qualys Cloud Suite 2.30

Amazon Search Services. Christoph Schmitter

Working with Islandora

Monitor your infrastructure with the Elastic Beats. Monica Sarbu

Using Elastic with Magento

Parallel SQL and Streaming Expressions in Apache Solr 6. Shalin Shekhar Lucidworks Inc.

An Application for Monitoring Solr

1 Preface and overview Functional enhancements Improvements, enhancements and cancellation System support...

SOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera

ElasticSearch in Production

Red Hat JBoss Data Grid 7.1 Migration Guide

SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME. Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013

Technical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved

Create-A-Page Design Documentation

Oracle APEX 18.1 New Features

Using ElasticSearch to Enable Stronger Query Support in Cassandra

Wednesday, September 19, 12 THE LEADER IN DRUPAL PLATFORM DESIGN AND DEVELOPMENT

CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

Elasticsearch Search made easy

Jamcracker, Inc. CMS Dashboard Widget Creation

Dynatrace FastPack for Liferay DXP

Smart Bulk SMS & Voice SMS Marketing Script with 2-Way Messaging. Quick-Start Manual

The main differences with other open source reporting solutions such as JasperReports or mondrian are:

D, E I, J, K, L O, P, Q

Distributed CI: Scaling Jenkins on Mesos and Marathon. Roger Ignazio Puppet Labs, Inc. MesosCon 2015 Seattle, WA

Blog site (cont.) theme, 202 view creations, 205 Browser tools, 196 Buytaert, Dries, 185

by Cisco Intercloud Fabric and the Cisco

Scaling DreamFactory

Europeana Core Service Platform

FAST& SCALABLE SYSTEMS WITH APACHESOLR. Arnon Yogev IBM Research

Cumulus 11.0 Release Notes

Homework 9: Stock Search Android App with Facebook Post A Mobile Phone Exercise

MIRO DIETIKER Founder

Whooo s calling Whooo?

KKDK Project: Solution description. KKDK Project. Solution description. Version 1.0

Release Highlights

NYC Apache Lucene/Solr Meetup

Nexis User Guide.

Unifying logs and metrics data with Elastic Beats. Monica Sarbu Team lead, Elastic Beats

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch

The Future of the Realtime Web BETTER APIS WITH GRAPHQL. Josh

Application monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect

Are you visualizing your logfiles? Bastian Widmer

MQ Monitoring on Cloud

Introduction to NoSQL Databases

Hadoop/MapReduce Computing Paradigm

Road to Auto Scaling

Red Hat JBoss Data Grid 7.0

run your own search engine. today: Cablecar

Hortonworks Cybersecurity Platform

Monitoring MySQL with Prometheus & Grafana

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

CLOUDLENS PUBLIC, PRIVATE, AND HYBRID CLOUD VISIBILITY

Microservices log gathering, processing and storing

About the Tutorial. Audience. Prerequisites. Copyright and Disclaimer. Logstash

VERINT EFM 8.0 Release Overview

The TDAQ Analytics Dashboard: a real-time web application for the ATLAS TDAQ control infrastructure

Oracle Fusion Middleware 11g: Build Applications with ADF Accel

Advance Search With Solr

The Observatory Tool Dashboard Guide

GUIDE. LiveNX Semantic Tagging: Best-Practices Guide

Islandora and Fedora 4; The Atonement v3: The Atonermenter

Google Tag Manager. Google Tag Manager Custom Module for Magento

mimacom & LiferayDXP Campaign

Drupal 7 Sql Schema Api Datetime

Log Analysis When CLI get's complex. ITNOG3 Octavio Melendres Network admin - Fastnet Spa

GraphQL: Mind Your Ps and QLs

DevOps + Infrastructure TRACK SUPPORTED BY

Coveo Platform 7.0. Atlassian Confluence V2 Connector Guide

RavenDB & document stores

Modern features with Plex and Websydian ExtJS. Olaf Flemming Project Manager AXSOS AG

Using vrealize Log Insight

Neos Google Analytics Integration

Using vrealize Log Insight

Using vrealize Log Insight. 08-SEP-2017 vrealize Log Insight 4.5

Using vrealize Log Insight. Update 1 Modified on 03 SEP 2017 vrealize Log Insight 4.0

Prophet Mobile User Guide

Develop Mobile Front Ends Using Mobile Application Framework A - 2

Building Search Based Applications

Entering Finding Aid Data in ArchivesSpace

Using vrealize Log Insight

Kafka Connect the Dots

Transcription:

Improving Drupal search experience with Apache Solr and Elasticsearch Milos Pumpalovic Web Front-end Developer Gene Mohr Web Back-end Developer

About Us Milos Pumpalovic Front End Developer Drupal theming and module development mip2047@med.cornell.edu Gene Mohr Back End Developer Full stack and Drupal module development bas2019@med.cornell.edu 2

What is search engine and why we use it? Search engine in Drupal What are the differences from Drupal Search API? Why we use it? 3

What is Solr? Solr is a super fast, open source, standalone search server built on Apache Lucene. Some of the features are: Advanced Full-Text Search Designed for High Volume traffic Faceted Search and Filtering Scalable, flexible, extensible Rich Document Parsing built-in ability to index PDF s, Word documents and more Multiple search indexes Query Suggestions, Spelling and More advanced capability for auto-complete, spell checking, highlighting and more 4

Apache Solr Configuration Install module Create an index on Solr engine Create a view Attach facet filters 5

Install the Drupal modules In Drupal 8, Composer is used to manage PHP libraries. The Search API module doesn t include Solarium, so dependencies need to be set via Composer with the following commands: $ composer config repositories.drupal composer https://packages.drupal.org/8 Download the module Search API Solr Search. $ composer require drupal/search_api_solr Install the following modules: Optional: install Solr Search Defaults module 6

Add Solr server Go to Configuration -> Search and metadata -> Search API. Click Add Server and configure the Server. Server name example: Local Solr Server 7

Confirm Solr server is connected At this stage, the server s status page should show message: "The Solr server could be reached" and "The Solr core could be accessed" If Solr Search Defaults module was installed, it can be uninstalled as it s no longer required. 8

Add search index Go to Configuration -> Search and metadata -> Search API. Click Add Index Selecting the Content data source, options are presented to select which bundles are to be indexed 9

Add fields to index Before search can be performed, select all the fields that should be available to search. That is configured in the Fields tab. 10

Add processors to the index Last step is to add additional processors. This includes items such as: Content access Ignore case (case-insensitive search) Tokenizer (split into individual words) 11

Verify search index is working Once fields and processors are setup, going back to the View tab, will show the status of the index, and at this point, the content is ready to be indexed if not already set to index immediately when the index is created. Indexing of content is done via cron and any new content will get indexed then. 12

Create a Faceted Solr Search page Final step is to create a search page. This is done by creating a View and create a Page. General settings to configure: View Settings: Show > Index (created earlier) Page Settings: Page Title Path: /search/content Display Format > Unformatted list of Rendered Entity The click Save and edit to configure the view. 1. Under Format > Show, click Settings for Rendered Entity and change view mode to Teaser 2. Under Filter Criteria, add Fulltext search field and expose the field for filtering Further solr configuration can be done in the Solr schema, such as: Searching parts of works, using Solr s EdgeNGramFilterFactory, for example, info will match information Lower Case Tokenizer remove whitespace and non-letters and convert all letters to lowercase 13

Add search facets With the search page setup now, we want to add facets to let users filter down content. Navigate to Configuration > Search and metadata > Facets then click Add facet Facets have a number of settings to configure: - Widget - Show the amount of results - Sorting (by count, display value ) - Operator (OR and AND - AND filters are exclusive and narrow the result set. OR filters are inclusive and widen the result set.) - 14

Add search facets Last step, add the newly created Facet blocks on the Block Layout page Note: Facet blocks don t need to have the page visibility set, it will automatically detect if the page a search page, otherwise, they will not be displayed 15

Search Page Finally, view the search page created 16

What is Elasticsearch? Also capable of providing advanced full-text search Easier to install REST based APIs, a simple HTTP interface, and uses schema-free JSON Distributed Document Store - easy to use JSON document-oriented storage platform Logging data and analysis Visualization and real-time data monitoring when couple of Kibana 17

Elasticsearch Configuration Install module Create an index on Elasticsearch engine Create a view Attach facet filters 18

Install the Drupal modules The Search API module needs Elasticsearch PHP library which provides the abstract layer of Elasticsearch Connector module in Drupal. This can be installed through composer. $ composer require nodespark/des-connector:5.x-dev $ composer update Install the following modules: NOTE: Elasticsearch doesn t have Solr Search Defaults-like module. Setting up and creating a view page has to be done manually. 19

Add Elasticsearch Go to Configuration > Search and metadata > Elasticsearch Connector. Click Add Cluster and configure the Server. 20

Add Elasticsearch (continue) Go to Configuration > Search and metadata > Elasticsearch Connector. Click Add Cluster and configure the server. As default, it is elasticsearch. If you want to edit the cluster/node information, edit elasticsearch.yml file. 21

Add search index (same as Solr) Go to Configuration > Search and metadata > Search API. Click Add Index Selecting the Content data source, options are presented to select which bundles are to be indexed 22

Add fields to index (same as Solr) Before search can be performed, select all the fields that should be available to search. That is configured in the Fields tab. 23

Add processors to the index (same as Solr) Last step is to add additional processors. This includes items such as: Content access Ignore case (case-insensitive search) Tokenizer (split into individual words) 24

Verify search index is working (same as Solr) Once fields and processors are set up, go back to the View tab. It will show the status of the index, and at this point, the content is ready to be indexed if not already set to index immediately when the index is created. Indexing of content is done via cron and any new content will get indexed then. 25

Create a view page 1. Go to Structure > Add view 2. Provide a view name and select your index name as the view source 26

Create a view page (continue) 3. Under Format > Show, select Rendered Entity Or, you can select Fields and add each field you would like to display in the Fields section. 4. Under Filter Criteria, add Fulltext search field and expose the field for filtering 5. Add Sort Criteria: The best one to use is Relevance (desc) 27

Add search facets (same as Solr) With the search page setup now, we want to add facets to let users filter down content. Navigate to Configuration > Search and metadata > Facets then click Add facet Facets have a number of settings to configure: Widget Show the amount of results Sorting (by count, display value ) Operator (OR and AND - AND filters are exclusive and narrow the result set. OR filters are inclusive and widen the result set.) 28

Place facet blocks (same as Solr) Last step, place the newly created Facet blocks on the Block Layout page Note: Facet blocks don t need to have the page visibility set, it will automatically detect Each of the page a search page, otherwise, they will not be displayed. 29

Search Page Finally, view the search page created. 30

Apache Solr or Elasticsearch? Source: A scene from Annie Get Your Gun (1950) 31

When to use Apache Solr If there are few thousand nodes of content (or more, depending on server performance and whether hosted locally or externally) You do not want to put stress on the database If searching is a major feature of the site If you need to search multiple fields 32

When to use Elasticsearch The Elastic Stack (Elasticsearch, Logstash, and Kibana) can interactively search, discover, and analyze to gain insights that improve the analysis of time-series data. No need for upfront schema definition. Schema can be defined per type for customization of indexing process. Has an edge in the cloud environment - this is depend upon SolrCloud advancement. Has advantages of search for enterprise or higher-ed level where analytics plays a bigger role. 33

Additional resources Solr: Configuring the Search API server (D8) https://www.drupal.org/node/2763137 Drupal 8 and Solr: Google-fast search on your own website Why and how https://blog.openlucius.com/en/blog/drupal-8-and-solr-google-fast-search-your-own-website-why-andhow Set up faceted Apache Solr search on Drupal 8 https://www.jeffgeerling.com/blog/2016/set-faceted-apache-solr-search-on-drupal-8-2016-deprecated When to consider Apache Solr https://lastcallmedia.com/blog/when-consider-apache-solr Elasticsearch: Indexing content from Drupal 8 using Elasticsearch https://www.lullabot.com/articles/indexing-content-from-drupal-8-to-elasticsearch How to index Attachments and Files to Elasticsearch https://qbox.io/blog/index-attachments-files-elasticsearch-mapper --------------- Solr vs. Elasticsearch: 5 Factors to Consider https://www.cmswire.com/information-management/solr-vs-elasticsearch-5-factors-to-consider/ 34

Questions? 35