EPL660: Information Retrieval and Search Engines Lab 3

Size: px
Start display at page:

Download "EPL660: Information Retrieval and Search Engines Lab 3"

Transcription

1 EPL660: Information Retrieval and Search Engines Lab 3 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science

2 Apache Solr Popular, fast, open-source search platform built on Apache Lucene from the Apache Lucene project Written in Java and runs as a standalone full-text search server with standalone or distributed (SolrCloud) operation Solr uses the Lucene Java search library at its core for full-text indexing and search

3 Apache Solr Features XML/HTTP and JSON APIs Hit highlighting Faceted Search and Filtering Near real-time indexing Database integration Rich document (e.g., Word, PDF) handling Geospatial Search Fast Incremental Updates and Index Replication Caching Replication Web administration interface etc

4 Apache Solr vs Apache Lucene Relationship between Solr and Lucene is that of a car and its engine. You can't drive an engine, but you can drive a car. Lucene is a library which you can't use as-is, whereas Solr is a complete application which you can use out-of-box. Unlike Lucene, Solr is a web application (WAR) which can be deployed in any servlet container, e.g. Jetty, Tomcat, Resin, etc. single JAR file needed to deploy application on server Solr can be installed and used easily by nonprogrammers. Lucene needs programming skills.

5 When to use Lucene? Need for embedded search functionality into a desktop application for example Need very customized requirements requiring low-level access to the Lucene API classes Solr may be more a hindrance than a help, since it is an extra layer of indirection.

6 SolrCloud Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability: SolrCloud SolrCloud allows for distributed search and indexing SolrCloud features: Central configuration for the entire cluster Automatic load balancing and fail-over for queries ZooKeeper integration for cluster coordination and configuration

7 SolrCloud Concepts A Cluster is made up of one or more Solr Nodes, which are running instances of the Solr server process

8 SolrCloud Concepts A Cluster can host multiple Collections of Solr Documents A collection can be partitioned into multiple Shards (pieces), which contain a subset of the Documents in the Collection Each Shard can be replicated (Leader & Replicas)

9 SolrCloud Concepts The number of Shards that a Collection has determines: The theoretical limit to the number of Documents that Collection can reasonably contain. The amount of parallelization that is possible for an individual search request. The number of Replicas that each Shard has determines: The level of redundancy built into the Collection and how fault tolerant the Cluster can be in the event that some Nodes become unavailable. The theoretical limit in the number concurrent search requests that can be processed under heavy load.

10 Getting Started Download Apache Solr from tgz (or zip for windows) Extract zip and go to solr directory Open a terminal and type: bin/solr start -e cloud -noprompt This will start up a SolrCloud cluster with embedded ZooKeeper (cloud management service) on local workstation with 2 nodes First node listens on port 8983 & second on port 7574 You can see that the Solr is running by loading in your web browser.

11 Solr web interface

12 SolrCloud Preview collections on tab One collection created automatically gettingstarted Collection is partioned into 2 shards First node stores 2 leader shards / Second stores 2 replicas Solr server is up and running, with one collection but no data indexed Important files configuration files: solrconfig.xml, managed-schema solr-dir/server/solr/configsets/_default/conf/solrconfig.xml solr-dir/server/solr/configsets/_default/conf/managed-schema

13 How Solr Sees the World Document: basic unit of information set of data that describes something E.g. document about a person, for example, might contain the person s name, biography, favorite color, and shoe size documents are expected to be composed of fields, which are more specific pieces of information E.g. "first_name":"pavlos", "shoe_size":42 fields can contain different types of data first_name text, shoe_size number User defines type of each field Field type tells Solr how to interpret the field and how it can be queried When document added into a collection, Solr takes values from document fields and add them to index Queries consult index, return matching docs

14 Field Analysis Process How does Solr process document fields when building an index? Example: biography field in a person document "biography": "He received his Ph.D. from Department of Computer Science of the University of Cyprus, in 2012" Index every word of biography in order to find quickly people whose lives have had anything to do with university, or computer. Any issues? What if biography contains a lot of common words you don t really care about like "he", "the", "a", "to", "for", "is" (stop words)? What if biography contains the word "University" and a user makes a query for "university"? Solution: field analysis

15 Field Analysis Process For each field, you can tell Solr: how to break apart the text into words (tokenization) E.g. split at whitespaces, commas, etc. to remove stop words (filtering) to make lower case normalization to remove accents marks Read more here: Understanding Analyzers, Tokenizers, and Filters

16 Schema files and manipulation Solr stores details about the field types and fields it is expected to understand in a schema file: managed-schema is the name for the schema file Solr uses by default to support making Schema changes at runtime via the Schema API (via HTTP), or Schemaless Mode / avoid hand editing of the managed schema file schema.xml is the traditional name for a schema file which can be edited manually by users who use the ClassicIndexSchemaFactory If you are using SolrCloud you may not be able to find any file by these names on the local filesystem. You will only be able to see the schema through the Schema API (if enabled) or through the Solr Admin UI s Cloud Screens

17 Field Analysis Schema defines The kind of fields available for indexing The type of analysis to be applied when indexing or querying each field Available field types such as float, long, double, date, text Explore the schema using Schema tab (see next slide) Example: choose *_txt field to see how solr behaves to field names ending by _txt

18 Field Analysis indexed fields are fields which pass through analysis phase, and are added to the index so as to be searchable/sortable by queries stored fields are fields whose the original text is stored in the index somewhere so as to be retrievable by queries Schema tab

19 Field Analysis Go to the Analysis Tab (see next slide) to see how a text value is broken down into words by Index and Query time analysis Field Value (Index): He received his Ph.D. from Department of Computer Science of the University of Cyprus, in 2012 Analyse Fieldname / FieldType: text_en

20 Field Analysis Insert text to Analyze Analysis tab

21 Field Analysis The word of has been stopped

22 Indexing XML Data Solr includes a simple command line tool for POSTing various types of content to a Solr server /bin/post in UNIX, different usage in Windows Let's first index two XML files UNIX: remain into solr directory bin/post c gettingstarted example/exampledocs/solr.xml example/exampledocs/monitor.xml Windows: go to examples/exampledocs dir java -Dc=gettingstarted -jar post.jar solr.xml monitor.xml You have now indexed two documents in Solr Browse the documents indexed at

23 Collection browsing

24 Collection querying

25 Querying Data via Solr Admin UI Solr can be queried via REST clients, curl, wget, Chrome POSTMAN, etc., as well as via native clients available for many programming languages. Solr Admin UI includes a query builder interface In Admin interface choose gettingstarted collection In "Query" tab click button to display results RequestHandlers are specified in solrconfig.xml Search for anything <requesthandler name="/select class="solr.searchhandler"> <lst name="defaults"> <str name="echoparams">explicit</str> <int name="rows">10</int> </lst> </requesthandler> <initparams path="/update/**,/query,/select,/tvrh,/elevate,/spell,/browse"> <lst name="defaults"> <str name="df">_text_</str> </lst> </initparams> Default search field: text

26 Querying Data via Solr Admin UI Enter "solr" in the "q" text box, to search for "solr" in the index Why no results returned? Default field for searching the word solr is text. No text field includes solr Change df to name and press button again Results can be also previewed in browser: name (response in JSON format) name&wt=xml (response in XML format)

27 Querying Data via Solr Admin UI RESTful url to query Solr. Can be used when querying Solr from custom apps.

28 Querying Data Index all.xml documents in example/exampledocs UNIX: /bin/post -c gettingstarted example/exampledocs/*.xml Windows: java -Dc=gettingstarted -jar post.jar *.xml...and now you can search for all sorts of things using the default Solr Query Syntax (a superset of the Lucene query syntax)... video name:*video* address_s:*ist* +video +price:[* TO 400] docs having video in searchable fields and price up to 400 -address_s:* docs that do not have address_s field

29 Updating Data Although solr.xml has been POSTed to the server twice q : solr " { "numfound": 1, "start": 0, Why? "docs": [ { "id": "SOLR1000", This is because the example schema.xml specifies a "uniquekey" field called "id". Whenever you POST commands to Solr to add a document with the same value for the uniquekey as an existing document, it automatically replaces it for you.

30 Updating Data You can see that that has happened by looking at the values for numdocs and maxdoc in the "CORE"/searcher section of the statistics page... d/plugins?entry=searcher&type=core

31 Deleting Data You can delete data by POSTing a delete command to the update URL and specifying the value of the document's unique key field, or a query that matches multiple documents java -Dc=gettingstarted -Ddata=args - Dcommit=false -jar post.jar "<delete><id>sp2514n</id></delete>" Delete documents that match a specific query java -Dc=gettingstarted -Dcommit=false - Ddata=args -jar post.jar "<delete><query>name:*ddr*</query></delete>"

32 Querying Data via REST API Searches are done via HTTP GET on the select URL with the query string in the q parameter. You can pass a number of optional request parameters to the request handler to control what information is returned. use the "fl" parameter to control what stored fields are returned, and if the relevancy score is returned: q=video&fl=name,id (return only name and id fields) q=video&fl=name,id,score (return relevancy score as well) q=video&fl=*,score (return all stored fields, as well as relevancy score) q=video&sort=address_s desc&fl=name,id,price (add sort specification: sort by address_s descending) q=video&wt=json (return response in JSON format)

33 Sorting Solr provides a simple method to sort on one or more indexed fields. Use the "sort' parameter to specify "field direction" pairs, separated by commas if there's more than one sort field: q=video&sort=price desc q=video&sort=price asc q=video&sort=instock asc, price desc "score" can also be used as a field name when specifying a sort: q=video&sort=score desc q=video&sort=instock asc, score desc Complex functions may also be used to sort results: q=video&sort=div(popularity,add(price,1)) desc If no sort is specified, the default is score desc to return the matches having the highest relevancy

34 Indexing Rich Data Index local "rich" files including HTML, PDF, Microsoft Office formats (such as MS Word), plain text and many other formats found in /docs UNIX: bin/post -c gettingstarted docs/

35 Index Data There are many other different ways to import your data into Solr... one can: Import records from a database using the Data Import Handler (DIH) see tutorial here for MySQL or SQL Server database import Load a CSV file (comma separated values), including those exported by Excel or MySQL. POST JSON documents Index binary documents such as Word and PDF with Solr Cell (ExtractingRequestHandler). Use SolrJ for Java or other Solr clients to programatically create documents to send to Solr.

36 Stopping SolrCloud Stop SolrCloud nodes bin/solr stop -all Delete Solr home for nodes (if needed): rm -rf example/cloud/node1 rm -rf example/cloud/node2

37 Useful Links Next Week: ElasticSearch

An Application for Monitoring Solr

An Application for Monitoring Solr An Application for Monitoring Solr Yamin Alam Gauhati University Institute of Science and Technology, Guwahati Assam, India Nabamita Deb Gauhati University Institute of Science and Technology, Guwahati

More information

Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć sematext.com

Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć  sematext.com Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć Sematext International @kucrafal @sematext sematext.com Who Am I Solr 3.1 Cookbook author (4.0 inc) Sematext consultant & engineer Solr.pl

More information

Soir 1.4 Enterprise Search Server

Soir 1.4 Enterprise Search Server Soir 1.4 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more David Smiley Eric Pugh *- PUBLISHING -J BIRMINGHAM - MUMBAI Preface

More information

Apache Solr Cookbook. Apache Solr Cookbook

Apache Solr Cookbook. Apache Solr Cookbook Apache Solr Cookbook i Apache Solr Cookbook Apache Solr Cookbook ii Contents 1 Apache Solr Tutorial for Beginners 1 1.1 Why Apache Solr................................................... 1 1.2 Installing

More information

Search Engines and Time Series Databases

Search Engines and Time Series Databases Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18

More information

rpaf ktl Pen Apache Solr 3 Enterprise Search Server J community exp<= highlighting, relevancy ranked sorting, and more source publishing""

rpaf ktl Pen Apache Solr 3 Enterprise Search Server J community exp<= highlighting, relevancy ranked sorting, and more source publishing Apache Solr 3 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, relevancy ranked sorting, and more David Smiley Eric Pugh rpaf ktl Pen I I riv IV I J community

More information

Apache Solr Reference Guide. Covering Apache Solr 4.5

Apache Solr Reference Guide. Covering Apache Solr 4.5 Apache Solr Reference Guide Covering Apache Solr 4.5 Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for

More information

Advanced Database : Apache Solr

Advanced Database : Apache Solr Advanced Database : Apache Solr Maazouz Mehdi Wouter Meire December 16th, 2018 1 Summary 1 Introduction 3 1.1 What is a search engine?.................... 3 2 Solr and Lucene 3 2.1 What is Lucene..........................

More information

Search and Time Series Databases

Search and Time Series Databases Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria

More information

Improving Drupal search experience with Apache Solr and Elasticsearch

Improving Drupal search experience with Apache Solr and Elasticsearch Improving Drupal search experience with Apache Solr and Elasticsearch Milos Pumpalovic Web Front-end Developer Gene Mohr Web Back-end Developer About Us Milos Pumpalovic Front End Developer Drupal theming

More information

Search Evolution von Lucene zu Solr und ElasticSearch. Florian

Search Evolution von Lucene zu Solr und ElasticSearch. Florian Search Evolution von Lucene zu Solr und ElasticSearch Florian Hopf @fhopf http://www.florian-hopf.de Index Indizieren Index Suchen Index Term Document Id Analyzing http://www.flickr.com/photos/quinnanya/5196951914/

More information

Relevancy Workbench Module. 1.0 Documentation

Relevancy Workbench Module. 1.0 Documentation Relevancy Workbench Module 1.0 Documentation Created: Table of Contents Installing the Relevancy Workbench Module 4 System Requirements 4 Standalone Relevancy Workbench 4 Deploy to a Web Container 4 Relevancy

More information

Datacenter Simulation Methodologies Web Search. Tamara Silbergleit Lehman, Qiuyun Wang, Seyed Majid Zahedi and Benjamin C. Lee

Datacenter Simulation Methodologies Web Search. Tamara Silbergleit Lehman, Qiuyun Wang, Seyed Majid Zahedi and Benjamin C. Lee Datacenter Simulation Methodologies Web Search Tamara Silbergleit Lehman, Qiuyun Wang, Seyed Majid Zahedi and Benjamin C. Lee Tutorial Schedule Time Topic 09:00-10:00 Setting up MARSSx86 and DRAMSim2 10:00-10:15

More information

Datacenter Simulation Methodologies Web Search

Datacenter Simulation Methodologies Web Search This work is supported by NSF grants CCF-1149252, CCF-1337215, and STARnet, a Semiconductor Research Corporation Program, sponsored by MARCO and DARPA. Datacenter Simulation Methodologies Web Search Tamara

More information

Using Elastic with Magento

Using Elastic with Magento Using Elastic with Magento Stefan Willkommer CTO and CO-Founder @ TechDivision GmbH Comparison License Apache License Apache License Index Lucene Lucene API RESTful Webservice RESTful Webservice Scheme

More information

Goal of this document: A simple yet effective

Goal of this document: A simple yet effective INTRODUCTION TO ELK STACK Goal of this document: A simple yet effective document for folks who want to learn basics of ELK (Elasticsearch, Logstash and Kibana) without any prior knowledge. Introduction:

More information

Indexing HTML files in Solr 1

Indexing HTML files in Solr 1 Indexing HTML files in Solr 1 This tutorial explains how to index html files in Solr using the built-in post tool, which leverages Apache Tika and auto extracts content from html files. You should have

More information

Realtime visitor analysis with Couchbase and Elasticsearch

Realtime visitor analysis with Couchbase and Elasticsearch Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo

More information

Enterprise Search with ColdFusion Solr. Dan Sirucek cf.objective 2012 May 2012

Enterprise Search with ColdFusion Solr. Dan Sirucek cf.objective 2012 May 2012 Enterprise Search with ColdFusion Solr Dan Sirucek cf.objective 2012 May 2012 About Me Senior Learning Technologist at WellPoint, Inc Developer for 14 years Developing in ColdFusion for 8 years Started

More information

Yonik Seeley 29 June 2006 Dublin, Ireland

Yonik Seeley 29 June 2006 Dublin, Ireland Apache Solr Yonik Seeley yonik@apache.org 29 June 2006 Dublin, Ireland History Search for a replacement search platform commercial: high license fees open-source: no full solutions CNET grants code to

More information

Thinking Beyond Search with Solr Understanding How Solr Can Help Your Business Scale. Magento Expert Consulting Group Webinar July 31, 2013

Thinking Beyond Search with Solr Understanding How Solr Can Help Your Business Scale. Magento Expert Consulting Group Webinar July 31, 2013 Thinking Beyond Search with Solr Understanding How Solr Can Help Your Business Scale Magento Expert Consulting Group Webinar July 31, 2013 The presenters Magento Expert Consulting Group Udi Shamay Head,

More information

LAB 7: Search engine: Apache Nutch + Solr + Lucene

LAB 7: Search engine: Apache Nutch + Solr + Lucene LAB 7: Search engine: Apache Nutch + Solr + Lucene Apache Nutch Apache Lucene Apache Solr Crawler + indexer (mainly crawler) indexer + searcher indexer + searcher Lucene vs. Solr? Lucene = library, more

More information

EPL660: Information Retrieval and Search Engines Lab 8

EPL660: Information Retrieval and Search Engines Lab 8 EPL660: Information Retrieval and Search Engines Lab 8 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science What is Apache Nutch? Production ready Web Crawler Operates

More information

Road to Auto Scaling

Road to Auto Scaling Road to Auto Scaling Varun Thacker Lucidworks Apache Lucene/Solr Committer, and PMC member Agenda APIs Metrics Recipes Auto-Scale Triggers SolrCloud Overview ZooKee per Lots Shard 1 Leader Shard 3 Replica

More information

Zookeeper ensemble in the Target cluster. The Zookeeper ensemble is provided as a configuration parameter in the Source configuration.

Zookeeper ensemble in the Target cluster. The Zookeeper ensemble is provided as a configuration parameter in the Source configuration. Introduction The goal of the project is to replicate data to multiple Data Centers. The initial version of the solution will cover the active-passive scenario where data updates are replicated from a source

More information

Technical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved

Technical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved Technical Deep Dive: Cassandra + Solr Confiden7al Business case 2 Super scalable realtime analytics Hadoop is fantastic at performing batch analytics Cassandra is an advanced column family oriented system

More information

Parallel SQL and Streaming Expressions in Apache Solr 6. Shalin Shekhar Lucidworks Inc.

Parallel SQL and Streaming Expressions in Apache Solr 6. Shalin Shekhar Lucidworks Inc. Parallel SQL and Streaming Expressions in Apache Solr 6 Shalin Shekhar Mangar @shalinmangar Lucidworks Inc. Introduction Shalin Shekhar Mangar Lucene/Solr Committer PMC Member Senior Solr Consultant with

More information

Apache Solr Out Of The Box (OOTB)

Apache Solr Out Of The Box (OOTB) Apache Solr Out Of The Box (OOTB) Chris Hostetter hossman - apache - org 2007-11-16 http://people.apache.org/~hossman/apachecon2007us/ http://lucene.apache.org/solr/ Why Are We Here? Learn What Solr Is

More information

FAST& SCALABLE SYSTEMS WITH APACHESOLR. Arnon Yogev IBM Research

FAST& SCALABLE  SYSTEMS WITH APACHESOLR. Arnon Yogev IBM Research FAST& SCALABLE EMAIL SYSTEMS WITH APACHESOLR Arnon Yogev IBM Research Background IBM Verse is a cloud based business email system Background cont. Verse backend is based on Apache Solr Almost every user

More information

Final Report CS 5604 Fall 2016

Final Report CS 5604 Fall 2016 Final Report CS 5604 Fall 2016 Solr Team CS 5604: Information Storage and Retrieval Instructor: Dr. Edward A. Fox Liuqing Li, Anusha Pillai, Ke Tian, Ye Wang {liuqing, anusha89, ketian, yewang16} @vt.edu

More information

Homework 4: Comparing Search Engine Ranking Algorithms

Homework 4: Comparing Search Engine Ranking Algorithms Homework 4: Comparing Search Engine Ranking Algorithms Objectives: o o Preparation Experience using Solr Investigating ranking strategies In a previous exercise you used crawler4j to crawl a news website.

More information

Sitecore Search Scaling Guide

Sitecore Search Scaling Guide Sitecore Search Scaling Guide Rev: 2015-02-18 Sitecore Experience Platform 7.5 Sitecore Search Scaling Guide Administrator's guide to scaling with Sitecore search and item buckets. Table of Contents Chapter

More information

EPL660: Information Retrieval and Search Engines Lab 2

EPL660: Information Retrieval and Search Engines Lab 2 EPL660: Information Retrieval and Search Engines Lab 2 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Apache Lucene Extremely rich and powerful full-text search

More information

elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay Banon

elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay Banon elasticsearch The Road to a Distributed, (Near) Real Time, Search Engine Shay Banon - @kimchy Lucene Basics - Directory A File System Abstraction Mainly used to read and write files Used to read and write

More information

ElasticSearch in Production

ElasticSearch in Production ElasticSearch in Production lessons learned Anne Veling, ApacheCon EU, November 6, 2012 agenda! Introduction! ElasticSearch! Udini! Upcoming Tool! Lessons Learned introduction! Anne Veling, @anneveling!

More information

Cross Data Center Replication in Apache Solr. Anvi Jain, Software Engineer II Amrit Sarkar, Search Engineer

Cross Data Center Replication in Apache Solr. Anvi Jain, Software Engineer II Amrit Sarkar, Search Engineer Cross Data Center Replication in Apache Solr Anvi Jain, Software Engineer II Amrit Sarkar, Search Engineer Who are we? Based in Bedford, MA. Offices all around the world Progress tools and platforms enable

More information

Web scraping. Donato Summa. 3 WP1 face to face meeting September 2017 Thessaloniki (EL)

Web scraping. Donato Summa. 3 WP1 face to face meeting September 2017 Thessaloniki (EL) Web scraping Donato Summa Summary Web scraping : Specific vs Generic Web scraping phases Web scraping tools Istat Web scraping chain Summary Web scraping : Specific vs Generic Web scraping phases Web scraping

More information

Oracle SQL Developer & REST Data Services

Oracle SQL Developer & REST Data Services Oracle SQL Developer & REST Data Services What s New Jeff Smith Senior Principal Product Manager Database Development Tools Jeff.d.smith@oracle.com @thatjeffsmith http://www.thatjeffsmith.com Agenda New

More information

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018 NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data

More information

RavenDB & document stores

RavenDB & document stores université libre de bruxelles INFO-H415 - Advanced Databases RavenDB & document stores Authors: Yasin Arslan Jacky Trinh Professor: Esteban Zimányi Contents 1 Introduction 3 1.1 Présentation...................................

More information

Adobe ColdFusion 11 Enterprise Edition

Adobe ColdFusion 11 Enterprise Edition Adobe ColdFusion 11 Enterprise Edition Version Comparison Adobe ColdFusion 11 Enterprise Edition Adobe ColdFusion 11 Enterprise Edition is an all-in-one application server that offers you a single platform

More information

run your own search engine. today: Cablecar

run your own search engine. today: Cablecar run your own search engine. today: Cablecar Robert Kowalski @robinson_k http://github.com/robertkowalski Search nobody uses that, right? Services on the Market Google Bing Yahoo ask Wolfram Alpha Baidu

More information

The main differences with other open source reporting solutions such as JasperReports or mondrian are:

The main differences with other open source reporting solutions such as JasperReports or mondrian are: WYSIWYG Reporting Including Introduction: Content at a glance. Create A New Report: Steps to start the creation of a new report. Manage Data Blocks: Add, edit or remove data blocks in a report. General

More information

Elasticsearch Search made easy

Elasticsearch Search made easy Elasticsearch Search made easy Alexander Reelsen Agenda Why is search complex? Installation & initial setup Importing data Searching data Replication & Sharding Plugin-based

More information

Building and Running a Solr-as-a-Service SHAI ERERA IBM

Building and Running a Solr-as-a-Service SHAI ERERA IBM Building and Running a Solr-as-a-Service SHAI ERERA IBM Who Am I? Working at IBM Social Analytics & Technologies Lucene/Solr committer and PMC member http://shaierera.blogspot.com shaie@apache.org Background

More information

Apache Lucene Eurocon: Preview

Apache Lucene Eurocon: Preview Apache Lucene Eurocon: Preview www.lucene-eurocon.org Overview Introduction Near Real Time Search: Yonik Seeley A link to download these slides will be available after the webcast is complete. An on-demand

More information

CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Hands-on Session NoSQL DB Donato Summa THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION 1 Summary Elasticsearch How to get Elasticsearch up and running ES data organization

More information

Views for Developers. What is Views? (and data geeks) "a tool for making lists of stuff" Bec White DrupalCamp WI, July 2010

Views for Developers. What is Views? (and data geeks) a tool for making lists of stuff Bec White DrupalCamp WI, July 2010 Views for Developers (and data geeks) What is Views? "a tool for making lists of stuff" Bec White white@palantir.net DrupalCamp WI, July 2010 Developing with Views: Export Views Views describes the data

More information

MEAP Edition Manning Early Access Program Solr in Action version 1

MEAP Edition Manning Early Access Program Solr in Action version 1 MEAP Edition Manning Early Access Program Solr in Action version 1 Copyright 2012 Manning Publications For more information on this and other Manning titles go to www.manning.com brief contents PART 1:

More information

Large-Scale Web Applications

Large-Scale Web Applications Large-Scale Web Applications Mendel Rosenblum Web Application Architecture Web Browser Web Server / Application server Storage System HTTP Internet CS142 Lecture Notes - Intro LAN 2 Large-Scale: Scale-Out

More information

Open Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria

Open Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria Open Source Search Andreas Pesenhofer max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria max.recall information systems max.recall is a software and consulting company enabling

More information

Tutorial 8 Sharing, Integrating and Analyzing Data

Tutorial 8 Sharing, Integrating and Analyzing Data Tutorial 8 Sharing, Integrating and Analyzing Data Microsoft Access 2013 Objectives Session 8.1 Export an Access query to an HTML document and view the document Import a CSV file as an Access table Use

More information

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL We have spent the first part of the course learning Excel: importing files, cleaning, sorting, filtering, pivot tables and exporting

More information

10 ways to reduce your tax bill. Amit Nithianandan Senior Search Engineer Zvents Inc.

10 ways to reduce your tax bill. Amit Nithianandan Senior Search Engineer Zvents Inc. 10 ways to reduce your tax bill Amit Nithianandan Senior Search Engineer Zvents Inc. 04-15-2010 Solr Eclipse- Running Apache Solr in Eclipse. Amit Nithianandan Senior Search Engineer Zvents Inc. 04-15-2010

More information

UIMA Simple Server User Guide

UIMA Simple Server User Guide UIMA Simple Server User Guide Written and maintained by the Apache UIMA Development Community Version 2.3.1 Copyright 2006, 2011 The Apache Software Foundation License and Disclaimer. The ASF licenses

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Side by Side with Solr and Elasticsearch

Side by Side with Solr and Elasticsearch Side by Side with Solr and Elasticsearch Rafał Kuć Radu Gheorghe Rafał Logsene Radu Logsene Overview Agenda documents documents schema mapping queries searches searches index&store index&store aggregations

More information

Creating an Online Catalogue Search for CD Collection with AJAX, XML, and PHP Using a Relational Database Server on WAMP/LAMP Server

Creating an Online Catalogue Search for CD Collection with AJAX, XML, and PHP Using a Relational Database Server on WAMP/LAMP Server CIS408 Project 5 SS Chung Creating an Online Catalogue Search for CD Collection with AJAX, XML, and PHP Using a Relational Database Server on WAMP/LAMP Server The catalogue of CD Collection has millions

More information

Uber Push and Subscribe Database

Uber Push and Subscribe Database Uber Push and Subscribe Database June 21, 2016 Clifford Boyce Kyle DiSandro Richard Komarovskiy Austin Schussler Table of Contents 1. Introduction 2 a. Client Description 2 b. Product Vision 2 2. Requirements

More information

#IoT #BigData. 10/31/14

#IoT #BigData.  10/31/14 #IoT #BigData Seema Jethani @seemaj @basho 1 10/31/14 Why should we care? 2 11/2/14 Source: http://en.wikipedia.org/wiki/internet_of_things Motivation for Specialized Big Data Systems Rate of data capture

More information

Setting Up the Development Environment

Setting Up the Development Environment CHAPTER 5 Setting Up the Development Environment This chapter tells you how to prepare your development environment for building a ZK Ajax web application. You should follow these steps to set up an environment

More information

C-JDBC Tutorial A quick start

C-JDBC Tutorial A quick start C-JDBC Tutorial A quick start Authors: Nicolas Modrzyk (Nicolas.Modrzyk@inrialpes.fr) Emmanuel Cecchet (Emmanuel.Cecchet@inrialpes.fr) Version Date 0.4 04/11/05 Table of Contents Introduction...3 Getting

More information

Web-based File Upload and Download System

Web-based File Upload and Download System COMP4905 Honor Project Web-based File Upload and Download System Author: Yongmei Liu Student number: 100292721 Supervisor: Dr. Tony White 1 Abstract This project gives solutions of how to upload documents

More information

Fusion Registry 9 SDMX Data and Metadata Management System

Fusion Registry 9 SDMX Data and Metadata Management System Registry 9 Data and Management System Registry 9 is a complete and fully integrated statistical data and metadata management system using. Whether you require a metadata repository supporting a highperformance

More information

MPLEMENTATION OF DIGITAL LIBRARY USING HDFS AND SOLR

MPLEMENTATION OF DIGITAL LIBRARY USING HDFS AND SOLR MPLEMENTATION OF DIGITAL LIBRARY USING HDFS AND SOLR H. K. Khanuja, Amruta Mujumdar, Manashree Waghmare, Mrudula Kulkarni, Mrunal Bajaj Department of Computer Engineering Marathwada Mitra Mandal's College

More information

Active Endpoints. ActiveVOS Platform Architecture Active Endpoints

Active Endpoints. ActiveVOS Platform Architecture Active Endpoints Active Endpoints ActiveVOS Platform Architecture ActiveVOS Unique process automation platforms to develop, integrate, and deploy business process applications quickly User Experience Easy to learn, use

More information

Developing Applications with Business Intelligence Beans and Oracle9i JDeveloper: Our Experience. IOUG 2003 Paper 406

Developing Applications with Business Intelligence Beans and Oracle9i JDeveloper: Our Experience. IOUG 2003 Paper 406 Developing Applications with Business Intelligence Beans and Oracle9i JDeveloper: Our Experience IOUG 2003 Paper 406 Chris Claterbos claterbos@vlamis.com Vlamis Software Solutions, Inc. (816) 781-2880

More information

Building your own BMC Remedy AR System v7 Applications. Maruthi Dogiparthi

Building your own BMC Remedy AR System v7 Applications. Maruthi Dogiparthi Building your own BMC Remedy AR System v7 Applications Maruthi Dogiparthi Agenda Introduction New Goodies Navigation, tree widgets Data Visualization Plug-in framework Development Guidelines Tools BMC

More information

Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions

Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions Test On Line: reusing SAS code in WEB applications Author: Carlo Ramella TXT e-solutions Chapter 1: Abstract The Proway System is a powerful complete system for Process and Testing Data Analysis in IC

More information

Globalbrain Administration Guide. Version 5.4

Globalbrain Administration Guide. Version 5.4 Globalbrain Administration Guide Version 5.4 Copyright 2012 by Brainware, Inc. All rights reserved. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system,

More information

Azure-persistence MARTIN MUDRA

Azure-persistence MARTIN MUDRA Azure-persistence MARTIN MUDRA Storage service access Blobs Queues Tables Storage service Horizontally scalable Zone Redundancy Accounts Based on Uri Pricing Calculator Azure table storage Storage Account

More information

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 About the Presentation Problems Existing Solutions Denis Magda

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2014 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

Application Architecture

Application Architecture Application Architecture Compatibility Flexibility Scalability Web Technologies Author: KM Newnham Edited by: SA Jost Last Update Date: 11/28/2016 Tel. 303.741.5711 Email. sales@adginc.net Web. www.adginc.net

More information

Intellicus Getting Started

Intellicus Getting Started Intellicus Getting Started Intellicus Web-based Reporting Suite Version 4.5 Enterprise Professional Smart Developer Smart Viewer Intellicus Technologies info@intellicus.com www.intellicus.com Copyright

More information

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider. Group 5 Ajantha Ramineni, Sahil Tiwari, Rishabh Jain, Shivang Gupta

ADVANCED DATABASES CIS 6930 Dr. Markus Schneider. Group 5 Ajantha Ramineni, Sahil Tiwari, Rishabh Jain, Shivang Gupta ADVANCED DATABASES CIS 6930 Dr. Markus Schneider Group 5 Ajantha Ramineni, Sahil Tiwari, Rishabh Jain, Shivang Gupta WHAT IS ELASTIC SEARCH? Elastic Search Elasticsearch is a search engine based on Lucene.

More information

In this brief tutorial, we will be explaining the basics of Elasticsearch and its features.

In this brief tutorial, we will be explaining the basics of Elasticsearch and its features. About the Tutorial is a real-time distributed and open source full-text search and analytics engine. It is used in Single Page Application (SPA) projects. is open source developed in Java and used by many

More information

This lab will introduce you to MySQL. Begin by logging into the class web server via SSH Secure Shell Client

This lab will introduce you to MySQL. Begin by logging into the class web server via SSH Secure Shell Client Lab 2.0 - MySQL CISC3140, Fall 2011 DUE: Oct. 6th (Part 1 only) Part 1 1. Getting started This lab will introduce you to MySQL. Begin by logging into the class web server via SSH Secure Shell Client host

More information

Real Life Web Development. Joseph Paul Cohen

Real Life Web Development. Joseph Paul Cohen Real Life Web Development Joseph Paul Cohen joecohen@cs.umb.edu Index 201 - The code 404 - How to run it? 500 - Your code is broken? 200 - Someone broke into your server? 400 - How are people using your

More information

DatabaseRESTAPI

DatabaseRESTAPI ORDS DatabaseRESTAPI https://oracle.com/rest Jeff Smith Senior Principal Product Manager Jeff.d.smith@oracle.com @thatjeffsmith Database Tools, Oracle Corp Not just THAT SQLDev Guy I GET ORDS, too! Blogs

More information

Implementation Architecture

Implementation Architecture Implementation Architecture Software Architecture VO/KU (707023/707024) Roman Kern ISDS, TU Graz 2017-11-15 Roman Kern (ISDS, TU Graz) Implementation Architecture 2017-11-15 1 / 54 Outline 1 Definition

More information

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013 Design Patterns for Large- Scale Data Management Robert Hodges OSCON 2013 The Start-Up Dilemma 1. You are releasing Online Storefront V 1.0 2. It could be a complete bust 3. But it could be *really* big

More information

Important Notice Cloudera, Inc. All rights reserved.

Important Notice Cloudera, Inc. All rights reserved. Cloudera Search Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks of

More information

SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME. Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013

SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME. Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013 SEARCHING BILLIONS OF PRODUCT LOGS IN REAL TIME Ryan Tabora - Think Big Analytics NoSQL Search Roadshow - June 6, 2013 1 WHO AM I? Ryan Tabora Think Big Analytics - Senior Data Engineer Lover of dachshunds,

More information

LucidWorks: Searching with curl October 1, 2012

LucidWorks: Searching with curl October 1, 2012 LucidWorks: Searching with curl October 1, 2012 1. Module name: LucidWorks: Searching with curl 2. Scope: Utilizing curl and the Query admin to search documents 3. Learning objectives Students will be

More information

Introduction to Web Application Development Using JEE, Frameworks, Web Services and AJAX

Introduction to Web Application Development Using JEE, Frameworks, Web Services and AJAX Introduction to Web Application Development Using JEE, Frameworks, Web Services and AJAX Duration: 5 Days US Price: $2795 UK Price: 1,995 *Prices are subject to VAT CA Price: CDN$3,275 *Prices are subject

More information

REAL TIME BOM EXPLOSIONS WITH APACHE SOLR AND SPARK. Andreas Zitzelsberger

REAL TIME BOM EXPLOSIONS WITH APACHE SOLR AND SPARK. Andreas Zitzelsberger REAL TIME BOM EXPLOSIONS WITH APACHE SOLR AND SPARK Andreas Zitzelsberger BILLS OF MATERIAL (BOMS) EXPLAINED BOMS ARE NEEDED FOR Production Planning Forecasting Demand Scenario-Based Planning Running Simulations

More information

Oracle NoSQL Database 3.0

Oracle NoSQL Database 3.0 Oracle NoSQL Database 3.0 Installation, Cluster Topology Deployment, HA and more Seth Miller, Oracle ACE Robert Greene, Product Management / Strategy Oracle Server Technologies July 09, 2014 Safe Harbor

More information

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016

DATABASE SYSTEMS. Database programming in a web environment. Database System Course, 2016 DATABASE SYSTEMS Database programming in a web environment Database System Course, 2016 AGENDA FOR TODAY Advanced Mysql More than just SELECT Creating tables MySQL optimizations: Storage engines, indexing.

More information

Developing Web Sites with Free Software

Developing Web Sites with Free Software Developing Web Sites with Free Software Tom Wheeler Software Engineer, Object Computing Inc. (OCI) About This Presentation What this presentation is: An explanation of free software, aimed at people who

More information

Excel4apps Wands 5 Architecture Excel4apps Inc.

Excel4apps Wands 5 Architecture Excel4apps Inc. Excel4apps Wands 5 Architecture 2014 Excel4apps Inc. Table of Contents 1 Introduction... 3 2 Overview... 3 3 Client... 3 4 Server... 3 4.1 Java Servlet... 4 4.2 OAF Page... 4 4.3 Menu and Function... 4

More information

Documenting APIs with Swagger. TC Camp. Peter Gruenbaum

Documenting APIs with Swagger. TC Camp. Peter Gruenbaum Documenting APIs with Swagger TC Camp Peter Gruenbaum Introduction } Covers } What is an API Definition? } YAML } Open API Specification } Writing Documentation } Generating Documentation } Alternatives

More information

TipsandTricks. Jeff Smith Senior Principal Product Database Tools, Oracle Corp

TipsandTricks. Jeff Smith Senior Principal Product Database Tools, Oracle Corp SQLDev TipsandTricks Jeff Smith Senior Principal Product Manager Jeff.d.smith@oracle.com @thatjeffsmith Database Tools, Oracle Corp Safe Harbor Statement The preceding is intended to outline our general

More information

IBM Maximo Anywhere Version 7 Release 6. Planning, installation, and deployment IBM

IBM Maximo Anywhere Version 7 Release 6. Planning, installation, and deployment IBM IBM Maximo Anywhere Version 7 Release 6 Planning, installation, and deployment IBM Note Before using this information and the product it supports, read the information in Notices on page 65. This edition

More information

How to install and configure Solr v4.3.1 on IBM WebSphere Application Server v8.0

How to install and configure Solr v4.3.1 on IBM WebSphere Application Server v8.0 How to install and configure Solr v4.3.1 on IBM WebSphere Application Server v8.0 About This post describe how to install and configure Apache Solr 4 under IBM WebSphere Application Server v8. Resume about

More information

Configuring Artifactory

Configuring Artifactory Configuring Artifactory 1 Configuration Files 2 Understanding Repositories 2.1 Local Repositories 2.2 Remote Repositories 2.3 Virtual Repositories 3 Common Repositories Configuration 3.1 Snapshots and

More information

APIs - what are they, really? Web API, Programming libraries, third party APIs etc

APIs - what are they, really? Web API, Programming libraries, third party APIs etc APIs - what are they, really? Web API, Programming libraries, third party APIs etc Different kinds of APIs Let s consider a Java application. It uses Java interfaces and classes. Classes and interfaces

More information

Developing with Google App Engine

Developing with Google App Engine Developing with Google App Engine Dan Morrill, Developer Advocate Dan Morrill Google App Engine Slide 1 Developing with Google App Engine Introduction Dan Morrill Google App Engine Slide 2 Google App Engine

More information

Session V-STON Stonefield Query: The Next Generation of Reporting

Session V-STON Stonefield Query: The Next Generation of Reporting Session V-STON Stonefield Query: The Next Generation of Reporting Doug Hennig Overview Are you being inundated with requests from the users of your applications to create new reports or tweak existing

More information

<Insert Picture Here> MySQL Cluster What are we working on

<Insert Picture Here> MySQL Cluster What are we working on MySQL Cluster What are we working on Mario Beck Principal Consultant The following is intended to outline our general product direction. It is intended for information purposes only,

More information