Language Support, Linguistics, and Text Analytics in Solr

Size: px
Start display at page:

Download "Language Support, Linguistics, and Text Analytics in Solr"

Transcription

1 Boston Apache Lucene and Solr Meetup Language Support, Linguistics, and Text Analytics in Solr Carl Steve W. Kearns Hoffman Product Manager Basis Technology Founder & CEO

2 Agenda About Basis Technology Language Identification Linguistics for Search Entity Extraction in Solr Demonstration Application

3 About Basis Technology This is Headquarters, offices in: Tokyo, San Francisco, Washington DC Specialists in natural language processing for Web/enterprise search Document/OSINT/media exploitation E-discovery Digital forensics Developer of a mature and widely used platform for multilingual text analytics and information retrieval Solutions for commercial enterprises and government agencies

4 How well will it work for me? Define! Definition of success varies widely Log File Search: Return only things that match exactly exception Product Search: Return similar results organized by category. Measure! Create examples, track performance.

5 Language Identification Detect dominant language Find language regions

6 Language Identification Why? Faceting Language-specific indexing Entity extraction

7 Language Identification in Solr Preprocessor to Solr Custom OpenPipeline Solr UpdateRequestProcessor Chain of URP s may be defined in SolrConfig.xml Has full access to the document Add, Edit, Remove fields UpdateRequestProcessor Chain Field Analysis Solr Doc Language Identification URP Add Language field Rename Text: text_<lang> Solr Field Language text_swedish Other Custom URP text_english SolrConfig.xml Schema.xml

8 Language Identification Challenges Identifying query language is hard How do you query multiple fields at the same time? Use the Dismax parser: /solr/select?qt=dismax&qf=text_english%20text_swedish%20detext&q=hello%20world The QT specifies the query type as dismax The QF specifies the fields to search

9 Linguistics for Search Why? Improve recall! Every language has a unique set of challenges: Tokenization Chinese, Japanese, Korean, Thai Morphological Analysis vs. N-Gram Stemming vs. Lemmatization All European and Middle Eastern languages Compound words Swedish, Danish, Norwegian, Dutch, German, Korean, Japanese

10 Morphological Analysis vs. N-Gram Search Term: 東京 N-Gram: 東京ルパン上映時間 Morphological:

11 Stemming vs. Lemmatization Stemming: Set of language-specific rules for removing leading and trailing characters from words Intended to increase recall at the expense of precision Example EN rule: Remove trailing ing Lemmatization: Complex set of language-specific approaches for producing the dictionary form of a given word Intended to increase recall without hurting precision. Uses context to disambiguate when multiple dictionary forms exist

12 Stemming vs. Lemmatization English: I have spoken at several conferences Stemming: Lemmatization:

13 Stemming vs. Lemmatization French: Je n étais pas là Stemming: Lemmatization:

14 Stemming vs. Lemmatization + Decompounding German: Am Samstagmorgen fliege ich zurueck nach Boston. Stemming: Lemmatization (and decompounding!)

15 Stemming vs. Lemmatization Swedish: En person skadades lindrigt i en trafikolycka i Pernå Stemming: Lemmatization:

16 Linguistics in Solr Easy to customize as Analyzer/Tokenizer/TokenFilter UpdateRequestProcessor Chain Field Analysis Solr Doc Language Identification URP Add Language field Rename Text: text_<lang> Solr Field Language text_swedish Other Custom URP text_english SolrConfig.xml Schema.xml

17 Related Challenges Can I index text from many languages into the same field? Yes, but it s not always a good idea, because query language ID is not accurate. You need a custom Query Analyzer that does stemming/lemmatization in many languages for the same query. How do I query text in multiple fields? Dismax parser!

18 Text Analytics in Solr: Entity Extraction Process of identifying people, places, organizations, dates, times, etc. in unstructured text. Methods: Lists Rules Statistical Define your goals upfront! Some extraction methods work better for certain entity types Rules work well for dates, addresses, and URL s, but not people Lists work well for titles, but not locations Statistical extractors work well for ambiguous entities: people, locations, organizations

19 Entity Extraction

20 Entity Extraction in Solr Pre-processor to Solr Custom OpenPipeline UpdateRequestProcessor Store entities in new fields per entity type <field name="person" type="string" indexed="true" multivalued="true" stored="false" /> UpdateRequestProcessor Chain Field Analysis Solr Doc Language Identification URP Add Language field Rename Text: text_<lang> Solr Field Language text_swedish Entity Extraction URP text_english SolrConfig.xml Schema.xml

21 Entity Extraction Challenges How do you use extracted entities as facets? For retrieving counts: &facet=true&facet.field=person&facet.field=location For filtering results: &facet.query=person:steve Kearns&facet.query=LOCATION:Stockholm How else can Entities be used? Improve relevance by searching the entity fields with a boost Entity-specific search phonetic matching and other name-specific search appoaches Measure accuracy! F-Score is a measurement that combines precision and recall Vendors should provide this, but evaluate on your own data!

22 Demo: Odyssey Information Navigator Example search application built on Solr I personally built this in < 2 months using Solr and products from Basis Technology I spent more time on the UI than integration of text analytics components I would be happy to show you the Solr config and let you try it out

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio Case Study Use Case: Recruiting Segment: Recruiting Products: Rosette Challenge CareerBuilder, the global leader in human capital solutions, operates the largest job board in the U.S. and has an extensive

More information

Patent Terminlogy Analysis: Passage Retrieval Experiments for the Intellecutal Property Track at CLEF

Patent Terminlogy Analysis: Passage Retrieval Experiments for the Intellecutal Property Track at CLEF Patent Terminlogy Analysis: Passage Retrieval Experiments for the Intellecutal Property Track at CLEF Julia Jürgens, Sebastian Kastner, Christa Womser-Hacker, and Thomas Mandl University of Hildesheim,

More information

NYC Apache Lucene/Solr Meetup

NYC Apache Lucene/Solr Meetup June 11, 2014 NYC Apache Lucene/Solr Meetup RAMP UP YOUR WEB EXPERIENCES USING DRUPAL AND APACHE SOLR peter.wolanin@acquia.com drupal.org/user/49851 (pwolanin) Peter Wolanin Momentum Specialist @ Acquia,

More information

Search Evolution von Lucene zu Solr und ElasticSearch. Florian

Search Evolution von Lucene zu Solr und ElasticSearch. Florian Search Evolution von Lucene zu Solr und ElasticSearch Florian Hopf @fhopf http://www.florian-hopf.de Index Indizieren Index Suchen Index Term Document Id Analyzing http://www.flickr.com/photos/quinnanya/5196951914/

More information

Side by Side with Solr and Elasticsearch

Side by Side with Solr and Elasticsearch Side by Side with Solr and Elasticsearch Rafał Kuć Radu Gheorghe Rafał Logsene Radu Logsene Overview Agenda documents documents schema mapping queries searches searches index&store index&store aggregations

More information

Soir 1.4 Enterprise Search Server

Soir 1.4 Enterprise Search Server Soir 1.4 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more David Smiley Eric Pugh *- PUBLISHING -J BIRMINGHAM - MUMBAI Preface

More information

Information Retrieval CS-E credits

Information Retrieval CS-E credits Information Retrieval CS-E4420 5 credits Tokenization, further indexing issues Antti Ukkonen antti.ukkonen@aalto.fi Slides are based on materials by Tuukka Ruotsalo, Hinrich Schütze and Christina Lioma

More information

Advance Search With Solr

Advance Search With Solr Advance Search With Solr www.biztechconsultancy.com sales@biztechconsultancy.com Page 1 Contents 1 Benefits of Advance Search with Solr... 3 2 Features... 3 2.1 Back-End Admin Features... 3 2.1.1 Integrated

More information

The Goal: Succeeding in the Japanese market

The Goal: Succeeding in the Japanese market Case Study Use Case: Social Media Platform Segment: Consumer Reviews Whether you live in San Francisco, Boston, Dublin, Vienna, or Tokyo, Yelp has reviews of local businesses in your neighborhood. Yelp

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 2: Preprocessing 1 Ch. 1 Recap of the previous lecture Basic inverted indexes: Structure: Dictionary and Postings Key step in construction: Sorting Boolean

More information

Perceptive Intelligent Capture Visibility

Perceptive Intelligent Capture Visibility Perceptive Intelligent Capture Visibility Technical Specifications Version: 3.1.x Written by: Product Knowledge, R&D Date: August 2018 2015 Lexmark International Technology, S.A. All rights reserved. Lexmark

More information

cominvent as Migrating FAST to Solr by Jan Høydahl cominvent as Enterprise Search Specialists

cominvent as Migrating FAST to Solr by Jan Høydahl cominvent as Enterprise Search Specialists Enterprise Search Specialists Migrating FAST to Solr by Jan Høydahl Consulting Cominvent delivers independent search consulting Focus on Apache Lucene/Solr & Microsoft FAST ESP We know both the proprietary

More information

rpaf ktl Pen Apache Solr 3 Enterprise Search Server J community exp<= highlighting, relevancy ranked sorting, and more source publishing""

rpaf ktl Pen Apache Solr 3 Enterprise Search Server J community exp<= highlighting, relevancy ranked sorting, and more source publishing Apache Solr 3 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, relevancy ranked sorting, and more David Smiley Eric Pugh rpaf ktl Pen I I riv IV I J community

More information

Stanbol Enhancer. Use Custom Vocabularies with the. Rupert Westenthaler, Salzburg Research, Austria. 07.

Stanbol Enhancer.  Use Custom Vocabularies with the. Rupert Westenthaler, Salzburg Research, Austria. 07. http://stanbol.apache.org Use Custom Vocabularies with the Stanbol Enhancer Rupert Westenthaler, Salzburg Research, Austria 07. November, 2012 About Me Rupert Westenthaler Apache Stanbol and Clerezza Committer

More information

Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć sematext.com

Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć  sematext.com Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć Sematext International @kucrafal @sematext sematext.com Who Am I Solr 3.1 Cookbook author (4.0 inc) Sematext consultant & engineer Solr.pl

More information

Technical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved

Technical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved Technical Deep Dive: Cassandra + Solr Confiden7al Business case 2 Super scalable realtime analytics Hadoop is fantastic at performing batch analytics Cassandra is an advanced column family oriented system

More information

VK Multimedia Information Systems

VK Multimedia Information Systems VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Results Exercise 01 Exercise 02 Retrieval

More information

10 Steps to Document Translation Success

10 Steps to Document Translation Success 10 Steps to Document Translation Success www.globalizationpartners.com 10 Steps to Document Translation Success Copyright 2016-2017 Globalization Partners International. All rights reserved. This ebook

More information

Company Overview SYSTRAN Applications Customization for Quality Translations

Company Overview SYSTRAN Applications Customization for Quality Translations Company Overview SYSTRAN Applications Customization for Quality Translations Prepared for Lirics Industrial Advisory Group June 20 & 21, 2005, Barcelona Agenda Company Overview SYSTRAN WebServer SYSTRAN

More information

All Localized. Your Localization Services Partner. Company Profile

All Localized. Your Localization Services Partner. Company Profile All Localized Company Profile All Localized is a young company made up of linguistic and technical professionals. All Localized is a reliable translation, localization and desktop publishing services provider

More information

Agenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing

Agenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing Agenda for today Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing 1 Projective vs non-projective dependencies If we extract dependencies from trees,

More information

RELEASE NOTES UFED ANALYTICS DESKTOP SAVE TIME AND RESOURCES WITH ADVANCED IMAGE ANALYTICS HIGHLIGHTS

RELEASE NOTES UFED ANALYTICS DESKTOP SAVE TIME AND RESOURCES WITH ADVANCED IMAGE ANALYTICS HIGHLIGHTS RELEASE NOTES Version 5.2 September 2016 UFED ANALYTICS DESKTOP HIGHLIGHTS UFED Analytics Desktop version 5.2 serves as your virtual partner, saving precious time in the investigative process. Designed

More information

12 Steps to Software Translation Success

12 Steps to Software Translation Success 12 Steps to Software Translation Success www.globalizationpartners.com 12 Steps to Software Translation Success Copyright 2016-2017 Globalization Partners International. All rights reserved. This ebook

More information

More about Posting Lists

More about Posting Lists More about Posting Lists 1 FASTER POSTINGS MERGES: SKIP POINTERS/SKIP LISTS 2 Sec. 2.3 Recall basic merge Walk through the two postings simultaneously, in time linear in the total number of postings entries

More information

To search and summarize on Internet with Human Language Technology

To search and summarize on Internet with Human Language Technology To search and summarize on Internet with Human Language Technology Hercules DALIANIS Department of Computer and System Sciences KTH and Stockholm University, Forum 100, 164 40 Kista, Sweden Email:hercules@kth.se

More information

Google Search Appliance

Google Search Appliance Google Search Appliance Search Appliance Internationalization Google Search Appliance software version 7.2 and later Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com GSA-INTL_200.01

More information

PAGE 1 SYSTRAN. PRESENTER: GILLES MONTIER

PAGE 1 SYSTRAN.  PRESENTER: GILLES MONTIER PAGE 1 SYSTRAN PRESENTER: GLLES MONTER Language Resources: Foundations of the Multilingual Digital Single Market PAGE 2 Language Resources for MT: what does our customers want? Better generic translation

More information

Functionality Description

Functionality Description Responsive & Personalized Content Aggregation Content Management Classification Enterprise Search Collaboration Visual Analytics January 2017 Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction... 2 Two

More information

ESI Packaging Registration Tutorial

ESI Packaging Registration Tutorial ESI Packaging Registration Tutorial Portal Registration and ESI Packaging Access Tutorial ESI Packaging Registration Process In order to access the ESI Packaging web site, you must first establish a user

More information

CSE 435/535 Information Retrieval (Fall 2016) Project Two: Boolean Query and Inverted Index

CSE 435/535 Information Retrieval (Fall 2016) Project Two: Boolean Query and Inverted Index CSE 435/535 Information Retrieval (Fall 2016) Project Two: Boolean Query and Inverted Index Due Date: October 17th 2016, 23:59 pm Overview In project part two you will be given Lucene index generated from

More information

Language support and linguistics

Language support and linguistics Language support and linguistics in Lucene, Solr and ElasticSearch and the eco-system June 3rd, 2013 Christian Moen cm@atilika.com About me MSc. in computer science, University of Oslo, Norway Worked with

More information

Recap of the previous lecture. Recall the basic indexing pipeline. Plan for this lecture. Parsing a document. Introduction to Information Retrieval

Recap of the previous lecture. Recall the basic indexing pipeline. Plan for this lecture. Parsing a document. Introduction to Information Retrieval Ch. Introduction to Information Retrieval Recap of the previous lecture Basic inverted indexes: Structure: Dictionary and Postings Lecture 2: The term vocabulary and postings lists Key step in construction:

More information

SUMMON WEB-SCALE DISCOVERY. ADA University Baku 02/04/2014

SUMMON WEB-SCALE DISCOVERY. ADA University Baku 02/04/2014 SUMMON WEB-SCALE DISCOVERY ADA University Baku 02/04/2014 Why an Automated Management Solution is Important Academic Library Expenditures on Purchased and Licensed Content 90% 80% 70% 60% 50% 40% 30% 20%

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We

More information

Smart Events Cloud Release February 2017

Smart Events Cloud Release February 2017 Smart Events Cloud Release February 2017 Maintenance Window This is not a site-down release. Users still have access during the upgrade. Modules Impacted The changes in this release affect these modules

More information

Product Release Notes

Product Release Notes Product Release Notes Release 33 October 2016 VERSION 20161021 Table of Contents Document Versioning 2 Overview 3 Known Issues 3 Usability 3 Drag and Drop Column Reordering is not Supported in some Admin

More information

The SAP Knowledge Acceleration, website package, can be deployed to any web server, file server, CD-ROM, or a user workstation.

The SAP Knowledge Acceleration, website package, can be deployed to any web server, file server, CD-ROM, or a user workstation. SAP KNOWLEDGE ACCELERATION TECHNICAL SPECIFICATIONS In this guide, you will learn about hardware and software requirements for SAP Knowledge Acceleration (KA). SAP Knowledge Acceleration (KA) is a web-based

More information

Full-Text Search. Explained. Philipp

Full-Text Search. Explained. Philipp Full-Text Search Explained Philipp Krenn @xeraa Infrastructure Developer Advocate ViennaDB Papers We Love Vienna Who uses Databases? Who uses Search? Database vs Full-Text Search But I can do... SELECT

More information

Realtime visitor analysis with Couchbase and Elasticsearch

Realtime visitor analysis with Couchbase and Elasticsearch Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo

More information

TruVision 12/32 Series IP Camera Firmware V7.1 Release Notes

TruVision 12/32 Series IP Camera Firmware V7.1 Release Notes TruVision 12/32 Series IP Camera Firmware V7.1 Release Notes P/N 1073169-EN REV A ISS 08AUG16 Introduction These are the TruVision 12/32 Series IP Camera Firmware V7.1 Release Notes with additional information

More information

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. 2 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle E-Business Suite Internationalization and Multilingual Features

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 2: The term vocabulary Ch. 1 Recap of the previous lecture Basic inverted

More information

Improving Drupal search experience with Apache Solr and Elasticsearch

Improving Drupal search experience with Apache Solr and Elasticsearch Improving Drupal search experience with Apache Solr and Elasticsearch Milos Pumpalovic Web Front-end Developer Gene Mohr Web Back-end Developer About Us Milos Pumpalovic Front End Developer Drupal theming

More information

PRODUCT DOCUMENTATION. Pivotal GPText. Version 1.2. GPText User s Guide Rev: A GoPivotal, Inc.

PRODUCT DOCUMENTATION. Pivotal GPText. Version 1.2. GPText User s Guide Rev: A GoPivotal, Inc. PRODUCT DOCUMENTATION Pivotal GPText Version 1.2 GPText User s Guide Rev: A03 2013 GoPivotal, Inc. Copyright 2013 GoPivotal, Inc. All rights reserved. GoPivotal, Inc. believes the information in this publication

More information

More on indexing and text operations CE-324: Modern Information Retrieval Sharif University of Technology

More on indexing and text operations CE-324: Modern Information Retrieval Sharif University of Technology More on indexing and text operations CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

QUICK REFERENCE GUIDE: SHELL SUPPLIER PROFILE QUESTIONNAIRE (SPQ)

QUICK REFERENCE GUIDE: SHELL SUPPLIER PROFILE QUESTIONNAIRE (SPQ) QUICK REFERENCE GUIDE: SHELL SUPPLIER PROFILE QUESTIONNAIRE (SPQ) July 2018 July 2018 1 SPQ OVERVIEW July 2018 2 WHAT IS THE SHELL SUPPLIER PROFILE QUESTIONNAIRE? Shell needs all potential and existing

More information

GLOBAL NETFLIX PREFERRED VENDOR (NPV) RATE CARD:

GLOBAL NETFLIX PREFERRED VENDOR (NPV) RATE CARD: Global Rate Card Timed Text Origination Rates Audio Description Rates Netflix Scope of Work Guidelines A/V Materials Timed Text Audio Materials Trailers Forced Narratives NPV SLA & KPIs GLOBAL NETFLIX

More information

Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization

Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization Jakub Kanis, Luděk Müller University of West Bohemia, Department of Cybernetics, Univerzitní 8, 306 14 Plzeň, Czech Republic {jkanis,muller}@kky.zcu.cz

More information

Nuance Licensing Opportunities: Power PDF and Dragon

Nuance Licensing Opportunities: Power PDF and Dragon Nuance Licensing Opportunities: Power PDF and Dragon Mel Catimbang, Channel Manager, South East Asia Derek Austin, Dragon Business Manager, APAC October 2017 2017 Nuance Communications, Inc. All rights

More information

Essential Elements of Multilingual Search Boos ng Global Search Quality with the Rose e Linguis cs Pla orm

Essential Elements of Multilingual Search Boos ng Global Search Quality with the Rose e Linguis cs Pla orm March 26, 2013 Essential Elements of Multilingual Search Boos ng Global Search Quality with the Rose e Linguis cs Pla orm We put the World in the World Wide Web ABOUT BASIS TECHNOLOGY Basis Technology

More information

KYOCERA Quick Scan v1.0

KYOCERA Quick Scan v1.0 KYOCERA Quick Scan v1.0 Software Information PC Name Version 0731 July 31, 2018 KYOCERA Document Solutions Inc. Product Planning Division 1 Table of Contents 1. Overview... 4 1.1. Background... 4 1.2.

More information

Rescue Lens Administrators Guide

Rescue Lens Administrators Guide Rescue Lens Administrators Guide Contents About Rescue Lens...4 Rescue Lens Administration Center At a Glance...4 LogMeIn Rescue Lens System Requirements...4 About Rescue Lens in a Multilingual Environment...5

More information

Content Quality Management with Open Source Language Technology

Content Quality Management with Open Source Language Technology Content Quality Management with Open Source Language Technology Christian Lieske (SAP AG) Dr. Felix Sasaki (DFKI, FH Potsdam) To complement this presentation, the full text for the conference proceedings

More information

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation

More information

3) CHARLIE HULL. Implementing open source search for a major specialist recruiting firm

3) CHARLIE HULL. Implementing open source search for a major specialist recruiting firm Advice: The time spent on pre-launch analysis is worth the effort to avoid starting from scratch and further alienating already frustrated users by implementing a search which appears to have no connection

More information

Product Release Notes

Product Release Notes Product Release Notes Release 32 June 2016 VERSION 20160624 Table of Contents Document Versioning 2 Overview 3 Known Issues 3 Usability 3 Action Bar Applets Do Not Collapse if the User Refines a List Within

More information

From: AAAI Technical Report SS Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved.

From: AAAI Technical Report SS Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. From: AAAI Technical Report SS-97-05. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. TITAN: A Cross-linguistic Search Engine for the WWW Yoshihiko Hayashi, Gen ichiro Kikui, Seiji

More information

Microsoft Academic Select Enrollment

Microsoft Academic Select Enrollment Microsoft Academic Select Enrollment Academic Select Agreement number Reseller or Microsoft affiliate to complete Academic Select Agreement Expiration Date Reseller or Microsoft affiliate to complete Enrollment

More information

ADOBE READER AND ACROBAT 8.X AND 9.X SYSTEM REQUIREMENTS

ADOBE READER AND ACROBAT 8.X AND 9.X SYSTEM REQUIREMENTS ADOBE READER AND ACROBAT 8.X AND 9.X SYSTEM REQUIREMENTS Table of Contents OVERVIEW... 1 Baseline requirements beginning with 9.3.2 and 8.2.2... 2 System requirements... 2 9.3.2... 2 8.2.2... 3 Supported

More information

Enhancing applications with Cognitive APIs IBM Corporation

Enhancing applications with Cognitive APIs IBM Corporation Enhancing applications with Cognitive APIs After you complete this section, you should understand: The Watson Developer Cloud offerings and APIs The benefits of commonly used Cognitive services 2 Watson

More information

Apache Solr Learning to Rank FTW!

Apache Solr Learning to Rank FTW! Apache Solr Learning to Rank FTW! Berlin Buzzwords 2017 June 12, 2017 Diego Ceccarelli Software Engineer, News Search dceccarelli4@bloomberg.net Michael Nilsson Software Engineer, Unified Search mnilsson23@bloomberg.net

More information

MetaCarta GeoSearch Toolkit for Solr James Goodwin Principal Engineer, Nokia

MetaCarta GeoSearch Toolkit for Solr James Goodwin Principal Engineer, Nokia MetaCarta GeoSearch Toolkit for Solr James Goodwin Principal Engineer, Nokia 2010 Nokia Overview Introduction to MetaCarta About Nokia MetaCarta Geographic Search Defining GeoSearch Functionality for Solr

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2016/17 IR Chapter 02 The Term Vocabulary and Postings Lists Constructing Inverted Indexes The major steps in constructing

More information

CACAO PROJECT AT THE 2009 TASK

CACAO PROJECT AT THE 2009 TASK CACAO PROJECT AT THE TEL@CLEF 2009 TASK Alessio Bosca, Luca Dini Celi s.r.l. - 10131 Torino - C. Moncalieri, 21 alessio.bosca, dini@celi.it Abstract This paper presents the participation of the CACAO prototype

More information

Using Elastic with Magento

Using Elastic with Magento Using Elastic with Magento Stefan Willkommer CTO and CO-Founder @ TechDivision GmbH Comparison License Apache License Apache License Index Lucene Lucene API RESTful Webservice RESTful Webservice Scheme

More information

Cross-Language Evaluation Forum - CLEF

Cross-Language Evaluation Forum - CLEF Cross-Language Evaluation Forum - CLEF Carol Peters IEI-CNR, Pisa, Italy IST-2000-31002 Kick-off: October 2001 Outline Project Objectives Background CLIR System Evaluation CLEF Infrastructure Results so

More information

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,

More information

CroLOM: Cross-Lingual Ontology Matching System

CroLOM: Cross-Lingual Ontology Matching System CroLOM: Cross-Lingual Ontology Matching System Results for OAEI 2016 Abderrahmane Khiat LITIO Laboratory, University of Oran1 Ahmed Ben Bella, Oran, Algeria abderrahmane khiat@yahoo.com Abstract. The current

More information

The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources

The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources FOSS4G 2017 Boston The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources Devika Kakkar and Ben Lewis Harvard Center for Geographic Analysis

More information

Natural Language Processing with PoolParty

Natural Language Processing with PoolParty Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense

More information

Open Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria

Open Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria Open Source Search Andreas Pesenhofer max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria max.recall information systems max.recall is a software and consulting company enabling

More information

American Philatelic Society Translation Committee. Annual Report Prepared by Bobby Liao

American Philatelic Society Translation Committee. Annual Report Prepared by Bobby Liao American Philatelic Society Translation Committee Annual Report 2012 Prepared by Bobby Liao - 1 - Table of Contents: 1. Executive Summary 2. Translation Committee Activities Summary July 2011 June 2012

More information

Sennheiser Updater End User Installation Guide

Sennheiser Updater End User Installation Guide Sennheiser Updater End User Installation Guide 1. Sennheiser Updater Installation Copy the installation package (Sennheiser_Updater_vX.Y.ZZZZ.exe) at any local path (E.g. C:\MySoftwares\) Using the installer

More information

PeopleTools 8.56: Search Technology

PeopleTools 8.56: Search Technology PeopleTools 8.56: Search Technology June 2017 PeopleTools 8.56: Search Technology This software and related documentation are provided under a license agreement containing restrictions on use and disclosure

More information

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island Taming Text How to Find, Organize, and Manipulate It GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS 11 MANNING Shelter Island contents foreword xiii preface xiv acknowledgments xvii about this book

More information

This Webcast Will Begin Shortly

This Webcast Will Begin Shortly This Webcast Will Begin Shortly If you have any technical problems with the Webcast or the streaming audio, please contact us via email at: accwebcast@commpartners.com Thank You! Welcome! Electronic Data

More information

Migrating from FAST to EMC Documentum xplore: What To Do and Why You'll Love It. Ed Bueché EMC Distinguished Engineer and xplore Architect

Migrating from FAST to EMC Documentum xplore: What To Do and Why You'll Love It. Ed Bueché EMC Distinguished Engineer and xplore Architect Migrating from FAST to EMC Documentum xplore: What To Do and Why You'll Love It Ed Bueché EMC Distinguished Engineer and xplore Architect Agenda Introduction to xplore xplore 1.2 new capabilities FAST-to-xPlore

More information

SACD Text summary. SACD Text Overview. Based on Scarlet book Version 1.2. sonic studio

SACD Text summary. SACD Text Overview. Based on Scarlet book Version 1.2. sonic studio 1 SACD Text Overview Based on Scarlet book Version 1.2 2 Main Features of SACD Text Good compatibility with CD Text Player can handle both CD Text and SACD in same operation Utilizes existing CD Text source

More information

CS105 Introduction to Information Retrieval

CS105 Introduction to Information Retrieval CS105 Introduction to Information Retrieval Lecture: Yang Mu UMass Boston Slides are modified from: http://www.stanford.edu/class/cs276/ Information Retrieval Information Retrieval (IR) is finding material

More information

Relevancy Workbench Module. 1.0 Documentation

Relevancy Workbench Module. 1.0 Documentation Relevancy Workbench Module 1.0 Documentation Created: Table of Contents Installing the Relevancy Workbench Module 4 System Requirements 4 Standalone Relevancy Workbench 4 Deploy to a Web Container 4 Relevancy

More information

introduction to using Watson Services with Java on Bluemix

introduction to using Watson Services with Java on Bluemix introduction to using Watson Services with Java on Bluemix Patrick Mueller @pmuellr, muellerware.org developer advocate for IBM's Bluemix PaaS http://pmuellr.github.io/slides/2015/02-java-intro-with-watson

More information

Technical Information

Technical Information Building Technologies Division Security Products Technical Information SPC Series SPC Support CD Release Note CD V3.6.6 04/08/2015 Updates since: CD V3.4.5 Release V3.6.6 for SPC versions SPC42xx/43xx/52xx/53xx/63xx.

More information

HP Insight Remote Support Advanced HP StorageWorks P4000 Storage System

HP Insight Remote Support Advanced HP StorageWorks P4000 Storage System HP Insight Remote Support Advanced HP StorageWorks P4000 Storage System Migration Guide HP Part Number: 5900-1089 Published: August 2010, Edition 1 Copyright 2010 Hewlett-Packard Development Company, L.P.

More information

USB Link Adapter. User s Manual

USB Link Adapter. User s Manual USB Link Adapter User s Manual Safety Instructions Always read the safety instructions carefully Keep this User s Manual for future reference Keep this equipment away from humidity If any of the following

More information

Transfer Manual Norman Endpoint Protection Transfer to Avast Business Antivirus Pro Plus

Transfer Manual Norman Endpoint Protection Transfer to Avast Business Antivirus Pro Plus Transfer Manual Norman Endpoint Protection Transfer to Avast Business Antivirus Pro Plus Summary This document outlines the necessary steps for transferring your Norman Endpoint Protection product to Avast

More information

Configuring an SAP Business Warehouse Resouce in Metadata Manager 9.5.0

Configuring an SAP Business Warehouse Resouce in Metadata Manager 9.5.0 Configuring an SAP Business Warehouse Resouce in Metadata Manager 9.5.0 2012 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

checkterm User Manual Document version 1.0

checkterm User Manual Document version 1.0 checkterm 6.0.1 User Manual Table of contents Table of contents 1 Introduction... 4 1.1 General... 4 1.2 Stemming... 4 1.3 Languages without spaces... 5 2 Step-by-step guide... 6 2.1 New installation...

More information

Key topics when. Migratng from FAST to Solr. By Jan Høydahl. cominvent as. Apache Lucene EuroCon 05/21/10

Key topics when. Migratng from FAST to Solr. By Jan Høydahl. cominvent as. Apache Lucene EuroCon 05/21/10 Key topics when Migratng from FAST to Solr By Jan Høydahl cominvent as Agenda About Cominvent & Jan Høydahl Quick overview of FAST ESP The migraton step by step Pain points Q&A Jan Høydahl: BIO Enterprise

More information

LAB 3: Text processing + Apache OpenNLP

LAB 3: Text processing + Apache OpenNLP LAB 3: Text processing + Apache OpenNLP 1. Motivation: The text that was derived (e.g., crawling + using Apache Tika) must be processed before being used in an information retrieval system. Text processing

More information

Enterprise Search with ColdFusion Solr. Dan Sirucek cf.objective 2012 May 2012

Enterprise Search with ColdFusion Solr. Dan Sirucek cf.objective 2012 May 2012 Enterprise Search with ColdFusion Solr Dan Sirucek cf.objective 2012 May 2012 About Me Senior Learning Technologist at WellPoint, Inc Developer for 14 years Developing in ColdFusion for 8 years Started

More information

Xapity Current Activity Administration Guide.

Xapity Current Activity Administration Guide. Xapity Current Activity Administration Guide www.xapity.com Document Version 1.0 October 2016 This document contains information that may change without notice. While every effort has been made to ensure

More information

ATI Radeon HD 2400XT (256MB DH) PCIe Graphics Card Overview

ATI Radeon HD 2400XT (256MB DH) PCIe Graphics Card Overview Overview Models KD060AA Introduction The provides a Low Profile, PCI Express x16 graphics add-in card based on the ATI RV610 Graphics Processor. It supports Dual Display video output through its DMS-59

More information

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND 41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia

More information

iknow Use Cases Michael Br ands Senior Product Manager

iknow Use Cases Michael Br ands Senior Product Manager iknow Use Cases Michael Br ands Senior Product Manager Agenda iknow in the InterSystems offering Breakthrough Characteristics Steps in Deployment New Features Use Cases The InterSystems Technology Unlock

More information

HP Enterprise Collaboration

HP Enterprise Collaboration HP Enterprise Collaboration For the Windows operating system Software Version: 1.1 Support Matrix Document Release Date: August 2012 Software Release Date: August 2012 Support Matrix Legal Notices Warranty

More information

UAIC: Participation in task

UAIC: Participation in task UAIC: Participation in TEL@CLEF task Adrian Iftene, Alina-Elena Mihăilă, Ingride-Paula Epure UAIC: Faculty of Computer Science, Alexandru Ioan Cuza University, Romania {adiftene, elena.mihaila, paula.epure}@info.uaic.ro

More information

Harnessing Publicly Available Factual Data in the Analytical Process

Harnessing Publicly Available Factual Data in the Analytical Process June 14, 2012 Harnessing Publicly Available Factual Data in the Analytical Process by Benson Margulies, CTO We put the World in the World Wide Web ABOUT BASIS TECHNOLOGY Basis Technology provides so ware

More information

Annotating Spatio-Temporal Information in Documents

Annotating Spatio-Temporal Information in Documents Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de

More information

Entity Extraction Enables Discovery

Entity Extraction Enables Discovery www.basistech.com info@basistech.com 617-386-2090 Entity Extraction Enables Discovery A discovery search is one in which you don t know or can t know all relevant search terms. Automated entity extraction

More information

Working with Image Files (IMG and ISO) Version 3.2

Working with Image Files (IMG and ISO) Version 3.2 Working with Image Files (IMG and ISO) 2015-03-03 Version 3.2 TABLE OF CONTENTS ISO and IMG Files... 2 What is an IMG or ISO File?... 2 32-Bit vs 64-Bit... 2 Bootable Installation Packages... 3 Package?

More information