Language Support, Linguistics, and Text Analytics in Solr
|
|
- Shanon Stanley
- 6 years ago
- Views:
Transcription
1 Boston Apache Lucene and Solr Meetup Language Support, Linguistics, and Text Analytics in Solr Carl Steve W. Kearns Hoffman Product Manager Basis Technology Founder & CEO
2 Agenda About Basis Technology Language Identification Linguistics for Search Entity Extraction in Solr Demonstration Application
3 About Basis Technology This is Headquarters, offices in: Tokyo, San Francisco, Washington DC Specialists in natural language processing for Web/enterprise search Document/OSINT/media exploitation E-discovery Digital forensics Developer of a mature and widely used platform for multilingual text analytics and information retrieval Solutions for commercial enterprises and government agencies
4 How well will it work for me? Define! Definition of success varies widely Log File Search: Return only things that match exactly exception Product Search: Return similar results organized by category. Measure! Create examples, track performance.
5 Language Identification Detect dominant language Find language regions
6 Language Identification Why? Faceting Language-specific indexing Entity extraction
7 Language Identification in Solr Preprocessor to Solr Custom OpenPipeline Solr UpdateRequestProcessor Chain of URP s may be defined in SolrConfig.xml Has full access to the document Add, Edit, Remove fields UpdateRequestProcessor Chain Field Analysis Solr Doc Language Identification URP Add Language field Rename Text: text_<lang> Solr Field Language text_swedish Other Custom URP text_english SolrConfig.xml Schema.xml
8 Language Identification Challenges Identifying query language is hard How do you query multiple fields at the same time? Use the Dismax parser: /solr/select?qt=dismax&qf=text_english%20text_swedish%20detext&q=hello%20world The QT specifies the query type as dismax The QF specifies the fields to search
9 Linguistics for Search Why? Improve recall! Every language has a unique set of challenges: Tokenization Chinese, Japanese, Korean, Thai Morphological Analysis vs. N-Gram Stemming vs. Lemmatization All European and Middle Eastern languages Compound words Swedish, Danish, Norwegian, Dutch, German, Korean, Japanese
10 Morphological Analysis vs. N-Gram Search Term: 東京 N-Gram: 東京ルパン上映時間 Morphological:
11 Stemming vs. Lemmatization Stemming: Set of language-specific rules for removing leading and trailing characters from words Intended to increase recall at the expense of precision Example EN rule: Remove trailing ing Lemmatization: Complex set of language-specific approaches for producing the dictionary form of a given word Intended to increase recall without hurting precision. Uses context to disambiguate when multiple dictionary forms exist
12 Stemming vs. Lemmatization English: I have spoken at several conferences Stemming: Lemmatization:
13 Stemming vs. Lemmatization French: Je n étais pas là Stemming: Lemmatization:
14 Stemming vs. Lemmatization + Decompounding German: Am Samstagmorgen fliege ich zurueck nach Boston. Stemming: Lemmatization (and decompounding!)
15 Stemming vs. Lemmatization Swedish: En person skadades lindrigt i en trafikolycka i Pernå Stemming: Lemmatization:
16 Linguistics in Solr Easy to customize as Analyzer/Tokenizer/TokenFilter UpdateRequestProcessor Chain Field Analysis Solr Doc Language Identification URP Add Language field Rename Text: text_<lang> Solr Field Language text_swedish Other Custom URP text_english SolrConfig.xml Schema.xml
17 Related Challenges Can I index text from many languages into the same field? Yes, but it s not always a good idea, because query language ID is not accurate. You need a custom Query Analyzer that does stemming/lemmatization in many languages for the same query. How do I query text in multiple fields? Dismax parser!
18 Text Analytics in Solr: Entity Extraction Process of identifying people, places, organizations, dates, times, etc. in unstructured text. Methods: Lists Rules Statistical Define your goals upfront! Some extraction methods work better for certain entity types Rules work well for dates, addresses, and URL s, but not people Lists work well for titles, but not locations Statistical extractors work well for ambiguous entities: people, locations, organizations
19 Entity Extraction
20 Entity Extraction in Solr Pre-processor to Solr Custom OpenPipeline UpdateRequestProcessor Store entities in new fields per entity type <field name="person" type="string" indexed="true" multivalued="true" stored="false" /> UpdateRequestProcessor Chain Field Analysis Solr Doc Language Identification URP Add Language field Rename Text: text_<lang> Solr Field Language text_swedish Entity Extraction URP text_english SolrConfig.xml Schema.xml
21 Entity Extraction Challenges How do you use extracted entities as facets? For retrieving counts: &facet=true&facet.field=person&facet.field=location For filtering results: &facet.query=person:steve Kearns&facet.query=LOCATION:Stockholm How else can Entities be used? Improve relevance by searching the entity fields with a boost Entity-specific search phonetic matching and other name-specific search appoaches Measure accuracy! F-Score is a measurement that combines precision and recall Vendors should provide this, but evaluate on your own data!
22 Demo: Odyssey Information Navigator Example search application built on Solr I personally built this in < 2 months using Solr and products from Basis Technology I spent more time on the UI than integration of text analytics components I would be happy to show you the Solr config and let you try it out
Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio
Case Study Use Case: Recruiting Segment: Recruiting Products: Rosette Challenge CareerBuilder, the global leader in human capital solutions, operates the largest job board in the U.S. and has an extensive
More informationPatent Terminlogy Analysis: Passage Retrieval Experiments for the Intellecutal Property Track at CLEF
Patent Terminlogy Analysis: Passage Retrieval Experiments for the Intellecutal Property Track at CLEF Julia Jürgens, Sebastian Kastner, Christa Womser-Hacker, and Thomas Mandl University of Hildesheim,
More informationNYC Apache Lucene/Solr Meetup
June 11, 2014 NYC Apache Lucene/Solr Meetup RAMP UP YOUR WEB EXPERIENCES USING DRUPAL AND APACHE SOLR peter.wolanin@acquia.com drupal.org/user/49851 (pwolanin) Peter Wolanin Momentum Specialist @ Acquia,
More informationSearch Evolution von Lucene zu Solr und ElasticSearch. Florian
Search Evolution von Lucene zu Solr und ElasticSearch Florian Hopf @fhopf http://www.florian-hopf.de Index Indizieren Index Suchen Index Term Document Id Analyzing http://www.flickr.com/photos/quinnanya/5196951914/
More informationSide by Side with Solr and Elasticsearch
Side by Side with Solr and Elasticsearch Rafał Kuć Radu Gheorghe Rafał Logsene Radu Logsene Overview Agenda documents documents schema mapping queries searches searches index&store index&store aggregations
More informationSoir 1.4 Enterprise Search Server
Soir 1.4 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more David Smiley Eric Pugh *- PUBLISHING -J BIRMINGHAM - MUMBAI Preface
More informationInformation Retrieval CS-E credits
Information Retrieval CS-E4420 5 credits Tokenization, further indexing issues Antti Ukkonen antti.ukkonen@aalto.fi Slides are based on materials by Tuukka Ruotsalo, Hinrich Schütze and Christina Lioma
More informationAdvance Search With Solr
Advance Search With Solr www.biztechconsultancy.com sales@biztechconsultancy.com Page 1 Contents 1 Benefits of Advance Search with Solr... 3 2 Features... 3 2.1 Back-End Admin Features... 3 2.1.1 Integrated
More informationThe Goal: Succeeding in the Japanese market
Case Study Use Case: Social Media Platform Segment: Consumer Reviews Whether you live in San Francisco, Boston, Dublin, Vienna, or Tokyo, Yelp has reviews of local businesses in your neighborhood. Yelp
More informationInformation Retrieval
Introduction to Information Retrieval Lecture 2: Preprocessing 1 Ch. 1 Recap of the previous lecture Basic inverted indexes: Structure: Dictionary and Postings Key step in construction: Sorting Boolean
More informationPerceptive Intelligent Capture Visibility
Perceptive Intelligent Capture Visibility Technical Specifications Version: 3.1.x Written by: Product Knowledge, R&D Date: August 2018 2015 Lexmark International Technology, S.A. All rights reserved. Lexmark
More informationcominvent as Migrating FAST to Solr by Jan Høydahl cominvent as Enterprise Search Specialists
Enterprise Search Specialists Migrating FAST to Solr by Jan Høydahl Consulting Cominvent delivers independent search consulting Focus on Apache Lucene/Solr & Microsoft FAST ESP We know both the proprietary
More informationrpaf ktl Pen Apache Solr 3 Enterprise Search Server J community exp<= highlighting, relevancy ranked sorting, and more source publishing""
Apache Solr 3 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, relevancy ranked sorting, and more David Smiley Eric Pugh rpaf ktl Pen I I riv IV I J community
More informationStanbol Enhancer. Use Custom Vocabularies with the. Rupert Westenthaler, Salzburg Research, Austria. 07.
http://stanbol.apache.org Use Custom Vocabularies with the Stanbol Enhancer Rupert Westenthaler, Salzburg Research, Austria 07. November, 2012 About Me Rupert Westenthaler Apache Stanbol and Clerezza Committer
More informationBattle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć sematext.com
Battle of the Giants Apache Solr 4.0 vs ElasticSearch 0.20 Rafał Kuć Sematext International @kucrafal @sematext sematext.com Who Am I Solr 3.1 Cookbook author (4.0 inc) Sematext consultant & engineer Solr.pl
More informationTechnical Deep Dive: Cassandra + Solr. Copyright 2012, Think Big Analy7cs, All Rights Reserved
Technical Deep Dive: Cassandra + Solr Confiden7al Business case 2 Super scalable realtime analytics Hadoop is fantastic at performing batch analytics Cassandra is an advanced column family oriented system
More informationVK Multimedia Information Systems
VK Multimedia Information Systems Mathias Lux, mlux@itec.uni-klu.ac.at This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Results Exercise 01 Exercise 02 Retrieval
More information10 Steps to Document Translation Success
10 Steps to Document Translation Success www.globalizationpartners.com 10 Steps to Document Translation Success Copyright 2016-2017 Globalization Partners International. All rights reserved. This ebook
More informationCompany Overview SYSTRAN Applications Customization for Quality Translations
Company Overview SYSTRAN Applications Customization for Quality Translations Prepared for Lirics Industrial Advisory Group June 20 & 21, 2005, Barcelona Agenda Company Overview SYSTRAN WebServer SYSTRAN
More informationAll Localized. Your Localization Services Partner. Company Profile
All Localized Company Profile All Localized is a young company made up of linguistic and technical professionals. All Localized is a reliable translation, localization and desktop publishing services provider
More informationAgenda for today. Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing
Agenda for today Homework questions, issues? Non-projective dependencies Spanning tree algorithm for non-projective parsing 1 Projective vs non-projective dependencies If we extract dependencies from trees,
More informationRELEASE NOTES UFED ANALYTICS DESKTOP SAVE TIME AND RESOURCES WITH ADVANCED IMAGE ANALYTICS HIGHLIGHTS
RELEASE NOTES Version 5.2 September 2016 UFED ANALYTICS DESKTOP HIGHLIGHTS UFED Analytics Desktop version 5.2 serves as your virtual partner, saving precious time in the investigative process. Designed
More information12 Steps to Software Translation Success
12 Steps to Software Translation Success www.globalizationpartners.com 12 Steps to Software Translation Success Copyright 2016-2017 Globalization Partners International. All rights reserved. This ebook
More informationMore about Posting Lists
More about Posting Lists 1 FASTER POSTINGS MERGES: SKIP POINTERS/SKIP LISTS 2 Sec. 2.3 Recall basic merge Walk through the two postings simultaneously, in time linear in the total number of postings entries
More informationTo search and summarize on Internet with Human Language Technology
To search and summarize on Internet with Human Language Technology Hercules DALIANIS Department of Computer and System Sciences KTH and Stockholm University, Forum 100, 164 40 Kista, Sweden Email:hercules@kth.se
More informationGoogle Search Appliance
Google Search Appliance Search Appliance Internationalization Google Search Appliance software version 7.2 and later Google, Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 www.google.com GSA-INTL_200.01
More informationPAGE 1 SYSTRAN. PRESENTER: GILLES MONTIER
PAGE 1 SYSTRAN PRESENTER: GLLES MONTER Language Resources: Foundations of the Multilingual Digital Single Market PAGE 2 Language Resources for MT: what does our customers want? Better generic translation
More informationFunctionality Description
Responsive & Personalized Content Aggregation Content Management Classification Enterprise Search Collaboration Visual Analytics January 2017 Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction... 2 Two
More informationESI Packaging Registration Tutorial
ESI Packaging Registration Tutorial Portal Registration and ESI Packaging Access Tutorial ESI Packaging Registration Process In order to access the ESI Packaging web site, you must first establish a user
More informationCSE 435/535 Information Retrieval (Fall 2016) Project Two: Boolean Query and Inverted Index
CSE 435/535 Information Retrieval (Fall 2016) Project Two: Boolean Query and Inverted Index Due Date: October 17th 2016, 23:59 pm Overview In project part two you will be given Lucene index generated from
More informationLanguage support and linguistics
Language support and linguistics in Lucene, Solr and ElasticSearch and the eco-system June 3rd, 2013 Christian Moen cm@atilika.com About me MSc. in computer science, University of Oslo, Norway Worked with
More informationRecap of the previous lecture. Recall the basic indexing pipeline. Plan for this lecture. Parsing a document. Introduction to Information Retrieval
Ch. Introduction to Information Retrieval Recap of the previous lecture Basic inverted indexes: Structure: Dictionary and Postings Lecture 2: The term vocabulary and postings lists Key step in construction:
More informationSUMMON WEB-SCALE DISCOVERY. ADA University Baku 02/04/2014
SUMMON WEB-SCALE DISCOVERY ADA University Baku 02/04/2014 Why an Automated Management Solution is Important Academic Library Expenditures on Purchased and Licensed Content 90% 80% 70% 60% 50% 40% 30% 20%
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We
More informationSmart Events Cloud Release February 2017
Smart Events Cloud Release February 2017 Maintenance Window This is not a site-down release. Users still have access during the upgrade. Modules Impacted The changes in this release affect these modules
More informationProduct Release Notes
Product Release Notes Release 33 October 2016 VERSION 20161021 Table of Contents Document Versioning 2 Overview 3 Known Issues 3 Usability 3 Drag and Drop Column Reordering is not Supported in some Admin
More informationThe SAP Knowledge Acceleration, website package, can be deployed to any web server, file server, CD-ROM, or a user workstation.
SAP KNOWLEDGE ACCELERATION TECHNICAL SPECIFICATIONS In this guide, you will learn about hardware and software requirements for SAP Knowledge Acceleration (KA). SAP Knowledge Acceleration (KA) is a web-based
More informationFull-Text Search. Explained. Philipp
Full-Text Search Explained Philipp Krenn @xeraa Infrastructure Developer Advocate ViennaDB Papers We Love Vienna Who uses Databases? Who uses Search? Database vs Full-Text Search But I can do... SELECT
More informationRealtime visitor analysis with Couchbase and Elasticsearch
Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo
More informationTruVision 12/32 Series IP Camera Firmware V7.1 Release Notes
TruVision 12/32 Series IP Camera Firmware V7.1 Release Notes P/N 1073169-EN REV A ISS 08AUG16 Introduction These are the TruVision 12/32 Series IP Camera Firmware V7.1 Release Notes with additional information
More information1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. 2 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Oracle E-Business Suite Internationalization and Multilingual Features
More informationInformation Retrieval
Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 2: The term vocabulary Ch. 1 Recap of the previous lecture Basic inverted
More informationImproving Drupal search experience with Apache Solr and Elasticsearch
Improving Drupal search experience with Apache Solr and Elasticsearch Milos Pumpalovic Web Front-end Developer Gene Mohr Web Back-end Developer About Us Milos Pumpalovic Front End Developer Drupal theming
More informationPRODUCT DOCUMENTATION. Pivotal GPText. Version 1.2. GPText User s Guide Rev: A GoPivotal, Inc.
PRODUCT DOCUMENTATION Pivotal GPText Version 1.2 GPText User s Guide Rev: A03 2013 GoPivotal, Inc. Copyright 2013 GoPivotal, Inc. All rights reserved. GoPivotal, Inc. believes the information in this publication
More informationMore on indexing and text operations CE-324: Modern Information Retrieval Sharif University of Technology
More on indexing and text operations CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationQUICK REFERENCE GUIDE: SHELL SUPPLIER PROFILE QUESTIONNAIRE (SPQ)
QUICK REFERENCE GUIDE: SHELL SUPPLIER PROFILE QUESTIONNAIRE (SPQ) July 2018 July 2018 1 SPQ OVERVIEW July 2018 2 WHAT IS THE SHELL SUPPLIER PROFILE QUESTIONNAIRE? Shell needs all potential and existing
More informationGLOBAL NETFLIX PREFERRED VENDOR (NPV) RATE CARD:
Global Rate Card Timed Text Origination Rates Audio Description Rates Netflix Scope of Work Guidelines A/V Materials Timed Text Audio Materials Trailers Forced Narratives NPV SLA & KPIs GLOBAL NETFLIX
More informationAutomatic Lemmatizer Construction with Focus on OOV Words Lemmatization
Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization Jakub Kanis, Luděk Müller University of West Bohemia, Department of Cybernetics, Univerzitní 8, 306 14 Plzeň, Czech Republic {jkanis,muller}@kky.zcu.cz
More informationNuance Licensing Opportunities: Power PDF and Dragon
Nuance Licensing Opportunities: Power PDF and Dragon Mel Catimbang, Channel Manager, South East Asia Derek Austin, Dragon Business Manager, APAC October 2017 2017 Nuance Communications, Inc. All rights
More informationEssential Elements of Multilingual Search Boos ng Global Search Quality with the Rose e Linguis cs Pla orm
March 26, 2013 Essential Elements of Multilingual Search Boos ng Global Search Quality with the Rose e Linguis cs Pla orm We put the World in the World Wide Web ABOUT BASIS TECHNOLOGY Basis Technology
More informationKYOCERA Quick Scan v1.0
KYOCERA Quick Scan v1.0 Software Information PC Name Version 0731 July 31, 2018 KYOCERA Document Solutions Inc. Product Planning Division 1 Table of Contents 1. Overview... 4 1.1. Background... 4 1.2.
More informationRescue Lens Administrators Guide
Rescue Lens Administrators Guide Contents About Rescue Lens...4 Rescue Lens Administration Center At a Glance...4 LogMeIn Rescue Lens System Requirements...4 About Rescue Lens in a Multilingual Environment...5
More informationContent Quality Management with Open Source Language Technology
Content Quality Management with Open Source Language Technology Christian Lieske (SAP AG) Dr. Felix Sasaki (DFKI, FH Potsdam) To complement this presentation, the full text for the conference proceedings
More informationTokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017
Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation
More information3) CHARLIE HULL. Implementing open source search for a major specialist recruiting firm
Advice: The time spent on pre-launch analysis is worth the effort to avoid starting from scratch and further alienating already frustrated users by implementing a search which appears to have no connection
More informationProduct Release Notes
Product Release Notes Release 32 June 2016 VERSION 20160624 Table of Contents Document Versioning 2 Overview 3 Known Issues 3 Usability 3 Action Bar Applets Do Not Collapse if the User Refines a List Within
More informationFrom: AAAI Technical Report SS Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved.
From: AAAI Technical Report SS-97-05. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. TITAN: A Cross-linguistic Search Engine for the WWW Yoshihiko Hayashi, Gen ichiro Kikui, Seiji
More informationMicrosoft Academic Select Enrollment
Microsoft Academic Select Enrollment Academic Select Agreement number Reseller or Microsoft affiliate to complete Academic Select Agreement Expiration Date Reseller or Microsoft affiliate to complete Enrollment
More informationADOBE READER AND ACROBAT 8.X AND 9.X SYSTEM REQUIREMENTS
ADOBE READER AND ACROBAT 8.X AND 9.X SYSTEM REQUIREMENTS Table of Contents OVERVIEW... 1 Baseline requirements beginning with 9.3.2 and 8.2.2... 2 System requirements... 2 9.3.2... 2 8.2.2... 3 Supported
More informationEnhancing applications with Cognitive APIs IBM Corporation
Enhancing applications with Cognitive APIs After you complete this section, you should understand: The Watson Developer Cloud offerings and APIs The benefits of commonly used Cognitive services 2 Watson
More informationApache Solr Learning to Rank FTW!
Apache Solr Learning to Rank FTW! Berlin Buzzwords 2017 June 12, 2017 Diego Ceccarelli Software Engineer, News Search dceccarelli4@bloomberg.net Michael Nilsson Software Engineer, Unified Search mnilsson23@bloomberg.net
More informationMetaCarta GeoSearch Toolkit for Solr James Goodwin Principal Engineer, Nokia
MetaCarta GeoSearch Toolkit for Solr James Goodwin Principal Engineer, Nokia 2010 Nokia Overview Introduction to MetaCarta About Nokia MetaCarta Geographic Search Defining GeoSearch Functionality for Solr
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Dell Zhang Birkbeck, University of London 2016/17 IR Chapter 02 The Term Vocabulary and Postings Lists Constructing Inverted Indexes The major steps in constructing
More informationCACAO PROJECT AT THE 2009 TASK
CACAO PROJECT AT THE TEL@CLEF 2009 TASK Alessio Bosca, Luca Dini Celi s.r.l. - 10131 Torino - C. Moncalieri, 21 alessio.bosca, dini@celi.it Abstract This paper presents the participation of the CACAO prototype
More informationUsing Elastic with Magento
Using Elastic with Magento Stefan Willkommer CTO and CO-Founder @ TechDivision GmbH Comparison License Apache License Apache License Index Lucene Lucene API RESTful Webservice RESTful Webservice Scheme
More informationCross-Language Evaluation Forum - CLEF
Cross-Language Evaluation Forum - CLEF Carol Peters IEI-CNR, Pisa, Italy IST-2000-31002 Kick-off: October 2001 Outline Project Objectives Background CLIR System Evaluation CLEF Infrastructure Results so
More informationLet s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed
Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,
More informationCroLOM: Cross-Lingual Ontology Matching System
CroLOM: Cross-Lingual Ontology Matching System Results for OAEI 2016 Abderrahmane Khiat LITIO Laboratory, University of Oran1 Ahmed Ben Bella, Oran, Algeria abderrahmane khiat@yahoo.com Abstract. The current
More informationThe Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources
FOSS4G 2017 Boston The Billion Object Platform (BOP): a system to lower barriers to support big, streaming, spatio-temporal data sources Devika Kakkar and Ben Lewis Harvard Center for Geographic Analysis
More informationNatural Language Processing with PoolParty
Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense
More informationOpen Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria
Open Source Search Andreas Pesenhofer max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria max.recall information systems max.recall is a software and consulting company enabling
More informationAmerican Philatelic Society Translation Committee. Annual Report Prepared by Bobby Liao
American Philatelic Society Translation Committee Annual Report 2012 Prepared by Bobby Liao - 1 - Table of Contents: 1. Executive Summary 2. Translation Committee Activities Summary July 2011 June 2012
More informationSennheiser Updater End User Installation Guide
Sennheiser Updater End User Installation Guide 1. Sennheiser Updater Installation Copy the installation package (Sennheiser_Updater_vX.Y.ZZZZ.exe) at any local path (E.g. C:\MySoftwares\) Using the installer
More informationPeopleTools 8.56: Search Technology
PeopleTools 8.56: Search Technology June 2017 PeopleTools 8.56: Search Technology This software and related documentation are provided under a license agreement containing restrictions on use and disclosure
More informationTaming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island
Taming Text How to Find, Organize, and Manipulate It GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS 11 MANNING Shelter Island contents foreword xiii preface xiv acknowledgments xvii about this book
More informationThis Webcast Will Begin Shortly
This Webcast Will Begin Shortly If you have any technical problems with the Webcast or the streaming audio, please contact us via email at: accwebcast@commpartners.com Thank You! Welcome! Electronic Data
More informationMigrating from FAST to EMC Documentum xplore: What To Do and Why You'll Love It. Ed Bueché EMC Distinguished Engineer and xplore Architect
Migrating from FAST to EMC Documentum xplore: What To Do and Why You'll Love It Ed Bueché EMC Distinguished Engineer and xplore Architect Agenda Introduction to xplore xplore 1.2 new capabilities FAST-to-xPlore
More informationSACD Text summary. SACD Text Overview. Based on Scarlet book Version 1.2. sonic studio
1 SACD Text Overview Based on Scarlet book Version 1.2 2 Main Features of SACD Text Good compatibility with CD Text Player can handle both CD Text and SACD in same operation Utilizes existing CD Text source
More informationCS105 Introduction to Information Retrieval
CS105 Introduction to Information Retrieval Lecture: Yang Mu UMass Boston Slides are modified from: http://www.stanford.edu/class/cs276/ Information Retrieval Information Retrieval (IR) is finding material
More informationRelevancy Workbench Module. 1.0 Documentation
Relevancy Workbench Module 1.0 Documentation Created: Table of Contents Installing the Relevancy Workbench Module 4 System Requirements 4 Standalone Relevancy Workbench 4 Deploy to a Web Container 4 Relevancy
More informationintroduction to using Watson Services with Java on Bluemix
introduction to using Watson Services with Java on Bluemix Patrick Mueller @pmuellr, muellerware.org developer advocate for IBM's Bluemix PaaS http://pmuellr.github.io/slides/2015/02-java-intro-with-watson
More informationTechnical Information
Building Technologies Division Security Products Technical Information SPC Series SPC Support CD Release Note CD V3.6.6 04/08/2015 Updates since: CD V3.4.5 Release V3.6.6 for SPC versions SPC42xx/43xx/52xx/53xx/63xx.
More informationHP Insight Remote Support Advanced HP StorageWorks P4000 Storage System
HP Insight Remote Support Advanced HP StorageWorks P4000 Storage System Migration Guide HP Part Number: 5900-1089 Published: August 2010, Edition 1 Copyright 2010 Hewlett-Packard Development Company, L.P.
More informationUSB Link Adapter. User s Manual
USB Link Adapter User s Manual Safety Instructions Always read the safety instructions carefully Keep this User s Manual for future reference Keep this equipment away from humidity If any of the following
More informationTransfer Manual Norman Endpoint Protection Transfer to Avast Business Antivirus Pro Plus
Transfer Manual Norman Endpoint Protection Transfer to Avast Business Antivirus Pro Plus Summary This document outlines the necessary steps for transferring your Norman Endpoint Protection product to Avast
More informationConfiguring an SAP Business Warehouse Resouce in Metadata Manager 9.5.0
Configuring an SAP Business Warehouse Resouce in Metadata Manager 9.5.0 2012 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,
More informationcheckterm User Manual Document version 1.0
checkterm 6.0.1 User Manual Table of contents Table of contents 1 Introduction... 4 1.1 General... 4 1.2 Stemming... 4 1.3 Languages without spaces... 5 2 Step-by-step guide... 6 2.1 New installation...
More informationKey topics when. Migratng from FAST to Solr. By Jan Høydahl. cominvent as. Apache Lucene EuroCon 05/21/10
Key topics when Migratng from FAST to Solr By Jan Høydahl cominvent as Agenda About Cominvent & Jan Høydahl Quick overview of FAST ESP The migraton step by step Pain points Q&A Jan Høydahl: BIO Enterprise
More informationLAB 3: Text processing + Apache OpenNLP
LAB 3: Text processing + Apache OpenNLP 1. Motivation: The text that was derived (e.g., crawling + using Apache Tika) must be processed before being used in an information retrieval system. Text processing
More informationEnterprise Search with ColdFusion Solr. Dan Sirucek cf.objective 2012 May 2012
Enterprise Search with ColdFusion Solr Dan Sirucek cf.objective 2012 May 2012 About Me Senior Learning Technologist at WellPoint, Inc Developer for 14 years Developing in ColdFusion for 8 years Started
More informationXapity Current Activity Administration Guide.
Xapity Current Activity Administration Guide www.xapity.com Document Version 1.0 October 2016 This document contains information that may change without notice. While every effort has been made to ensure
More informationATI Radeon HD 2400XT (256MB DH) PCIe Graphics Card Overview
Overview Models KD060AA Introduction The provides a Low Profile, PCI Express x16 graphics add-in card based on the ATI RV610 Graphics Processor. It supports Dual Display video output through its DMS-59
More informationTEXT CHAPTER 5. W. Bruce Croft BACKGROUND
41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia
More informationiknow Use Cases Michael Br ands Senior Product Manager
iknow Use Cases Michael Br ands Senior Product Manager Agenda iknow in the InterSystems offering Breakthrough Characteristics Steps in Deployment New Features Use Cases The InterSystems Technology Unlock
More informationHP Enterprise Collaboration
HP Enterprise Collaboration For the Windows operating system Software Version: 1.1 Support Matrix Document Release Date: August 2012 Software Release Date: August 2012 Support Matrix Legal Notices Warranty
More informationUAIC: Participation in task
UAIC: Participation in TEL@CLEF task Adrian Iftene, Alina-Elena Mihăilă, Ingride-Paula Epure UAIC: Faculty of Computer Science, Alexandru Ioan Cuza University, Romania {adiftene, elena.mihaila, paula.epure}@info.uaic.ro
More informationHarnessing Publicly Available Factual Data in the Analytical Process
June 14, 2012 Harnessing Publicly Available Factual Data in the Analytical Process by Benson Margulies, CTO We put the World in the World Wide Web ABOUT BASIS TECHNOLOGY Basis Technology provides so ware
More informationAnnotating Spatio-Temporal Information in Documents
Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de
More informationEntity Extraction Enables Discovery
www.basistech.com info@basistech.com 617-386-2090 Entity Extraction Enables Discovery A discovery search is one in which you don t know or can t know all relevant search terms. Automated entity extraction
More informationWorking with Image Files (IMG and ISO) Version 3.2
Working with Image Files (IMG and ISO) 2015-03-03 Version 3.2 TABLE OF CONTENTS ISO and IMG Files... 2 What is an IMG or ISO File?... 2 32-Bit vs 64-Bit... 2 Bootable Installation Packages... 3 Package?
More information