Natural Language Processing as Key Component to Successful Information Products

Size: px
Start display at page:

Download "Natural Language Processing as Key Component to Successful Information Products"

Transcription

1 Natural Language Processing as Key Component to Successful Information Products Yves Schabes Teragram Corporation Tera monster in Greek 2 40 (=1,099,511,627,766) (one trillion) gram something written down, drawn, or recorded. Yves Schabes. Teragram. JEITA 10/15/2003 1

2 Natural Language Processing and Document Management Information Overflow Problem. Amount of Information growing at exponential rate. Internet Search Engines:» Year 1998: 10 million pages» Year 2003: 2 billion pages with more than 100 millions queries/day Personal Computing:» Year 1998: Hard-drives of 100 Megabytes-several Gigabytes» Year 2003: Hard-drives of 100s Gigabytes Enterprise Computing» Year 1998: storage 100 Gigabytes - 1 Terabyte» Year 2003: storage 1 Terabyte 100s Terabytes New information is being published continuously. Scalability of NLP Techniques is key. Yves Schabes. Teragram. JEITA 10/15/2003 2

3 Examples of use of NLP for Business Applications in the U.S. Search Internet Search Question-Answering Enterprise search Customer Relationship Management Call Center Knowledgebase Document Management Speech Recognition Automated Call Center (e.g. airline reservation) Dictation Translation Software Localization User Manual Localization Machine Translation Translation Memory Government applications. Interfaces Yves Schabes. Teragram. JEITA 10/15/2003 3

4 Examples of Market Size Internet Search ~$3 billions of 2003 revenues (projected) $5-$8 billions expected for Examples of revenues: Google: $700,000, revenue (projected) Overture: $1,000,000, revenue (projected) Ask Jeeves: $100,000, revenue (projected) Natural Language Technologies are key components of Internet Search Engines. Microsoft is building a web search engine. Internet Search is Generating significant revenues and attracts big players. Yves Schabes. Teragram. JEITA 10/15/2003 4

5 Information Overflow Information Retrieval: Look into existing documents Traditional Solution: Retrieval of Documents Objective: Retrieval of Information» Need for NLP techniques. Information Navigation: Go from one document to another Traditional Solution: Search for Documents Objective: Present Documents and information based on user profiles» Need for NLP techniques. Information Awareness: What is newly published? Traditional Solution: alerts based on keywords Objective: alerts based on user interests.» Need for NLP techniques. Yves Schabes. Teragram. JEITA 10/15/2003 5

6 Why Scalability? Example: Altavista Prisma Topics Related To Search Query To Refine Search Yves Schabes. Teragram. JEITA 10/15/2003 6

7 Example: Altavista Prisma In order to perform the computation needed for Prisma: Preprocessing: Process every week 1 billion web pages and perform the following NLP techniques: 1. Automatic Language Identification 2. Tokenization and Segmentation 3. Morphological Analysis 4. Disambiguation and Part-of-Speech Tagging 5. Concepts and phrase extraction Run-time speed: more than 100 million queries/s Scalability is key! Yves Schabes. Teragram. JEITA 10/15/2003 7

8 Spelling Correction. Example MSN.com Word Processor Speed: 1 user. Small Dictionary (~300,000 words) Internet Search Engine Requirements: Hundreds of millions of queries/day Use of context Use of multiple dictionaries (names, ) Dictionary Size: all words used on the Internet Yves Schabes. Teragram. JEITA 10/15/2003 8

9 Spelling Correction. User mistypes Christina Aguilera s first name as cristina in the search query box and then clicks the Search button. Before the search engine begins its search, Spelling Correction automatically corrects the first name to christina in order to return the correct Web pages. Yves Schabes. Teragram. JEITA 10/15/2003 9

10 Spelling Correction. User has mistyped the name for the query search. The search is made for pages with the mistyped name, but the Spelling Correction suggests a search with the correct spelling of the name. Yves Schabes. Teragram. JEITA 10/15/

11 Contextual Spelling Correction Yves Schabes. Teragram. JEITA 10/15/

12 Contextual Spelling Correction Yves Schabes. Teragram. JEITA 10/15/

13 New York Times Examples Functionalities: Retrieval and Browsing (find existing documents) Search by keywords. Browse documents by topics Search by keyword within Topic Related News/Documents Information Awareness (search in the future for newly available documents) Alert based on Topic and Keywords Yves Schabes. Teragram. JEITA 10/15/

14 Search Search Search Search within within Category Category Browse Browse by by Categories Categories Alerts Alerts Browse Browse by by Categories Categories Yves Schabes. Teragram. JEITA 10/15/

15 Search Query Search Query Categorized Yves Schabes. Teragram. JEITA 10/15/

16 Alerts Related Articles Related Categories Yves Schabes. Teragram. JEITA 10/15/

17 Search Query Categorization Yves Schabes. Teragram. JEITA 10/15/

18 Use of Categorization and Extraction for Alerts Category Concept Yves Schabes. Teragram. JEITA 10/15/

19 Yves Schabes. Teragram. JEITA 10/15/

20 New York Times. NLP Techniques. NLP Techniques: Document Categorization Concepts Extraction Phrase Extraction Part-of-speech tagging Stemming Tokenization Scalability in Precision. Yves Schabes. Teragram. JEITA 10/15/

21 Example: CNN Functionality: Alerts based on concepts and topics Scalability: millions of s per day Yves Schabes. Teragram. JEITA 10/15/

22 Yves Schabes. Teragram. JEITA 10/15/

23 Natural Language Question-Answering Scalability in terms of documents is required. Answers are computed across all documents. Techniques: Semantic Analysis Parsing Events Extraction Concepts Extraction Part-of-Speech Tagging Stemming Yves Schabes. Teragram. JEITA 10/15/

24 Question Documents Documents were were answer answer is is found found Answer 1 Excerpts Answer 2 Answer 3 Yves Schabes. Teragram. JEITA 10/15/

25 Yves Schabes. Teragram. JEITA 10/15/

26 Yves Schabes. Teragram. JEITA 10/15/

27 Conclusion Scalability is a requirement of NLP Techniques. Scalability makes NLP techniques useful for Information Retrieval Information Browsing Information Awareness Yves Schabes. Teragram. JEITA 10/15/

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island Taming Text How to Find, Organize, and Manipulate It GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS 11 MANNING Shelter Island contents foreword xiii preface xiv acknowledgments xvii about this book

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

Module 1: Internet Basics for Web Development (II)

Module 1: Internet Basics for Web Development (II) INTERNET & WEB APPLICATION DEVELOPMENT SWE 444 Fall Semester 2008-2009 (081) Module 1: Internet Basics for Web Development (II) Dr. El-Sayed El-Alfy Computer Science Department King Fahd University of

More information

Information Retrieval

Information Retrieval Introduction Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching and retrieval of information Gerard Salton, 1968 J. Pei: Information

More information

Question No : 1 Web spiders carry out a key function within search. What is it? Choose one of the following:

Question No : 1 Web spiders carry out a key function within search. What is it? Choose one of the following: Volume: 199 Questions Question No : 1 Web spiders carry out a key function within search. What is it? Choose one of the following: A. Indexing the site B. Ranking the site C. Parsing the site D. Translating

More information

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search Seo tutorial Seo tutorial Introduction to seo... 4 1. General seo information... 5 1.1 History of search engines... 5 1.2 Common search engine principles... 6 2. Internal ranking factors... 8 2.1 Web page

More information

Contents. Access Content Library Scroll over the Services Tab and select Content Library.

Contents. Access Content Library Scroll over the Services Tab and select Content Library. Job Aid: Content Library The purpose of this document is to provide step-by-step instructions on how to utilize features and locate items in the Content Library in AgentNet. The Content Library in AgentNet

More information

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio Case Study Use Case: Recruiting Segment: Recruiting Products: Rosette Challenge CareerBuilder, the global leader in human capital solutions, operates the largest job board in the U.S. and has an extensive

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

DMI Exam PDDM Professional Diploma in Digital Marketing Version: 7.0 [ Total Questions: 199 ]

DMI Exam PDDM Professional Diploma in Digital Marketing Version: 7.0 [ Total Questions: 199 ] s@lm@n DMI Exam PDDM Professional Diploma in Digital Marketing Version: 7.0 [ Total Questions: 199 ] https://certkill.com Topic break down Topic No. of Questions Topic 1: Search Marketing (SEO) 21 Topic

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

Contents. A Recommended Reading...21 Index iii

Contents. A Recommended Reading...21 Index iii Contents Installing SAS Information Retrieval Studio...1 1.1 About This Book... 1 1.1.1 Audience... 1 1.1.2 Prerequisites... 1 1.1.3 Typographical Conventions... 2 1.2 Introduction to SAS Information Retrieval

More information

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS 82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the

More information

Knowledge Engineering with Semantic Web Technologies

Knowledge Engineering with Semantic Web Technologies This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) Knowledge Engineering with Semantic Web Technologies Lecture 5: Ontological Engineering 5.3 Ontology Learning

More information

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK Qing Guo 1, 2 1 Nanyang Technological University, Singapore 2 SAP Innovation Center Network,Singapore ABSTRACT Literature review is part of scientific

More information

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured

More information

A Survey on Web Information Retrieval Technologies

A Survey on Web Information Retrieval Technologies A Survey on Web Information Retrieval Technologies Lan Huang Computer Science Department State University of New York, Stony Brook Presented by Kajal Miyan Michigan State University Overview Web Information

More information

Data Mining and Warehousing

Data Mining and Warehousing Data Mining and Warehousing Sangeetha K V I st MCA Adhiyamaan College of Engineering, Hosur-635109. E-mail:veerasangee1989@gmail.com Rajeshwari P I st MCA Adhiyamaan College of Engineering, Hosur-635109.

More information

Text Mining: A Burgeoning technology for knowledge extraction

Text Mining: A Burgeoning technology for knowledge extraction Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.

More information

Allow a few seconds for the app data to load. Fig.1: Splash screen

Allow a few seconds for the app data to load. Fig.1: Splash screen 1 Allow a few seconds for the app data to load. Fig.1: Splash screen 2 Fig.2: The main page Fig.3: Update dictionary After the splash screen, you will be directed to the main search page. A toast message

More information

SEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE

SEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE SEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE What is Search Engine Optimization? SEO is a marketing discipline focused on growing visibility in organic (non-paid) search engine results. Why

More information

Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets

Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets Rahul Potharaju (Purdue University) Navendu Jain (Microsoft Research) Cristina Nita-Rotaru (Purdue University) April

More information

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES

AN OVERVIEW OF SEARCHING AND DISCOVERING WEB BASED INFORMATION RESOURCES Journal of Defense Resources Management No. 1 (1) / 2010 AN OVERVIEW OF SEARCHING AND DISCOVERING Cezar VASILESCU Regional Department of Defense Resources Management Studies Abstract: The Internet becomes

More information

Unstructured Data. CS102 Winter 2019

Unstructured Data. CS102 Winter 2019 Winter 2019 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for patterns in data

More information

Administrivia. Crawlers: Nutch. Course Overview. Issues. Crawling Issues. Groups Formed Architecture Documents under Review Group Meetings CSE 454

Administrivia. Crawlers: Nutch. Course Overview. Issues. Crawling Issues. Groups Formed Architecture Documents under Review Group Meetings CSE 454 Administrivia Crawlers: Nutch Groups Formed Architecture Documents under Review Group Meetings CSE 454 4/14/2005 12:54 PM 1 4/14/2005 12:54 PM 2 Info Extraction Course Overview Ecommerce Standard Web Search

More information

Néonaute: mining web archives for linguistic analysis

Néonaute: mining web archives for linguistic analysis Néonaute: mining web archives for linguistic analysis Sara Aubry, Bibliothèque nationale de France Emmanuel Cartier, LIPN, University of Paris 13 Peter Stirling, Bibliothèque nationale de France IIPC Web

More information

Reading group on Ontologies and NLP:

Reading group on Ontologies and NLP: Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.

More information

American Institute of Physics

American Institute of Physics American Institute of Physics (http://journals.aip.org/)* Founded in 1931, the American Institute of Physics (AIP) is a not-for-profit scholarly society established for the purpose of promoting the advancement

More information

Department of Electronic Engineering FINAL YEAR PROJECT REPORT

Department of Electronic Engineering FINAL YEAR PROJECT REPORT Department of Electronic Engineering FINAL YEAR PROJECT REPORT BEngCE-2007/08-HCS-HCS-03-BECE Natural Language Understanding for Query in Web Search 1 Student Name: Sit Wing Sum Student ID: Supervisor:

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

Informatica Enterprise Information Catalog

Informatica Enterprise Information Catalog Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with

More information

John Edgar 2

John Edgar 2 CMPT 354 http://www.cs.sfu.ca/coursecentral/354/johnwill/ John Edgar 2 Assignments 30% Midterm exam in class 20% Final exam 50% John Edgar 3 A database is a collection of information Databases of one

More information

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system. Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.

More information

The Data Divide. Luke Segars - Google CS10

The Data Divide. Luke Segars - Google CS10 The Data Divide Luke Segars - Google 11/7/2012 @ CS10 Statement Google's advantage is not in writing drastically better software; it's in having more data. Question Can any problem be solved by computers

More information

Global Mobile Biometric Authentication Market: Size, Trends & Forecasts ( ) October 2017

Global Mobile Biometric Authentication Market: Size, Trends & Forecasts ( ) October 2017 Global Mobile Biometric Authentication Market: Size, Trends & Forecasts (2017-2021) October 2017 Global Mobile Biometric Authentication Market Report Scope of the Report The report entitled Global Mobile

More information

A web directory lists web sites by category and subcategory. Web directory entries are usually found and categorized by humans.

A web directory lists web sites by category and subcategory. Web directory entries are usually found and categorized by humans. 1 After WWW protocol was introduced in Internet in the early 1990s and the number of web servers started to grow, the first technology that appeared to be able to locate them were Internet listings, also

More information

Index Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search

Index Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing

More information

Semantic Search at Bloomberg

Semantic Search at Bloomberg Semantic Search at Bloomberg Search Solutions 2017 Edgar Meij Team lead, R&D AI emeij@bloomberg.net @edgarmeij Bloomberg Professional Service Bloomberg at a glance Bloomberg Professional Service Trading

More information

Toward a Knowledge-Based Solution for Information Discovery in Complex and Dynamic Domains

Toward a Knowledge-Based Solution for Information Discovery in Complex and Dynamic Domains Toward a Knowledge-Based Solution for Information Discovery in Complex and Dynamic Domains Eloise Currie and Mary Parmelee SAS Institute, Cary NC About SAS: The Power to Know SAS: The Market Leader in

More information

Information Retrieval Spring Web retrieval

Information Retrieval Spring Web retrieval Information Retrieval Spring 2016 Web retrieval The Web Large Changing fast Public - No control over editing or contents Spam and Advertisement How big is the Web? Practically infinite due to the dynamic

More information

Text Mining. Representation of Text Documents

Text Mining. Representation of Text Documents Data Mining is typically concerned with the detection of patterns in numeric data, but very often important (e.g., critical to business) information is stored in the form of text. Unlike numeric data,

More information

CS 572: Information Retrieval. Lecture 1: Course Overview and Introduction 11 January 2016

CS 572: Information Retrieval. Lecture 1: Course Overview and Introduction 11 January 2016 CS 572: Information Retrieval Lecture 1: Course Overview and Introduction 11 January 2016 1/11/2016 CS 572: Information Retrieval. Spring 2016 1 Lecture Plan What is IR? (the big questions) Course overview

More information

AUTOMATED VIDEO INDEXING AND VIDEO SEARCH IN LARGE LECTURE VIDEO ARCHIVES USING HADOOP FRAMEWORK

AUTOMATED VIDEO INDEXING AND VIDEO SEARCH IN LARGE LECTURE VIDEO ARCHIVES USING HADOOP FRAMEWORK AUTOMATED VIDEO INDEXING AND VIDEO SEARCH IN LARGE LECTURE VIDEO ARCHIVES USING HADOOP FRAMEWORK P. Satya Shekar Varma 1,Prof.K.VenkateshwarRao 2,A.SaiPhanindra 3,G. Ritin Surya Sainadh 4 1,3,4 Department

More information

Contextual Search using Cognitive Discovery Capabilities

Contextual Search using Cognitive Discovery Capabilities Contextual Search using Cognitive Discovery Capabilities In this exercise, you will work with a sample application that uses the Watson Discovery service API s for cognitive search use cases. Discovery

More information

Tim Cohn TimWCohn

Tim Cohn TimWCohn Tim Cohn www.marketingprinciples.com 1-866-TimWCohn How To Get More Leads, Prospects and Sales Without Hiring New Employees or Going Broke! The Only 3 Ways To Grow Your Business Increase the number of

More information

Information Management Platform Release Date Version Highlights compared to previous version

Information Management Platform Release Date Version Highlights compared to previous version For over 30 years ZyLAB has been working with professionals in the litigation, auditing, security and intelligence communities to develop the best solutions for investigating and managing large sets of

More information

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016 + Databases and Information Retrieval Integration TIETS42 Autumn 2016 Kostas Stefanidis kostas.stefanidis@uta.fi http://www.uta.fi/sis/tie/dbir/index.html http://people.uta.fi/~kostas.stefanidis/dbir16/dbir16-main.html

More information

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics

An Oracle White Paper October Oracle Social Cloud Platform Text Analytics An Oracle White Paper October 2012 Oracle Social Cloud Platform Text Analytics Executive Overview Oracle s social cloud text analytics platform is able to process unstructured text-based conversations

More information

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai. UNIT-V WEB MINING 1 Mining the World-Wide Web 2 What is Web Mining? Discovering useful information from the World-Wide Web and its usage patterns. 3 Web search engines Index-based: search the Web, index

More information

The Challenges of Search for Enterprise Search and Search-based Applications. Gregory Grefenstette

The Challenges of Search for Enterprise Search and Search-based Applications. Gregory Grefenstette The Challenges of Search for Enterprise Search and Search-based Applications Gregory Grefenstette 1 2 Differences between Enterprise & Web Search Motivators for Enterpise Search 3 types of Search Facets

More information

extensible Text Framework (XTF): Building a Digital Publishing Framework

extensible Text Framework (XTF): Building a Digital Publishing Framework extensible Text Framework (XTF): Building a Digital Publishing Framework California Digital Library Kirk Hastings Martin Haye XTF Topics Digital publishing at CDL What XTF is (and isn't) Design and Features

More information

Jane s 2.0 Customer FAQ

Jane s 2.0 Customer FAQ Jane s 2.0 Customer FAQ Accessing the Site Why are there no longer individual product URLs? The existing site is a collection of individual publications that have evolved over the years, incorporating

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

"Leveraging FIBO with Semantic Analysis to Perform On-Boarding, KYC and CDD" Bryan Bell & Elisa Kendall

Leveraging FIBO with Semantic Analysis to Perform On-Boarding, KYC and CDD Bryan Bell & Elisa Kendall Ontology Summit 2016 Track B 12 April 2017 "Leveraging FIBO with Semantic Analysis to Perform On-Boarding, KYC and CDD" Bryan Bell & Elisa Kendall linkedin.com/company/expert-system twitter.com/expert_system

More information

Search Engine Architecture II

Search Engine Architecture II Search Engine Architecture II Primary Goals of Search Engines Effectiveness (quality): to retrieve the most relevant set of documents for a query Process text and store text statistics to improve relevance

More information

Sklik PPC advertising from Seznam.cz

Sklik PPC advertising from Seznam.cz Sklik PPC advertising from Seznam.cz hotel šumava mujdummujhrad.cz mujdummujhrad.cz SUMMARY 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. Seznam.cz overview Seznam.cz & figures About Sklik Sklik & figures Sklik in

More information

@InfluxDB. David Norton 1 / 69

@InfluxDB. David Norton  1 / 69 @InfluxDB David Norton (@dgnorton) david@influxdb.com 1 / 69 Instrumenting a Data Center 2 / 69 3 / 69 4 / 69 The problem: Efficiently monitor hundreds or thousands of servers 5 / 69 The solution: Automate

More information

Mining User - Aware Rare Sequential Topic Pattern in Document Streams

Mining User - Aware Rare Sequential Topic Pattern in Document Streams Mining User - Aware Rare Sequential Topic Pattern in Document Streams A.Mary Assistant Professor, Department of Computer Science And Engineering Alpha College Of Engineering, Thirumazhisai, Tamil Nadu,

More information

The Goal of this Document. Where to Start?

The Goal of this Document. Where to Start? A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce

More information

Search & Google. Melissa Winstanley

Search & Google. Melissa Winstanley Search & Google Melissa Winstanley mwinst@cs.washington.edu The size of data Byte: a single character Kilobyte: a short story, a simple web html file Megabyte: a photo, a short song Gigabyte: a movie,

More information

Functionality Description

Functionality Description Responsive & Personalized Content Aggregation Content Management Classification Enterprise Search Collaboration Visual Analytics January 2017 Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Introduction... 2 Two

More information

INTRODUCTION. Chapter GENERAL

INTRODUCTION. Chapter GENERAL Chapter 1 INTRODUCTION 1.1 GENERAL The World Wide Web (WWW) [1] is a system of interlinked hypertext documents accessed via the Internet. It is an interactive world of shared information through which

More information

Search Engines. Charles Severance

Search Engines. Charles Severance Search Engines Charles Severance Google Architecture Web Crawling Index Building Searching http://infolab.stanford.edu/~backrub/google.html Google Search Google I/O '08 Keynote by Marissa Mayer Usablity

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search 1 / 33 Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search Bernd Wittefeld Supervisor Markus Löckelt 20. July 2012 2 / 33 Teaser - Google Web History http://www.google.com/history

More information

KNOWLEDGE GRAPH: FROM METADATA TO INFORMATION VISUALIZATION AND BACK. Xia Lin College of Computing and Informatics Drexel University Philadelphia, PA

KNOWLEDGE GRAPH: FROM METADATA TO INFORMATION VISUALIZATION AND BACK. Xia Lin College of Computing and Informatics Drexel University Philadelphia, PA KNOWLEDGE GRAPH: FROM METADATA TO INFORMATION VISUALIZATION AND BACK Xia Lin College of Computing and Informatics Drexel University Philadelphia, PA 1 A little background of me Teach at Drexel University

More information

Text Mining for Software Engineering

Text Mining for Software Engineering Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software

More information

20489: Developing Microsoft SharePoint Server 2013 Advanced Solutions

20489: Developing Microsoft SharePoint Server 2013 Advanced Solutions 20489: Developing Microsoft SharePoint Server 2013 Advanced Solutions Length: 5 days Audience: Developers Level: 300 OVERVIEW This course provides SharePoint developers the information needed to implement

More information

Searching SPORTDiscus: a basic guide

Searching SPORTDiscus: a basic guide Searching SPORTDiscus: a basic guide SPORTDiscus provides details of articles and other documents from across hundreds of journals, and other forms of literature, on the subject of sport. Many of the following

More information

COMPUTER MEMORY AND STORAGE CONCEPTS FOR ALL COMPUTER SYSTEMS

COMPUTER MEMORY AND STORAGE CONCEPTS FOR ALL COMPUTER SYSTEMS COMPUTER MEMORY AND STORAGE CONCEPTS FOR ALL COMPUTER SYSTEMS Memory Concepts: Computer memory is the Random Access Memory that all computer systems use to store the working instructions for the operating

More information

3 Media Web. Understanding SEO WHITEPAPER

3 Media Web. Understanding SEO WHITEPAPER 3 Media Web WHITEPAPER WHITEPAPER In business, it s important to be in the right place at the right time. Online business is no different, but with Google searching more than 30 trillion web pages, 100

More information

YouTube & Vimeo. Differences, similarities which one is for you or should you be on both?

YouTube & Vimeo. Differences, similarities which one is for you or should you be on both? YouTube & Vimeo Differences, similarities which one is for you or should you be on both? Videos go online, but where? There are two BIG players, YouTube & Vimeo, and a lot of other smaller guys. Vimeo

More information

University of Eastern Finland Library Heikki Laitinen UEF // University of Eastern Finland

University of Eastern Finland Library Heikki Laitinen UEF // University of Eastern Finland Information Retrieval in Health-Related Natural Sciences Applied physics, biomedicine, environmental sciences, medical engineering and computing, nutrition, pharmacy, toxicology University of Eastern Finland

More information

Lexis for Microsoft Office User Guide

Lexis for Microsoft Office User Guide Lexis for Microsoft Office User Guide Downloaded on 10-20-2011 Copyright 2011 LexisNexis. All rights reserved Contents About Lexis for Microsoft Office... 1 What is Lexis for Microsoft Office?...1 What's

More information

Tags, Categories and Keywords

Tags, Categories and Keywords Tags, Categories and Keywords Document Management Tip Sheet As more and more content gets added to your repository, it will become harder to find what you need. Documents may become buried in multi-level

More information

Making the most of Oxford Journals Online Collection.

Making the most of Oxford Journals Online Collection. Making the most of Oxford Journals Online Collection. Part 2 Searching Oxford Journals & Expanding Your Search searching oxford journals & expanding your search This is one of a set of five modules that

More information

Lexi Topic Models Setting up Topic Models in the EEG Cloud

Lexi Topic Models Setting up Topic Models in the EEG Cloud Lexi Topic Models Setting up Topic Models in the EEG Cloud Using Topic Models helps Lexi to learn about specific names, places, and topics unique to your application. This document will explain how Topic

More information

Text Document Clustering Using DPM with Concept and Feature Analysis

Text Document Clustering Using DPM with Concept and Feature Analysis Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,

More information

Figure 4. Digital Specialist Browse Page.

Figure 4. Digital Specialist Browse Page. Guide to Using the Digital Specialist The Digital Specialist Archive features the entire run of the Society s journal, The United States Specialist, from 1930 through the last full year published. While

More information

Open Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria

Open Source Search. Andreas Pesenhofer. max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria Open Source Search Andreas Pesenhofer max.recall information systems GmbH Künstlergasse 11/1 A-1150 Wien Austria max.recall information systems max.recall is a software and consulting company enabling

More information

Internet. Telephone Line

Internet. Telephone Line Internet The Internet (International Network) is a network of computers from all over the world linked together by telephone lines, fibre optic cables and satellite. Millions of users from all around the

More information

You need to start your research and most people just start typing words into Google, but that s not the best way to start.

You need to start your research and most people just start typing words into Google, but that s not the best way to start. Academic Research Using Google Worksheet This worksheet is designed to have you examine using various Google search products for research. The exercise is not extensive but introduces you to things that

More information

Natural Language Processing with PoolParty

Natural Language Processing with PoolParty Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense

More information

Viewpoint Review & Analytics

Viewpoint Review & Analytics The Viewpoint all-in-one e-discovery platform enables law firms, corporations and service providers to manage every phase of the e-discovery lifecycle with the power of a single product. The Viewpoint

More information

Integrating Spanish Linguistic Resources in a Web Site Assistant

Integrating Spanish Linguistic Resources in a Web Site Assistant Integrating Spanish Linguistic Resources in a Web Site Assistant Paloma Martínez*, Ana García-Serrano, Alberto Ruiz-Cristina * Universidad Carlos III de Madrid Avd. Universidad 30, 28911 Leganés, Madrid,

More information

CCH China Law Express & China Law for Foreign Business. Participant Training Guide

CCH China Law Express & China Law for Foreign Business. Participant Training Guide CCH China Law Express & China Law for Foreign Business July, 2007 Table of Contents INTRODUCTION...2 COURSE OBJECTIVES...2 LOGGING IN...3 Library Layout and Subscription Content...4 CHINA LAW EXPRESS...5

More information

Derwent Innovations Index

Derwent Innovations Index Derwent Innovations Index DERWENT INNOVATIONS INDEX Quick reference card ISI Web of Knowledge SM Derwent Innovations Index is a powerful patent research tool, combining Derwent World Patents Index, Patents

More information

International Journal of Scientific & Engineering Research Volume 7, Issue 12, December-2016 ISSN

International Journal of Scientific & Engineering Research Volume 7, Issue 12, December-2016 ISSN 55 The Answering System Using NLP and AI Shivani Singh Student, SCS, Lingaya s University,Faridabad Nishtha Das Student, SCS, Lingaya s University,Faridabad Abstract: The Paper aims at an intelligent learning

More information

From Boolean Towards Semantic Retrieval Models. Speakers : Arpan Gupta, Seinjuti Chatterjee

From Boolean Towards Semantic Retrieval Models. Speakers : Arpan Gupta, Seinjuti Chatterjee From Boolean Towards Semantic Retrieval Models Speakers : Arpan Gupta, Seinjuti Chatterjee 1 About us Leading Machine Learning Platform For Ecommerce Search 120+Customers & Brands 1200+ Global Websites

More information

SEO Case Study: How We Increased Traffic from the Conversion Pages by 60% Client: Turkeyhomes.com

SEO Case Study: How We Increased Traffic from the Conversion Pages by 60% Client: Turkeyhomes.com SEO Case Study: How We Increased Traffic from the Conversion Pages by 60% Client: Turkeyhomes.com Client Turkeyhomes.com sells extremely high quality real estate in Turkey to a global audience. This company

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND 41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia

More information

What Is Voice SEO and Why Should My Site Be Optimized For Voice Search?

What Is Voice SEO and Why Should My Site Be Optimized For Voice Search? What Is Voice SEO and Why Should My Site Be Optimized For Voice Search? Voice search is a speech recognition technology that allows users to search by saying terms aloud rather than typing them into a

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Web Crawler Finds and downloads web pages automatically provides the collection for searching Web is huge and constantly

More information

Information retrieval

Information retrieval Information retrieval Lecture 8 Special thanks to Andrei Broder, IBM Krishna Bharat, Google for sharing some of the slides to follow. Top Online Activities (Jupiter Communications, 2000) Email 96% Web

More information

August 2012 Daejeon, South Korea

August 2012 Daejeon, South Korea Building a Web of Linked Entities (Part I: Overview) Pablo N. Mendes Free University of Berlin August 2012 Daejeon, South Korea Outline Part I A Web of Linked Entities Challenges Progress towards solutions

More information

A Guide to Business and Economics Databases How to use Factiva Page 1

A Guide to Business and Economics Databases How to use Factiva Page 1 How to use Factiva Page 1 How to use Factiva Factiva contains newspaper articles from the UK and around the world, including the Financial Times (FT). Connect to Factiva through the moodle database course

More information

Web version of SciFinder : new interface and features

Web version of SciFinder : new interface and features Web version of SciFinder : new interface and features Arisara Rodmui, CAS representative www.cas.org User enters the information 1. Registration Enters personal Information w/ strictly ID & PSW only authorized

More information

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany Information Systems & University of Koblenz Landau, Germany Semantic Search examples: Swoogle and Watson Steffen Staad credit: Tim Finin (swoogle), Mathieu d Aquin (watson) and their groups 2009-07-17

More information