Entity Extraction Enables Discovery
|
|
- Cody McCormick
- 6 years ago
- Views:
Transcription
1 Entity Extraction Enables Discovery A discovery search is one in which you don t know or can t know all relevant search terms. Automated entity extraction lets you discover what you don t know. by Steve Cohen, EVP/COO
2 Pg. 1 ABOUT BASIS TECHNOLOGY Basis Technology provides software solutions for text analytics, information retrieval, digital forensics, and identity resolution in over forty languages. Our Rosette linguistics platform is a widely used suite of interoperable components that power search, business intelligence, e-discovery, social media monitoring, financial compliance, and other enterprise applications. Our linguistics team is at the forefront of applied natural language processing using a combination of statistical modeling, expert rules, and corpus-derived data. Our forensics team pioneers better, faster, and cheaper techniques to extract forensic evidence, keeping government and law enforcement ahead of exponential growth of data storage volumes. Software vendors, content providers, financial institutions, and government agencies worldwide rely on Basis Technology s solutions for Unicode compliance, language identification, multilingual search, entity extraction, name indexing, and name translation. Our products and services are used by over 250 major firms, including Cisco, EMC, Exalead/Dassault Systems, Hewlett-Packard, Microsoft, Oracle, and Symantec. Our text analysis products are widely used in the U.S. defense and intelligence industry by such firms as CACI, Lockheed Martin, Northrop Grumman, SAIC, and SRI. We are the top provider of multilingual technology to web and e-commerce search engines, including Amazon.com, Bing, Google, and Yahoo!. Company headquarters are in Cambridge, Massachusetts, with branch offices in San Francisco, Washington, London, and Tokyo. For more information, visit Basis Technology Corporation. Basis Technology Corporation, Rosette and Highlight are registered trademarks of Basis Technology Corporation. Big Text Analytics is a trademark of Basis Technology Corporation. All other trademarks, service marks, and logos used in this document are the property of their respective owners. ( WPREX)
3 Pg. 2 Consider three scenarios: 1. A real estate agent wants to search Craig s List for the prices of all two-bedroom/two-bathroom condominiums in Boston with over 1200 square feet and indoor parking. 2. An intelligence analyst wants to search global newspapers for the names, locations, and dates associated with a certain diplomat s visit to the U.S. 3. An industrial designer for a leading automaker wants to search through thousands of customer s received last year for all references to the color of the company s hot-selling new model. The advantage of such searches is clear. Yet each is impossible or impractical with today s web and enterprise search engines, because conventional search technology does not enable discovery search. What is discovery search? It is a search in which you are looking for something you cannot fully specify until you find it. You don t know all the search terms that will return all instances of the desired information. Even if you did know all of these terms, you probably would not have time to type them all into a search box. Take the real estate scenario. There are countless ways sellers describe the attributes of properties, yet a person reading these listings usually has no problem parsing the sentences and extracting the key entities of interest price, square footage, amenities, and so on. Whether bathroom is spelled out, abbreviated, shortened to bath, or preceded by a number or a word, the meaning is obvious to the buyer. But it s impossible for anyone to know, never mind manually type, all of the possible search-term combinations that would find every qualified apartment regardless of how the classified ads were written. The same difficulty applies to the other two scenarios. How would you specify a search for all the places a diplomat might have visited? Or how could you search for every possible occurrence of a color especially since new words for colors are invented all the time? You can conduct a discovery search by using a technology called entity extraction. Entity extraction automatically locates important search terms for you based on the same contextual cues people would use. The ability to locate these entities, or units of meaning, in any text is what enables discovery search to find all the information that fits a definition, even when you don t know all of the search terms that might fit the definition. This ability is something that the current generation of search technologies does not do.
4 Pg. 3 TWO-DIMENSIONAL SEARCH First generation search tools look for information in two ways: by word matching and by link analysis. Mid 1990s: Alta Vista & Keyword Lookup The earliest search tools, of which Alta Vista is the model example, looked for keywords in documents that match the search terms the user typed into the search box. In these systems, relevance is based on the document s internal content attributes, including the document title, the number of times the sought-after string appears in the document, how early in the document the terms appear, and the proximity of words to each other in a multi-word search term. Late 1990s: Google & Social Context Rather than just look at a document in isolation to determine search relevance, Google and most other web search engines today also look at how many other documents link to it. If more people link to a document, the document must be more important, and hence more relevant to most searches. Late 1990s-Early 2000s: Ask Jeeves & Topic Communities Ask.com (formerly Ask Jeeves) was the first to apply machine language processing to semantic content specifically when users pose questions. Where other search engines ignored words like Why or What because they appear too often to have relevance anywhere, Ask regards such words as question markers. Another technique Ask uses is dynamic topic clustering. This technique weighs both the raw number of inbound links, like Google, and the number of links from sites with a high degree of relevant subject content that also link to each other. If a site is highly referenced within a topic community, it is probably better suited to answer a question on that topic. DISCOVERY: THE NEXT GENERATION IN SEARCH None of these methods, however, let you search on meaning rather than on words. If you were looking for information on red cars, for example, even the technology that powers Ask.com would only return answers from sources explicitly including the word red. It would not return answers about burgundy or maroon unless those sources included the word red as well. That might pose a problem for, say, a car designer looking for the most popular color combinations mentioned in a year s worth of customer s.
5 Pg. 4 Adding the word color to the search string as a workaround might improve results, but again, not all red-related content would include the word color either. Another issue arises in the fact that the word red has meanings outside the context of design, especially as a metaphor for danger, as in red zone. A designer might want to exclude those. Bush is an even richer example of a word whose meaning is different depending on context. A search on that word might return articles about President Bush as well as articles about landscaping. A journalist looking for documents about President Bush would want to specify the person attribute in the search. What is missing in these scenarios is the ability to refine searches to target the desired meaning (semantics) of a term. The computer cannot automatically discover all the entities, i.e., concepts, expressed as words or phrases that have those attributes. With first generation search, you must already know all the right keywords to type into the search box. You must also have taken into account all the semantic misinterpretations (as in other kinds of bushes). And you would also need the skill to express terms so as to exclude those misinterpretations. This ability to automatically discover semantic matches and exclude semantic non-matches is what entity extraction is all about. HOW ENTITY EXTRACTION WORKS Words derive much of their meanings from the context in which they appear on a page. The same word will often have very different meanings depending on context. Similarly, the same meaning will often be expressed by very different words, which is also context-sensitive. These contextual features include: Proximity to other words Written forms of the word (e.g., abbreviations, capitalization) Parts of speech (e.g., is the word used as a subject, predicate, object, etc.) Punctuation And many more A person grasps a concept from a word or phrase choosing from one of several possible word definitions. That choice is heavily dependent on the context of the word or phrase. Entities are the words on a page that represent a concept in people s heads.
6 Pg. 5 Suppose you wanted to find all human names in a document all names, including those of people you had not previously considered. Suppose also that the document contains the quote: Mr. John Hillinhuetter spoke Even though you may never have seen the name Hillinhuetter before, you would infer that Hillinhuetter is a person s name. You would base that inference on that fact that the word is capitalized and otherwise matches a pattern: Mr. [capitalized noun] [capitalized noun] [verb] Entity extraction recreates the process of applying and recognizing context in a computer. Take the word date. When that word appears in a block of text, is it an entity with one set of contextual features, as in a point in time, or is it a completely different set of contextual features that add up to food? There are literally hundreds of different features that may indicate when the word date is about time or about food. Ko i Atta Annan is a Ghanaian diplomat who served as the seventh Secretary General of the United Nations from January 1, 1997, to January 1, 2007, serving two ive-year terms. Annan was the co-recipient of the Nobel Peace Prize in October Ko i Annan was born on April 8, 1938, to Victoria and Henry Reginald Annan in Kumasi, Ghana. He is a twin, an occurrence that is regarded as special in Ghanaian culture. Efua Atta, his twin sister, shares the same middle name, which means twin. As with most Akan names, his irst name indicates the day of the week he was born: Ko i denotes a boy born on a Friday. The name Annan can indicate that a child was the fourth in the family, but in his family it was simply a name which Annan inherited from his parents. Person In 1962, Annan started working as a Budget Of icer for the World Health Location Organization, an agency of the United Nations. From 1974 to 1976, he worked Organization as the Director of Tourism in Ghana. Annan then returned to work for the United Date Nations as an Assistant Secretary General in three consecutive positions. Nationality Title Ko i Annan (né le 8 Avril 1938), est l ancien secrétaire général des Nations-unies. Ko i Annan fut le septième Secrétaire général de l Organisation des Nations Unies et le premier à sortir des rangs du personnel. Il a entamé son premier mandat le 1 er janvier Le 29 juin 2001, sur recommandation du Conseil de sécurité, l Assemblée générale l a réélu par acclamation pour un second mandat, commençant le 1 er janvier 2002 et s'achevant au 31 décembre Ko i Annan est né à Kumasi (Ghana) le 8 avril Il a étudié à l'université scienti ique et technologique, à Kumasi, et a achevé sa licence d'économie au Macalester College, à Saint Paul (Minnesota) aux États-Unis, en En Personne 1961 et 1962, il a fait des études de troisième cycle en économie à l Institut universitaire Lieu des hautes études internationales, à Genève. En 1971 et 1972, en qualité de Organisation Sloan Fellow au Massachusetts Institute of Technology, M. Annan a obtenu son diplôme Date de maîtrise en sciences de gestion. Nationalité Titre For decades, linguists and computational linguists have examined word context to create natural language processing software. Natural language refers to normally written or spoken language. The barrier to automating entity extraction has been the question of how to write software to correctly recognize entities. Manually programming the rules to do this is impractical; there are just too many different possible contexts that determine a word s meaning. Some features are obvious, but many are not. It would be virtually impossible to specify all of them explicitly as rules for a computer for every possible context a computer might encounter in real-world applications. Rather than programming computers to recognize contexts, an alternative approach has been to essentially install the language itself into computers. In other words, the goal is to teach computers the grammatical and morphological rules of English, French, Chinese, or whatever human language is to be parsed. Once the computer understands a language, it could presumably recognize entities within a text based on the context in which the entity appears.
7 Pg. 6 A natural language parser, however, has two main drawbacks: Natural languages are enormously difficult to express as explicit rule sets, so a parser would take years to construct Every language has a different set of rules, so the parser built for one language would not work for another A third approach, different from either programming computers to match contexts explicitly or constructing natural language parsers, is statistical machine learning. Whereas other approaches might take years just to write the code to begin to extract even one entity (color, for example), a statistical approach uses an efficient three-step process that can be completed in a few weeks. 1. A computational linguist specifies an entity s contextual features 2. Native speakers tag a statistically sufficient number of examples of the entity (typically thousands) in the target text genre (e.g., news, medical, financial, etc.) 3. The features and tagged examples are fed into a computer, which then generates a model that recognizes entity matches in the text genre. Statistical machine learning makes entity extraction practical and offers several key benefits: Language Independence The software algorithms are the same regardless of language or script; the only difference is the entity-specific context features and language-specific, tagged training samples used. Entity Extensibility Adding new features and examples to an existing model is straightforward, so adding new entities to accommodate linguistic changes is relatively easy. Automated Discovery The model will return new context-appropriate instances of an entity even if they didn t exist when the entity was first modeled. Users don t have to search for terms they don t know (or don t know yet) to define an inclusive search.
8 Pg. 7 THE IMPACT Statistically powered entity extraction distinguishes the newest generation of search from earlier generations. People will no longer have to rely on personal knowledge (or simple brute force) to find all occurrences of the information they wish to find in unstructured content, but people are not the only ones capable of using this technology. Adding entity extraction to XML tagging tools, for example, provides an efficient way to bring the mountains of unstructured data both enterprise and web into a database where it can be manipulated and processed just like structured data can. In fact, in the very near future the idea of unstructured data as a separate category may itself seem quaint. We are looking at a new kind of convergence, one just as important as the convergence between data and communications. When that happens, it will be because computers can identify what words mean, not just what they look like. BASIS TECHNOLOGY S APPROACH TO ENTITY EXTRACTION The Rosette Entity Extractor is a hybrid mechanism that integrates the results from three techniques: list-based, pattern matching, and statistical. The target text is fed to all three modules and then a fourth module called the redactor balances the results and acts as judge when answers conflict. Rosette uses a weighted set of criteria to merge results and identify people, places, and other entities. Rosette Entity Extractor Features Foreign language capabilities Rosette extracts entities from text in many languages, including English, Arabic, Pashto, Persian, Urdu, Chinese, Japanese Korean, and major European languages. Context-sensitive extraction Rosette s statistical models consider context when extracting key entities such as person, place, and organization (including company names). Seamless integration Rosette is a software development kit (SDK) accessible via via single C, C++,.NET, or Java application programming interface (API). It has been designed for simple integration with Apache Lucene, Apache Solr, dtsearch,and other search engines. Easy customization Users can add custom entities via regular expressions or lists, or enhance the statistical model with training data with additional data relevant to the user s problem domain. High accuracy and throughput Rosette s accuracy and speed is industry-tested and used by customers such as Microsoft Bing, which handle a high volume of transactions and require high quality for every system component.
9 Pg. 8 ABOUT THE AUTHOR Steve Cohen is Executive Vice President and Chief Operating Officer of Basis Technology, where he is responsible for worldwide sales and the planning and operations of the company s linguistic product research and development. Before starting Basis Technology with Carl Hoffman, Steve was engineering manager for Cognex Corporation s Tokyo office and development manager for SMT device inspection. He has also consulted on software internationalization engineering and developed software for embedded systems and electronic test equipment. Steve earned a bachelor s degree in electrical engineering from MIT and studied at Waseda University in Tokyo.
Harnessing Publicly Available Factual Data in the Analytical Process
June 14, 2012 Harnessing Publicly Available Factual Data in the Analytical Process by Benson Margulies, CTO We put the World in the World Wide Web ABOUT BASIS TECHNOLOGY Basis Technology provides so ware
More informationThe Goal: Succeeding in the Japanese market
Case Study Use Case: Social Media Platform Segment: Consumer Reviews Whether you live in San Francisco, Boston, Dublin, Vienna, or Tokyo, Yelp has reviews of local businesses in your neighborhood. Yelp
More informationLanguage Support, Linguistics, and Text Analytics in Solr
Boston Apache Lucene and Solr Meetup Language Support, Linguistics, and Text Analytics in Solr Carl Steve W. Kearns Hoffman Product Manager Basis Technology Founder & CEO www.basistech.com Agenda About
More informationChallenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio
Case Study Use Case: Recruiting Segment: Recruiting Products: Rosette Challenge CareerBuilder, the global leader in human capital solutions, operates the largest job board in the U.S. and has an extensive
More informationAn Oracle White Paper October Oracle Social Cloud Platform Text Analytics
An Oracle White Paper October 2012 Oracle Social Cloud Platform Text Analytics Executive Overview Oracle s social cloud text analytics platform is able to process unstructured text-based conversations
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationProduction Report of the TRAD Parallel Corpus Chinese-French Translation of a subset of GALE Phase 1 Chinese Blog Parallel Text (LDC2008T06)
Production Report of the TRAD Parallel Corpus Chinese-French Translation of a subset of GALE Phase 1 Chinese Blog Parallel Text (LDC2008T06) This corpus has been produced within the framework of the PEA-TRAD
More informationThis corpus has been produced within the framework of the PEA-TRAD project ( )
Production Report of the TRAD Parallel Corpus Arabic-French Translation of a subset of GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1 (LDC2009T03) This corpus has been produced within the framework
More informationAgile Internationalization User Stories
Agile Internationalization User Stories Tex Texin Chief Globalization Architect XenCraft Internationalization and Unicode Conference IUC41 Abstract User stories are the way that Agile Methodology describes
More informationThe Unicode Standard Version 11.0 Core Specification
The Unicode Standard Version 11.0 Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers
More informationExpert Reference Series of White Papers. Five Simple Symbols You Should Know to Unlock Your PowerShell Potential
Expert Reference Series of White Papers Five Simple Symbols You Should Know to Unlock Your PowerShell Potential 1-800-COURSES www.globalknowledge.com Five Simple Symbols You Should Know to Unlock Your
More informationTim Cohn TimWCohn
Tim Cohn www.marketingprinciples.com 1-866-TimWCohn How To Get More Leads, Prospects and Sales Without Hiring New Employees or Going Broke! The Only 3 Ways To Grow Your Business Increase the number of
More informationPowering Knowledge Discovery. Insights from big data with Linguamatics I2E
Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural
More informationViewpoint Review & Analytics
The Viewpoint all-in-one e-discovery platform enables law firms, corporations and service providers to manage every phase of the e-discovery lifecycle with the power of a single product. The Viewpoint
More informationA TEXT MINER ANALYSIS TO COMPARE INTERNET AND MEDLINE INFORMATION ABOUT ALLERGY MEDICATIONS Chakib Battioui, University of Louisville, Louisville, KY
Paper # DM08 A TEXT MINER ANALYSIS TO COMPARE INTERNET AND MEDLINE INFORMATION ABOUT ALLERGY MEDICATIONS Chakib Battioui, University of Louisville, Louisville, KY ABSTRACT Recently, the internet has become
More informationGrid Computing with Voyager
Grid Computing with Voyager By Saikumar Dubugunta Recursion Software, Inc. September 28, 2005 TABLE OF CONTENTS Introduction... 1 Using Voyager for Grid Computing... 2 Voyager Core Components... 3 Code
More informationCompany Overview SYSTRAN Applications Customization for Quality Translations
Company Overview SYSTRAN Applications Customization for Quality Translations Prepared for Lirics Industrial Advisory Group June 20 & 21, 2005, Barcelona Agenda Company Overview SYSTRAN WebServer SYSTRAN
More informationTechnical Overview. Access control lists define the users, groups, and roles that can access content as well as the operations that can be performed.
Technical Overview Technical Overview Standards based Architecture Scalable Secure Entirely Web Based Browser Independent Document Format independent LDAP integration Distributed Architecture Multiple
More informationEDISPHERE. Mapping Simplified
EDISPHERE Mapping Simplified A comprehensive mapping product that simplifies your complex mapping requirements EDISPHERE is a comprehensive any-to-any mapping product, having a very intuitive user interface
More informationusing a website in their native language
KEY FIGURES ecommercetranslation.com AN ESTIMATED 42% OF EUROPEAN WEB USERS do not stay on a website that is in a foreign language 9 OUT OF 10 WEB USERS PREFER using a website in their native language
More informationThis demonstration is aimed at anyone with lots of text, unstructured or multiformat data to analyse.
1 2 This demonstration is aimed at anyone with lots of text, unstructured or multiformat data to analyse. This could be lots of Word, PDF and text file formats or in various databases or spreadsheets,
More informationGet the most value from your surveys with text analysis
SPSS Text Analysis for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That s
More informationAshton-Tate is a leader in the nearly $2-billion. microcomputer software market with a comprehensive, welldocumented
Contact: Chris Thomas Ashton-Tate (213) 204-5570 Gail Kauranen Miller Communications (617) 536-0470 MARKETING BACKGROUND Ashton-Tate is a leader in the nearly $2-billion microcomputer software market with
More informationMy Tracker. The Power of Intelligence. Accessing My Tracker Visit to access the homepage. 1. Profile History
1 My Tracker My Tracker is an exciting new procurement resource area populated with your contract information and containing a whole host of features to assist you with the tendering process. Accessing
More informationNEXT GENERATION ENCRYPTION AND KEY MANAGEMENT MEDIA OVERVIEW
NEXT GENERATION ENCRYPTION AND KEY MANAGEMENT MEDIA OVERVIEW MEDIA OVERVIEW PKWARE is a global leader in business data security, providing encryption and compression solutions to more than 35,000 enterprise
More informationSIEM: Five Requirements that Solve the Bigger Business Issues
SIEM: Five Requirements that Solve the Bigger Business Issues After more than a decade functioning in production environments, security information and event management (SIEM) solutions are now considered
More informationEnhancing applications with Cognitive APIs IBM Corporation
Enhancing applications with Cognitive APIs After you complete this section, you should understand: The Watson Developer Cloud offerings and APIs The benefits of commonly used Cognitive services 2 Watson
More information4 KEY FACTORS FOR DATA QUALITY ON A DATA LAKE (OR: HOW TO AVOID THE DATA SWAMP) JOSH HERRITZ MIOSOFT CORPORATION MIOsoft Corporation.
4 KEY FACTORS FOR DATA QUALITY ON A DATA LAKE (OR: HOW TO AVOID THE DATA SWAMP) JOSH HERRITZ MIOSOFT CORPORATION The trends in digital business promise that the future holds an unprecedented volume, variety,
More informationSTUDY ONLINE B2B MARKETING AND LEAD GENERATION TACTICS INCREASE SALES
CASE STUDY ONLINE B2B MARKETING AND LEAD GENERATION TACTICS INCREASE SALES Insight Selling EDUCATION MARKETING Inbound Marketing LEAD GENERATION Outbound Marketing ACCOUNT BASED MARKETING Content Marketing
More informationIs Free Bluetooth Really Free?
EmbeddedMarket forecasters Is Free Bluetooth Really Free? Jerry Krasner, Ph.D., Chief Analyst Dolores A. Krasner, Senior Editor American Technology International, Inc. www.embeddedforecast.com About EMF:
More informationHISTORIC SERVICE ORGANIZATION ADOPTS CISCO ACI TO MOVE INTO THE FUTURE AND ITS NEXT 100 YEARS
CASE STUDY HISTORIC SERVICE ORGANIZATION ADOPTS CISCO ACI TO MOVE INTO THE FUTURE AND ITS NEXT 100 YEARS THE CLIENT: Boys Town www.boystown.org THE CHALLENGE: Boys Town had two different core network infrastructures
More informationTechTarget s Client Consulting Services: Committed to maximizing your marketing ROI
White paper TechTarget s Client Consulting Services: Committed to maximizing your marketing ROI Best practices and strategic consulting services to keep you ahead of the market Client Consulting is a global
More informationNote by the Secretary-General CANDIDACY FOR THE POST OF DEPUTY SECRETARY-GENERAL
Plenipotentiary Conference (PP-18) Dubai, 29 October 16 November 2018 PLENARY MEETING Document 21-E 12 March 2018 Original: Arabic/Chinese/ English/French/Spanish/ Russian Note by the Secretary-General
More informationProduction Report of the TRAD Parallel Corpus Arabic-French Translation of a subset of NIST 2008 Open Machine Translation Evaluation (LDC2010T21)
Production Report of the TRAD Parallel Corpus Arabic-French Translation of a subset of NIST 2008 Open Machine Translation Evaluation (LDC2010T21) This corpus has been produced within the framework of the
More informationTechnical Paper Style Guide
AACE International Technical Paper Style Guide Prepared by the AACE International Technical Board Revised February 3, 2017 Contents 1. Purpose... 3 2. General Requirements... 3 2.1. Authorship... 3 2.2.
More informationIBM Compliance Offerings For Verse and S1 Cloud. 01 June 2017 Presented by: Chuck Stauber
IBM Compliance Offerings For Verse and S1 Cloud 01 June 2017 Presented by: Chuck Stauber IBM Connections & Verse Email and collaboration platform designed to help you work better Empower people Teams are
More informationACCESSING ONLINE DATABASES FROM YOUR HOME OR OFFICE FOR JOBSEEKERS
ACCESSING ONLINE DATABASES FROM YOUR HOME OR OFFICE FOR JOBSEEKERS U. S. JOBS/INTERNSHIPS Go to our homepage at http://ppld.org. Click on Research at the top of the screen on the brown bar (or half way
More informationSony Adopts Cisco Solution for Global IPv6 Project
Customer Case Study Sony Adopts Cisco Solution for Global IPv6 Project Sony aims to accelerate global collaboration and business across business units to realize goal of "One Sony." EXECUTIVE SUMMARY Customer
More informationParallel Concordancing and Translation. Michael Barlow
[Translating and the Computer 26, November 2004 [London: Aslib, 2004] Parallel Concordancing and Translation Michael Barlow Dept. of Applied Language Studies and Linguistics University of Auckland Auckland,
More informationPrivacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras
Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 25 Tutorial 5: Analyzing text using Python NLTK Hi everyone,
More informationFocus On: The Deal Pipeline
By Penny Crossland This article was published in December 2009 and was accurate as of that date. Used under license agreement with Free Pint Limited; all other rights reserved. For further information
More information6.001 Notes: Section 15.1
6.001 Notes: Section 15.1 Slide 15.1.1 Our goal over the next few lectures is to build an interpreter, which in a very basic sense is the ultimate in programming, since doing so will allow us to define
More informationRELEASE NOTES UFED ANALYTICS DESKTOP SAVE TIME AND RESOURCES WITH ADVANCED IMAGE ANALYTICS HIGHLIGHTS
RELEASE NOTES Version 5.2 September 2016 UFED ANALYTICS DESKTOP HIGHLIGHTS UFED Analytics Desktop version 5.2 serves as your virtual partner, saving precious time in the investigative process. Designed
More informationActivity Report at SYSTRAN S.A.
Activity Report at SYSTRAN S.A. Pierre Senellart September 2003 September 2004 1 Introduction I present here work I have done as a software engineer with SYSTRAN. SYSTRAN is a leading company in machine
More informationMaking Sense Out of the Web
Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide
More information"Leveraging FIBO with Semantic Analysis to Perform On-Boarding, KYC and CDD" Bryan Bell & Elisa Kendall
Ontology Summit 2016 Track B 12 April 2017 "Leveraging FIBO with Semantic Analysis to Perform On-Boarding, KYC and CDD" Bryan Bell & Elisa Kendall linkedin.com/company/expert-system twitter.com/expert_system
More informationFrom Human Language to Useful Information. Gilbane Boston November 2007
From Human Language to Useful Information Gilbane Boston November 2007 Steve Cohen Founder and EVP www.basistech.com Matt Kodama Product Management www.endeca.com About Basis Technology Diversified firm
More informationDesign and Implementation of Search Engine Using Vector Space Model for Personalized Search
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 1, January 2014,
More informationNatural Language Processing as Key Component to Successful Information Products
Natural Language Processing as Key Component to Successful Information Products Yves Schabes Teragram Corporation Tera monster in Greek 2 40 (=1,099,511,627,766) 10 12 (one trillion) gram something written
More information<Insert Picture Here> Oracle Policy Automation 10.0 Features and Benefits
Oracle Policy Automation 10.0 Features and Benefits December 2009 The following is intended to outline our general product direction. It is intended for information purposes only,
More informationQuestion Answering Systems
Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction
More informationAwareness and training programs OPTUS MACQUARIE UNIVERSITY CYBER SECURITY HUB
Awareness and training programs OPTUS MACQUARIE UNIVERSITY CYBER SECURITY HUB 2 OPTUS MACQUARIE UNIVERSITY CYBER SECURITY HUB In today s digital world, safeguarding data, intellectual property, financial
More informationGovernment-University-Industry Research Roundtable (GUIRR) Update FDP Meeting May 14-15, 2009 Irvine, CA
Government-University-Industry Research Roundtable (GUIRR) Update FDP Meeting May 14-15, 2009 Irvine, CA What is GUIRR? Joint body of the NAS, NAE, and IOM Created in 1984 to convene senior-most representatives
More informationGUIDELINES ON THE CONTINUING PROFESSIONAL DEVELOPMENT (CPD) HOURS. Appendix I. Guidelines on the Continuing Professional Development (CPD) Hours
Appendix I Guidelines on the Continuing Professional Development (CPD) Hours Table of Contents Document Revision Control... i 1. Introduction... 1 2. Objectives... 1 3. Applicability... 1 4. Definition
More informationWebsite Design Article by Michele Jennings, Netmajic, Inc.
It s not just a website; it s a digital employee. TM Website Design Article by Michele Jennings, Netmajic, Inc. As with all employees, there is a job description, training, constant communication of current
More informationFRENCH WEEE REGISTER FOR PRODUCERS OF ELECTRICAL AND ELECTRONIC EQUIPMENT
FRENCH WEEE REGISTER FOR PRODUCERS OF ELECTRICAL AND ELECTRONIC EQUIPMENT USER GUIDE FOR AUTHORISED REPRESENTATIVES January 2018 ADEME Angers Direction Économie Circulaire et Déchets Service Produits et
More informationData and Information Integration: Information Extraction
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Data and Information Integration: Information Extraction Varnica Verma 1 1 (Department of Computer Science Engineering, Guru Nanak
More informationBuilding a Scalable, Service-Centric Sender Policy Framework (SPF) System
Valimail White Paper February 2018 Building a Scalable, Service-Centric Sender Policy Framework (SPF) System Introduction Sender Policy Framework (SPF) is the protocol by which the owners of a domain can
More informationMs. Cingal, Mr. Huet, Mr. Jendoubi, Ms. Regen. Les documents et les appareils électroniques ne sont pas autorisés.
Université PANTHÉON - ASSAS (PARIS II) MELUN U.E.C.1 7296 Droit - Economie - Sciences Sociales Melun Session : Septembre 2018 Année d'étude : Discipline : Troisième année de Licence Droit Anglais juridique
More informationNatural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi) )
Natural Language Processing SoSe 2014 Question Answering Dr. Mariana Neves June 25th, 2014 (based on the slides of Dr. Saeedeh Momtazi) ) Outline 2 Introduction History QA Architecture Natural Language
More informationReadme File. Purpose. Hyperion Financial Data Quality Management for Enterprise Release Readme
Hyperion Financial Data Quality Management for Enterprise Release 7.2.1 Readme Readme File This file contains the following sections: Purpose... 1 Restrictions... 2 New Features... 2 Multi-Language Support...
More informationThe Next Evolution of Enterprise Public Cloud. Bring the Oracle Cloud to Your Data Center
The Next Evolution of Enterprise Public Cloud Bring the Oracle Cloud to Your Data Center The Next Stage of Cloud Evolution Over the past decade, cloud has matured from a fringe technology option that offered
More informationCHAPTER 2: DATA MODELS
CHAPTER 2: DATA MODELS 1. A data model is usually graphical. PTS: 1 DIF: Difficulty: Easy REF: p.36 2. An implementation-ready data model needn't necessarily contain enforceable rules to guarantee the
More informationThe DITA business case
The DITA business case Maximizing content value Sarah O'Keefe Bill Swallow September 17, 2018 Executive summary Executive summary Companies require content to support ever-increasing requirements, including:
More informationTen Innovative Financial Services Applications Powered by Data Virtualization
Ten Innovative Financial Services Applications Powered by Data Virtualization DATA IS THE NEW ALPHA In an industry driven to deliver alpha, where might financial services firms find opportunities when
More informationQuery-Time JOIN for Active Intelligence Engine (AIE)
Query-Time JOIN for Active Intelligence Engine (AIE) Ad hoc JOINing of Structured Data and Unstructured Content: An Attivio-Patented Breakthrough in Information- Centered Business Agility An Attivio Technology
More informationA Technical Overview: Voiyager Dynamic Application Discovery
A Technical Overview: Voiyager Dynamic Application Discovery A brief look at the Voiyager architecture and how it provides the most comprehensive VoiceXML application testing and validation method available.
More informationNetwrix Virtual. Customer Summit 2016
Netwrix Virtual Customer Summit 2016 Welcome Michael Fimin Chief Executive Officer Phone: 1.949.407.5125 x1057 Email: Michael.Fimin@netwrix.com linkedin.com/in/michaelfimin Agenda Michael Fimin Chief Executive
More informationWHITE PAPER. Operationalizing Threat Intelligence Data: The Problems of Relevance and Scale
WHITE PAPER Operationalizing Threat Intelligence Data: The Problems of Relevance and Scale Operationalizing Threat Intelligence Data: The Problems of Relevance and Scale One key number that is generally
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationSearch Engine Marketing Guide 5 Ways to Optimize Your Business Online
Search Engine Marketing Guide 5 Ways to Optimize Your Business Online Table of Contents Introduction....................................................... 3 Quiz: How Does Your Website Rank?.............................4
More informationOracle Big Data SQL brings SQL and Performance to Hadoop
Oracle Big Data SQL brings SQL and Performance to Hadoop Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data SQL, Hadoop, Big Data Appliance, SQL, Oracle, Performance, Smart Scan Introduction
More informationAchieving Network Storage Optimization, Security, and Compliance Using File Reporter
Information Management & Governance Achieving Network Storage Optimization, Security, and Compliance Using File Reporter Table of Contents page Detailed Network Storage File Reporting and Analysis...2
More informationSemantic Parsing for Location Intelligence
Semantic Parsing for Location Intelligence Voicebox s advanced system for helping you navigate the world Voicebox s Location Intelligence system combines cutting-edge Semantic Parsing for natural language
More informationEnabling Data Governance Leveraging Critical Data Elements
Adaptive Presentation at DAMA-NYC October 19 th, 2017 Enabling Data Governance Leveraging Critical Data Elements Jeff Goins, President, Jeff.goins@adaptive.com James Cerrato, Chief, Product Evangelist,
More informationInformatica Enterprise Information Catalog
Data Sheet Informatica Enterprise Information Catalog Benefits Automatically catalog and classify all types of data across the enterprise using an AI-powered catalog Identify domains and entities with
More informationMaximizing Website Return on Investment: The Crucial Role of High-Quality Search
Maximizing Website Return on Investment: The Crucial Role of High-Quality Search Driving conversions: the crucial role of search........................... 4 Rapid payback on investments......................................
More informationOracle Enterprise Data Quality for Product Data
Oracle Enterprise Data Quality for Product Data Glossary Release 5.6.2 E24157-01 July 2011 Oracle Enterprise Data Quality for Product Data Glossary, Release 5.6.2 E24157-01 Copyright 2001, 2011 Oracle
More informationHarmonizing Multi-Model at the World Bank Group
Harmonizing Multi-Model at the World Bank Group Valentin Prudius, World Bank Group Dagobert Soergel, University at Buffalo & World Bank Group Denisa Popescu, World Bank Group Tyler Replogle, MarkLogic
More informationKnowledge Engineering Models and Tools for the Digital Scholarly Publishing of Manuscripts
Knowledge Engineering Models and Tools for the Digital Scholarly Publishing of Manuscripts Semantic Web for the Digital Humanities Sahar Aljalbout, Giuseppe Cosenza, Luka Nerima, Gilles Falquet 1 Cultural
More informationInformation Extraction Techniques in Terrorism Surveillance
Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism
More informationEllie Bushhousen, Health Science Center Libraries, University of Florida, Gainesville, Florida
Cloud Computing Ellie Bushhousen, Health Science Center Libraries, University of Florida, Gainesville, Florida In the virtual services era the term cloud computing has worked its way into the lexicon.
More informationNatural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi)
Natural Language Processing SoSe 2015 Question Answering Dr. Mariana Neves July 6th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Introduction History QA Architecture Outline 3 Introduction
More informationRETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2
Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1907-1911 1907 Web-Based Data Mining in System Design and Implementation Open Access Jianhu
More informationEleven+ Views of Semantic Search
Eleven+ Views of Semantic Search Denise A. D. Bedford, Ph.d. Goodyear Professor of Knowledge Management Information Architecture and Knowledge Management Kent State University Presentation Focus Long-Term
More informationData Privacy in Your Own Backyard
White paper Data Privacy in Your Own Backyard Staying Secure Under New GDPR Employee Internet Monitoring Rules www.proofpoint.com TABLE OF CONTENTS INTRODUCTION... 3 KEY GDPR PROVISIONS... 4 GDPR AND EMPLOYEE
More informationUNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES
UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES Saturday 10 th December 2016 09:30 to 11:30 INSTRUCTIONS
More informationMULTINATIONALIZATION FOR GLOBAL LIMS DEPLOYMENT LABVANTAGE Solutions, Inc. All Rights Reserved.
FOR GLOBAL LIMS DEPLOYMENT 2011 LABVANTAGE Solutions, Inc. All Rights Reserved. OVERVIEW Successful companies leverage their assets to achieve operational efficiencies. By streamlining work processes and
More informationSEO Case Study: How We Increased Traffic from the Conversion Pages by 60% Client: Turkeyhomes.com
SEO Case Study: How We Increased Traffic from the Conversion Pages by 60% Client: Turkeyhomes.com Client Turkeyhomes.com sells extremely high quality real estate in Turkey to a global audience. This company
More informationUniversal Model Framework -- An Introduction
Universal Model Framework -- An Introduction By Visible Systems Corporation www.visible.com This document provides an introductory description of the Universal Model Framework an overview of its construct
More informationNatural Language Processing. SoSe Question Answering
Natural Language Processing SoSe 2017 Question Answering Dr. Mariana Neves July 5th, 2017 Motivation Find small segments of text which answer users questions (http://start.csail.mit.edu/) 2 3 Motivation
More informationCHAPTER4 CONSTRAINTS
CHAPTER4 CONSTRAINTS LEARNING OBJECTIVES After completing this chapter, you should be able to do the following: Explain the purpose of constraints in a table Distinguish among PRIMARY KEY, FOREIGN KEY,
More informationBENEFITS of MEMBERSHIP FOR YOUR INSTITUTION
PROFILE The Fiduciary and Investment Risk Management Association, Inc. (FIRMA ) is the leading provider of fiduciary and investment risk management education and networking to the fiduciary and investment
More informationMapping the library future: Subject navigation for today's and tomorrow's library catalogs
University of Pennsylvania ScholarlyCommons Scholarship at Penn Libraries Penn Libraries January 2008 Mapping the library future: Subject navigation for today's and tomorrow's library catalogs John Mark
More informationER/Studio Enterprise Portal 1.1 New Features Guide
ER/Studio Enterprise Portal 1.1 New Features Guide 2nd Edition, April 16/2009 Copyright 1994-2009 Embarcadero Technologies, Inc. Embarcadero Technologies, Inc. 100 California Street, 12th Floor San Francisco,
More informationThis session will provide an overview of the research resources and strategies that can be used when conducting business research.
Welcome! This session will provide an overview of the research resources and strategies that can be used when conducting business research. Many of these research tips will also be applicable to courses
More informationMAPR DATA GOVERNANCE WITHOUT COMPROMISE
MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance
More informationFusion Apps Administration: Case Study Utilizing Administration Groups and Target Properties for Efficient Administration
An Oracle White Paper April, 2014 Fusion Apps Administration: Case Study Utilizing Administration Groups and Target Properties for Efficient Administration Executive Overview... 2 Caveats... 3 Customer
More informationCPRE Question Bank. 200 questions with answers and explanations. LN Mishra, CPRE, CBAP, CSM, PMP
CPRE Question Bank 200 questions with answers and explanations LN Mishra, CPRE, CBAP, CSM, PMP Copyright notice All rights reserved. CPRE is registered Trademarks of International Requirements Engineering
More informationMicrosoft SharePoint Server 2013 Plan, Configure & Manage
Microsoft SharePoint Server 2013 Plan, Configure & Manage Course 20331-20332B 5 Days Instructor-led, Hands on Course Information This five day instructor-led course omits the overlap and redundancy that
More information