Catching the wave Tools and Technology for Taxonomists Taxonomy Bootcamp Washington DC November 6, 2018

Similar documents
Catching the wave Tools and Technology for Taxonomists Taxonomy Bootcamp London October 16, 2018

Taxonomies and controlled vocabularies best practices for metadata

Natural Language Processing with PoolParty

The Emerging Data Lake IT Strategy

Real World Data Governance- Part 1

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Applied Data Governance - Part 3

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company

4) DAVE CLARKE. OASIS: Constructing knowledgebases around high resolution images using ontologies and Linked Data

THE FUTURE OF PERSONALIZATION IS VISUAL WHITE PAPER

APPLYING THE POWER OF AI TO YOUR VIDEO PRODUCTION STORAGE

Automated Classification. Lars Marius Garshol Topic Maps

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Why Information Architecture is Vital for Effective Information Management. J. Kevin Parker, CIP, INFO CEO & Principal Architect at Kwestix

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Reducing Consumer Uncertainty

WEB DESIGN SERVICES. Google Certified Partner. In-Studio Interactive CEO: Onan Bridgewater. instudiologic.com.

On-Page SEO is the foundation with which backlinks and other off-page SEO strategies reach their highest potential.

Effective Threat Modeling using TAM

Business Benefits of Developing Effective Taxonomies. Cathrin Senn and Ian Davis Taxonomy Consultants

Metadata for Digital Collections: A How-to-Do-It Manual. Introduction to Resource Description and Dublin Core

Controlled vocabularies, taxonomies, and thesauruses (and ontologies)

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

What Is Taxonomy and Why Is It Useful?

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

WHAT CIOs NEED TO KNOW TO CAPITALIZE ON HYBRID CLOUD

Linking SharePoint Documents with Structured Data. Towards Unified Views of Business-critical Information. Andreas Blumauer Director PoolParty Ltd, UK

Give Your DITA wings with taxonomy & modern web design. Joe Pairman

HyTrust government cloud adoption survey

Semantic Technologies for Nuclear Knowledge Modelling and Applications

Informatica Enterprise Information Catalog

Mission-Critical Customer Service. 10 Best Practices for Success

Improving Findability in the Enterprise. Deloitte Touche Tohmatsu Limited APQC KM Conference May 3, 2013 Houston, TX

Data Governance for the Connected Enterprise

Enterprise Knowledge Map: Toward Subject Centric Computing. March 21st, 2007 Dmitry Bogachev

HOW THE RIGHT CMS MAKES CONTENT AVAILABLE WHEN AND WHERE CUSTOMERS NEED IT

. social? better than. 7 reasons why you should focus on . to GROW YOUR BUSINESS...

Eleven+ Views of Semantic Search

Linked.Art & Vocabularies: Linked Open Usable Data

Modern Database Architectures Demand Modern Data Security Measures

Terminologies, Knowledge Organization Systems, Ontologies

3) CHARLIE HULL. Implementing open source search for a major specialist recruiting firm

SharePoint 2016 Migration Checklist. Step-by-Step Guide to Migration Success

8 Must Have. Features for Risk-Based Vulnerability Management and More

Taxonomy for Self-service Delivery

Hello, my name is Cara Daly, I am the Product Marketing Manager for Polycom Video Content Management Solutions. Today we will be going over the

Understanding the workplace of the future. Artificial Intelligence series

Fractal Data Modeling

IBE101: Introduction to Information Architecture. Hans Fredrik Nordhaug 2008

WHAT IS WEB 3.0? Abstract. While the concept of Web2.0 has made a significant impact on the

0.1 Knowledge Organization Systems for Semantic Web

What Is Voice SEO and Why Should My Site Be Optimized For Voice Search?

Hello, my name is Cara Daly, I am the Product Marketing Manager for Polycom Video Content Management Solutions and today I am going to be reviewing

From Open Data to Data- Intensive Science through CERIF

Data is the new Oil (Ann Winblad)

Developing your Intranet Content Strategy like a Coder

User Configurable Semantic Natural Language Processing

Managing s with SharePoint

VIDEO 1: WHY IS SEGMENTATION IMPORTANT WITH SMART CONTENT?

The Value of Force.com as a GRC Platform

SEMANTIC SOLUTIONS FOR OIL & GAS: ROLES AND RESPONSIBILITIES

Book Review. Information Architecture for the World Wide Web (Second Edition) by Louis Rosenfeld & Peter Morville

State of the Headless CMS Market 2018

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

The Web, Semantics and Data Mining

Getting Lost in Semantics Selecting the Right Search Engine

OBJECT-ORIENTED DESIGN

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

TREND REPORT. Ethernet, MPLS and SD-WAN: What Decision Makers Really Think

SIEM Solutions from McAfee

The SD-WAN security guide

Amyyon customers can t wait to get their hands on it s new application, developed in Uniface.

Taxonomies in support of search

The Open Group SOA Ontology Technical Standard. Clive Hatton

PoolParty. Thesaurus Management Semantic Search Linked Data. ISKO UK, London September 14, Andreas Blumauer

Microsoft Core Solutions of Microsoft SharePoint Server 2013

Revving Your Salesforce Community Engine

ICANN and Technical Work: Really? Yes! Steve Crocker DNS Symposium, Madrid, 13 May 2017

6 TOOLS FOR A COMPLETE MARKETING WORKFLOW

Geonetric Webinar: February Selecting a Web Content Management System for Your Health System

Web 2.0 and the Semantic Web

Vocabulary Harvesting Using MatchIT. By Andrew W Krause, Chief Technology Officer

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

ebook ADVANCED LOAD BALANCING IN THE CLOUD 5 WAYS TO SIMPLIFY THE CHAOS

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Service-Oriented Programming

Next-Generation Standards Management with IHS Engineering Workbench

Choosing the perfect CMS

The Intersection of Hype & Performance

Making hybrid IT simple with Capgemini and Microsoft Azure Stack

Expose Existing z Systems Assets as APIs to extend your Customer Reach

Ontologies SKOS. COMP62342 Sean Bechhofer

Real-Time Insights from the Source

Grow your cloud knowledge.

Ecommerce Site Search. A Guide to Evaluating Site Search Solutions

Transforming the utilities industry. How our insight and infrastructure can help you thrive in a changing world

Report from the W3C Semantic Web Best Practices Working Group

20331B: Core Solutions of Microsoft SharePoint Server 2013

DIGITALGLOBE ENHANCES PRODUCTIVITY

Transcription:

Catching the wave Dave Clarke CEO Synaptica Catching the wave Tools and Technology for Taxonomists Taxonomy Bootcamp Washington DC November 6, 2018

Agenda Three questions AI Linked Data Ontology Semantic Web Knowledge Graphs Blockchain Big Data Q1. What are taxonomists doing today? Q3. What tools do you need to succeed? Q2. How can new technology help?

Preface What is it we taxonomists do?

Organize Taxonomies are Knowledge Organization Systems. When we do taxonomy we use industry standard data models to centralize and standardize the terminology used in our enterprise. 1 We define and unambiguously label enterprise terminology, and then we organize it into hierarchical and associative concept schemes, which we call taxonomies or ontologies. These schemes help us to understand how concepts, people, places, products, processes and organizations all relate to one another.

Categorize Categorization enables an enterprise to retrieve, sort and rank content based on what it is about. 2 Concepts in taxonomies provide the metadata values for tagging documents and database records. When we categorize content we use the semantics of our taxonomies plus contextualization rules to determine meaning and rank the relevance of content. Taxonomy builds a bridge between the language people use to search and browse and the language found in documents.

Discover The end-goal of what we do is we help people to retrieve more complete and accurate information and to discover latent knowledge. 3 Our work isn t finished if we stop at categorization. We need to work with search teams and information architects to design and deliver semantic search and rich end-user discovery experiences. including browsable navigation, faceted query refinement, and the ability to recommend related content.

Taxonomy Practitioners Survey Q1. What are taxonomists doing today? Online survey takes 10 minutes link from pinned tweet at https://twitter.com/davidclarkeblog Participate in this survey during the conference and we will email you the revised results

Taxonomy Practitioners Survey Q1. How does your enterprise currently make use of taxonomies? Faceted query refinement 66% Improve search accuracy Browsable navigation Lookup lists & glossaries Content classification 42% 44% 60% 66% What we would expect core taxonomy applications Recommend related content 36% Product Information Management 28% Machine-reasoning Sentiment analysis Chatbots and conversational search 6% 6% 18% Under-exploited room for greater adoption 0% 10% 20% 30% 40% 50% 60% 70% Participate today using link from pinned Synaptica LLC, tweet 2018 www.synaptica.com at https://twitter.com/davidclarkeblog

Taxonomy Practitioners Survey Q2. Do your taxonomies contain general categories or specific concepts and names? Highly specific categories and subjects Broad categories and topics Individual named entities 48% 58% 70% 0% 10% 20% 30% 40% 50% 60% 70% 80% Even mix of people doing broad-level categorization and highly specific subject indexing Document retrieval more prevalent than page-level access Document tagging In-line tagging Q3. How general or specific is your content tagging? 78% 22% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Most taxonomy ends up as metadata in CMS / search Metadata in content systems Metadata in search engines Full text in-line document mark-up Q4. Where do you store your tagging data? 70% 48% 6% 0% 10% 20% 30% 40% 50% 60% 70% 80% Participate today using link from pinned Synaptica LLC, tweet 2018 www.synaptica.com at https://twitter.com/davidclarkeblog

Taxonomy Practitioners Survey Q5. How are your lexicons / taxonomies / thesauri / ontologies structured? Hierarchy Preferred and alternative labels Associative relationships Disambiguators 38% 44% 50% 86% What we would expect core taxonomy Specialized ontology relationships 22% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Under-exploited room for greater adoption Within a company intranet Public-facing Shared with trusted partners Available for adoption and reuse Q6. How public or private are your taxonomies? Linked Open Data 12% 14% 20% 0% 10% 20% 30% 40% 50% 60% 42% 52% Personal prediction semantic web and automation will drive greater sharing of taxonomies even between commercial enterprises Participate today using link from pinned Synaptica LLC, tweet 2018 www.synaptica.com at https://twitter.com/davidclarkeblog

Taxonomy Practitioners Survey Q7. How do you create your taxonomies and/or use third-party taxonomies? Q8. Which industry standards do you use for your taxonomies and/or for tagging? Create our own in-house 96% ANSI-NISO Z39.19 / ISO 25964 44% Reference third-party taxonomies Map to third-party taxonomies Public-domain taxonomies Linked Open Data taxonomies Tag using third-party taxonomies Commercial licenses Outsource to consultants 4% 4% 14% 10% 10% 16% 26% Most taxonomy still done within and specifically for the enterprise Personal prediction much greater adoption of LOD taxonomies 0% 20% 40% 60% 80% 100% 120% Helps your metadata drive SEO Dublin Core W3C Linked Data W3C SKOS Schema.org W3C OWL W3C SKOS-XL 2% 10% Miniscule adoption but this spec. means a lot to academic publishers and life-sciences 14% 18% 20% 32% Enabler for machine reasoning but low adoption because of its complexity 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% Participate today using link from pinned Synaptica LLC, tweet 2018 www.synaptica.com at https://twitter.com/davidclarkeblog

Taxonomy Practitioners Survey Q9. What processes do you use to tag content? Human tagging dominant In-house professional indexers Self-tag content using taxonomies 42% 44% Self-tag using free-text keywords 34% Tagging images, audio, and video 30% Editable auto-categorization rules Human-supervised auto-categorization Fully automated categorization Training-sets and machine learning Outsourced content tagging 6% 10% 10% 18% 20% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% Rules-based, humansupervised autocategorization more prevalent than fully automated or machine learning processes Participate today using link from pinned Synaptica LLC, tweet 2018 www.synaptica.com at https://twitter.com/davidclarkeblog

Taxonomy Practitioners Survey No surprises given the hype around AI but is the hype justified? Q10. Which emerging technologies do you think are already or will soon impact your enterprise? AI / ML / Deep Learning 72% Linked Data Big data and data science Semantic Web Graph databases 36% 54% 52% 58% Linked Data identified as second highest impact on the enterprise are you prepared? Chatbots and conversational search 20% Internet of Things (IoT) 18% Blockchain 18% 0% 10% 20% 30% 40% 50% 60% 70% 80% Participate today using link from pinned Synaptica LLC, tweet 2018 www.synaptica.com at https://twitter.com/davidclarkeblog

Survey Insights 70% of enterprises are building traditional taxonomies and thesauri 18% are doing machine reasoning Even mix doing broad-bucket categorization versus highly-granular subject indexing Taxonomy Practitioners Survey Most (80%) tagging at the document level, fewer (20%) indexing to page/paragraph Over 40% of enterprise taxonomies are public facing What tech do people think will impact them most: AI followed by Linked Data But, currently there is low adoption (10%) of ontologies and graphs that enable AI Low adoption of schema.org, despite it being key to SEO for public-facing content Big surprise: most tagging still being done by humans Most common aspiration is for better quality auto-categorization Participate today using link from pinned Synaptica LLC, tweet 2018 www.synaptica.com at https://twitter.com/davidclarkeblog

New Technology Q2. How can new technology help? AI Ontologies Linked Data Decentralized web Metadata management Records management Provenance & Governance IP rights management Supply chain management Linked Data https://medium.com/@jimmysong/why-blockchain-is-hard-60416ea4c5c

AI & Human Intelligence AI an Anecdotal Review How might the development of new technologies such as AI affect how we do our work as information professionals? Primarily it should make some of the leg work easier. Being able to process a large amount of data in a shorter time than a human could, is going to be very helpful when it comes to the day-to-day work that we do. Machine learning and AI should also help us spot patterns in information that we may not otherwise notice - and it can help us simulate what the consequences of particular decisions might be. http://www.taxonomybootcamp.com/london/2018/latestnews.aspx?id=1632 However, machine learning and AI won t be a silver bullet - we ve already seen examples of algorithms being applied with unfortunate consequences, or machine learning classifying images in a way that is problematic when we consider culture, race and politics. So, we ll need to be aware of the limits of what is possible, manage expectations, and take on new responsibilities for helping machines understand our world, our biases, and our morality. What's the most exciting change you've seen in the industry in the last few years? The deployment of knowledge graphs and non-visual interfaces have really brought the value of structured data to the fore. Voice and conversational interfaces are obviously very in vogue at the moment, and I think an under-appreciated aspect of these has to be the importance of structured data in powering these. I think more and more people are seeing the benefits of structured data in things like Google s Knowledge Graph, Amazon s ability to tell you which actors are on screen during a show you re watching, and of course the perennial joy of doing a deep dive on Wikipedia. Whilst big data and machine learning might be hogging the limelight at the moment, I think that the work of taxonomists and those who architect and develop structured data is quietly, gradually, revolutionising the kinds of things we do with computers and the Internet.

AI & Human Intelligence we ve increasingly come across organizations that have been promised Artificial Intelligence (AI) capabilities, but have not realized them The message we consistently hear, however, is that these [AI] tools haven t lived up to the promise. Though the demos are impressive, the reality is deflating. Zach Wahl Enterprise Knowledge A fully automated generation of ontologies from text corpora is not possible and won t be possible in the next couple of decades. This is the same kind of A.I. promise that has failed many times before Generation and maintenance of taxonomies and ontologies will always remain to the realm of human beings and their knowledge of the world. Andreas Blumauer Pool Party Hi Dave, Smarter AI requires even smarter data, but how does that data stay sharp in a rapidly changing industry? Sometimes it takes a human touch to bridge the gap, and curated crowds are the perfect source for that bit of human tuning. Behind all this AI, humans still need to touch the data. We still need human power to teach computers the subtleties of identifying data, language, dialects, and so on. - Kerri Reynolds, SVP Human Resources & Crowdsourcing, Appen Behind all this AI, humans still need to touch the data. We still need human power to teach computers the subtleties of identifying data, language, dialects, and so on. Kerri Reynolds Appen Synaptica: What do you think are the biggest challenges in the future? Hedden: Growing interest in AI and machine learning and its impact on indexing and tagging. I feel there are limits in automatic technology and taxonomy creation. Heather Hedden Author The Accidental Taxonomist

AI Interactive Primer There are many different types of AI, and different types are more or less suited to different business applications, specifically some are just not relevant to taxonomy and semantics learn about different use-cases with this interactive online guide from McKinsey https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/an-executives-guide-to-ai

AI take away AI is very relevant to what we do, but it will not replace the need for human curated taxonomies or ontologies. On the contrary, it is taxonomies and ontologies that will empower AI with the semantics and logic to improve search, categorization and machine reasoning. Make sure management in your organization understand this.

Taxonomy & Ontology Analogy Do you need ontologies depends on the type of problem you are trying to solve an analogy Taxonomies have a singular top-down way of organizing information Now try modelling the Washington Metro as a taxonomy it won t work because it is a graph great for a classification or guided browse experiences ontologies are graphs, they support many points of entry and many alternative pathways

Taxonomy & Ontology The boundary between Taxonomy and Ontology can be a confusing grey area distinguishing Property from Value Vocabularies may help Ontology Taxonomy we take an holistic approach and use ontology to design and build taxonomy

Linked Data Linked Open Data: a huge and growing set of re-useable public domain ontologies and taxonomies Linked Enterprise Data: enterprises can benefit from the data model even when their data is not open LOD Linked Open Data LED Linked Enterprise Data shared openly behind the firewall

Ontology & Linked Data take aways Ontologies and Linked Data are highly relevant and offer many benefits including + + Build smarter search and discovery applications by leveraging the logical dependencies defined by ontologies. Reduce costs and speed up project deliverables by reusing a vast and rapidly growing library of public domain property vocabularies (ontologies) and value vocabularies (taxonomies). + Simplify systems integrations by adopting open industry standard data models and portable data interchange formats.

Tools Q3. What tools do you need to succeed? Designing and building standards compliant taxonomies and ontologies and developing good categorization rules isn t easy good software tools will simplify the complexity. Taxonomy, Ontology and Linked Data Management Systems Text Analytics, Auto-Categorization & Human Tagging Systems Semantic Search, Recommenders, Visualization & Chatbots

Taxonomy Tools What to look for in Taxonomy Management Systems Drag-and-drop editability and the ability to switch seamlessly between taxonomy editing and categorization rule editing Collaboration & governance Automated management reports See also Synaptica s Top 100 Features Checklist at https://www.synaptica.com/resources/

Ontology & Linked Data Tools What to look for in Ontology & Linked Data Management Systems An extensible library of public domain ontologies Design your own taxonomy schemes using plug-and-play ontologies Easy-to-use UI/X to simplify the complexities of generating standardscompliant RDF triples and knowledge graphs.

Categorization Tools What to look for from Categorization & Text Analytics Tools Categorization rules that are transparent easy to edit without having to learn esoteric syntax a no-black-box principle allowing users to understand how rules work and quickly refine them. Seamless integration between taxonomy management and categorization management workflows a no-silo approach to reduce complexity and improve productivity.

Tools What to look for from Search and Discovery Tools NLP to handle natural language queries. Semantic search to improve precision recommend related content. Taxonomy-driven IA including facetted navigation and query refinement.

Tools take away Software tools exist to help taxonomists and indexers be more effective and productive... they must simplify the complex. They must use open industry standards for data portability and interoperability. They should help taxonomists to become experts without having to learn esoteric code. Search needs to take advantage of the semantics of taxonomy and ontologies to deliver smarter applications and a richer knowledge discovery experience.

Catching the wave Dave Clarke CEO Synaptica Thank You! and remember to take the survey Catching the wave Tools and Technology for Taxonomists Taxonomy Bootcamp Washington DC November 6, 2018 Participate in the survey today using link Synaptica from LLC, pinned 2018 www.synaptica.com tweet at https://twitter.com/davidclarkeblog