Taxonomy Tools: Collaboration, Creation & Integration Dave Clarke Global Taxonomy Director dave.clarke@dowjones.com Dow Jones & Company
Introduction Software Tools for Taxonomy 1. Collaboration 2. Creation 3. Integration
Dow Jones Handle massive volumes of data 24x7 every day: Over 500,000 documents per day 10,000+ Sources 22 languages Expertise to create and maintain a robust taxonomy including: 310,000+ company codes 820+ industries 520+ subjects 340+ regions 3.6 Million Documents/Month 700 Feeds 152 Countries 60 Terra Byte Content Server
Taxonomy Tools
Taxonomy Tools
Collaboration WHO needs to get involved? WHAT do they need to do? HOW do they work together?
WHO Cross-functional Team Categorization Content Management Information Technology Knowledge Management Knowledge Workers Library Metadata Ontology Search Subject Matter Expertise Taxonomy
WHAT Assess Design Build Maintain Business Goals Content IT Metadata Taxonomy Standards & Best Practices Audience Segmentation & Definition Facet Analysis Information Architecture Editorial Guidelines & Workflow Entity Extraction (machine and/or human) Content Tagging Rules (machine and/or human) Taxonomy Construction & Mapping Continuous Work-in-progress Engage endusers (query log analysis, focus groups, folksonomy) Governance Process Users
HOW Web workspace Task-oriented Role-based Workflow Governance alerts
HOW Web workspace Task-oriented Role-based Workflow Governance alerts Location independent access for in-house stakeholders and very often external consultants and SMEs
HOW Web workspace Task-oriented Role-based Workflow Governance alerts Work-oriented views for teams of people performing different tasks the flip side of the collaboration coin is compartmentalization
HOW Web workspace Task-oriented Role-based Workflow Governance alerts Multiple levels of functional permission for fine-tuning what users can do to particular sets of terms
HOW Web workspace Task-oriented Role-based Workflow Governance alerts New Candidates Primary Review Rejected for Rework Secondary QC Deactivated / Deleted Approved & Published Withdrawn & Replaced By
HOW Web workspace Task-oriented Role-based Workflow Governance alerts Design need-to-know reports for each stakeholder group / stage in the workflow
HOW Web workspace Task-oriented Role-based Workflow Governance alerts Schedule the reports to be generated automatically And to email alerts to designated recipients
Creation (models, methods and trends for building taxonomies) Folksonomies Taxonomies Semantic webs
Classic Taxonomy Classification based Web portals Navigation aids File-folder metaphor Ad-hoc groupings 2-dimensional
Faceted Taxonomy Separate taxonomies for individual attributes Content tagged to facets separately not pre- -coordinated Used as orthogonal search filters n-dimensional
Faceted Taxonomy in Action
Faceted Taxonomy in Action
Folksonomy Tag Clouds Web 2.0 Folksonomy Un-controlled Un-structured Social tagging User participation Wikis Collaboration Blogs
Pros & Cons Folksonomy lets users create (and adopt) terminology that is meaningful to themselves but does so at the expense of precision and recall for the general user (meta noise). Controlled vocabularies solve the precision-recall trade off but their insistence on preferred terminology imposes onesize-fits-all order on a heterogeneous user community.
A Middle Path Audience-Centric Taxonomy 1. Segment a user community into Audiences 2. Develop a core-taxonomy but append extensions to it which store the terminology and hierarchy preferences of each audience 3. Leverage folksonomy and social tagging systems to help inform the evolution of the audience-centric taxonomies
Audience-Centric Views The world of your content Audience-centric views provide access and navigation orientated for different user perspectives Conceptual representation of the content as a semantic web
Semantic Webs & Ontologies Concept-oriented rather than terminologyoriented semantic web Formally defined relationships Extensible concept types & extensible relationship types Resource Description Framework (RDF)
Integration Components Talking to each other RDF Files & Web Service Calls
Components Taxonomy Categorization Search Content
Talking to Each Other Taxonomy W3C RDF-based Open Standards (SKOS & OWL) Content Open Standards Categorization n Components 1 Common integration Search Web Services ad hoc transactions and small data sets XML File Libraries published versions and large data sets
EXAMPLE From Idea to Published Output
whiteboard your entities and John Doe relationships Employer Of Employed By Manufacturer Of Manufactured By Widgets Located In Location Of ABC Corporation Client Of Client Of Vendor To PQR Corporation New York Vendor To XYZ Corporation
Step 1 Design the conceptual structure Concept types Data elements Relationship types Semantic rules
Step 2 Input entities and build relationships Key data via GUI Import Excel files Import XML files
Step 3 HTML CSV XML Publish HTML Browser CSV Download XML/RDF Export From whiteboard to published RDF in 30 minutes
Thank You Questions Comments dave.clarke@dowjones.com