Metadata. Week 4 LBSC 671 Creating Information Infrastructures

Similar documents
Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey.

Metadata Standards and Applications. 4. Metadata Syntaxes and Containers

RDA Resource Description and Access

Contribution of OCLC, LC and IFLA

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Building Consensus: An Overview of Metadata Standards Development

Metadata: The Theory Behind the Practice

Library of Congress BIBFRAME Pilot. NOTSL Fall Meeting October 30, 2015

Metadata Workshop 3 March 2006 Part 1

BIBFRAME Update Why, What, When. Sally McCallum Library of Congress NCTPG 10 February 2015

Metadata Standards and Applications

Table of Contents. iii

DCMI Abstract Model - DRAFT Update

RDF and Digital Libraries

Linked data for manuscripts in the Semantic Web

Resource Description and Access Setting a new standard. Deirdre Kiorgaard

The Semantic Web and expert metadata: pull apart then bring together. Gordon Dunsire

3. Technical and administrative metadata standards. Metadata Standards and Applications

The Semantic Web Revisited. Nigel Shadbolt Tim Berners-Lee Wendy Hall

Opus: University of Bath Online Publication Store

Getting Started with the Digital Commonwealth. Robin L. Dale Director of Digital & Preservation Services LYRASIS

Storage and Preservation. Week 3 LBSC 671 Creating Information Infrastructures

Representation. Week 6 LBSC 671 Creating Information Infrastructures

Metadata Overview: digital repositories

Metadata Standards and Applications. 6. Vocabularies: Attributes and Values

Permanence and Temporal Interoperability of Metadata in the Linked Open Data Environment

Contents. G52IWS: The Semantic Web. The Semantic Web. Semantic web elements. Semantic Web technologies. Semantic Web Services

Getting Started with Omeka Music Library Association March 5, 2016

Chapter 13: Advanced topic 3 Web 3.0

RDF Schema. Mario Arrigoni Neri

Labelling & Classification using emerging protocols

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Alexander Haffner. RDA and the Semantic Web

Summary of Bird and Simons Best Practices

Guidelines for Developing Digital Cultural Collections

The CEN Metalex Naming Convention

Metadata Issues in Long-term Management of Data and Metadata

RDA and Linked Data. by Gordon Dunsire National Seminar, National Library of Finland, Helsinki, Finland, 25 March 2014

Linked Open Data in Aggregation Scenarios: The Case of The European Library Nuno Freire The European Library

A Dublin Core Application Profile for Scholarly Works (eprints)

Opus: University of Bath Online Publication Store

Digitisation Standards

Its All About The Metadata

OAI-PMH. DRTC Indian Statistical Institute Bangalore

Standards for language encoding: Sharing resources

Opus: University of Bath Online Publication Store

Copyright 2008, Paul Conway.

RDA: a quick introduction Chris Oliver. February 2 nd, 2011

Syrtis: New Perspectives for Semantic Web Adoption

SWAD-Europe Deliverable 8.1 Core RDF Vocabularies for Thesauri

Background. Recommendations. SAC13-ANN/11/Rev. SAC/RDA Subcommittee/2013/1 March 8, 2013; rev. July 11, 2013 page 1 of 7

Ontology Servers and Metadata Vocabulary Repositories

LORE: A Compound Object Authoring and Publishing Tool for Literary Scholars based on the FRBR. Anna Gerber, Jane Hunter

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS. Jenn Riley IU Metadata Librarian DLP Brown Bag Series February 25, 2005

Data Exchange and Conversion Utilities and Tools (DExT)

Application Profiles and Metadata Schemes. Scholarly Digital Repositories

Digital Library Curriculum Development Module 4-b: Metadata Draft: 6 May 2008

The Dublin Core Metadata Element Set

Semantic Extensions to Defuddle: Inserting GRDDL into XML

Representation. Week 6 LBSC 671 Creating Information Infrastructures

Internal Structure of Information Packages in Digital Preservation

Alphabet Soup: A Metadata Overview Melanie Schlosser Metadata Librarian

Multi-agent Semantic Web Systems: Data & Metadata

Metadata for Digital Collections: A How-to-Do-It Manual

Metadata. AMA Digitization Workshop. Elizabeth-Anne Johnson University of Manitoba 29 February 2016

Joint Steering Committee for Development of RDA. Gordon Dunsire, Chair, JSC Technical Working Group

RDF /RDF-S Providing Framework Support to OWL Ontologies

Institutional repositories: description of VITAL as an example of a Fedora-based digital assets management system.

SobekCM METS Editor Application Guide for Version 1.0.1

Metadata for the Web A Necessary Evil? CS 431 March 2, 2005 Carl Lagoze Cornell University

Two Traditions of Metadata Development

Problem: Solution: No Library contains all the documents in the world. Networking the Libraries

Outline RDF. RDF Schema (RDFS) RDF Storing. Semantic Web and Metadata What is RDF and what is not? Why use RDF? RDF Elements

or: How to be your own Good Fairy

Terminology Services. Diane Vizine-Goetz Senior Research Scientist OCLC Research

RDA work plan: current and future activities

METADATA AT DIGITAL-INFORMATIVE ERA

Comparing Open Source Digital Library Software

Metadata harmonization for fun and profit

7.3. In t r o d u c t i o n to m e t a d a t a

Web Standards Mastering HTML5, CSS3, and XML

Joint Steering Committee for Development of RDA. Related document: 5JSC/RDA/Scope/Rev/4

Registry Interchange Format: Collections and Services (RIF-CS) explained

Archival Information Package (AIP) E-ARK AIP version 1.0

Assessment of product against OAIS compliance requirements

Compound or complex object: a set of files with a hierarchical relationship, associated with a single descriptive metadata record.

BIBLIOGRAPHIC REFERENCE DATA STANDARD

For more information about how to cite these materials visit

Structured documents

Metadata for general purposes

Update: MIRIAM Registry and SBO

CWI. Multimedia on the Semantic Web. Jacco van Ossenbruggen, Lynda Hardman, Frank Nack. Multimedia and Human-Computer Interaction CWI, Amsterdam

Introduction and background

Extending the Open Journals System OAI repository with RDF aggregation and querying (African Journals Online)

Million Book Universal Library Project :Manual for Metadata Capture, Digitization, and OCR

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

From Open Data to Data- Intensive Science through CERIF

RDA? GAME ON!! A B C L A / B C C A T S P R E C O N F E R E N C E A P R I L 2 2, : : 0 0 P M

How to contribute information to AGRIS

Transcription:

Metadata Week 4 LBSC 671 Creating Information Infrastructures

Muddiest Points Memory madness Hard drives, DVD s, solid state disks, tape, Digitization Images, audio, video, compression, file names, Where LOCKSS fits in all this Digital preservation vs. digitization for preservation

Tonight Finishing Preservation Metadata

Preserving Behavior Word processors Formatting, track changes, undo deleted text Spreadsheets Formulas, visualizations Databases Queries, forms, derived values Computer-Assisted Design (CAD) Display, modification, manufacturing Software Simulation, games, embedded systems,

Behavior Preservation Strategies Format migration For example, convert Word Perfect to PDF Emulation Allows running old software on newer systems

Apollo Guidance Computer Emulation http://www.ibiblio.org/apollo/

An Integrated Strategy Delay decay of organic materials But balance costs and benefits Balance quality and scale Preservation: rescue at-risk collections Access: Quantity has a quality all its own Design in diversity Technologies, risk exposure, institutions Adequately resource the process

Tonight Finishing Preservation Metadata

Two Ways of Searching Free-Text Searcher Author Indexer Controlled Vocabulary Searcher Construct query from terms that may appear in documents Write the document using terms to convey meaning Choose appropriate concept descriptors Construct query from available concept descriptors Query Terms Content-Based Query-Document Matching Document Terms Document Descriptors Metadata-Based Query-Document Matching Query Descriptors Retrieval Status Value

Functional Requirements for Bibliographic Records (FRBR) Midsummer Night s Dream 2005 Free for All August 23 Performance Seat 23G

FRBR Bibliographic User Tasks Find it Search ( to find ) Recognize ( to identify ) Choose ( to select ) Serve it Location ( to obtain )

FRBR Entity Types Subject-Only Entities (abstract) Concepts (tangible) Objects (any kind of) Places Events Subject or Responsibility Entities Persons Corporate Bodies (~any kind of organization) Families (technically, only in FRAD) Product Entities Works, Expressions, Manifestations, Items

Functional Requirements for Authority Data IFLA, 2013

FRBR in OCLC s FictionFinder

Goals: Dublin Core Easily understood, implemented and used Broadly applicable to many applications Approach: Intersect several standards (e.g., MARC) Suggest only best practices for element content Implementation: Initially 15 optional and repeatable elements Refined using a growing set of qualifiers Now extended to 22 elements

Dublin Core Elements (version 1.1) Content Title Subject [LCSH, MeSH, ] Description Type Coverage [spatial, temporal, ] Related resource Rights Instantiation Date [Created, Modified, Copyright, ] Format Language Identifier [URI, Citation, ] Responsibility Creator Contributor Source Publisher

Resource Description Framework XML schema for describing resources Can integrate multiple metadata standards Dublin Core, P3P, PICS, vcard, Dublin Core provides a XML namespace DC Elements are XML properties DC Refinements are RDF subproperties Values are XML content

Dublin Core in RDF XML <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:description rdf:about="http://media.example.com/audio/guide.ra"> <dc:creator>rose Bush</dc:creator> <dc:title>a Guide to Growing Roses</dc:title> <dc:description>describes process for planting and nurturing different kinds of rose bushes.</dc:description> <dc:date>2001-01-20</dc:date> </rdf:description> </rdf:rdf>

Framework Aspects of Metadata Functional Requirements for Bibliographic Records (FRBR) Schema ( Data Fields and Structure ) Dublin Core Guidelines ( Data Content and Values ) Resource Description and Access (RDA) Library of Congress Subject Headings (LCSH) Representation (abstract Data Format ) Resource Description Framework (RDF) Serialization ( Data Format ) RDF in extensible Markup Language (RDF/XML) Adapted from Elings and Waibel, First Monday, (12)3, 2007

Some Types of Metadata Not in Taylor & Joudrey Descriptive Content, creation process, relationships Technical Format, system requirements Administrative Acquisition, authentication, access rights Preservation Media migration Usage Display, derivative works Adapted from Introduction to Metadata, Getty Information Institute (2000)

Metadata Encoding and Transmission Standard (METS) Descriptive metadata (e.g., subject, author) Administrative metadata (e.g., rights, provenance) Technical metadata (e.g., resolution, color space) Behavior (which program can render this?) Structural map (e.g., page order) Structural links (e.g., Web site navigation links) Files (the raw data) Root (meta-metadata)

Aspects of Metadata What kinds of objects can we describe? MARC, Dublin Core, FRBR, How can we convey it? MODS, RDF, OAI-PMH, METS What can we say? LCSH, MeSH, PREMIS, What can we do with it? Discovery, description, reasoning

FRBR Bibliographic User Tasks Find it Search ( to find ) Recognize ( to identify ) Choose ( to select ) Serve it Location ( to obtain )

Broader View of Metadata Uses Have it Preservation (e.g., PREMIS) Validation Disposition Find it Search/Recognize/Choose Browse ( Navigation ) Serve it Persistent location Structure Surrogates Use it Context Rights management User behavior capture Reasoning ( Semantic Web )

Metadata Sources Automated Capture Extraction Classification Manual Professional Community Personal

Metadata Capture: Exchangeable Image Format (EXIF) Time Location Camera manufacturer and model Camera orientation Exposure information (shutter speed, f stop) Thumbnail versions Altering the image may not change the thumbnail!

Inconsistent Metadata http://www.umiacs.umd.edu/~oard/rtw/

Metadata Capture: Email Message metadata Times Sent Resent Received Route In-reply-to Attachment file type System metadata Folder

Metadata Capture: Windows File System (NTFS) Time file created (or copied) Most recent one; optionally journaled Time file content changed (or made changeable) Most recent one; optionally journaled Time file renamed (or moved) Most recent one Time file metadata created or changed Most recent one Time file accessed (content or metadata) Most recent one; optionally disabled

Metadata Capture: Microsoft Word Author Title Dates (may not agree with file system) Created Modified Accessed Printed Each tracked change

Metadata Capture: User Behavior Behavior Category Examine Retain Reference Annotate Create Minimum Scope Segment Object Class View Listen Print Copy / paste Quote Mark up Type Edit Select Bookmark Save Purchase Delete Forward Reply Link Cite Tag Publish Subscribe Organize

Exploiting Behavioral Metadata http://wsj.com/wtk

Metadata Extraction: Named Entity Tagging Machine learning techniques can find: Location Extent Type Two types of features are useful Orthography e.g., Paired or non-initial capitalization Trigger words e.g., Mr., Professor, said,

Metadata Sources Automated Capture Extraction Classification Manual Professional Community Personal

Community Metadata: Folksonomies

Community Metadata: Games With a Purpose van Ahn and Dabbish, CHI 2004

Community Metadata: Crowdsourcing

Sources of File Type Metadata Capture: MyDocument.xls Attachment MIME type Extraction Magic bytes Classification Machine learning on byte sequences Manual Mechanical Turk

Metadata Challenges Balancing cost and benefit Accommodating dynamic factors Content Location Reuse for unanticipated purposes Remaining interpretable in the far future

Open Archives Initiative- Protocol for Metadata Harvesting (OAI-PMH)

Linked Open Data

Web Ontology Language (OWL) <owl:class rdf:about="http://dbpedia.org/ontology/astronaut"> <rdfs:label xml:lang="en">astronaut</rdfs:label> <rdfs:label xml:lang="de">astronaut</rdfs:label> <rdfs:label xml:lang="fr">astronaute</rdfs:label> <rdfs:subclassof rdf:resource="http://dbpedia.org/ontology/person"> </rdfs:subclassof> </owl:class>

Semantic Web Search

Putting It All Together Adapted from Elings and Waibel, First Monday, (12)3, 2007

Before You Go! On a sheet of paper (no names), answer the following question: What was the muddiest point in today s class?