Metadata at Statistics Canada. Canadian Metadata Forum September 19-20, 2003

Similar documents
European Conference on Quality and Methodology in Official Statistics (Q2008), 8-11, July, 2008, Rome - Italy

METADATA REGISTRY, ISO/IEC 11179

Metadata: an Integral Part of Statistics Canada s Data Quality Framework 1

INTEROPERABILITY OF STATISTICAL DATA AND METADATA AMONG BRAZILIAN GOVERNMENT INSTITUTIONS USING THE SDMX STANDARD. Submitted by IBGE, Brazil 1

U.S. Department of Transportation. Standard

Variables System the bridge between metadata and dissemination Teodora Isfan, Methodology and Information Systems Department, Statistics Portugal 1

ISO/IEC INTERNATIONAL STANDARD. Information technology Metadata registries (MDR) Part 1: Framework

ISO/IEC INTERNATIONAL STANDARD. Information technology Metadata registries (MDR) Part 3: Registry metamodel and basic attributes

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING

ISO/IEC JTC 1/SC 32 N 0455

Metadata and classification system development in Bosnia and Herzegovina

Creating a Corporate Taxonomy. Internet Librarian November 2001 Betsy Farr Cogliano

ORGANISATION FOR ECONOMIC COOPERATION AND DEVELOPMENT (OECD) STATISTICS DIRECTORATE

Data Points Structure Explanatory Documentation

1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL 1.2 UML DIAGRAM

Data Partnerships to Improve Health Frequently Asked Questions. Glossary...9

Building an Assurance Foundation for 21 st Century Information Systems and Networks

Traffic Interface Manager User Guide V4.5

ISO/IEC JTC 1/SC 32 N 0754

2011 INTERNATIONAL COMPARISON PROGRAM

Records Management Metadata Standard

The Design of The Integration System for OTOP Products Data Using Web Services Technology, Thailand

AS/NZS ISO 19157:2015

Good analytics needs good data and that needs good metadata

Keyword AAA. National Archives of Australia

MANAGING STATISTICAL DEVELOPMENT AND INFORMATION TECHNOLOGY IN THE STATISTICAL SYSTEM OF MALAYSIA

ISO/IEC TR TECHNICAL REPORT. Information technology Procedures for achieving metadata registry (MDR) content consistency Part 1: Data elements

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

Overview of SAS/GIS Software

ITG. Information Security Management System Manual

Information Technology Web Solution Services

ITG. Information Security Management System Manual

2011 INTERNATIONAL COMPARISON PROGRAM

ANSI Homeland Security Standards Panel (ANSI-HSSP) Open Forum for Standards Developers

DATA Act Information Model Schema (DAIMS) Architecture. U.S. Department of the Treasury

Towards a joint service catalogue for e-infrastructure services

The Changing Role of Data Stewardship in Creating Trustworthy, Transdisciplinary High Performance Data Platforms for the Future

Guidance Solvency II data quality management by insurers

manner. IOPA conducts its reviews in conformance with Government Auditing Standards issued by the Comptroller General of the United States.

Blue Alligator Company Privacy Notice (Last updated 21 May 2018)

Metadata implementation for a Business Intelligence environment. Yuriy Verbitskiy William Yeoh Andy Koronios

Ontologies for Agents

Checklist and guidance for a Data Management Plan, v1.0

WM2015 Conference, March 15 19, 2015, Phoenix, Arizona, USA

STRATEGY ATIONAL. National Strategy. for Critical Infrastructure. Government

Certification Guidelines IPMA Level A

Business Architecture concepts and components: BA shared infrastructures, capability modeling and guiding principles

PSI Registries. Sam Oh, Sungkyunkwan University Montreal, Canada. WG3, Montreal, Sam Oh

Effective Risk Data Aggregation & Risk Reporting

Core Component Primer

Have Records Management Fundamentals Changed with the Revision of ISO 15489?

Data Management Plan

OnlineNIC PRIVACY Policy

PRINCIPLES AND FUNCTIONAL REQUIREMENTS

SME License Order Working Group Update - Webinar #3 Call in number:

Business Model for Global Platform for Big Data for Official Statistics in support of the 2030 Agenda for Sustainable Development

Privacy and Security Aspects Related to the Use of Big Data Progress of work in the ESS. Pascal Jacques Eurostat Local Security Officer 1

ESINets and Map Based Location Technology Use

Metadata Elements Comparison: Vetadata and ANZ-LOM

Information Technology Metadata registries (MDR) Part 6: Registration

THESTREET.COM - PRIVACY POLICY

These are all examples of relatively simple databases. All of the information is textual or referential.

INFORMATION TECHNOLOGY DATA MANAGEMENT PROCEDURES AND GOVERNANCE STRUCTURE BALL STATE UNIVERSITY OFFICE OF INFORMATION SECURITY SERVICES

ISO/IEC JTC 1/SC 32. Data Management and Interchange

Avancier Methods (AM) CONCEPTS

Information technology Metadata registries (MDR)

METADATA MANAGEMENT AND STATISTICAL BUSINESS PROCESS AT STATISTICS ESTONIA

Indexing Field Descriptions Recommended Practice

California GIS Enterprise Strategy

Vocabulary-Driven Enterprise Architecture Development Guidelines for DoDAF AV-2: Design and Development of the Integrated Dictionary

ISO INTERNATIONAL STANDARD. Information and documentation Managing metadata for records Part 2: Conceptual and implementation issues

National Data Sharing and Accessibility Policy-2012 (NDSAP-2012)

GEP Certificate Database System - User Manual Contents

Certification Guidelines IPMA Level A

GEO-SPATIAL METADATA SERVICES ISRO S INITIATIVE

Metadata and Infrastructure for Researchers from a perspective of an NSI. C-G Hjelm Research and Development at Statistics Sweden

Strategy for information security in Sweden

Vendor: The Open Group. Exam Code: OG Exam Name: TOGAF 9 Part 1. Version: Demo

User Guide. Bankers World Online

1997 Minna Laws Chap. February 1, The Honorable Jesse Ventura Governor 130 State Capitol Building

CEF e-invoicing. Presentation to the European Multi- Stakeholder Forum on e-invoicing. DIGIT Directorate-General for Informatics.

The attached document is hereby submitted for a 3-month letter ballot to the NBs of ISO/IEC JTC 1/SC 32. The ballot starts

Data Governance Central to Data Management Success

SDMX GLOBAL CONFERENCE

Contents USER S MANUAL 1

How do you cope with Change?

Information technology - Metadata registries (MDR) - Part 5: Naming principles

UGANDA NATIONAL BUREAU OF STANDARDS LIST OF DRAFT UGANDA STANDARDS ON PUBLIC REVIEW

Section Qualifications of Audit teams Qualifications of Auditors Maintenance and Improvement of Competence...

CLICK HERE TO KNOW MORE

2 The IBM Data Governance Unified Process

ISO / IEC 27001:2005. A brief introduction. Dimitris Petropoulos Managing Director ENCODE Middle East September 2006

Data Entry, and Manipulation. DataONE Community Engagement & Outreach Working Group

BBK3253 Knowledge Management Prepared by Dr Khairul Anuar

RELATIONSHIP BETWEEN THE ISO SERIES OF STANDARDS AND OTHER PRODUCTS OF ISO/TC 46/SC 11: 1. Records processes and controls 2012

B. To ensure compliance with federal and state laws, rules, and regulations, including, but not limited to:

Australian/New Zealand Standard

European Commission - ISA Unit

Professional Evaluation and Certification Board Frequently Asked Questions

Conformity Assessment Schemes and Interoperability Testing (1) Keith Mainwaring ITU Telecommunication Standardization Bureau (TSB) Consultant

Transcription:

Metadata at Statistics Canada Canadian Metadata Forum September 19-20, 2003 1

Corporate metadata at Statistics Canada Integrated Metadatabase (IMDB) Collection of facts about each of Statistics Canada s 400+ surveys Aimed at helping human users interpret statistical data Survey description Methodololgy Concepts and variables measured Data quality 2

Project status Data base implemented in November 2000 covers survey description, methodology and data quality Published on STC website, with daily updates Extensive efforts in improving metadata quality 3

What is the IMDB based on? ISO 11179 Specification and Standardization of Data Elements 1. Framework for specification and standardization 2. Classification 3. Basic attributes 4. Rules and guidelines for formulation of definitions 5. Naming and identification 6. Registration 6 part Standard Attributes include: identifying definitional relational (classification schemes, keywords) representational administrative 4

What is the IMDB based on? Model for the Corporate Metadata Repository (CMR) of US Census Bureau by Dan Gilman Earlier work by Bo Sundgren of Statistics Sweden ANSI 3. shows the structure and relationship among parts 1 to 6 of ISO 11179. What is ANSI 3.285? The ANSI 3.285 consists of 6 regions. I will describe each region. 5

Administered Component Any item that is defined and may be reused or shared Any item requiring registration Administered component = Any item that is managed, tracked and organised. 6

Administered Component Organization Contact Documentation Identification Time Frame Keyword Theme These are the items that are required to administer a component - manage, track and organise. 7

Organization Contact Documentation Stewardship Administration aspect of the component The stewardship region supports the administration aspect of the components - organisation that interact with the component and contacts within the organization * responsible organisation = owner * submitting organisation that submits the component for registration * registrar - the organisation who registers the component - supporting documentation of the component 8

Organization Contact Documentation Identification Identification Time Frame Naming and identification The identification region manages the names of the component. Time Frame was added here to bring a time context for the component identified. The Time Frame can also be part of the Stewardship. 9

Organization Contact Documentation Classification Identification Time Frame Keyword Theme Management of classification schemes Classification region is the - management of classification schemes - and management of components that are classification schemes. Keywords which is a list of preferred terms from the LIBRARY s thesaurus. Themes list from the LIBRARY. 10

Administered Component Organization Contact Documentation ISO 11179 Identification Time Frame Keyword Theme Data Element Data Element Concept Object Class Property Conceptual Domain Value Domain With respect to the ISO11179, it is the management, organisation and tracking of the DE and each of the components that form it. The shadowed components are AC. They are defined, registered and can be reused/shared. They are sub-types of the AC and therefore share all the relationships of the AC. The ORG, CONTACT, DOCUMENTATION etc As an example the DEC AC in addition to its relationship with OBJECT CLASS and PROPERTY, it has relationships with ORG, CONTACT, DOC and the remaining items in the AC box. Note that Documentation which is part of the Stewardship region, is also an AC. Earlier I had mentioned that the ANSI 3.285 was extend to include the statistical activities. What does this mean? 11

Organization Contact Documentation Identification Time Frame Data Element Administration Keyword Theme definition representation permissible values Data Element Data Element Concept Object Class Property Conceptual Domain Value Domain Data Element Administration region covers the management of a DE. DE is a unit of data with def n attributes identification attributes representation attributes permissible values attributes DEC is Object Class + Property -- No representation DE is DEC + CD + VD (DEC + Representation) These are the components that are required to fully define a shareable DE. 12

Administered Component Organization Contact Documentation Identification Time Frame Keyword Statistical Activity Survey Universe Frame Survey Instance Instrument Data File Theme Methodology Sample Deslgn Instrument Design Collection Error Detection Imputation Estimation Time Series Quality Evaluation Quality Measures Confidentiality Data Element Data Element Concept Object Class Property Conceptual Domain Value Domain It means that all the components of the statistical activities are also managed, organised and tracked by the administered component. In addition to the relationship among the components themselves, they also have a relationship with AC. STAT ACTIVITY AC = SURYEY, UNIVERSE etc Methodology = Sample Design, Instrument Design etc Each are AC s, but because of space constraints, they have been grouped together. Some of the relationships are displayed in the green arrows: Survey. Examples of Data Element links are: Questions are asked for Data Elements DataFile contain Data Elements Universe is defined by DEC. Frames are defined by DEC. This concludes the overview of the ANSI 3.285 and ISO 11179. ( shows the structure and relationship among parts 1 to 6 ) 13

Next Phase of IMDB Extending the content of the IMDB database to include the concepts, variables and classifications published for every STC survey Focus first on data published through CANSIM 14

Expected benefits Most frequently cited missing metadata in recent market research Fulfill requirements of Policy on Informing Users of Data Quality and Methodology 15

Additions to STC web site For every survey, list of variables published, with hyperlinks to definitions, classification used and source of on-line data (CANSIM, Daily, Canadian Statistics table) On Statistical Methods page, searchable list of all variables, with links to definitions, classifications and source of on-line data New hyperlinks in CANSIM to definitions stored in IMDB 16

Implications To be stored in IMDB, information on variables must be consistently structured To be listed in web pages, variables must be meaningfully named To be most effectively searchable, variables must be consistently named 17

Structure of information on variables in IMDB Statistical unit + property + representation = Variable Statistical unit is agent, event or item about which data are produced Property is characteristic of statistical unit being measured Representation is form given to resulting data, e.g. Quantity, Value, Type 18

Naming convention All three elements used to create name of variable Value of sales of establishment Type of assets of establishment Name of geographic location of person Type of occupation of person Value of GDP of economy 19

IMDB Phase III Data Element Model Object Class A set of ideas, abstractions, or things in the real world that can be identified with explicit boundaries and meaning whose properties and behavior follow the same rules. At STC = statistical unit Can be an agent, event or item 20

Macro Statistical Units In order to comprehensively cover the data and information published by Statistics Canada, different views are accommodated within the framework. Four Macro Statistical Units were chosen to provide four different views within the framework. 21

The four views The Macro Statistical Units are: People Economy Environment The State The four views divide the framework into four different sections 22

Fundamental Statistical Units: Definition Fundamental Statistical Units are defined as those that are not types of any other unit and can not be derived as grouping of any other unit. Fundamental Statistical Units keep the model simple and robust by limiting and organising the number of Statistical Units. 23

Fundamental Statistical Unit: Types Agents: Statistical Units that operate and whose operations are reported on by Statistics Canada. Events: Statistical Units that represent the actions of (or by) Agents as reported by Statistics Canada. Events are defined as occurrences that are discrete in time (occur in time period) and finite (can be counted). Items: Other Statistical Units reported on by Statistics Canada that are generally created by Agents. 24

Commonly Used Derivations of Fundamental Statistical Units Subclasses based on inherent characteristics Roles Aggregations 25

IMDB Phase III Data Element Model Object Class Property A characteristic or attribute common to all members of an object class. Occurrence is a special property for cases where the variable is simply a count (or measure) of the object class 26

IMDB Phase III Data Element Model Object Class Property A concept that can be represented in the form of a data element, described independently of any particular representation. Data Element Concept amalgamation of the object class and the property. 27

IMDB Phase III Data Element Model Object Class Property The representation describes how the data are represented to represent a DEC logically a representation class must be added ISO 11179 guidelines and examples for representation terms include Name, Type, Amount, Quantity, Number, Etc. Data Element Concept Representation 28

IMDB Phase III Data Element Model A unit of data for which the definition, identification, representation, and permissible values are specified by means of a set of attributes. Object Class Property The data element is a data element concept with a representation (object class + property + representation). At STC = variable Data Element Our naming convention is the natural language form: Concept representation of property of object class Data Element Representation 29

Data elements in a statistical agency Data most often presented in tabular form Data elements are dimensions of statistical tables Data element thus defined can have many value domains Data element has one and only one value domain in the context of a given datafile = Data element - value domain - table map 30

IMDB Phase III Data Element Model Set of permissible values and their associated meanings AT STC = classification Can have several value domains per DE Can be enumerated or non-enumerated Value Domain Data Element Object Class Data Element Concept Property Representation 31

Value domains and classifications Hierarchically related value domains structured as classifications or taxonomies Assign levels to value domains and parentchild relationships between levels and between permissible values 32

Value Domain Standard Classification Table 1 Current Account (Standard Classification) Goods Services Investment Income Current Transfers Travel Transportation Commercial Services Government Services Direct Portfolio Other Private Official 33

IMDB Phase III Data Element Model USERS STEWARDS Object Class Property CANSIM Array Value Domain Data Element Concept Data Element Representation 34

IMDB Phase III Data Element Model USERS STEWARDS Object Class Property CANSIM Array These points of entry include IMDB survey page, Concepts and variables, Search engine Value Domain Data Element Concept Other Entry Points Data Element Representation 35

Next steps Produce lists of variables and definitions from IMDB test environment for review and discussion with subject-matter areas Activate on STC Intranet for trial period Roll-out into production 36

Conclusion IMDB is a significant corporate infrastructure for Information Management Comprehensiveness and quality are continuously improving, with strong management support Will provide an additional tool to help users find and interpret data published by Statistics Canada 37