Metadata and Infrastructure for Researchers from a perspective of an NSI C-G Hjelm Research and Development at Statistics Sweden
The coming minutes Metadata and Microdata from a perspective of an NSI Metadata and Microdata from a perspective of a researchers Legal basis/legislation on Microdata in the Nordic countries Nordic smorgasbord Data Warehouse Solutions of today MONA The lifecycle for a research project Cooperation with the Swedish Research Council An example within the Public Domain Solutions of tomorrow Federation
Metadata and Microdata from a perspective of an Nordic NSI
How an NSI see Microdata and Metadata Production of statistics Periodical statistics (year, month etc) Metadata for statistical production No longitudinal Microdata or Metadata Documentation primary for production and archiving Microdata is (only) the base for the aggregated statistics
Metadata and Microdata from a researchers perspective Microdata!!! Longitudinal perspective on both Microdata and Metadata Documentation essential on process data / paradata Searchable Metadata Linkable to other materials Metadata standard ISO 11179
Legal basis/legislation Swedish legislation regarding Microdata Statistics Acts Personal Data Acts Secrecy Act
Statistics Act Data collected for statistical purposes may be used for the production of statistics research purposes public planning
Confidentiality main rule Data obtained for statistical purposes are confidential Access may be granted in forms which do not allow direct or indirect identification of persons or of other data subjects like enterprises
Confidentiality - exception Confidential data may be released to a third party for the purpose of statistical surveys or research.
Confidentiality The obligation of confidentiality will also according to the law or by imposition of a duty of nondisclosure apply to the recipient of the data.
Nordic smorgasbord Data Warehouse
Data collection Administrative registers Postal questionnaires?! Other electronic transfers Interviews Telephone Face-to-face
The Swedish Register system a goldmine for researchers Swedish for Immigrants Causes of Death Adults Education Supplementary Benefits Upper Secondary School Vehicles - privately owned Form 9 Persons nominated and elected Teachers Education Higher education Personal Income and Assets Population Activity Persons Enrolled in Education Population and Housing Census Register Register Income Verifications Employment Private sector - wages Longitudinal Income County Councils - wages Longitudinal Welfare Ecclesiastical Districts - wages Persons entering Labour Market Municipalities - wages Second Generation Register Civil Servants - wages Fertility Occupational Register Geographical Database Standardised Accounts Real Estate Price Monthly Tax Returns Restoration of Buildings VAT Register New Construction of Buildings Foreign Trade One or Two Dwelling Buildings Real Estate Business Vehicles - company owned Multi Dwelling Buildings Register Register Agricultural Register Industrial Real Estate Register of Business Statistics Agricultural real Estate Register of Schools Dwellings Municipality Register
Data Warehousing vision Metadata Process data Dissemination Direct collection Input data Observation data Target data Macro data Presentation data Publishing Admin data Preprocessing Base register Publication data Raw data Micro data Macro data
Today
The MONA system Microdata ON-line Access)
Advantages and basis of MONA Statistics Sweden and other authorities has distributed Microdata to research institutions in Sweden for many years by CD-ROM and other medias Today there are new needs to distribute Microdata primary larger volumes We wont control over distributed data Secure system User friendly system Users don t need their own infrastructure
Advantages and basis of MONA - continued Extended and better documentation suitable for researcher Faster and cheaper Increased level of service Increased choice of place to perform the research research is no longer local Increased possibilities to build common used databases for different disciplines Increased secondary use of Microdata
User statistics of MONA Started winter 2005 2010 around 650 registered users (500 active) demanding and creative users Round 25-30% increasing per year The users are researchers (75%) and Public Offices (25%)
MONA some technical buzzwords Server based design - Microsoft Terminal Services / Windows OS - User have their own desktops - Only screen updates are transferred - No cut and paste between sessions - Email in PGP for result files Användare Användare Internet Användare Security - SSL tunnel - https://mikrodata.scb.se - RSA Secure ID - Domain logon and reauthentication with security token Mikrodata Databaser Excel /Word Brandvägg Applications (present) SAS Egna data Super Cross - SAS, SPSS, GAUSS, R, TSP, Super Cross, STATA, LISREL, GAMS, WinRats, SQL Query Analyzer and Microsoft Office SPSS SQL
Public certificate
MONA domain Internet https://mikrodata.scb.se File, SQL, Application and RSA Servers SAN https SSL Reverse Proxy (PortWise) Domain Controller and Terminal Servers
Structure of data in MONA Structured data layer (Data Warehouse) Views (filter) in SQL Server User-1 User-2 User-n
The lifecycle for a research project Often ends with the archiving of the project this includes the data For normal archiving For academic replication This part is often forgotten in the research community and in the NSI s
Cooperation with the Swedish Research Council
Cooperation with the Research Community Funding to Statistics Sweden from the Swedish Research Council (public authority that handles most of the research funds in Sweden) 2006-2008 with evaluation in summer 2008 10.000.000 SEK (ax 1.000.000 ) per year Evaluated in April 2009
Activities and projects Access to the MONA system Service to researchers Selection of structured registers available for researchers Suitable metadata, documentation and classifications for researchers on the selected registers available via Internet Quality indicators on register data Routines for handling researchers data after the ending of the project
Multimode Interviewing Capability in the Public Domain - an example open solutions in Metadata and Interviewing
MMIC (Multimode Interviewing Capability) Source: https://mmic.rand.org/mmic
Participating studies
Metadata (SHARE)
Tomorrow
Solutions of tomorrow federation?
Examples of reports
International reports
DISC reports
Questions and Thank You for Your time claus-goran.hjelm@scb.se