A Centralised System for Administrative Data Collection at Statistics Finland

Similar documents
Sinikka Laurila Statistics Finland,

Effect of the revised publishing system on the development of the contents of Statistics Finland s free data supply

Joint UNECE/Eurostat/OECD work session on statistical metadata (METIS) (Geneva, 3-5 April 2006)

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING

METADATA MANAGEMENT AND STATISTICAL BUSINESS PROCESS AT STATISTICS ESTONIA

WP.25 ENGLISH ONLY. Supporting Paper. Submitted by Statistics Finland 1

Business Architecture concepts and components: BA shared infrastructures, capability modeling and guiding principles

Metadata and classification system development in Bosnia and Herzegovina

Open API recommendations for cities

Generic Statistical Business Process Model

XML-based production of Eurostat publications

GSIM Implementation at Statistics Finland Session 1: ModernStats World - Where to begin with standards based modernisation?

Best Practice for Creation and Maintenance of a SAS Infrastructure

College of Agricultural Sciences UNIT STRATEGIC PLANNING UPDATES MARCH 2, Information Technologies

INSPIRE status report

METADATA FLOWS IN THE GSBPM. I. Introduction. Working Paper. Distr. GENERAL 26 April 2013 WP.22 ENGLISH ONLY

Description of the European Big Data Hackathon 2019

Compass INSPIRE Services. Compass INSPIRE Services. White Paper Compass Informatics Limited Block 8, Blackrock Business

NETWRIX GROUP POLICY CHANGE REPORTER

Proposals for the 2018 JHAQ data collection

VdTÜV Statement on the Communication from the EU Commission A Digital Single Market Strategy for Europe

Evaluation of technologies that will improve the UEL IT infrastructure, recommending and advising on strategic improvements

Economic and Social Council

Implementing ITIL v3 Service Lifecycle

Data Migration Plan Updated (57) Fingrid Datahub Oy

IT Enterprise Services. Capita Private Cloud. Cloud potential unleashed

ESSnet. Common Reference Architecture. WP number and name: WP2 Requirements collection & State of the art. Questionnaire

NSPCC JOB DESCRIPTION

BASIC NAVIGATION & VIEWS...

strategy IT Str a 2020 tegy

Content: Company presentation:

European Conference on Quality and Methodology in Official Statistics (Q2008), 8-11, July, 2008, Rome - Italy

EUROPEAN ICT PROFESSIONAL ROLE PROFILES VERSION 2 CWA 16458:2018 LOGFILE

OSCE/ODIHR Election Expert Database. User s Manual

Writing Reports with Report Designer and SSRS 2014 Level 1

CORA Technical Annex. ESSnet. WP3: Definition of the Layered Architecture. 3.2 Technical Annex. Statistics Netherlands

EUROINDICATORS WORKING GROUP. Demetra+, a new seasonal adjustment tool 12 TH MEETING 3 RD & 4 TH DECEMBER 2009 EUROSTAT D5 DOC 287/09

POSITION DESCRIPTION

New Zealand Government IBM Infrastructure as a Service

INFORMATION SECURITY PRINCIPLES OF THE UNIVERSITY OF JYVÄSKYLÄ

28 - Integrated Production & Business Process Model Ildikó Györki and Ildikó Szűcs

Measures for implementing quality labelling and certification

What to Expect When You Need to Make a Data Delivery... Helpful Tips and Techniques

SHARING GEOGRAPHIC INFORMATION ON THE INTERNET ICIMOD S METADATA/DATA SERVER SYSTEM USING ARCIMS

Checklist and guidance for a Data Management Plan, v1.0

OPTIONS FOR INFORMATION COMMUNICATION TECHNOLOGY (ICT) PROVISION AND SUPPORT CONTRACT

Manager, Infrastructure Services. Position Number Community Division/Region Yellowknife Technology Service Centre

SOUTH AFRICAN LIBRARY FOR THE BLIND (SALB)

M a d. Take control of your digital security. Advisory & Audit Security Testing Certification Services Training & Awareness

Position Description For ICT Systems Officer Information, Technology and Communication Department Hobart

McAfee Security Management Center

Toward Horizon 2020: INSPIRE, PSI and other EU policies on data sharing and standardization

Ref. Ares(2015) /12/2015. D9.1 Project Collaborative Workspace Bénédicte Ferreira, IT

Streamline SDTM Development and QC

Data Migration Plan (40) Fingrid Oyj

The One-Ipswich Community

User Guide. Kronodoc Kronodoc Oy. Intelligent methods for process improvement and project execution

Renewing statistical e-publications

Person-centred ehealth

WHO-ITU National ehealth Strategy Toolkit

NATIONAL CYBER SECURITY STRATEGY. - Version 2.0 -

The National Digital Library Finna Among Digital Research Infrastructures in Finland

Strategic Action Plan. for Web Accessibility at Brown University

Getting started with Inspirometer A basic guide to managing feedback

National Council for Special Education. NCSE Support Service Advisor Job Description and General Notes

NETWORK ACCESS CONTROL OVERVIEW. CONVENIENCE. SECURITY.

Creating a Departmental Standard SAS Enterprise Guide Template

Metal Recovery from Low Grade Ores and Wastes Plus

Commission Action Plan on Environmental Compliance and Governance

MANAGING STATISTICAL DEVELOPMENT AND INFORMATION TECHNOLOGY IN THE STATISTICAL SYSTEM OF MALAYSIA

Help Manual. Personal DataPublisher For Outlook & SharePoint

ESET Remote Administrator 6. Version 6.0 Product Details

Further details about this position, including key responsibilities and requisites, are available on our website.

Marine Institute Job Description

Microsoft OneNote and Toshiba Customer Solution Case Study

Business Glossary Best Practices

Why do architects need more than TOGAF?

Cloud First: Policy Not Aspiration. A techuk Paper April 2017

Grant for research infrastructure of national interest

SDMX GLOBAL CONFERENCE

Chris Skorlinski Microsoft SQL Escalation Services Charlotte, NC

Outlook 2007 Guide. Frequently Asked Questions

POSITION DETAILS. Content Analyst/Developer

User Manual Message box for the Point of Single Contact. November 2009 version. Gebruiksaanwijzing Berichtenbox voor het Dienstenloket 1

NHS e-referral Service Transition Planning WebEx May 2015

A new international standard for data validation and processing

Job Specification & Recruiting Profile of Vacancy

Position Description For ICT Officer Support Information, Technology and Communication Department Hobart

Privacy Impact Assessment

ecald TM Web Developer Position Description

AGIIS Synchronization

ARCHIVE ESSENTIALS

Turning partnership into success

D2.5 Data mediation. Project: ROADIDEA

Qvidian Proposal Automation provides several options to help you accelerate the population of your content database.

Business Model for Global Platform for Big Data for Official Statistics in support of the 2030 Agenda for Sustainable Development

Senior Manager Information Technology (India) Duration of job

WEB-BASED COLLECTION MANAGEMENT FOR ARCHIVES

Reviewing Aspects of User Orientated Quality Reporting

Request for Proposal: Deadline for application: 14 March 2018

Transcription:

EFFICIENT STATISTICAL PRODUCTION SESSION B A Centralised System for Administrative Data Collection at Statistics Finland Johanna Sisto Janne Eskelinen Sinikka Laurila Statistics Finland

A Centralised System for Administrative Data Collection at Statistics Finland Johanna Sisto 1, Janne Eskelinen 2, Sinikka Laurila 3 Abstract. Recently, Statistics Finland developed a centralised system for collecting and receiving administrative data. That work was well in line with Statistics Finland s strategic plans to harmonise processes, develop generic solutions and centralise data collection activities and know-how. The work was organised as a project. One of the results of the project was that Statistics Finland now has a new system that utilises process metadata to control and monitor the received administrative data. Project results also included the implementation of a generic tool for technical validation and distribution of administrative data. During the project, a dedicated team took the responsibility for the processes linked to administrative data. In addition to handling the data receiving process, technical validation and distribution of administrative data, these processes include contract management, as well as co-operation with both users and suppliers of administrative data. Key words: administrative data, centralised data collection, standardised processes 1. Introduction This paper will explain the background behind the centralised system for collecting and receiving administrative data at Statistics Finland (see Section 2) as well as the principles of the administrative data collection process (see Section 3) and the basic ideas of the generic tool developed for technical validation and distribution of administrative data (see Section 6). The paper also discusses the need for standardising certain phases of the centralised administrative data collection process (see Section 4), describes the benefits of centralised administrative data collection (see Section 5) and gives some examples on how the project and the Havas team have tried to overcome the challenges of 1 Statistics Finland, Senior Statistician, Helsinki, Finland, Email: johanna.sisto@stat.fi 2 Statistics Finland, Senior Statistician, Helsinki, Finland, Email: janne.eskelinen@stat.fi 3 Statistics Finland, Chief Analyst, Helsinki, Finland, Email: sinikka.laurila@stat.fi

changing disharmonised practices related to administrative data collection (see Chapter 7). Although the centralised administrative data collection system developed in the Havas project has already been in use for a while, centralising administrative data collection is still an ongoing development at Statistics Finland. Section 8 of this paper discusses the next steps for further development. 2. Background Exploiting administrative data in statistical production is not a new idea at Statistics Finland. Recent strategical developments, however, have led towards a more process-oriented approach, and Statistics Finland has tried to centralise data collection activities and know-how (Tilastokeskus 2013 and 2016). Part of the ICT strategy is to develop generic solutions, and the new centralised system for administrative data collection falls well within this area. 2.1 Using administrative data in statistical production About 95 per cent of all the data that are used in statistical production at Statistics Finland come from administrative data or registers. One of the main successes in collecting and using administrative data is the population census data, which has been combined from various official registers since 1990. Statistics Finland has collaborated with official data providers in collecting data from their administrative data sources for different statistical needs since the first registerbased population census. Today about 150 administrative data or registers are collected and used in producing statistics. 2.2 ICT strategy implementation Statistics Finland updated its current ICT strategy (2015 to 2019) during 2014 (Tilastokeskus 2014). The mission of ICT activities is to develop and generate efficient ICT solutions for statistics production. In addition, the target is to offer a reliable and easy-to-use interface to Statistics Finland s services for data suppliers and statistics users. The critical success factors of ICT activities include Efficient service production and reliable infrastructure, Proactive and innovative service approach, Systematic, professional and efficient development activity, and Unified processes, methods and tools.

The previous ICT strategy from 2008 pointed out the following guidelines and principles, which had to be implemented when developing new statistical information systems (Tilastokeskus 2008): To harmonise the statistical processes by a common process model (Generic Statistic Business Process Model, GSBPM) To use common tools and programs when developing new systems To build centralised statistical systems that can be used broadly in various statistics To promote the integration of different statistical systems, which includes the effective use of metadata in statistical processes As a result of the 2008 strategy guidelines, Statistics Finland decided to centralise the collecting of administrative data from various official data providers and started a project whose main task was to build an information system and a technical solution for centralised data collecting. 2.3 Havas project The project started in April 2013 and lasted two years ending in March 2015. The main goals of the project were: To find and record all the administrative data that are collected and used at Statistics Finland To define and build a centralised IT system for receiving the administrative data by line transfer To define and build a SAS system for pre-controlling the data received To describe all the data and variables in a centralised meta-data system. The data descriptions were to be used in pre-controlling the data before they are handed over to statistical production To organise a team to whom the centralised administrative data collecting was handed over after the project The Havas team was formed and it started its work alongside the project in autumn 2014. One of the most important tasks of the team is the collaboration with the official organisations that are the data providers. The goal of the collaboration is that Statistics Finland will have data contracts with all the data providers by the end of 2017. This is also mentioned as a task which the Peer Review of Eurostat appointed to Statistics Finland in 2014 (Durr et al. 2014). Today, nearly 65 per cent of the administrative data (files) are collected and received using the centralised information system.

3. Administrative data collection process The basic idea of the centralised administrative data collection system has been built on the following notions: Process metadata information controls data transfer and directs the arriving administrative data files into the right target directory Data files arriving in flat file format are transformed into SAS file format using the generic application and metadata information (file descriptions) Data files are validated technically utilising metadata information (file descriptions) Accepted data files are saved in predetermined client directories However, the implementation of a generic tool for technical validation and distribution of administrative data compose only part of the Havas team s daily work. Alongside handling the data receiving process, the Havas team also manages and updates contracts, takes care of co-operation with data users and data providers, organises meetings, looks after service point duties, solves problems that users have with administrative data, maintains contact information, writes guidelines and develops methods and practices. 3.1 Data transfer Contracts with data providers determine how administrative data travel between external transfer servers of the data provider organisation and Statistics Finland. Basically, there are two different possibilities (see Figure 1 below): Either Statistics Finland retrieves the data from the external transfer server of the data provider organisation, or The data provider organisation delivers the data to Statistics Finland s external transfer server. Once the data have arrived to Statistics Finland s external transfer server (DMZ area), they will be transferred to the internal transfer server. At this point, the Havas team receives an automatically sent email telling that the data have arrived into the directory dedicated to the data provider. In the next phase, the process metadata information saved in the exist database directs the data files into the right target folder on the TKSAS server based on the filename and provider ID of the data. In order for this phase to end successfully, it is essential that data providers follow the contracts and name the data files as agreed. Otherwise, the file cannot be directed to its target folder and the Havas team receives an automatically sent error message about the orphan file.

Provider ID =?? Filename =?? Administrative data (file) Data provider s (external) transfer server StatFin s external transfer server (DMZ) StatFin s internal transfer server TKSAS server Process metadata information - File is recognised based on provider ID and filename and saved in target directory Figure 1. Servers and process metadata information 3.2 Technical validation and data transformation Once the administrative data have been saved in the predetermined import folder, a Havas team member transforms the data into the SAS dataset and validates the data technically. This step utilises metadata i.e. file descriptions that have been saved in the exist database using the metadata editor application and the data validation application (see Figure 2 below). The basic ideas behind the metadata editor application and data validation application are described in Sections 6.1 and 6.2 of this document, respectively. The transformed and validated administrative data are saved in a predetermined export folder as SAS datasets. As an output from the technical validation phase, the Havas team receives an email including a data validation report in HTML format. If the validation report shows that the data meet expectations, the Havas team sends the validation report to the end user(s) and informs them that the data are now available in the export folder.

TKSAS server Data Predetermined Import folder Data transformation into SAS datasets & Technical validation Metadata Metadata editor application exist Database Data validation application SAS/EG stored process SAS data Predetermined export folder Data validation report HTML report Data user Access to export folder Access to validation report Figure 2. Technical validation, data transformation and routing data to their predetermined target folder 4. Need for standardising certain phases of the centralised administrative data collection process The centralised administrative data collection process needs standardising of certain phases in order to work properly. By doing this it is possible to utilise much more efficient administrative data collection processes than before. First of all, Havas team members make sure that metadata and also process metadata are stored and in order. Havas team members also make sure that every data delivery has its own data collection ID code. That ID code forms the basis for the consistent data folder structure in the TKSAS server. It is also very important that both the deliverer/supplier of administrative data and Havas team members work in close collaboration and obey common specifications and objectives. New harmonised contracts with all administrative data suppliers make the basis for a fluent and efficient data receiving process. The main thing written in these contracts is the information of all administrative data that are sent to Statistics Finland. To be more precise, the key information includes: the name of

every single data set, time of delivery, data formats, how the data are delivered to Statistics Finland and by which filename. In practice, by means of this information, it is possible to automatise the entire data collection process, which includes automatic technical validation, data transformation into a SAS dataset and routing data to its predetermined target folder. 5. Benefits of centralised administrative data collection Successful centralised administrative data collection comes with many benefits compared to a situation where many separate statistical production units are in charge of their administrative data collection. By means of the centralised data collection and consistent data collection procedures we are able to achieve many benefits, for example: 1) More efficient and systematic administrative data receiving processes This means the standardisation of our data receiving processes that enables process automatisation. This also means standardising our operating environment including a consistent folder structure for the data received. 2) Better transparency of the utilised administrative data In future, it is possible to establish a much better overall picture of the administrative data utilised at Statistics Finland. In other words, we can increase our knowledge of collecting and receiving administrative data including the information on what, when and what sort of data we are receiving. Earlier, forming a general view was difficult because all that know-how was fragmented. 3) Rationalised data requisitions and advanced data usage In future, one team is responsible for all new administrative data requisitions, hence we can avoid overlapping of requisitions. A common objective is that the joint use of administrative data increases and that the statistical production units are henceforth more aware of the existence and possibilities of the administrative data gathered.

6. Principles of tools developed for technical validation and distribution of administrative data The everyday toolbox of the Havas team includes SAS/EG the application that transforms flat files into SAS datasets operates under SAS/EG exist file descriptions and process metadata are saved in the exist database exide the application that is used to update process metadata Variable editor (called Muuttujaeditori ) the application that is used to update file descriptions Specific directory structure (O:\yhteishankinta\havas) for saving (and updating), e.g. contact information, technical validation reports, transformation programmes and documents relating to contract management WinSCP to monitor and manage files or transfer files between local and remote servers Shared email address to monitor the data receiving process and co-operation with both users and suppliers of administrative data Shared Outlook calendar to plan timetables and duties, but also to disseminate information between Havas team members on how to transform specific flat files to SAS datasets (e.g. priority/urgency, contact information on the users and suppliers of the dataset and information on the right file description) 6.1 File descriptions: Data and Variable Metadata database The administrative data which are received and collected by the centralised system are checked before they are handed over for statistical production. The checking is mainly technical including basic information on distributions, possible double observations, missing values and variables and classification values. The technical validation check is done in the SAS EG environment and the process uses the data and variable metadata descriptions that are stored in the metadata system (see Annex 1). Data and variable metadata descriptions are created and maintained by the Variable editor, which is a metadata application made at Statistics Finland at the end of 2000. In the Variable editor it is possible to describe information of data in all statistical process phases from collecting data to data dissemination (statistical table descriptions). In the Variable editor the metadata of data and variables are stored in the metadata database (exist). The database is in XML format and the metadata model behind is the CoSSI metadata model (Common System for Statistical Information) that was created at Statistics Finland in the early 2000s. One of the reasons why the CoSSI model for statistical information was created was that the common metadata models did not consist suitable tools for describing variable metadata at the time.

Every file of the administrative data that are received has a connection to its own metadata description. This makes it possible to check various differing data by the same checking program in the SAS environment. Variable level information required in data transformation and technical validation include technical name, data type, information, length, start position, minimum and/or maximum value (for numeric variables) and either the list of permissible values or classification (for character variables). Besides variable information, file descriptions include information on the search path i.e. the import directory where the flat file was saved and the name of the flat file as well as the file format and the delimiter character. 6.2 Havas application The heart of the application that transforms flat files (e.g..dat,.csv,.txt) into SAS datasets is a SAS macro that utilises file descriptions (metadata) saved in the Variable editor. The application performs technical validation of the data against the file description, saves the SAS file to the target directory, produces a validation report and sends the report to the predefined email address. The report includes information on the number of observations, distribution of numeric variables, number of missing values by variable, duplicate key values and error lists. The application operates under SAS/EG. The user needs to select (see Figure 3 below): Metadata folder, where the file description is being saved Name of the file description Target folder, where the application saves the SAS-file Decimal separator that has been used in the flat file The first data row of the flat file The application searches the right flat file based on the search path and the filename given in the metadata file description.

Choose metadata folder Choose file description Choose target folder for SAS-data Choose the first data row Choose decimal separator Choose Run Figure 3. Havas application 7. How to overcome the challenges of changing disharmonised practices related to administrative data collection The Havas team, data users, data providers and ICT experts are all important operators in the process of administrative data collection. As a result, it is vital that all the operators concerned have clear roles, know their own responsibilities and follow the agreed practices. In order to keep everybody in the picture, information must reach all interested parties. The road towards harmonised processes and centralised collection of administrative data has not been completely free of challenges or obstacles. Generic and efficient solutions that serve most user needs well may not always serve every single user need quite as optimally as the old user specific solutions might have done. Therefore, it is important that the management recognises ongoing development such as centralising administrative data collection as an important strategic plan and is ready to be committed to the project. Informing all interest groups regularly and repeatedly is extremely important. Data users need to hear about the benefits of the new system, but also about the limitations. This helps them to build a realistic impression. The information provided should also be directed and easy enough to understand. Development projects changing processes and old practices bring out the importance of coordination and collaboration as well as carefully considered and clearly expressed responsibilities and entitlements. It is also important to follow the newly

agreed practices loyally, because otherwise the number of exceptions will increase and benefits will never be realised to full potential. One way that can help to achieve these targets is to write down (official) guidelines and instructions. In order for the development project to run as smoothly as possible, it is vital to guarantee that competent human resources are at the project s disposal when needed. Otherwise, the project is at risk of being delayed. It is also important to recognise other risks that may cause delays or otherwise hamper the project such as an unstable ICT environment and try to tackle them by developing secondary or alternative solutions, for example. 8. Next steps for further development Centralising administrative data collection is still an ongoing development at Statistics Finland. At present, the Havas team handles the data receiving process, technical validation and distribution of about 145 administrative datasets (separate files), which represents about 65 per cent of all administrative data (files) suitable for centralised data collection. Increasing the level of automation There are some important development steps to be taken before the remaining datasets can be included in the centralised data collection. One of these steps is to increase the level of automation. In practice, this means that the Havas application would work without the user selecting the right metadata folder, file description, target folder and other required parameters. This development is crucial especially for timely data that Statistics Finland receive weekly. Widening the range of file formats Another step is to develop the centralised data collection system so that it becomes possible to widen the range of file formats included in the system from flat files and SAS files to XML files and maybe also to Excel files. This is important, because an increasing number of administrative data providers want to send the data in XML format. Developing a user interface for controlling the process There is also a clear need to develop for the Havas team a user interface that could be used to control and manage the whole process of centralised administrative data collection. The user interface would offer tools to update and maintain process metadata, contracts and contact information, for example.

At present, there is neither enough up-to-date information nor a suitable system available to follow the process of administrative data collection systematically. However, by far the bigger problem is the availability and ease of use of information, not so much that it would not exist. The target is that information systems would enable printing weekly or monthly reports on exact timetables, arriving and received/accepted datasets and timeliness or costs of administrative data. This would mean considerable improvement in the Havas team s possibilities to plan and develop its own work. A better overall picture would also benefit the data users. 9. Conclusion In respect to the number of data included in the centralised system for administrative data collection, we have now crossed the halfway mark. Even though the actual development work was organised as a project and had a strong strategical background, the process of creating something new and changing disharmonised practices also meant learning by doing. Regardless of good planning, at the beginning it was not always easy to estimate how much time it would actually take to handle the data receiving process, technical validation and distribution of the data, contract management or email communication. We are still taking steps in finding out the best way to organise the actions, process management and responsibilities in order to service the users and providers of administrative data as well and as efficiently as possible. There is some important technical development ahead as well if and because we want to increase the level of automation and widen the range of file formats included in the centralised administrative data collection. Developing a user interface for controlling and managing the whole process and timetable of data collection is also of importance. Acknowledgments Writers of this paper want to thank Tiina Vesala for participating in writing the abstract.

10. References Tilastokeskus (2016). Strategy document 2016 to 2019 (in Finnish), Available at: http://www.stat.fi/static/media/uploads/org/tilastokeskus/strategia_2016-2019.pdf (Accessed 12 May 2016). Durr, J.-M., de Pourbaix, I., Redmond, A. (2014). Peer Review Report Finland, Available at: http://ec.europa.eu/eurostat/documents/64157/4372828/2015-fi- Report/688a9a11-741e-4311-b2f8-a0943104c933 (Accessed 12 May 2016). Tilastokeskus (2014). ICT-strategia 2015-2018 (TK-04-531-14). Tilastokeskus (2013). Strategy document 2012 to 2015 and Performance target document 2014, Available at: http://tilastokeskus.fi/org/tilastokeskus/tulossopimus_2014_en.pdf (Accessed 12 May 2016). Tilastokeskus (2008). Tilastokeskuksen ICT-strategia (TK-04-307-07).

Annex 1: Statistics Finland s metadata system Metadata repository Variable editor Metadata editors Concept editor Classification editor Metadata management Classification database (SQL) Dataset and variable description database (exist) Concepts and definitions database (SQL) Metadata utilised in Administrative data collection Statistical data systems SAS Webpages and -services Eurostat quality reports Customer assignments Archiving process