CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING

Similar documents
Business Architecture concepts and components: BA shared infrastructures, capability modeling and guiding principles

METADATA MANAGEMENT AND STATISTICAL BUSINESS PROCESS AT STATISTICS ESTONIA

Centre of Excellence on Data Warehousing. Harry Goossens

Generic Statistical Business Process Model

3.4 Data-Centric workflow

Privacy and Security Aspects Related to the Use of Big Data Progress of work in the ESS. Pascal Jacques Eurostat Local Security Officer 1

Economic and Social Council

Data Management Glossary

ESS Shared SERVices project Background, Status, Roadmap. Modernisation Workshop 16/17 March Bucharest

A metadata-driven process for handling statistical data end-to-end

A new international standard for data validation and processing

European Conference on Quality and Methodology in Official Statistics (Q2008), 8-11, July, 2008, Rome - Italy

ESSnet. Common Reference Architecture. WP number and name: WP2 Requirements collection & State of the art. Questionnaire

Generic Statistical Information Model (GSIM)

A STRATEGY ON STRUCTURAL METADATA MANAGEMENT BASED ON SDMX AND THE GSIM MODELS

DIRECTORS OF METHODOLOGY/IT DIRECTORS JOINT STEERING GROUP 18 NOVEMBER 2015

Data driven transformation of the public sector Tallinn, Estonia Head of unit 22 September 2016 European Commission

Data Governance Central to Data Management Success

What s a BA to do with Data? Discover and define standard data elements in business terms

2011 INTERNATIONAL COMPARISON PROGRAM

Metadata Framework for Resource Discovery

Business Intelligence Roadmap HDT923 Three Days

On the Design and Implementation of a Generalized Process for Business Statistics

Linkage of main components of GSBP model through integrated statistical information system.

Designing a System Engineering Environment in a structured way

Linked open data at Insee. Franck Cotton Guillaume Mordant

National Data Sharing and Accessibility Policy-2012 (NDSAP-2012)

Metadata and classification system development in Bosnia and Herzegovina

Data Quality Assessment Tool for health and social care. October 2018

A Centralised System for Administrative Data Collection at Statistics Finland

PRINCIPLES AND FUNCTIONAL REQUIREMENTS

2011 INTERNATIONAL COMPARISON PROGRAM

Integration of INSPIRE & SDMX data infrastructures for the 2021 population and housing census

Variables System the bridge between metadata and dissemination Teodora Isfan, Methodology and Information Systems Department, Statistics Portugal 1

Enterprise Architecture Layers

Workpackage WP 33: Deliverable D33.6: Documentation of the New DBE Web Presence

The CROS portal. A platform for your collaborative initiative? Jean-Marie Bolis & Martin Karlberg ESTAT B1 17 November 2017.

GSIM Implementation at Statistics Finland Session 1: ModernStats World - Where to begin with standards based modernisation?

Information Infrastructure: Foundations for ABS Transformation. Stuart Girvan, Australian Bureau of Statistics MSIS Paris, April 2013.

METADATA FLOWS IN THE GSBPM. I. Introduction. Working Paper. Distr. GENERAL 26 April 2013 WP.22 ENGLISH ONLY

Financial information: Promoting Data Sharing

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

Achieving regulatory compliance by improving data quality

Guidance Solvency II data quality management by insurers

D2.5 Data mediation. Project: ROADIDEA

Work Package on Mobile Phone Data

BUSINESS REQUIREMENTS SPECIFICATION (BRS) Documentation Template

Business Model for Global Platform for Big Data for Official Statistics in support of the 2030 Agenda for Sustainable Development

A corporate approach to processing microdata in Eurostat

EXAM PREPARATION GUIDE

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

After completing this course, participants will be able to:

European Commission - ISA Unit

Proposed Revisions to ebxml Technical Architecture Specification v ebxml Business Process Project Team

EUROSTAT and BIG DATA. High Level Seminar on integrating non traditional data sources in the National Statistical Systems

The Need for a Terminology Bridge. May 2009


Opus: University of Bath Online Publication Store

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

<Insert Picture Here> Enterprise Data Management using Grid Technology

European Commission. Immigration Portal Development Case. Date: 08/06/2007 Version: 1.0 Authors: Revised by: Approved by: Public: Reference Number:

Business Process Testing

D3.1 Validation workshops Workplan v.0

Microsoft SharePoint Server 2013 Plan, Configure & Manage

SUGGESTED SOLUTION IPCC MAY 2017EXAM. Test Code - I M J

ISO INTERNATIONAL STANDARD. Information and documentation Managing metadata for records Part 2: Conceptual and implementation issues

Module 3. Overview of TOGAF 9.1 Architecture Development Method (ADM)

Distributed Hybrid MDM, aka Virtual MDM Optional Add-on, for WhamTech SmartData Fabric

Business Architecture concepts and components: BA Process Flow

Managing your metadata efficiently - a structured way to organise and frontload your analysis and submission data

Streamline SDTM Development and QC

Enterprise Data-warehouse (EDW) In Easy Steps

Information Systems and Tech (IST)

Data Warehouse Testing. By: Rakesh Kumar Sharma

EXAM PREPARATION GUIDE

The development process of the Online S3 project. Anastasia Panori, INTELSPACE Innovation Technologies S.A.

DSDM Agile Professional Candidate Guidelines October I do it right

BI/DWH Test specifics

Executing Evaluations over Semantic Technologies using the SEALS Platform

Continuous auditing certification

Data Management Plan

A Study on Website Quality Models

SOFTWARE ENGINEERING DECEMBER. Q2a. What are the key challenges being faced by software engineering?

PREPARE FOR TAKE OFF. Accelerate your organisation s journey to the Cloud.

Metadata at Statistics Canada. Canadian Metadata Forum September 19-20, 2003

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

7. Detail: Main SDMX objects for metadata exchange (What is SDMX? Part iii)

WP3 Technologies and methods for Web applications

CEN and CENELEC Position Paper on the draft regulation ''Cybersecurity Act''

Introduction

Preservation Planning in the OAIS Model

Common Statistical Data Architecture (CSDA)

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Cyber Defense Maturity Scorecard DEFINING CYBERSECURITY MATURITY ACROSS KEY DOMAINS

SharePoint 2016 Site Collections and Site Owner Administration

CDEM Resilience Fund project application form

International Journal of Computer Engineering and Applications, REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE

Solving the Enterprise Data Dilemma

Workshop 4.4: Lessons Learned and Best Practices from GI-SDI Projects II

DATA WAREHOUSING. a solution for the Caribbean

Transcription:

in partnership with Overall handbook to set up a S-DWH CoE: Deliverable: 4.6 Version: 3.1 Date: 3 November 2017 CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING Handbook to set up a S-DWH 1 version 2.1 / 4 September 2017

Content 1. Introduction... 3 2. The Statistical Data Warehouse... 4 3. The main phases for setting up a S-DWH... 6 4. The 3 tracks within the S-DWH process:... 7 1.1 Metadata... 7 1.2 Methodological aspects... 8 1.3 Technological aspects... 9 5. The Road Map for setting up a S-DWH... 12 5.1 Roadmap S-DWH: General overview... 13 5.2 Roadmap S-DWH: Approved Business Case... 14 5.2 Roadmap S-DWH: Design phase... 15 5.3 Roadmap S-DWH: Build phase... 16 5.4 Roadmap S-DWH: Finalize phase... 17 Handbook to set up a S-DWH 2

1. Introduction In October 2010, the ESSnet on micro data linking and data warehousing in statistical production' was established to provide assistance in the development of more integrated databases and data production systems for (business) statistics. From October 2013 the ESSnet evolved in CoE. In order to improve and optimise statistical production, ESS Member States are searching for ways to make optimal use of all available data sources, existing and new. In daily statistical practice this means supporting and assisting statistical institutes to increase the efficiency of data processing in statistical production systems and to maximize the reuse of already collected data in the statistical system. Recently, the CoE Member States have started to evaluate the impact of Big Data infrastructures on statistical data warehouse (S-DWH) systems. The result will be included in the S- DWH Manual, available as CoE deliverable in the ESS Cross Portal in the S-DWH web page. This modernisation implicates an important organisational impact. First there is the need to develop and implement a complete new way of organising and operating the statistical production processes. Second, it also comes with higher and stricter demands for the data and metadata management. Both activities are often decentralised and implemented in various ways, depending on the needs of specific statistical systems, whereas realising maximum re-use of available statistical data just demands the opposite: a centralised and standardised set of (generic) systems with a flexible and transparent metadata catalogue that gives insight in and easy access to all available statistical data. To reach these goals, building a S-DWH is considered to be a crucial instrument. The S-DWH approach enables NSIs to identify the particular phases and data elements in the various statistical production processes that need to be common and reusable. Main focus of the ESSnet was on issues that are common for the majority of the NSIs within the ESS when applying a data warehousing approach for statistics. A thorough enquiry among the ESS Member States resulted in a set of deliverables, now reorganized and updated in the Manual, articulated over 3 main topics: 1. Metadata 2. Methodological aspects 3. Technological aspects In the various workshops, held to interactively exchange information and receive feedback, MS expressed great demand for a practical handbook that helps and guides in the process of developing and implementing a S-DWH. This handbook answers the following questions: What is a Statistical Data Warehouse (S-DWH)? How does a S-DWH differ from a traditional = 'commercial' DWH? Why should we build a S-DWH? Who are the envisaged users of a S-DWH? Give a road map for designing, building and finalizing the S-DWH: - What are the prerequisites for implementing a S-DWH? - What are the phases/steps to take? - How to prepare for an implementation? The handbook is set up as a lean quick reference guide around the S-DWH roadmap. Goal is to guide users through the process of setting up and implementing a S-DWH by indicating what deliverables of the ESSnet (recommendations, guidelines etc.) to use at which phase in the development process. Handbook to set up a S-DWH 3

2. The Statistical Data Warehouse This chapter gives a short explanation on most common terminology to explain the statistical data warehouse. The Manual chapter 5.1, on Fundamental principles, gives more detailed explanation and information on the terminology used in the project. Data Warehouse The generic definition of a Data Warehouse (DWH) says that it is a central repository of data which is created by integrating data from one or more disparate sources 1. In the DWH current and historical data are stored and organised in ways that facilitate combining data to, e.g., to perform analyses and to create reports. According to broader and perhaps more useful definitions the term DWH should not only be understood as a way of storing data, but it must also include all the functions and tools necessary to extract, transform and load data (ETL tools), to maintain the data structure, and to make data available to end users in ways that suit their tools. According to the role and function, a commercial (or traditional) DWH mostly is set up as a supportive system to the primary process of an organisation, with as main goal to produce and deliver management information that is used to manage and improve the primary process. Statistical Data Warehouse This project uses the term Statistical Data Warehouse (S-DWH) to refer to a DWH that is purpose-built specifically to support the production of national and international statistics. Thus the S-DWH is defined as a central store of statistical data, regardless of their sources, for managing all available data of interest, thereby improving the NSI s ability to: - use and reuse data in order to create new data or new outputs; - create reports; - execute analyses; - produce any required information. According to the role and function, a statistical data warehouse is developed as a crucial element in the primary process, which simply is: to produce statistics. Business Intelligence (BI) The general DWH is often considered part of a Business Intelligence (BI) system. BI technology can handle large amounts of historical and current data stored in a DWH. Specialised BI tools let the users analyse the information in the DWH and even make predictions in order to make better business decisions. Many BI tasks, such as decision support, include quick creation and immediate analysis of statistics based on data from the DWH. Supporting creation and analysis of statistics is the main purpose of the S-DWH, but the demands for quality are generally higher, while the analysis may follow immediately on creation or later. 1 Wikipedia: http://en.wikipedia.org/wiki/data_warehouse Handbook to set up a S-DWH 4

Metadata Standards The S-DWH contains only statistical data and is dedicated to supporting efficient production of statistics. Data in the S-DWH may be atomic, micro data, or aggregated, macro data. All data must always be defined and described in accompanying metadata. Since the data warehouse is not only one single data store, but consists of several parts, or layers, metadata must also describe the processes that move the data through the layers from source to presentation and dissemination (process metadata). There are several formal and industry standards that should be considered when building a DWH. The architecture should be supported by well-established data modelling standards. In addition to the standards and rules that support the design of any DWH, the S-DWH should also be designed and built in accordance with the standards that are used in the statistics society. The process model GSBPM, the information model GSIM, the metadata registry standard ISO/IEC 11179 and the classification model (Neuchâtel model) are examples of important and widely accepted standards that should be taken into account when designing a S-DWH. Why build and use a S-DWH? There are several alternative models that can be used to describe and build statistics production systems, e.g., the traditional stovepipe model and several versions of integrated models. The S-DWH model is generally considered as being the most advantageous one compared to the other models. Some arguments that speak in favour of using a S-DWH include: Easier to reuse data, collect once, use many times ; Facilitates cross-domain analysis; Well suited for process oriented production systems (even though its data model is not specifically designed for that purpose); Supports standardisation of tools and methods; Enables efficient governance and maintenance. Handbook to set up a S-DWH 5

3. The main phases for setting up a S-DWH From a project management view, the process of setting up and implementing a S-DWH does not essentially differ from other major projects that involve organisational changes in combination with new processes and (IT) systems. Basically 5 more or less generic phases can be distinguished: Business Case Design Build Finalize As for all projects it is an essential and required precondition to compose a solid business case that needs to be approved the responsible authority/management/sponsors. The business case must clearly state the aimed goals, describe and explain the expected benefits and of course give a sound cost benefit analysis. The Introduction and the first chapter of the Manual 2 can be used as a good fundament when writing the business case. The first phase in the actual development is the design of the S-DWH, with all elements and aspects. This should cover various aspects: What type of S-DWH, active or passive? The architectural framework for the S-DWH. A clear description of the functions of the S-DWH. The necessary metadata designs (metadata model, meta system etc.) Methodological concepts (role BR etc.). All designs must be approved by the responsible managerial body (steering group, program management e.g.) In the build phase the various elements of the S-DWH need to be realised. For the most part these are strongly IT related components: databases, repository, ETL processes etc. Main milestones in this phase are tool selection, translating design to business rules, testing and documentation. As the development of a S-DWH mostly consists of a complex set of systems, it is recommended to work in small incremental steps. The finalization phase means actually putting the S-DWH to work. After defining a sound implementation strategy, most important milestones in this phase are setting up the governance of the (meta)data management, ensuring confidentiality and training users. Use & Maintain After finalizing the S-DWH the phase of operational use starts. The feedback from daily statistical use requires also a steady process of maintaining 2 main aspects of the S-DWH: 1. The content of the S-DWH (metadata and statistical data) 2. The functional and technical systems The focus of ESSnet was on the elements of the phases design, build and finalization. Therefore the roadmap of this handbook concentrates on and describes these 3 phases in connection with the S- DWH Manual in wich mains information are described and explained. 2 https://ec.europa.eu/eurostat/cros/content/general-introduction_en Handbook to set up a S-DWH 6

4. The 3 tracks within the S-DWH process: The goal of the statistical data warehouse is to enable NSIs to produce flexible outputs, in an efficient way, with maximum re-use of data that is already available in the statistical system. Therefore the ESSnets needs to focus on issues that are common for the majority of the NSIs when applying a data warehousing approach for statistics, resulting in 3 main tracks: 1.1 Metadata One of the key factors and drivers in a S-DWH is the information about one or more aspects of the data itself, usually referred to as "metadata". Metadata is the DNA of the data warehouse, defining its elements and how they work together. [...] Metadata plays such a critical role in the architecture that it makes sense to describe the architecture as being metadata driven. The metadata provides the access to the data and must enable a clear and unambiguous description of the data and its elements. All data in the S-DWH must have corresponding metadata: no data without metadata. Users must be able to search the entire metadata layer and, if permitted, to access the physical statistical data via the metadata. Thus, metadata plays a vital role in the S-DWH, satisfying 2 essential needs: 1. to guide statisticians in processing and controlling the statistical production 2. to inform end users by giving them insight in the exact meaning of statistical data In order to meet these 2 essential functions, the statistical metadata must be: correct and reliable (the metadata must give a correct picture of the statistical data), consistent and coherent (the metadata driving the statistical processes and the reporting metadata presented to the end users must be compatible with each other), standardised and coordinated (the data of different statistics are described and documented in the same standardised way). Finally, since the different users of the (meta)data have diverse needs, it is essential to ensure an effective management of the statistical metadata in the S-DWH. In the metadata track, the first focus was on the identification of the various kinds of essential metadata and recommendations and guidelines on their use. Further focus was on the use of metadata models, the required functions of a metadata system and the governance of metadata in the S-DWH. In the context the manual answer to the follow items: Framework of metadata requirements and roles in the S-DWH give definitions and background information on the roles and purposes of metadata in the S-DWH in generic terms. It destined to provide a common language. Recommendations on the impact of (meta)data quality in the S-DWH. This item is about monitoring the quality of (meta)data in a S-DWH. For data exchange, it is more or less common to use indicators to measure data quality. The advice is to also define a set of indicators for metadata quality, following and using the data quality systems. Handbook to set up a S-DWH 7

Overview of and recommendations on the use of metadata models give an overview of metadata models and recommendations on their use. The use of a metadata model is a key element in structuring and standardising the statistical metadata within a NSI in a generic way. In the context of the S-DWH, a metadata model is a standardized representation used to define all necessary metadata elements of statistical information systems. Definition of the functionalities of a metadata system to facilitate and support the operation of the S-DWH. This item gives a detailed description of the functionalities that are necessary to facilitate and support the operation of the S DWH. In order to meet these diverse needs of different users of the (meta)data, the statistical metadata must be managed and maintained in a metadata system that covers these functionalities. Recommendations and guidelines on governance of metadata management in the S-DWH explain the importance of reliable governance of metadata management in a statistical organisation when operating a S-DWH. It focuses on the main issues to consider when establishing, running and maintaining metadata management in a S-DWH. Implementing good governance for metadata management is highly important for a S-DWH. The detailed metadata system functionalities are mapped on the layered S-DWH architecture and the GSBPM workflow. 1.2 Methodological aspects A key challenge in the process of designing and implementing a Statistical Data Warehouse is to match the various statistical requirements that are set by the statistical users of the S-DWH. The indicated methodological challenges that need to be covered and ensured are about: Impacts on statistical methods Which are the methodological advantages and drawbacks? Which considerations as to statistical methods are needed? How to handle confidentiality issues? How to deal with data linking? Also this work package provided input to actions/deliverables of the other 2 tracks, by reviewing deliverables and advising from the methodological perspective. Items to be faced in the context are: Guidelines (including options) on how the BR interacts with the S-DWH. This item is an essential part of the S-DWH: the role and position of the statistical business register. The Business Register holds a central role in the S-DWH in order to link different units from different data sources and to act as a population frame. Guidelines/recommendations for application within the S-DWH of the data linking aspects. This item is faced in the Manual and gives an overview on data linking aspects in a S- DWH. It provides information about data linking methods, about useful links, and it mentions possible problems that can occur when linking data from multiple sources. Finally it presents guidelines about the methodological challenges on data linking. Guidelines/recommendations for application in the S-DWH of the confidentiality aspects. Handbook to set up a S-DWH 8

This outlines the options for understanding and dealing with the confidentiality aspects of combining and re-using data from a Statistical Data Warehouse that comes with an increased risk for compromising the confidentiality of the data. Guidelines on editing for the S-DWH. This examines options for efficient editing in a Statistical Data Warehouse, specifically exploring how selective editing may be used in this context. Focus is on two widely available selective editing tools, to consider if they could be used for efficient editing in a S-DWH. Guidelines on detecting and treating outliers for the S-DWH. This explains the distinction between outliers and errors, the three possible types of outliers in a S-DWH and gives recommendation on how to deal with them. 1.3 Technological aspects This track covers all essential architectural and technical elements for designing and building the statistical data warehouse and provide a generic model of the statistical data warehouse: Management processes to govern S-DWH operations In the S-DWH are fourteen over-arching statistical processes needed to support the statistics production processes, nine of them are those found in the GSBPM, while the remaining five are a consequence of a fully active S-DWH approach; they are: 1. S-DWH Management 2. Data Capturing Management 3. Output Management 4. Web Communication Management This includes for example management of a thematic web portal. 5. (Business) Register Management (or for institutions or civil registers) Models & Tools There is a great variety of models and tools that can be used to support the creation of a S-DWH: Generic Statistical Business Process Model (GSBPM) In order to treat and manage all stages of a generic production process it is useful to identify and locate the different phases of a generic statistics production process by using the Generic Statistical Business Process Model (GSBPM). Generic Statistical Information Model (GSIM) Another model used for describing statistical processes is the Generic Statistical Information Model (GSIM), a reference framework providing a set of standardized, consistently described information objects, which are the inputs and outputs in the design and production of statistics. GSIM is intended to support a common representation of information concepts at a conceptual level. Handbook to set up a S-DWH 9

CORE There are many software models and approaches available to build modular flows between layers. One of the approaches is CORE (Common Reference Environment), which is an environment supporting the definition of statistical processes and their automated execution. CORE services can be used to move data between S-DWH layers and also inside the layers between different sub-tasks. The Integrated Warehouse model The Integrated Warehouse model combines technical and process integration with the warehouse approach into one model. To have an integrated warehouse centric statistical production system, different statistical domains should use a common methodology, share common tools and have a distributed architecture. Decisions in the design phase, like questionnaire design, sample selection, imputation method, etc., are made globally. This way, integration of processes provides reusable data in the warehouse. The warehouse contains each variable only once, making it easier to reuse and manage valuable data. There is also a big variety of software tools used for statistics production. Which tool to choose mainly depends on the NSI s possibilities to adopt a particular technology, what tools are already used, which skills and experiences are available, as well as other considerations and available resources. In the interpretation and source layers standard tools can be used out-of-the-box, even though they are not generally very customizable to adapt to statistical processes. In the Integration layer, where all operational activities needed for the statistical elaboration processes are carried out, mainly in-house developed software is used. This is because the needs are very specific and cannot be covered by standard applications. In these cases sharing of experience between NSIs is very desirable as it avoids unwanted duplication of work and allows using the experiences already acquired. The S-DWH business architecture. A corporate S-DWH specialized in supporting production must support multiple-purpose statistical information. Different statistical information on different topics should not be produced independently from each other but as integrated parts of a comprehensive information system where statistical concepts, micro data, macro data and metadata are shared. The S-DWH data model must sustain the ability of realizing data integration at micro and macro data granularity levels. The model, instead of focusing on a process-oriented design, should be on data inter-relationships that are fundamental for different processes of different statistical domains. We identify four functional layers defined as: IV - access layer, for the access to the data: selected operational views, final presentation, dissemination and delivery of the information sought; III - interpretation and data analysis layer, enables data analysis or data mining functional to support statistical design; II - integration layer, is where all operational activities needed for any statistical production process are carried out; in this layer data are transformed; Handbook to set up a S-DWH 10

STATISTICAL DATA WAREHOUSE OPERATIONAL DATA DATA WAREHOUSE I - source layer, is the level in which we locate all the activities related to storing and managing data sources and where is realized the reconciliation, the mapping, of statistical definitions from external to internal DWH dictionary. layered S-DWH architecture and operational GSBPM-phases interaction ACCESS LAYER DISSEMINATE INTERPRETATION AND ANALYSIS LAYER ANALAYZE BUILD INTEGRATION LAYER PROCESS SOURCES LAYER COLLECT The layers can be viewed as grouped in two sub-groups: the first two layers for statistical operational activities, i.e. where the data are acquired, stored, coded, checked, imputed, edited and validated; the last two layers are for the effective data warehouse, i.e. levels in which data are organized for analysis, evaluation, design and for data visualization. Easy and flexible access to the data is a basic requirement for any production based on a large, changeable, amount of data. The S-DWH architecture could support a conceptual organization in which we consider the first two levels as pure statistical operational infrastructures, while the core repository of the S-DWH system is the interpretation and analysis layer, which is the effective data warehouse, and the final access layer allows the use of specialized statistical tools. Layers II and III are reciprocally functional to each other. Layer II supports the uploading from raw data or from any base-phase elaboration output of a production process. Layer III is optimized for an integrated and effective activity on micro/macro data at any stage of the elaboration process. This is because, in layer III methodologists may organize and retrieve the data for analysis or for creating the input of each base-phase elaboration. This means that, layer II supplies elaborated data for analytical activities, while layer III supplies concepts usable for the engineering of ETL functions, or new production processes by a continuous cyclical interaction. Through the interpretation layer methodologists, or data experts, can easily access all data, before, during and after the elaboration of a production line to re-design or correct a process. Handbook to set up a S-DWH 11

5. The Road Map for setting up a S-DWH After illustrating and explaining the 5 phases and the 3 tracks for setting up a S-DWH, in this chapter a roadmap is given, explaining which general steps to take and what chapters of the Manual on S-DWH to use in which step(s). The (approved) business case is seen as a required precondition for even starting the actual process whereas the use and maintain phase is the actual operational phase. For this purpose we use a graphical representation based comparable to an underground map. The first map gives a general overview from start to end. The S-DWH development process is represented by 1 single line with the most essential stops : 1. The approved business case, the official GO to start the S-DWH project.; 2. The approved designs of the various components of the S-DWH (business architecture, meta model, etc.); 3. A set of tested and approved systems, representing the working S-DWH (but not yet implemented); 4. The operational S-DWH, in use to produce statistics. These phases are then worked out in detailed maps that show the essential milestones/steps, represented as a station or stop. The with stops are specific for the S-DWH development process. The grey stops are generic stops, like testing, training users etc. All the each specific S-DWH stops are linked to the Manual to be used in that stage of the S- DWH development process. In these detailed sub maps the 3 tracks are represented by collared lines: the green line represents WP1 Metadata the blue line represents WP2 Methodology the red line represents WP3 Technological aspects Handbook to set up a S-DWH 12

5.1 Roadmap S-DWH: General overview Start Project Business Requirements Establish Project Target Define Project Strategy Cost-benefit Analysis Approved Business Case Building Blocks Business Architecture Information Architecture Metadata Model Data Linking Data Cleaning Estimation Approved Design Technology Architecture Workflow System Metadata System Test Working S-DWH Metadata Governance Confidentiality Analysts Revisions Training Operational S-DWH Handbook to set up a S-DWH 13

5.2 Roadmap S-DWH: Approved Business Case Handbook to set up a S-DWH 14 version 2.1 / 4 September 2017

5.2 Roadmap S-DWH: Design phase Handbook to set up a S-DWH 15

5.3 Roadmap S-DWH: Build phase Handbook to set up a S-DWH 16

5.4 Roadmap S-DWH: Finalize phase Handbook to set up a S-DWH 17