Avancier Methods (AM) Data Architecture

Similar documents
Avancier Methods (AM)

Basics in good research data management (RDM) for reviewing DMPs

RECOMMENDED FILE FORMATS

Where to store research data during and after a project. Dr. Chris Emmerson Research Data Manager

GUIDELINES FOR CREATION AND PRESERVATION OF DIGITAL FILES

PROCESSING AND CATALOGUING DATA AND DOCUMENTATION - QUALITATIVE

Why do architects need more than TOGAF?

JSON as an XML Alternative. JSON is a light-weight alternative to XML for datainterchange

JSON is a light-weight alternative to XML for data-interchange JSON = JavaScript Object Notation

PROCESSING AND CATALOGUING DATA AND DOCUMENTATION: QUALITATIVE

CSC Web Technologies, Spring Web Data Exchange Formats

How to Build a Digital Library

Introduction to JSON. Roger Lacroix MQ Technical Conference v

Sustainable File Formats for Electronic Records A Guide for Government Agencies

Characterisation. Digital Preservation Planning: Principles, Examples and the Future with Planets. July 29 th, 2008

This tutorial will help you understand JSON and its use within various programming languages such as PHP, PERL, Python, Ruby, Java, etc.

Assigns a persistent identifier that will always point to the object and/or its metadata.

Different File Types and their Use

HR-XML Schema Extension Recommendation, 2003 February 26

Geodatabases. Dr. Zhang SPRING 2016 GISC /03/2016

MEDIA RELATED FILE TYPES

DTD MIGRATION TO W3C SCHEMA

Dexterity: Data Exchange Tools and Standards for Social Sciences

JME Language Reference Manual

ENTSO-E ACKNOWLEDGEMENT DOCUMENT (EAD) IMPLEMENTATION GUIDE

(Frequently Asked Questions)

CountryData Technologies for Data Exchange. Introduction to XML

Section 9 Linking & Importing

Sticky and Proximity XML Schema Files

... Open Data Policy ... Agreed on 28 September 2015

Transform Presentation

1 Delivery of data to DNB using Logius Digipoort Introduction Logius documentation Logius requirements Logius validations 3

Flat triples approach to RDF graphs in JSON

REST. Web-based APIs

PRACTICE DIRECTION GUIDELINES FOR THE USE OF TECHNOLOGY IN CIVIL LITIGATION

Module 12 Tree-Structured data

Google Docs Tipsheet. ABEL Summer Institute 2009

Assigns a number to 110,000 letters/glyphs U+0041 is an A U+0062 is an a. U+00A9 is a copyright symbol U+0F03 is an

CRD - Crystal Reports Scheduler. Software Features. This document only outlines the main features of CRD

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

EMERGING TECHNOLOGIES. XML Documents and Schemas for XML documents

Structure and document your research data

- XML. - DTDs - XML Schema - XSLT. Web Services. - Well-formedness is a REQUIRED check on XML documents

QDA Miner. Addendum v2.0

Chapter 3 Brief Overview of XML

REPORTING Copyright Framework Private Equity Investment Data Management Ltd

REDCap Overview. REDCap questions to Last updated 12/12/2018 for REDCap v6.13.1

2 Spreadsheet Considerations 3 Zip Code and... Tax ID Issues 4 Using The Format... Cells Dialog 5 Creating The Source... File

Data Formats. Course NDBI040: Big Data Management and NoSQL Databases. Lecture 06: Martin Svoboda

JSON Support for Junos OS

[Rosa Say on flickr] Module 12 Tree-Structured data CS 106 Winter 2018

A Case Study: A Distributed Web Application

2/24/2015. What are File Formats? Types of File Formats. Sustainable Formats. File formats can be grouped into three categories:

DATABASE SCHEMA DESIGN ENTITY-RELATIONSHIP MODEL. CS121: Relational Databases Fall 2017 Lecture 14

Supported File Types

Ascent 6.1 Release Script for FileNet Content Manager 3.0. Release Notes

Data Services API Guide SuccessMaker 9

Part III: Survey of Internet technologies

Draft Digital Preservation Policy for IGNCA. Dr. Aditya Tripathi Banaras Hindu University Varanasi

SuccessMaker Data Services API Guide

Teiid Designer User Guide 7.5.0

Grading for Assignment #1

access to reformatted and born digital content regardless of the challenges of media failure and technological

Electronic Mail (SMTP)

Field Types and Import/Export Formats

ATLAS.ti: The Qualitative Data Analysis Workbench

OnDemand Discovery Quickstart Guide

Data Services API Guide SuccessMaker 10

HTM, HTML, MHT, MHTML Web document Brightspace Learning Environment strips the <title> tag and text within the tag from user created web documents

The Swedish National Archives digital preservation. Mats Berggren, IT-department,

Guide to Supplemental Materials

PVB CONTACT FORM 7 CALCULATOR PRO DOCUMENTATION

Automating the Capture of Data Transformation Metadata

Queens Library API Requirements Document For e-content Partners

Summary of Bird and Simons Best Practices

Creating Coverage Zone Files

Week 2: Lecture Notes. DTDs and XML Schemas

Assignment Submission - Student Work Product

HTML vs. XML In the case of HTML, browsers have been taught how to ignore invalid HTML such as the <mymadeuptag> element and generally do their best

WORKING GROUP ON PASSENGER MOBILITY STATISTICS

GiftWorks Import Guide Page 2

Notify Metering Point Characteristics

Delivery Context in MPEG-21

Search Page Basic Search Advanced Search Exploring search results Showing Properties Showing Details...

Chapter 2. Information System Building Blocks. McGraw-Hill/Irwin. Copyright 2007 by The McGraw-Hill Companies, Inc. All rights reserved.

Data Collection and Online Access to Map Collections

JSON - Overview JSon Terminology

XML. Part II DTD (cont.) and XML Schema

Spreadsheet Procedures

Need help? Call: / DOCMAIL: ADVANCED USER GUIDE

Your leads. Your way. Lead delivery options for BuyerZone clients and partners.

See Types of Data Supported for information about the types of files that you can import into Datameer.

Example 1: Denary = 1. Answer: Binary = (1 * 1) = 1. Example 2: Denary = 3. Answer: Binary = (1 * 1) + (2 * 1) = 3

CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA

1. Information Systems for Design Support

Media Types. Web Architecture and Information Management [./] Spring 2009 INFO (CCN 42509) Contents. Erik Wilde, UC Berkeley School of

Enterprise Knowledge Platform

Creation of the adaptive graphic Web interfaces for input and editing data for the heterogeneous information systems on the bases of XML technology

Session [2] Information Modeling with XSD and DTD

Mastering phpmyadmiri 3.4 for

Transcription:

Methods (AM) Data Architecture Design data stores: document stores It is illegal to copy, share or show this document (or other document published at http://avancier.co.uk) without the written permission of the copyright holder Copyright Limited 2008-2106

The data/information architecture domain/layer Passive Structure Required Behaviour Logical Structure Physical Structure Business Business Service Business Process Function Role Org Unit Actor Data / Information Data Entity Data Flow Log Data Model Data Store Applications IS Service Application Interface Application Infrastructure Technology Platform Service Platform Interface Platform Applicat n Copyright Limited 2007-2016

Digitisation of document creation and use In theory, data architects can design non-digital data flows. In practice, data architects mostly focus on business systems in which business information is to be digitised. Business facts are created and used during business operations e.g. customer address, order id, product id, product amount, order receipt, order rejection These logical facts must be moved in flows from senders to receivers Transport mechanisms include Keyboard data entry User interface display Transaction message Message in queue Serial file Copyright Limited 2008-2106

Design the target (AM level 2) Initiate Establish capability Establish the context Scope the endeavour Get vision approved Govern Respond to oper'l change Monitor the portfolio(s) Govern delivery Hand over to delivery Manage Manage stakeholders Manage requirements Manage business case Manage readiness & risks Architect Understand the baseline Review initiation products Clarify NFRs Design the target Plan Select & manage suppliers Plot migration path Review business case Plan delivery Copyright Limited 2017

Canonical data model Conceptual Defines the ideal or common types for data items in data flows Canonical Data Model Logical XML XML Schema Schema Data Flow Structure Physical XML XML Schema Schema CSV, JSON, XML Real Data Flow Copyright Limited 2008-2106

Design target data architecture (AM level 3 and 4) 1. Define the business context for data creation and use 2. Define data flows (I/O messages, displays, forms and reports) 3. Define data dictionary or canonical data model 4. Define data store(s): document stores 1. Define each input document as a logical structure (a regular expression) 2. Define additional context data to be stored (provenance information etc.) 3. Code input data structures using chosen data format standard (XML, JSON ) 4. Refine design to meet output requirements, keeping input documents intact. 5. Design to ensure the database satisfies CIA and scalability requirements. 5. Address data quality issues Copyright Limited 2017

Canonical data model Conceptual Defines the ideal or common types for data items in data flows Canonical Data Model Logical The proper form to define the logical structure of a data flow is a regular expression XML XML Schema Schema Data Flow Structure Physical XML XML Schema Schema CSV, JSON, XML Real Data Flow Copyright Limited 2008-2106

Document as a simple sequence of fields First Full Middle Last Full name SEQUENCE First Middle Last Full name END Copyright Limited 2008-2106

Document with optional elements Single First o Full Middle o Last SELECT Single OR Full name SEQUENCE First Middle Last Full name END END Copyright Limited 2008-2106

Document with iterated element Single First o Full Middle s Middle o * Last SELECT Single OR Full name SEQUENCE First Middle s ITERATION Middle Middle s END Last Full name END END Copyright Limited 2008-2106

Data flow structure diagram (regular expression) A document in a data flow structure can be defined as A hierarchical structure a regular expression A Jackson structure like this, or in the form of an XML schema Person first last age address phone Numbers streetaddress city state postalcode phone Number * type number Copyright Limited 2008-2106

Code the input data structures using the data format standard Conceptual Defines the ideal or common types for data items in data flows Canonical Data Model Logical The proper form to define the logical structure of a data flow is a regular expression XML XML Schema Schema Data Flow Structure Physical A data flow s content can be defined in various forms XML XML Schema Schema CSV, JSON, XML Real Data Flow Copyright Limited 2008-2106

There are many standard data flow formats, Digital image data TIFF version 6 uncompressed (.tif) JPEG (.jpeg,.jpg) PDF (.pdf) Digital video data: MPEG-4 High Profile (.mp4) JPEG 2000 (.mj2) Digital audio data Free Lossless Audio Codec (FLAC) (.flac) Waveform Audio Format (WAV) (.wav) MPEG-1 Audio Layer 3 (.mp3) Documentation and scripts Open Document Text (.odt) Rich Text Format (.rtf) HTML (.htm,.html) plain text (.txt) widely-used proprietary formats e.g. MS Word (.doc/.docx) or MS Excel (.xls/.xlsx) XML marked-up text (.xml) to a DTD or schema, e.g. XHMTL 1.0 PDF (.pdf) Geospatial data; vector and raster data ESRI Shapefile (essential --.shp,.shx,.dbf; optional --.prj,.sbx,.sbn) geo-referenced TIFF (.tif,.tfw) CAD data (.dwg) tabular GIS attribute data Qualitative data, textual extensible Mark-up Language (XML) text according to a Document Type Definition (DTD) or schema (.xml) Rich Text Format (.rtf) plain text data, ASCII (.txt) Hypertext Mark-up Language (HTML) (.html) widely-used proprietary formats, e.g. MS Word (.doc/.docx) Quantitative tabular data with extensive metadata SPSS portable format (.por) delimited text and command ('setup') file (SPSS, Stata, SAS, etc.) containing metadata information structured text or mark-up file containing metadata information, e.g. DDI XML file MS Access (.mdb/.accdb) Quantitative tabular data with minimal metadata: comma-separated values (CSV) file (.csv) tab-delimited file (.tab) including delimited text of given character set with SQL data definition statements where appropriate delimited text of given character set -- only characters not present in the data should be used as delimiters (.txt) widely-used formats, e.g. MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), dbase (.dbf) and OpenDocument Spreadsheet (.ods) Copyright Limited 2008-2106

Simple data flow format CSV Comma-separated values For simple data flow structures John, 3 South Street, Big Town Mary, 44 North Street, Small Town Spreadsheet import/export Meaning is not usually contained in the data flow People Person * House Town The raw data can appear meaningless A, 0001, 23 B, 9888, 10 So, sender and receiver must know the meaning of each data item Copyright Limited 2008-2106

Self-describing data flow formats JSON A JSON data flow contains not only Data (data values) but also Data descriptors (data types) Number String Boolean Array So the data flow can be sent to recipients who do not already know the data types, but can find them in the flow itself. Copyright Limited 2008-2106

http://en.wikipedia.org/wiki/json Self-describing data flow formats JSON Number (double precision floating-point format in JavaScript, generally depends on implementation) String (double-quoted Unicode, with backslash escaping) Boolean (true or false) Array (an ordered sequence of values, comma-separated and enclosed in square brackets; the values do not need to be of the same type) Object (an unordered collection of key:value pairs with the ':' character separating the key and the value, comma-separated and enclosed in curly braces; the keys must be strings and should be distinct from each other) null (empty) Non-significant white space may be added freely around the "structural characters" (i.e. brackets "{ } [ ]", colons ":" and commas ","). Copyright Limited 2008-2106

JSON data structure { } "first": "last": "age": "address": "streetaddress": "city": "state": "postalcode": "phonenumbers": [ { "type": "number": }, { "type": "number": } ] Person first last age address streetaddress city state postalcode type Copyright Limited 2008-2106 phone Numbers phone Number * number

http://en.wikipedia.org/wiki/json JSON representation of an object that describes a person. { } "first": "John", "last": "Smith", "age": 25, "address": { "streetaddress": "21 2nd Street", "city": "New York", "state": "NY", "postalcode": 10021 }, "phonenumbers": [ { "type": "home", "number": "212 555-1234" }, { "type": "fax", "number": "646 555-4567" } ] Copyright Limited 2008-2106 The object has string fields for first and last name, a number field for age, an object representing the person's address an array of phone number objects.

Self-describing data flow format XML XML (extensible Mark up Language) Like JSON, an XML data flow contains not only the data (values) but also descriptors of the data (types) How does XML differ from JSON? Con: Clunkier Pro: Can be supported by a separate XML Schema Definition (XSD) Copyright Limited 2008-2106

XSD: XML Schema Definition language used to define the data structure of an XML document enables verification of a document s data integrity loosens coupling between sender and receiver E.g. a simple type - a subtype of string with a range of values?xml version="1.0"? xsd:schema xmlns:xsd=http://www.w3.org/1999/xmlschema xsd:simpletype name="persontitle" base="xsd:string" xsd:enumeration value="mr." / xsd:enumeration value="ms." / xsd:enumeration value="dr." / xsd:enumeration value="rev." / /xsd:simpletype xsd:complextype name="text" content="textonly" base="xsd:string" derivedby="restriction" / Copyright Limited 2008-2106

Document as XML schema Single First o Full Middle s Middle o * Last SELECT Single OR Full name SEQUENCE First Middle s ITERATION Middle Middle s END Last Full name END END xsd:choice /xsd:choice xsd:element name="single" type="text" minoccurs="1" maxoccurs="1" / xsd:sequence xsd:element name="first" type="text" minoccurs="1" maxoccurs="1" / xsd:element name="middle" type="text" minoccurs="0" maxoccurs="unbounded"/ xsd:element name="last" type="text" minoccurs="1" maxoccurs="1" / /xsd:sequence Copyright Limited 2008-2106

Refine the database design to fulfil output requirements Document-oriented database An output requirement Order History SEQUENCE Customer number Data analysis Customer name Customer address Set of Orders ITERATION Order SEQUENCE Order number Order amount Product type Order END Set of Orders END Order History END Serialisation Copyright Limited 2008-2106

Partitioning a network perhaps for scalability How would you partition this data model functionally? Sales Partner Hotel Chain Town Customer Sales Agent Hotel Facility Type Booking Room Facility Email Address Room Reservation Copyright Limited 2007-2016

Partitioning a network for scalability A natural division Sales Partner Hotel Chain Town Customer Sales Agent Hotel Facility Type Booking Room Facility Email Address Room Reservation Room Reservation Copyright Limited 2007-2016

Horizontal partitioning of network into hierarchical structures Each hierarchical data structure is describable in an XML schema Sales Partner Hotel Chain Town Customer Sales Agent Hotel Hotel Facility Type Email Address Booking Booking Room Facility Facility Facility Room Reservation Room Reservation Room Reservation Room Reservation Copyright Limited 2007-2016

Design target data architecture (AM level 3 and 4) 1. Define the business context for data creation and use 2. Define data flows (I/O messages, displays, forms and reports) 3. Define data dictionary or canonical data model 4. Define data store(s): document stores 1. Define each input document as a logical structure (a regular expression) 2. Define additional context data to be stored (provenance information etc.) 3. Code input data structures using chosen data format standard (XML, JSON ) 4. Refine design to meet output requirements, keeping input documents intact. 5. Design to ensure the database satisfies CIA and scalability requirements. 5. Address data quality issues Copyright Limited 2017

The physical reality Conceptual Defines the ideal or common types for data items in data flows Canonical Data Model Logical The proper form to define the logical structure of a data flow is a regular expression XML XML Schema Schema Data Flow Structure Physical A data flow s content can be defined in various forms XML XML Schema Schema CSV, JSON, XML Real At the bottom-level of a communication stack is the physical medium wires, Microwaves, Sound waves Data Flow Copyright Limited 2008-2106