DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Master of Science in Engineering in Computer Science (MSE-CS) Seminars in Software and Services for the Information Society Data Warehousing Design Issues 1
Data Warehouse Design Setting targets and planning feasibility (border, size, sources,...) team operating plan Design of the infrastructure choice of architecture choice of technologies Design of Data Marts analysis with domain experts 2
Lifecycle (Kimball, 1998) planning definition of requirements project management technology design of architecture selection and installation of products data dimensional modelling physical design feeding (ETL) design and implementation application specification of applications applications development release maintenance 3
Data Flow & Project evolution DW Data flow Design 4
Design of a Data Mart: phases 1. Analysis and reconciliation of data sources schema of sources reconciled schema 2. Requirements analysis reconciled schema facts, work load 3. Conceptual Design reconciled schema, facts, work load fact schema 4. Logical design fact schema, work load logical schema of Data Marts 5. Feeding (ETL) Design Fact schema Star-schema, Snowflakes entity-relationship schema of sources, reconciled schema, logical schema of DM ETL procedures 6. Physical design logical schema of DM, work load, DBMS physical schema of DM 5
Reconciling data sources schema integration: one step steps balanced iterative 6
Operational data source: ER-Schema date DATE number amount issued-on INVOICE position contains units amount INV-ROW ay refers-to p_iva shop SHOP in ARTICLE article code CITY 7
Operational data source: logical schema date DATE number amount issued-on INVOICE position contains units amount INV-ROW ay refers-to p_iva shop SHOP in ARTICLE article code CITY 8
Operational data source: logical schema (rev.) simplification DROP ATTRIBUTES: delete uninteresting attributes date shop DATE issued at SHOP in CITY denormalization JOIN TABLES: e.g.: nobody is interested to market basket analysis, i.e., only product sales are relevant id SALE quantity sales refers-to ARTICLE article 9
Fact Schema (preliminary) DIMENSIONS FACT date (TIME) shop (SPACE) MEASURES SALES -units -amount article (PRODUCT) 10
Dimensional Hierarchies year TIME dimension zone region SPACE dimension quarter manager month week district shop date details based on user requirements article subtype brand PRODUCT dimension type brand- 11
Fact Schema: DFM (Dimensional Fact Model) quarter year month week date zone region manager district shop SALES units amount article subtype brand type brand- 12
Fact Schema (an interpretation for OLAP) ALL quarter year Es: montly sales by and brand month date week ALL zone region manager district shop SALES units amount article subtype brand type brand- ALL 13
ER schema week WEEK year quarter month date YEAR QUARTER MONTH DATE manager ZONE zone SALES-MANAGER quantity on amount negozio REGION CITY in SHOP at INVOICE-ROW region district DISTRICT refers-to CITTA_MARCA BRAND ARTICLE citta_marca TYPE brand SUBTYPE articolo belongs-to type subtype 14
Classify the information week WEEK year quarter month date YEAR QUARTER MONTH DATE manager ZONE zone SALES-MANAGER quantity on negozio amount REGION CITY in SHOP at INVOICE-ROW region district Legenda DISTRICT refers-to easy to build operational data somewhere in our organization hard tofindpossibly to buy BRAND_CITY citta_marca TYPE type BRAND brand SUBTYPE subtype ARTICLE articolo belongs-to 15
ER schema (rev.) week WEEK year quarter month data YEAR QUARTER MONTH DATE manager ZONE zone SALES-MANAGER quantity on negozio amount REGION CITY in SHOP at INVOICE-ROW region percent district DISTRICT refers-to COMMISSION home BRAND ARTICLE TYPE brand SUBTYPE articolo belongs-to type subtype 16
Multiple arcs (n:n relation) year quarter month week a single shop may have (had) more than one sales manager sales-manager district shop date sales units amount article subtype type shop_ brand brand_ area region 17
Cross-dimensional Attributes year quarter month week date sales-manager shop sales units amount article subtype type district shop_ brand area region brand_ commission_perc a commission percentage may depend on both the brand and the shop 18