Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Similar documents
Introduction to DWML. Christian Thomsen, Aalborg University. Slides adapted from Torben Bach Pedersen and Man Lung Yiu

Overview. DW Performance Optimization. Aggregates. Aggregate Use Example

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Advanced Data Management Technologies

Question Bank. 4) It is the source of information later delivered to data marts.

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Advanced Modeling and Design

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Data warehouse architecture consists of the following interconnected layers:

DATA MINING AND WAREHOUSING

Handout 12 Data Warehousing and Analytics.

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

Data Mining & Data Warehouse

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

Table of Contents. Knowledge Management Data Warehouses and Data Mining. Introduction and Motivation

Knowledge Management Data Warehouses and Data Mining

Data warehouses Decision support The multidimensional model OLAP queries

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Oracle Database 11g: Data Warehousing Fundamentals

Data Warehousing Introduction. Toon Calders

TIM 50 - Business Information Systems

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Data Warehouse and Data Mining

DATA MINING TRANSACTION

An Overview of Data Warehousing and OLAP Technology

After completing this course, participants will be able to:

Cognos also provides you an option to export the report in XML or PDF format or you can view the reports in XML format.

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Information Management course

Evolution of Database Systems

On-Line Application Processing

Decision Support Systems aka Analytical Systems

ETL and OLAP Systems

collection of data that is used primarily in organizational decision making.

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

CHAPTER 3 Implementation of Data warehouse in Data Mining

TIM 50 - Business Information Systems

Data Mining Concepts & Techniques

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Data Mining. Associate Professor Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology

Data Warehouse and Data Mining

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Data Warehouse and Data Mining

Data Warehousing & OLAP

Evolving To The Big Data Warehouse

Q1) Describe business intelligence system development phases? (6 marks)

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Knowledge Discovery & Data Mining

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives

ISM 50 - Business Information Systems

DATA WAREHOUSE- MODEL QUESTIONS

Data Warehouses and OLAP. Database and Information Systems. Data Warehouses and OLAP. Data Warehouses and OLAP

Full file at

Managing Data Resources

Chapter 6 VIDEO CASES

Data Warehouse and Data Mining

Knowledge Modelling and Management. Part B (9)

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

Advanced Data Management Technologies

Xcelerated Business Insights (xbi): Going beyond business intelligence to drive information value

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders

OLAP Introduction and Overview

Knowledge Discovery in Data Bases

Data Warehousing. Overview

Warehousing. Data Mining

Data Mining & Data Warehouse

by Prentice Hall

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory

Introduction to Data Mining and Data Analytics

DATAWAREHOUSING AND ETL PROCESSES: An Explanatory Research

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

Testing Masters Technologies

Application software office packets, databases and data warehouses.

Data Warehousing and OLAP

Basics of Dimensional Modeling

Data Warehousing & Mining Techniques

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Oracle 1Z0-515 Exam Questions & Answers

2. Summary. 2.1 Basic Architecture. 2. Architecture. 2.1 Staging Area. 2.1 Operational Data Store. Last week: Architecture and Data model

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22

Data Mining & Analytics Data Mining Reference Model Data Warehouse Legal and Ethical Issues. Slides by Michael Hahsler

Data Warehousing & Mining Techniques

Guide Users along Information Pathways and Surf through the Data

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

Advanced Data Management Technologies

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Training 24x7 DBA Support Staffing. MCSA:SQL 2016 Business Intelligence Development. Implementing an SQL Data Warehouse. (40 Hours) Exam

Transcription:

Introduction to Data Warehousing and Business Intelligence Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction A tour of the coming DW lectures DW Applications Loosely covers [Jarke et al.] chapter 1 Original slides were written by Torben Bach Pedersen Aalborg University 2007 - DWML course 2 What is Business Intelligence (BI)? BI Is Important BI is different from Artificial Intelligence (AI) AI systems make decisions for the users BI systems help the users make the right decisions, based on available data Combination of technologies Data Warehousing (DW) On-Line Analytical Processing (OLAP) Data Mining () Data Visualization (VIS) Decision Analysis (what-if) Customer Relationship Management (CRM) Worldwide BI revenue in 2005 = US$ 5.7 billion 10% growth each year The Web makes BI more necessary Customers do not appear physically in the store Customers can change to other stores more easily Thus: Know your customers using data and BI! Utilize Web logs, analyze customer behavior in a more detail than before (e.g., what was not bought?) Combine web data with traditional customer data Aalborg University 2007 - DWML course 3 Aalborg University 2007 - DWML course 4

Data Analysis Problems The same data found in many different systems Example: customer data across different departments The same concept is defined differently Heterogeneous sources Relational S, On-Line Transaction Processing (OLTP) Unstructured data in files (e.g., MS Excel) and documents (e.g., MS Word) Data is suited for operational systems Accounting, billing, etc. Do not support analysis across business functions Data quality is bad Missing data, imprecise data, different use of systems Data are volatile Data deleted in operational systems (6 months) Data change over time no historical information Data Warehousing Solution: new analysis environment (DW) where data are Subject oriented (versus function oriented) Integrated (logically and physically) Time variant (data can always be related to time) Stable (data not deleted, several versions) Supporting management decisions (different organization) A good DW is a prerequisite for successful BI Getting multidimensional data into the DW Data from the operational systems are Extracted Cleansed Transformed Aggregated? Loaded into DW Aalborg University 2007 - DWML course 5 Aalborg University 2007 - DWML course 6 DW: Purpose and Definition DW Architecture Data as Materialized Views DW is a store of information organized in a unified data model Data collected from a number of different sources Finance, billing, web logs, personnel, The purpose of a data warehouse (DW) is to support decision making Easy to perform advanced analyses Ad-hoc analyses and reports Data mining: discovery of hidden patterns and trends Existing databases and systems (OLTP) (Global) Data Warehouse New databases and systems (OLAP) DW (Local) Data Marts OLAP Data mining Visualization Aalborg University 2007 - DWML course 7 Analogy: suppliers supermarket customers Aalborg University 2007 - DWML course 8

Quick review of normalized database OLTP vs. OLAP Customer ID 3302 3303 Product Beer Rice Beer Wheat Category Beverage Beverage Price 6.00 4.00 6.00 5.00 Date 05-02-2007 07-02-2007 Target Data OLTP operational needs small, operational data OLAP business analysis large, historical data Customer ID 3302 3303 ProductID 013 052 013 067 Date 05-02-2007 07-02-2007 ProductID Normalized database avoids Redundant data Modification anomalies How to get the original table? (join them) No redundancy in OLTP, controlled redundancy in OLAP 013 052 067 Product Beer Rice Wheat Category Beverage Price 6.00 4.00 5.00 Model Query language Queries Updates Transactional recovery Optimized for normalized SQL small frequent and small necessary update operations denormalized/ multidimensional not unified large infrequent and batch not necessary query operations Aalborg University 2007 - DWML course 9 Aalborg University 2007 - DWML course 10 Queries hard or infeasible for OLTP Business analysis In the past five years, which product is the most profitable? Which public holiday we have the largest sales? Does the sales of dairy products increase over time? Is there any pattern (correlation) between the sales of beers and the sales of diapers? Function- vs. Subject Orientation Function-oriented systems All subjects, integrated Subject-oriented systems DW Selected subjects D- D- D- Aalborg University 2007 - DWML course 11 Bus architecture Aalborg University 2007 - DWML course 12

n x m versus n + m Top-down vs. Bottom-up D-App D-App D-App inflexible, expensive Aalborg University 2007 - DWML course 13 Top-down: 1. Design of DW 2. Design of s DW In-between: 1. Design of DW for 1 2. Design of 2 and integration with DW 3. Design of 3 and integration with DW 4.... Aalborg University 2007 - DWML course 14 D- D- D- Bottom-up: 1. Design of s 2. Maybe integration of s in DW 3. Maybe no DW Multidimensional database design Cube Example Motivation: Why not use ER model? Cubes: Dimensions, Facts, Measures OLAP queries Advanced multidimensional modeling Mainly handling changes in dimensions MS SQL server and Analysis Services 350 300 250 200 Total 150 100 50 0 2000 Year 2001 Sales Copenhagen Aalborg City Aalborg Copenhagen Text-based results difficult for managers to understand Why Cube? Good for visualization Multidimensional, intuitive Support OLAP operations Aalborg University 2007 - DWML course 15 Aalborg University 2007 - DWML course 16

On-Line Analytical Processing (OLAP) Performance Optimization On-Line Analytical Processing Interactive analysis Explorative discovery Fast response times required OLAP operations Aggregation, e.g., SUM Starting level, (Year, City) Roll Up: Less detail Drill Down: More detail Slice/Dice: Selection, Year=2000 102 250 All Time 20 25 70 57 Performance optimization Fine tune performance for important queries Aggregates, indexing, other optimizations (environment, partitioning) Using aggregates How can aggregates improve performance? Choosing aggregates Which aggregates should we materialize? Maintaining views How do we keep the (aggregate) views up to date? Bitmapped indices Aalborg University 2007 - DWML course 17 Aalborg University 2007 - DWML course 18 Materialization Example Imagine 1 billion sales rows, 1000 products, 100 locations CREATE VIEW TotalSales (pid,locid,total) AS SELECT s.pid,s.locid,sum(s.sales) FROM Sales s GROUP BY s.pid,s.locid The materialized view has 100,000 rows Rewrite the query to use the view SELECT p.category,sum(s.sales) FROM Products p, Sales s WHERE p.pid=s.pid GROUP BY p.category Rewritten to SELECT p.category,sum(t.total) FROM Products p, TotalSales t WHERE p.pid=t.pid GROUP BY p.category Query becomes 10,000 times faster! Extract, Transform, Load (ETL) Getting multidimensional data into the DW Extract Transformations / cleansing Load Aalborg University 2007 - DWML course 19 Aalborg University 2007 - DWML course 20

Data s Way To The DW Extraction Extract from many heterogeneous systems Staging area Large, sequential bulk operations => flat files best? Cleansing Data checked for missing parts and erroneous values Default values provided and out-of-range values marked Transformation Data transformed to decision-oriented format Data from several sources merged, optimize for querying Aggregation? Are individual business transactions needed in the DW? Loading into DW Large bulk loads rather than SQL INSERTs Fast indexing (and pre-aggregation) required Aalborg University 2007 - DWML course 21 DW Applications: Visualization Graphical presentation of complex result Color, size, and form help to give a better overview Aalborg University 2007 - DWML course 22 DW Applications: Data Mining Data mining is automatic knowledge discovery Roots in AI and statistics Classification Partition data into pre-defined classes Prediction Predict/estimate unknown value based on similar cases Clustering Partition data into groups so the similarity within individual groups are greatest and the similarity between groups are smallest Association rule Find associations/dependencies between data that occur together Rules: A -> B (c%,s%): if A occurs, B occurs with confidence c and support s Important to choose the granularity for mining No useful results at too small granularity (shirt brand,..) Data Mining Examples Wal-Mart: USA s largest supermarket chain Has DW with all ticket item sales for the last 5 years (huge!) Use DW and mining intensively to gain business advantages Analysis of association within sales tickets Discovery: Beer and diapers on the same ticket Men buy diapers, and must just have a beer Put the expensive beers next to the diapers Wal-Mart's suppliers use the DW to optimize delivery The supplier puts the product on the shelf The supplier only get paid when the product is sold Web log mining What is the association between time of day and requests? What user groups use my site? How many requests does my site get in a month? (Yahoo) Aalborg University 2007 - DWML course 23 Aalborg University 2007 - DWML course 24

Common DW Issues Metadata management Need to understand data = metadata needed Greater need that in OLTP applications as raw data is used Need to know about: Data definitions, dataflow, transformations, versions, usage, security DW project management DW projects are large and different from ordinary SW projects 12-36 months and US$ 1+ million per project Data marts are smaller and safer (bottom up approach) Reasons for failure Lack of proper design methodologies High HW+SW cost (not so much anymore) Deployment problems (lack of training) Organizational change is hard (new processes, data ownership,..) Ethical issues (security, privacy, ) Summary Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction Analysis technologies that use the DW OLAP Data mining Visualization BI can provide many advantages to your organization A good DW is a prerequisite for BI But, a DW is a means rather than a goal it is only when it is heavily used that success is achieved Aalborg University 2007 - DWML course 25 Aalborg University 2007 - DWML course 26 DWML Mini Project and Exam Performed in groups of ~4 persons Documented in report of 20 pages Deadline: April 20 But every part should be done when indicated on home page Basis for discussion at the oral exam (20 mins per person) Maximum 4 persons at a time in exam Exam also covers literature Not just mini project Questions in theoretical background, too DWML Software Groups to be formed today! Inform MLY about the groups at 16.00 MS software via MSDNAA Talk to msdnaa@cs.aau.dk about accounts DW software MS SQL Server 2005 RMS MS Analysis Services, Integration Services, Reporting Services Read the mini-project webpage (part 1c) for installation details Data mining software Presented by Thomas D. Nielsen Aalborg University 2007 - DWML course 27 Aalborg University 2007 - DWML course 28