Data Warehouse and Data Mining

Similar documents
Data Warehouse and Data Mining

Data Warehouse and Data Mining

Data Warehouse and Data Mining

Data Mining Concepts & Techniques

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong

An Overview of Data Warehousing and OLAP Technology

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

Data Mining & Data Warehouse

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Rocky Mountain Technology Ventures

CHAPTER 3 Implementation of Data warehouse in Data Mining

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Data Warehousing and OLAP

Data Mining. Associate Professor Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology

DATA MINING TRANSACTION

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Business Intelligence and Decision Support Systems

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

Data Warehousing (1)

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa

The Data Organization

Data warehouse architecture consists of the following interconnected layers:

Evolution of Database Systems

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

DATA WAREHOUING UNIT I

Data Warehouse and Data Mining

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

Data Warehouse and Data Mining

Data Warehouse and Mining

MOLAP Data Warehouse of a Software Products Servicing Call Center

Information Management course

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Decision Support, Data Warehousing, and OLAP

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehousing and OLAP Technologies for Decision-Making Process

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Q1) Describe business intelligence system development phases? (6 marks)

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Question Bank. 4) It is the source of information later delivered to data marts.

TIM 50 - Business Information Systems

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

Course chapter 8: Data Warehousing - Introduction. F. Radulescu - Data warehousing - introduction 1

Managing Data Resources

OLAP Introduction and Overview

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives

TIM 50 - Business Information Systems

DSS based on Data Warehouse

Data Warehouse and Data Mining

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1

Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations.

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

INTRODUCTORY INFORMATION TECHNOLOGY ENTERPRISE DATABASES AND DATA WAREHOUSES. Faramarz Hendessi

ETL and OLAP Systems

DATA MINING AND WAREHOUSING

DC Area Business Objects Crystal User Group (DCABOCUG) Data Warehouse Architectures for Business Intelligence Reporting.

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Decision Support Systems aka Analytical Systems

: How does DSS data differ from operational data?

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

Information Integration

Full file at

Call: SAS BI Course Content:35-40hours

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

Data Warehouses and OLAP. Database and Information Systems. Data Warehouses and OLAP. Data Warehouses and OLAP

Data Mining and Data Warehousing Introduction to Data Mining

collection of data that is used primarily in organizational decision making.

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

Managing Data Resources

Part I. Introduction. Chapter 1: Introduction to Data Warehousing and SQL Server 2008 Analysis Services

Data Modelling for Data Warehousing

DKMS Brief No. Five: Is Data Staging Relational? A Comment

Adnan YAZICI Computer Engineering Department

Data Warehousing. Overview

CS 245: Database System Principles. Warehousing. Outline. What is a Warehouse? What is a Warehouse? Notes 13: Data Warehousing

Dr.G.R.Damodaran College of Science

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus


UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

Introduction to Data Warehousing

Advanced Data Management Technologies

Chapter 3: Data Warehousing

Data Vault Brisbane User Group

Data Warehousing and Decision Support

Data Mining and Warehousing

Warehousing. Data Mining

Data Warehousing & OLAP

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse

Transcription:

Data Warehouse and Data Mining Lecture No. 02 Introduction to Data Warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

Outline Introduction to Data Warehouse Data Warehouse versus Operational Database Techniques, Tools Terms and Definitions Source: www.stonebridgegroup.com

Data Warehouse Purpose of the Data Warehouse Value of the DATA - Realize!!! Data / Information is an asset Data / Information can be sold Methods to realize the VALUE Reporting, Analysis, Data Mining, etc Make better decisions!!! Turn data into Information Create competitive advantages Methods to support decision making process DSS etc

Data Warehouse Bill Inmon Father of Data Warehousing The Bill Inmon Definition (in 1993): A Data Warehouse is a: subject oriented integrated non-volatile time-variant collection of data in support of management s decisions

Data Warehouse Subject oriented: Data is arranged by subject area rather than by application, which is more intuitive for users to navigate. Integrated: Data is collected and consistently stored from multiple, diverse sources.

Data Warehouse Non-volatile: The data is static, one version of the truth regardless of when the question is asked. Time-variant: Allows for access to and analysis of data over time, rather than typical systems which generally provide just detailed current information.

Data Warehousing A paradigm specifically designed for strategic business information or decision making Data warehousing is a data-driven decisionsupport system

Data Warehouse (definitions) Used for decision making, Duplicates existing data, Combination of hardware, specialized software and data Dyche A copy of transaction data specifically structured for query and analysis Kimball A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a way that can be understood and used in business context Barry Devlin

Data Warehouse (definitions) A data warehouse is a database where data is collected for the purpose of being analyzed" A data warehouse is used to help people make better decisions" A data warehouse is defined by the use to which it is put, not its underlying architecture

What can be warehoused? Customer records Customer purchases Click stream, web traffic Product records Product purchase records Inventory movement

Technical Architecture Tactics Data warehouses have specific architectures for performance reasons (Star, Snowflake schemas), usually to address weaknesses in database server technology or to provide generic analysis access. Vendors are quickly improving their architecture to address performance problems In the future, operational data stores and data warehouses design may be merged into one technical architecture

Multi-Dimensional Modeling Dimension and Fact tables Techniques Star Schema / Snowflake Schema Specialized tools and knowledge DBMS OLAP (On-Line Analytical Processing), (ROLAP, MOLAP, DOLAP) Data Movement ROLAP - Relational OLAP MOLAP - Multi-dimensional OLAP DOLAP - Desktop OLAP

Data Warehouse vs. Operational Database Data Warehouse Subject oriented Operational Database Application oriented Integrated Multiple diverse sources Non-volatile Updateable Time-variant Real-time, current

Terms and Definitions Bitmapped Indexing - A family of advanced indexing algorithms that optimize RDBMS query performance by maximizing the search capability of the index per unit of memory and per CPU instruction. Properly implemented, bitmapped indices eliminate all table scans in query and join processing. Business Model - An object-oriented model that captures the kinds of things in a business or a business area and the relationships associated with those things (and sometimes associated business rules, too). Note that a business model exists independently of any data or database. A data warehouse should be designed to match the underlying business models or else no tools will fully unlock the data in the warehouse.

Terms and Definitions Corporate Data - All the databases of the company. This includes legacy systems, old and new transaction systems, general business systems, client/server databases, data warehouses and data marts. Data Dictionary - A collection of Meta Data. Many kinds of products in the data warehouse arena use a data dictionary, including database management systems, modeling tools, middleware, and query tools.

Terms and Definitions Data Mart - A subset of a data warehouse that focuses on one or more specific subject areas. The data usually is extracted from the data warehouse and further denormalized and indexed to support intense usage by targeted customers. Data Mining - Techniques for finding patterns and trends in large data sets. Data Model - The road map to the data in a database. This includes the source of tables and columns, the meanings of the keys, and the relationships between the tables.

Terms and Definitions Data Visualization - Techniques for turning data into information by using the high capacity of the human brain to recognize visually recognize patterns and trends. There are many specialized techniques designed to make particular kinds of visualization easy. Data Warehouse - A database built to support information access. Typically a data warehouse is fed from one or more transaction databases. The data needs to be cleaned and restructured to support queries, summaries, and analyses.

Terms and Definitions Decision Support - Data access targeted to provide the information needed by business decision makers. Examples include pricing, purchasing, human resources, management, manufacturing, etc. Decision Support System (DSS) - Database(s), warehouse(s), and/or mart(s) in conjunction with reporting and analysis software optimized to support timely business decision making. Meta Data - Literally, "data about data." More usefully, descriptions of what kind of information is stored where, how it is encoded, how it is related to other information, where it comes from, and how it is related to your business.

Terms and Definitions Methodology - The steps followed to guarantee repeatability of success. A good methodology is built on top of real world experience. Middleware - Hardware and software used to connect clients and servers, to move and structure data, and/or to pre-summarize data for use by queries and reports. Multidimensional database (MDD) - A DBMS optimized to support multidimensional data. The best systems support standard RDBMS functionality and add high-bandwith support for multidimensional data and queries. Users that need a lot of slices and dices might appreciate a multidimensional database.

Terms and Definitions Object Oriented Analysis (OOA) - A process of abstracting a problem by identifying the kinds of entities in the problem domain, the is-a relationships between the kinds (kinds are known as classes, is-a relationships as subtype/supertype, subclass/superclass, or less commonly, specialization/generalization), and the has-a relationships between the classes. Also identified for each class are its attributes (e.g. class Person has attribute Hair Color) and its conventional relationships to other classes(e.g. class Order has a relationship Customer to class Customer.)

Terms and Definitions Object Oriented Design (OOD) - A design methodology that uses Object Oriented Analysis to promote object reusability and interface clarity. OLAP - An acronym for On Line Analytical Processing. A common use of a data warehouse that involves real time access and analysis of multidimensional data such as order information. Performance - Data, summaries, and analyses need to be delivered in a timely fashion. Performance is often a key issue with data warehouses: the right answer isn't worth much if it shows up after the decisions have been made.

Terms and Definitions Query - A specific atomic request for information from a database. Relational On-Line Analytic Processing (ROLAP) - OLAP based on conventional relational databases rather than specialized multidimensional databases. Replication - A standard technique in data warehousing. For performance and reliability several independent copies are often created of each data warehouse. Even data marts can require replication on multiple servers to meet performance and reliability standards.

Terms and Definitions Replicator - Any of a class of product that supports replication. Often these tools use special load and unload database APIs and have scripting languages that support automation. Report - A repeatable, formatted, non-atomic request for information from a database. Usually a report formats and combines several related queries. Security - The right data for the right person. Snowflake Schema - A layering of Star Schema that scales that technique to handle an entire warehouse.

Terms and Definitions Star Schema - A standard technique for designing the summary tables of a data warehouse. "Fact" tables each join to a larger number of independent "dimension" tables. The tables may be partially denormalized for performance, but most queries will still need to join in one or more of the star tables. OLAP refers to querying and accessing on-line data Data Warehouse refers to specific technical architectures for storing and accessing large amounts of data