Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

Similar documents
DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

Data Mining Concepts & Techniques

Question Bank. 4) It is the source of information later delivered to data marts.

Fig 1.2: Relationship between DW, ODS and OLTP Systems

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

ETL and OLAP Systems

CHAPTER 3 Implementation of Data warehouse in Data Mining

Q1) Describe business intelligence system development phases? (6 marks)

Data Warehousing and OLAP

Data Warehouse and Data Mining

OLAP Introduction and Overview

Data Mining & Data Warehouse

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

DATA MINING AND WAREHOUSING

Data Warehousing and OLAP Technologies for Decision-Making Process

Data Warehouse and Mining

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

DATA MINING TRANSACTION

Information Management course

DSS based on Data Warehouse

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Data Warehouse and Data Mining

REPORTING AND QUERY TOOLS AND APPLICATIONS

Teradata Aggregate Designer

DATA WAREHOUING UNIT I

An Overview of Data Warehousing and OLAP Technology

CHAPTER 3 BUILDING ARCHITECTURAL DATA WAREHOUSE FOR CANCER DISEASE

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

The strategic advantage of OLAP and multidimensional analysis

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

Data Warehousing (1)

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

Adnan YAZICI Computer Engineering Department

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse

collection of data that is used primarily in organizational decision making.

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!

Data Mining. Associate Professor Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology

Business Intelligence and Decision Support Systems

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

Knowledge Modelling and Management. Part B (9)

This proposed research is inspired by the work of Mr Jagdish Sadhave 2009, who used

Rocky Mountain Technology Ventures

Data Mining and Warehousing

Evolution of Database Systems

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

Managing Data Resources

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong

Decision Support Systems aka Analytical Systems

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data warehouses Decision support The multidimensional model OLAP queries

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Full file at

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

Data Mining & Data Warehouse

BUSINESS INTELLIGENCE AND OLAP

Dr.G.R.Damodaran College of Science

: How does DSS data differ from operational data?

Data Warehousing and Decision Support

Data warehouse architecture consists of the following interconnected layers:

Managing Data Resources

Data Warehousing and Decision Support

DATAWAREHOUSING AND ETL PROCESSES: An Explanatory Research

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

Managing Information Resources

Call: SAS BI Course Content:35-40hours

The Data Organization

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

1. Inroduction to Data Mininig

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

Data Mining. Asso. Profe. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of CS (1)

Data Warehouse and Data Mining

Outline. Managing Information Resources. Concepts and Definitions. Introduction. Chapter 7

02 Hr/week. Theory Marks. Internal assessment. Avg. of 2 Tests

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

CS 245: Database System Principles. Warehousing. Outline. What is a Warehouse? What is a Warehouse? Notes 13: Data Warehousing

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory

A Data Warehouse Implementation Using the Star Schema. For an outpatient hospital information system

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives

On-Line Analytical Processing (OLAP) Traditional OLTP

Improving Resource Management And Solving Scheduling Problem In Dataware House Using OLAP AND OLTP Authors Seenu Kohar 1, Surender Singh 2

TDWI Data Modeling. Data Analysis and Design for BI and Data Warehousing Systems

Decision Support, Data Warehousing, and OLAP

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

Overview of Reporting in the Business Information Warehouse

What is a Data Warehouse?

Data Warehousing. Adopted from Dr. Sanjay Gunasekaran

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

MOLAP Data Warehouse of a Software Products Servicing Call Center

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE

Transcription:

Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 669-674 Research India Publications http://www.ripublication.com/aeee.htm Data Warehousing Ritham Vashisht, Sukhdeep Kaur and Shobti Saini M.Tech (CSE), Department of Computer Science and Engineering Sri Sai College of Engineering and Technology Pathankot, Punjab, India. Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Data Warehouse provides an effective way for the analysis and statistic to the mass data and helps to do the decision making. In recent years data warehousing has become a prominent buzzword in the database industry, but attention from the database research community has been limited. Data warehouse is a booming industry with many interesting research problems. The database research community has concentrated only on few aspects. In this paper we motivate the concept of a data warehouse, we outline a general data warehousing architecture, and we propose a number of technical issues arising from the architecture that we believe are suitable topics for exploratory research. We describe back end tools for extracting, cleaning and loading data into the data warehouse, tools for metadata management and for managing the warehouse. Keywords: Data warehouse design; multidimensional modeling; OLAP; ETL. 1. Introduction Databases are developed on the idea that data is one of the critical materials of the information age. Information, which is created by data, becomes the bases for decision making. Data warehousing encompasses architectures, algorithms, and tools for bringing together selected data from multiple databases or other information sources into a single repository, called a data ware-house, suitable for direct querying or analysis.

670 Ritham Vashisht et al It is well known that data warehouses (DWs) are focused on decision support rather than on transaction support and that they are prevalently characterized by an OLAP work-load. Data Warehouse is a database used for reporting and analysis. It refers to the database that is maintained separately from an organization s operational databases. The data stored in the data warehouse is uploaded from the operational systems. Data warehousing is a phenomenon that grew from the huge amount of electronic data stored in recent years and from the urgent need to use that data to accomplish goals that go beyond the routine tasks linked to daily processing. Many years ago, database designers realized that such an approach is hardly feasible, because it is very demanding in terms of time and resources, and it does not always achieve the desired results. Moreover, a mix of analytical queries with transactional routine queries inevitably slows down the system, and this does not meet the needs of users of either type of query. Today s advanced data warehousing processes separate online analytical processing (OLAP) from online transactional processing (OLTP) by creating a new information repository that integrates basic data from various sources, properly arranges data formats, and then makes data available for analysis and evaluation aimed at planning and decision-making processes. Traditionally, OLAP applications are based on multidimensional modeling that intuitively represents data under the metaphor of a cube whose cells store events that occurred in the business domain. Adopting the multidimensional model for DWs has a two-fold benefit. On the one hand, it is close to the way of thinking of data analyzers and, therefore, it helps users understand data; on the other hand, it supports performance improvement as its simple structure allows designers to predict users intentions. Multidimensional modeling requires specialized design techniques. Though a lot has been written about how a DW should be designed, there is no consensus on a design method yet. Most methods agree on the opportunity for distinguishing between a phase of conceptual design and one of logical design. Several methods also support a phase of physical design that addresses all the issues specifically related to the suite of tools chosen for implementation such as indexing and allocation. In some cases, a phase of requirement analysis is separately considered. Fig. 1: Core phases in DW design

Data Warehousing 671 1.1 Data Warehouse It is a collection of integrated, subject-oriented databases designed to support the DSS function, where each unit of data is non-volatile and relevant to some moment in time. DW is a repository of integrated information from operational (OLTP) and legacy system that provides data for analytical processing and decision making. The data in data warehouse is cleansed, temporal (historic), summarized and non-volatile. 1.2 Characteristics of data warehouse Subject-Oriented: Information is presented according to specific subjects or areas of interest, not simply as computer files. Data is manipulated to provide information about a particular subject. Integrated: All inconsistencies regarding naming convention and value representations are removed. Non-Volatile: Stable information that doesn t change each time an operational process is executed. Information is consistent regardless of when the warehouse is accessed. Time-Variant: Containing a history of the subject, as well as current information. Historical information is an important component of a data warehouse. Accessible: The primary purpose of a data warehouse is to provide readily accessible information to end-users. Process-Oriented: It is important to view data warehousing as a process for delivery of information. The maintenance of a data warehouse is ongoing and iterative in nature. 2. Multidimensional Modeling Multidimensional data model views data in the form of a data cube. A data cube allows data to be modeled and viewed in multiple dimensions. Dimensions are perspectives or entities with respect to which an organization wants to keep records. Each dimension has a table associated with it, called a dimension table, which further describes the dimension. A multidimensional data model is typically organized around a a fact table. The fact table contains the names of the facts, or measures, as well as keys to each of the related dimension tables. Conceptual modeling: Conceptual modeling provides a high level of abstraction in describing the warehousing process and architecture in all its aspects, aimed at achieving independence of implementation issues. Conceptual modeling for DWs has been tackled from mainly two points of view so far: Multidimensional modeling Modeling of ETL Logical modeling: Once the conceptual modeling phase is completed, the overall task of logical modeling is the transformation of conceptual schemata into logical schemata that can be optimized for and implemented on a chosen target system. OLAP(on-line analytical processing)

672 Ritham Vashisht et al It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. OLAP operations: Drill down Drill up Slice and dice Pivot OLAP Servers An OLAP Server is a high capacity, multi user data manipulation engine specifically designed to support and operate on multi-dimensional data structure. OLAP servers available are: 1. ROLAP server - These are the intermediate servers that stand in between a relational back-end server and client front-end tools. ROLAP severs include optimization for each DBMS back end, implementation of aggregation navigation logic, and additional tools and services 2. MOLAP server - These servers support multidimensional views of data through array-based multidimensional storage engines. 3. HOLAP server - The hybrid OLAP approach combines ROLAP and MOLAP technology, benefiting from the greater scalability of ROLAP and faster computation of MOLAP. OLAP Tools A set of software products that attempts to facilitate multidimensional analysis. It can incorporate data acquisition, data access, data manipulation, or any combination thereof. 3. Data Warehouse Maintenance Issues DW maintenance issues include data cleansing, transform, loading, subsequent loading (refreshing), data purging. ECTL (Extract, cleansing, transform, and load): Refers to the methods involved in accessing and manipulating source data and loading it into the target database. 1. Extract: Some of the data elements in the operational database can be reasonably expected to be useful in the decision-making, but others are of less value for that purpose. For this reason, it is necessary to extract the relevant data from the operational database before bringing into the data warehouse. 2. Transform: The operational database developed can be based on any set of priorities, which keep changing with requirements. Therefore those who develop data warehouse based on these databases are typically faced with inconsistency among their data source.

Data Warehousing 673 3. Cleansing: Information quality is the key consideration in determining the value of the information. The developer of the data warehouse makes the data error-free before entering into the warehouse as much as possible. This process is known as data cleansing. It must deal with many types of possible errors. It includes missing data and incorrect data at one source, inconsistent data and conflicting data when two or more sources are involved. 4. Loading: It often implies physical movement of data from the computer storing the source database to that which will store the source data warehouse database assuming it is different. 5. Data refreshing and data purging: After the initial loading, updates at the source database should be propagated to the data warehouse. This propagation is called data refreshing. 4. Applications of Data Warehouse Operational and business intelligence applications Knowledge discovery Agriculture Biological data analysis Call record analysis Churn Prediction for Telecom subscribers, Credit Card users etc. Decision support Financial forecasting Insurance fraud analysis Logistics and Inventory management Trend analysis Health care providers Security agencies 5. Conclusion In this paper, we have discussed about the construction and designing of data warehouses. The construction of data warehouses involves data cleaning and data integration. A data warehouse is a subject-oriented, integrated, time-variant and non volatile collection of data in support of management s decision making process. Data warehousing is very useful from the point of view of heterogeneous database integration. It provides an interesting alternative approach to the traditional approach of heterogeneous database integration. It employs an update-driven approach in which information from multiple, heterogeneous source is integrated in advance and stored in a warehouse for direct querying and analysis. Many people feel that with competition mounting in every industry, data warehousing is the latest must-have marketing weapon- a way to keep customers by learning more about their needs. Ad-hoc

674 Ritham Vashisht et al techniques are required for dealing with the emerging applications of data warehousing and with advanced architectures for business intelligence. Besides, the need for realtime data processing raises original issues that were not addressed within traditional periodically-refreshed DWs. Thus, over-all, we believe that research on DW modeling and design is far from being dead, partly because more sophisticated techniques are needed for solving known problems, partly because of the new problems raised during the adaptation of DWs to the peculiar requirements of today s business. Data warehouse do not contain the current information. However, data warehouse brings high performance to the integrated heterogeneous database system. It can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry. References [1] B.H usemann, J.Lechtenb orger, and G.Vossen. Conceptual data warehouse design. InProc. DMDW, pages 3 9, 2000. [2] M.Golfarelli, S.Rizzi, and I.Cella. Beyond data warehousing: What s next in business intelligence? In Proc. DOLAP, pages 1 6, 2004. [3] Data Warehousing- Wikipedia [4] Surajit Choudhary: Data Warehousing and OLAP technology. [5] http://www.123helpme.com/data-wharehouse-paper-view.asp?id=164499