Database Technologies for E-Business. Dongmei CUI

Size: px
Start display at page:

Download "Database Technologies for E-Business. Dongmei CUI"

Transcription

1 Database Technologies for E-Business 15 Database Technologies for E-Business Dongmei CUI Abstract In today's fast-paced business environment, business processes such as designing product, obtaining suppliers, selling, fulfilling orders, and providing services, are performed through the extensive use of computer and communication technologies and computerized data. Under the "e" environment, many companies have been already collecting and refining vast amounts of data. From analyzing the data, they understand customer expectations and optimize operations to unprecedented degrees which lead to success. The companies, called analytics competitor, are competing on analytics and three key attributes they own are found out. This paper aims to investigate database technologies from data perspective, focusing on the requirement for the exploitation of analytic techniques in today's e-business. key words Relational data model, Multidimensional data model, Data warehouse, ETL, Online analytical processing, Data mining, Analytics competitor 1. INTRODUCTION In today's competitive business environment characterized by globalization, short product life cycles, short spans of distribution and diversity of customers' needs, a lot of innovations have been appearing on the business and technology side of organizations (Hammer 2001; Crainer 2000). On the business side we have been seeing business process reengineering (BPR) and balanced scorecard (BSC), the management philosophies of customer relationship and supply chain management, electronic commerce, and business-to-business trading exchanges. On the technology side, there is a move from standalone information systems to large-scale information systems, such as enterprise resource planning (ERP) systems, knowledge management systems (KMS), and different information technology solutions for enterprise application integration (e.g. the customer relationship management (CRM) systems), interorganizational systems (IOS), as well as the standardization like the internet, electronic data interchange (EDI) and so forth. It can say that those technologies are indispensable for such business processes as designing product, obtaining suppliers, selling, fulfilling orders, and providing service, we call it e-business (Alter 2002), where business activities are conducted using computer and communication technologies and computerized data. Under the "e" environment, many companies have been already collecting and refining vast amounts of data in databases / data warehouses. From analyzing the data, they understand customer expectations and optimize operations to unprecedented degrees. Several well-known stories, such as electronic reservations from American Airlines, online ordering from American Hospital, and online book selling from Amazon and so on, have been intensively studied. Those companies are called analytics competitors and the key attributes they have are found out from a recent research (Davenport 2006): they use widely data modeling and optimization technologies; they apply analytic techniques through the whole enterprise; and their senior executive promotes analytics for

2 16 competing. Needless to say, the exploitation of analytic techniques has brought competitive advantage to the companies, which is not just because they can analysis, but also because they should keep competing on analytics. This paper attempts to overview database technologies from data perspective, focusing on the requirement for applying analytic techniques in e-business. The paper is organized in the following way: Firstly, we start with investigating how the data is design in a database / data warehouse in the coming section. Secondly, for the purpose of analyzing the multidimensional data, data warehousing and online analytical processing are discussed in detail under the data warehouse architecture in section 3. Finally, a conclusion is discussed in the end. 2. RELATIONAL AND MULTIDIMENSIONAL DATA MODEL According to the database terminology, a data model is defined as a set of constructs that describe the structure of data, and a set of operations which are used for manipulating the data. In this section, we use the data model term to refer a structure of the data by design, not a structure discovered existing within the data which is usually modeled by entity-relationship (ER) modeling method to represent the semantic data model of entities and their relationships. 2.1 Relational Data Model Generally, the data stored in a database is designed by a tabular way, or organized in a tabular form. A database can contain multiple tables. The schema of a table consists of the table name and a set of columns with the corresponding names; the column names are also referred to as attributes and a row of table is called a tuple. An instance of the schema (namely, the actual table) is called a relation. Each table entry (relational data) in the column for each attribute is a value within a domain of the corresponding attribute. For example, suppose we have a database containing five tables for storing the data regarding the process of supply which might be as the following: Figure 1. Example of a database containing five relations: supply, part, project, inventory and supply. QOH and QTY stand for quantity of house and quantity of supply respectively. Herein, we describe the relational data more formally. Let R be a relation schema, which is a set of attributes {A 1,..., A i,..., A n}, where each attribute A i has a associated domain Dom(A i). A row over the schema R is a mapping t : R i Dom(A i) where t(a i) Dom(A i). A table (relation) over the schema R is a collection of rows over R. A database schema R is a collection of {R 1,..., R i,..., R m} of relation schemas and a database r over the schema R consists of relations over R i for each i = 1,..., m. The database consisting of the tables is called relational database (Codd 1970). Here, each instance of R i, i = 1,..., m can be seen as a basic relation of database. Moreover, new relations can be derived from the basic relations or the pre-derived relations. Figure 2 shows user can operate database through two ways: accessing basic relations (table) directly, or operating the basic relations indirectly through the derived relations. Two kinds of relations can be derived: view and snapshot. View is a set of data satisfying a constraint, or collection of some attributes (columns), or a result obtained by relational operation, which is a dynamic, also called virtual relation as it doesn't shore the data, so that user has to access basic relations (tables) of database according to the definition of view. On the other hand, snapshot is a copy of data of a period in database which is not time-varying, or says, static relation. For example, we can create a snapshot for making monthly sales report on the each final day of every month. As illustrated in Figure 2, users use the interface SQL (Structured Query Language) to access R1 directly, and also they can get the data in R1 through V3. Further, in order to obtain the data from R3, they can operate V3 and V1 since V3 is defined

3 Database Technologies for E-Business 17 over V1 and V1 is derived from R3, where the notation of R1, R2, and R3 are referred to as basic relations, and V1, V2, and V3 as derived relations which could be a view or snapshot. Figure 2. Basic relations and derived relations In the relational data model, each relation is normalized according to the concept of function dependence, and entries of a given relation can crossreference other entries of the same relation or entries of a different relation under the primary-foreign key relationship (Codd 1970). In other words, the primary key (a column or combination of columns) plays a role to be used to reference relation itself or to be referenced by other relations. In contrast, the foreign key is used by combining with the primary key of other relations for relation connecting with other relations. 2.2 Multidimensional Data Model Although data could be modeled in relational modeling, it is more intuitive to think of it in terms of dimensions and facts while aggregating/summarizing data, such as the sales for the product in the month of September over all stores. Information about the dimension values is maintained in the dimension tables, usually, one dimension table is created for one dimension. And information about the facts is organized in a table. Each row contains one fact, which is represented by references to the dimensions and the measures (e.g. sales). Technically, each dimension table holds a primary key, which is also included in the fact table as a foreign key. The combination of all foreign keys becomes the primary key of the fact table. Now we arrive at considering what the schema expressing facts from multiple dimensions would be. Most commonly the multidimensional data model is mapped onto a star schema which consists of a fact table and several dimension tables. The fact table has measure attributes that record the facts and dimension attributes that form a foreign key to the dimension tables. Imagine a picture in which the fact table is in the middle encircled by several dimension tables. Figure 3 gives a concrete example from (Han & Kamber 2006) showing a star schema in the left side. As shown in the right side of Figure 3, the values for a dimension can be grouped in a hierarchical tree structure, so that the analyst can view data at the different levels along the hierarchy of time and Figure 3. A star schema and the hierarchy of dimension

4 18 location dimension. Different aggregation functions can be applied to the lower-level data in order to obtain the data at higher levels along the direction of arrow (see Figure 3). In business analysis, sum, count, min, max, and average are commonly used aggregation functions. For instance, the aggregation function "sum" can be applied to sales values within each month to get the monthly figure to see the total sales figure at monthly level. In order to reduce the number of dimension tables (or say, joins) during query processing, the dimension tables in a star schema are de-normalized. For normalizing each dimension table, the alternative approach is to have a snowflake schema. In the snowflake schema, the dimension tables can be normalized by splitting the information in them in several tables; unlike star schema described above, picture a large fact table in the middle surrounded by dimension tables. But now, each dimension table in turn may be surrounded by a number of smaller tables. As shown in Figure 4, the item dimension table is normalized into supplier dimension table, and location dimension table into city dimension table, that is, the attribute of supplier_key in item dimension table is a foreign key while the attribute of supplier_key in supplier dimension table is the primary key after the item dimension table is normalized. Hence, the same changing is happening between the location dimension table and city dimension table. Figure 4. A snowflake schema 2.3 Representation of Multidimensional Data Model Besides the n-tuple of the tables described in section 2.1 is represented as n-array in the computer (Codd 1970), facts with n dimensions can be organized by an n-dimensional cube stored as an n-dimensional array (Gray, Chauduri and Bosworth el al. 1997) (see Figure 5). All the dimensions together are assumed to uniquely determine the measure, that is, in our example from (Han & Kamber 2006), a particular time, item, location and branch give us a unique sale. Thus, time t, item i, location j and branch k give a cell [t][i][j][k] that records the content of the measures for that sale. Clearly, each cell in the cube corresponds to a row in a fact table, which brings a great problem with sparseness in the representation of n-dimensional cube, because the fact that most combinations of dimensions do not have an associated measure (see Figure 5). For example, all items are not sold at all branches at all times. Several compression versions such as iceberg cube (Fang, Shivakumar and Garcia-Molina el al. 1998) are used in the model to avoid having to store large, mostly empty arrays. The multidimensional data cube technology is influenced by the success of spreadsheet programs in business analysis. However, nowadays, the model based on relational technology is mostly used since it allows leveraging all the know-how and software already existing in relational database systems.

5 Database Technologies for E-Business 19 Figure 5. Lattice of cuboids, making up a 4-D data cube for the dimensions time, item, location, and branch. Each cuboid represents a different degree of summarization (Han & Kamber 2006) 3. DATA WAREHOUSING AND ONLINE ANALYTICAL PROCESSING Data warehousing and online analytical processing have been becoming increasingly important for comprehensive analysis of current and historical data, in order to extract key insights from the vast amounts of data being collected. The purpose of these systems is to provide users a fast analysis so that they can interactively analyze the data to understand business pattern. Generally, relational database described in section 2.1 is designed to mainly maintain data for everyday operations. A bank database, for instance, is a typical example which contains information about accounts and runs everyday under a network of ATM machines. The database is mainly to support transactions which are operations that access and change (e.g. insert or update) the data in the database, called OnLine Transaction Processing (OLTP). OLTP system uses the primary-foreign key relationship to relate tables to each other, and usually is created for a specific use such as the example of bank, as well as order processing, ticket tracking, or personnel file systems. However, after emphasis shifted towards comprehensive analysis of current and historical data, in order to understand customers' expectation and business patterns, data processing is demanded to summarize large amounts of low-level data (see Figure 3) and relate different aspects of business to find interesting correlations. Therefore, database access needs to be based on complex queries, which is called OnLine Analytical Processing (OLAP) (Kimball & Strehlo 1997). In order to process the complex queries efficiently for analyzing vast amounts of data from multiple dimensions, data has to be collected in advance of queries, that is to say, data needs to be extracted from many sources to be collected in a database holding information about subjects spanning the entire organization, called data warehouse, or multiple smallsize databases holding information about a subset of corporation-wide data (e.g. marketing data), called data mart. Figure 6 shows a multi-tier architecture of data warehouse (McFadden 1996; Han & Kimball 2006). Usually, the data stored in a data warehouse (data mart) is copied from multiple OLTP databases to keep history of many data (sets of snapshots). In order to get the data and allow them to continue working normally, it is necessary to watch out for redundant data, missing data, or heterogeneous data. In data warehousing systems, a variety of data extraction and cleaning tools, and utilities of load and refresh are exploited for populating warehouses during the extracting, transforming, and loading process (Immon 2002; Kimball & Ross 2002). Data extraction from "foreign" sources is usually implemented via gateways and standard interfaces.

6 20 Figure 6. Data warehouse: a multi-tier architecture Not surprisingly, there is a high probability of errors and anomalies in the data, since large volumes of data from multiple sources are involved. For example, data cleaning is processed several tasks: filling in missing entries, identifying outliers and smooth out noisy data (e.g. incorrect attribute values: random error or variance in a measured variable), correcting inconsistent data (e.g. inconsistent value assignments, inconsistent field length, inconsistent descriptions) and resolving redundancy caused by data integration (e.g. schema integration: A.cust-id = B.cust-#). The discrepancy of data is usually detected through checking field overloading, checking uniqueness rule, consecutive rule and null rule, using metadata such as domain, range, or dependency, as well as applying some tools: Data scrubbing tools use simple domain knowledge (e.g. postal addresses, spell-check) to detect and correct the data. Parsing and fuzzy matching techniques are often exploited to scrub the data from multiple sources. Data auditing tools scan the data and discover rules and relationships to detect violators, for example, analyzing correlation and clustering to find outliers. Thus, such tools may be considered variants of data mining tools, for instance, the tool may discover a suspicious pattern based on statistical analysis that a certain car dealer has never received any complaints. For the migration and integration of data, data migration tools and ETL (extraction/ transformation/loading) tools fall in this category: data migration tools allow simple transformation rules to be specified, for example, replace the string "gender" by "sex", while ETL tools provide users a graphical user interface to specify transformation. After extracting, cleaning and transforming data, typically, batch load utilities are used for populating the warehouse. Several processes are required: checking integrity constraints; sorting; creating the derived tables stored in the warehouse by summarization, aggregation and other computation; building indices and other access paths; and partitioning data to multiple target storage areas. Furthermore, a load utility must allow the system administrator to monitor status, to cancel, suspend and resume a load. If a failure occurs during the load, the loading process can be restart from the last checkpoint by using periodic checkpoints. In practice, pipelined and partitioned parallelism are typically exploited to prevent loads taking a very long time, for example, sequential loads may take weeks and months for loading a terabyte of data. However, even using parallelism, loading process may still take too long time. Therefore, incremental loading can be used during refresh, in order to reduce the volume of data that has to be incorporated into the warehouse, in which only the updated tuples are

7 Database Technologies for E-Business 21 inserted. Refreshing a warehouse consists in propagating updates on source data to correspondingly update the data stored in the warehouse. Usually, the warehouse is refreshed periodically (e.g., daily or weekly). The refresh policy is set by the warehouse administrator depending on user needs and traffic and so on. Most contemporary database systems provide replication servers that support incremental techniques for propagating updates from a primary database to one or more replicas. Such replication servers can be used to incrementally refresh a warehouse when the sources change. The data shored in data warehouse is modeled in multidimensional data model as described in section 2.2. How to compute and organize the data cube is important process in OLAP. The multidimensional data can be stored and organized in different ways. In the OLAP engine tier shown in Figure 6, there are two contrasting approaches called relational OLAP (ROLAP) and multidimensional OLAP (MOLAP). In a ROLAP system, the data is stored in relational tables and the analytical engine is built on the top of relation database system through standard SQL interface to access the multidimensional data in the tables which are commonly mapped onto a star schema or a snowflake schema. On the other hand, the data in MOLAP systems is stored in a specialized form such as multidimensional arrays described in section 2.3. Since ROLAP uses the well-developed relational database technology (e.g. query processing and optimization), it can coexist with other data sources based on relational database technology and dose not need any specialized storage mechanisms, whereas MOLAP computes and organizes the data cube in a n-dimensional array which could lead to a fast multidimensional analysis. The benefits of both can combined in Hybird OLAP (HOLAP), for example, the Microsoft SQL Server 2000 supports a HOLAP server, which can store large volumes of detail data in a relational database, while aggregations are kept in a separate MOLAP store. According to recent reports from vendors (Ault 2003; Oracle presentation 2005), there are two main moves among them: building a specialized multidimensional engine and attempting to push OLAP functionality into relational databases. At the front-end of the data warehouse architecture illustrated in Figure 6, users use the front-end tools to make complex queries, modify information in a report, swapping between aggregated and detail data, select part of the data, and so forth through OLAP operations: explore the multidimensional data cube by moving up the dimension hierarchy (roll up), moving down (drill down), restricting to a dimension value (slice), selecting an aggregated sub-space (dice), and crossing tabulation (pivot). Alike online analytical processing, data mining is one of most important approaches for multidimensional analysis in data warehouses. Data mining is the extraction of interesting, such as nontrivial, implicit, previously unknown, and potentially useful information or patterns from data in large databases (Fayad, Piatetsky-Shiapiro, Smyth, & Uthurusamy 1996; Han & Kamber 2006), trying to generate such a hypothesis by uncovering hidden patterns. Motivated by the popularity of OLAP technology, Han developed an online analytical mining (OLAM) mechanism to integrate OLAP with multidimensional data mining (Han 1997; Han & Kamber 2006). OLAM provides facilities for data mining on different subsets of data and at different level of abstraction by drilling, pivoting, filtering, dicing, and slicing on a data cube. This can greatly enhance the power and flexibility of exploratory data mining together with visualization tools (Aggarwal 2002). 4. CONCLUSION Since the advent of information technology, businesses have been collecting vast amounts of data about their daily transactions, refining the system that produce transaction data, making data from multiple sources available in warehouses, selecting and implementing analytic tools and assembling the hardware and communication environment. From data perspective, we discussed database technologies associated with the exploitation of analytic techniques:

8 22 multidimensional data modeling, data warehousing and online analytical processing, which are indispensable technological demand for being an analytics competitor. The purpose of these systems is to provide users a fast analysis so that they can interactively analyze the data to understand business pattern such as customer behavior, product movement, employee performance, and financial reactions. In order to build such a system, there are a lot of challenges including data modeling, schema design, loading, maintenance, query processing and so on. For ease to use, simpler and more deployment, and optimal value, a trend has been appearing that data collection, storage, processing, and other issues specific to analytics are incorporated into overall system design. REFERENCES Aggarwal, Charu C. (2002) Towards Effective and Interpretable Data Mining by Visual Interaction. SIGMOD Explorations, Vol.3 Issue 2 pp.11/22 Alter Steven (2002) Information Systems-The Foundation of E-Business, Fourth Edition, Prentice Hall, pp.3/35 Ault Mike (2003) Oracle Data Warehouse Management- Secrets of Oracle Data Warehousing, Rampant TechPress Codd, E. F. (1970) A Relational Model of Data for Large Shared Data Banks, Communication of ACM, Vol. 13, No. 6, June Crainer Stuart (2000) The Management Century A Critical Review of 20th Century Thought & Practice, Booz Allen & Hamilton Inc. Japanese Translation pp.240/296 Davenport, Thomas H. (2006) Competing on analytics, Harvard Business Review, Jan. Fang, M., Shivakumar, H., Garcia-Molina, F., Motwani, R., and Ullman, J.D. (1998) Computing iceberg queries efficiently, Proceedings of Very Large Data Bases, pp.299/310, New York, Aug. Fayad, U., Piatetsky-Shiapiro, G., Smyth, P., and Uthurusamy, R. (1996) Advances in Knowledge Discovery and Data mining, Menlo Park, CA: AAAI Press Giudici, P. (2003) Applied Data Mining Statistical Methods for Business and Industry, England, Wiley & Sons Gray, J., Chaudhuri, S., Bosworth, A., Layman A., Reichart, D., VenKatrao, M., Pellow, F., and Pirahesh, H. (1997) Data Cube: A relational aggregation operator generalizing group-by, cross-tab and sub-total, Data Mining and Knowledge Discovery, No.1 pp.29/54 Hammer Michael (2001) The Agenda What Every Business Must Do to Dominate The Decade, Three River Press Han, J. W. and Kamber, M. (2006) Data Mining: Concepts and Techniques, Morgan Kaufmann Publisher Han, J. (1997) OLAP mining: An Integration of OLAP with Data Mining, Proceedings of the 1997 IFIP Conference on Data Semantics, Oct. IDG Japan (2004) Business Innovation Powered by Oracle E-Business Suite, ISBN Immon, W. H. (2002) Building the Data Warehouse (3 rd Ed.), New York, Wiley & Sons Kimball, R. and Ross, M. (2002) The Data Warehouse Toolkit (2 nd Ed.), New York, Wiley & Sons Kimball, R. and Strehlo, K. (1995) Why Decision Support Fails and How to Fix It, SIGMOD Record, 24(3) pp.92/97 Kinghtsbridge (2005) Top 10 Trends in Business Intelligence and Data Wareshousing for 2005, White Paper, Kinghtsbridge Solutions LLC, Jan. McFadden, Fred R. (1996) Data Warehouse for EIS: Some Issues and Impacts, Proceedings of the Hawaii International Conference on Systems Sciences Oracle Presentation (2005) Oracle Database 10g Release 2-The Exploitation of Data Warehouse, Oracle Corporation

ETL and OLAP Systems

ETL and OLAP Systems ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:- UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

collection of data that is used primarily in organizational decision making.

collection of data that is used primarily in organizational decision making. Data Warehousing A data warehouse is a special purpose database. Classic databases are generally used to model some enterprise. Most often they are used to support transactions, a process that is referred

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 22 Table of contents 1 Introduction 2 Data warehousing

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

REPORTING AND QUERY TOOLS AND APPLICATIONS

REPORTING AND QUERY TOOLS AND APPLICATIONS Tool Categories: REPORTING AND QUERY TOOLS AND APPLICATIONS There are five categories of decision support tools Reporting Managed query Executive information system OLAP Data Mining Reporting Tools Production

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015 Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)

More information

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights

More information

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

Novel Materialized View Selection in a Multidimensional Database

Novel Materialized View Selection in a Multidimensional Database Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/

More information

Data Warehousing and OLAP

Data Warehousing and OLAP Data Warehousing and OLAP INFO 330 Slides courtesy of Mirek Riedewald Motivation Large retailer Several databases: inventory, personnel, sales etc. High volume of updates Management requirements Efficient

More information

Data Warehousing and OLAP Technologies for Decision-Making Process

Data Warehousing and OLAP Technologies for Decision-Making Process Data Warehousing and OLAP Technologies for Decision-Making Process Hiren H Darji Asst. Prof in Anand Institute of Information Science,Anand Abstract Data warehousing and on-line analytical processing (OLAP)

More information

Adnan YAZICI Computer Engineering Department

Adnan YAZICI Computer Engineering Department Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection

More information

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures) CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm

More information

An Overview of Data Warehousing and OLAP Technology

An Overview of Data Warehousing and OLAP Technology An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 669-674 Research India Publications http://www.ripublication.com/aeee.htm Data Warehousing Ritham Vashisht,

More information

What is a Data Warehouse?

What is a Data Warehouse? What is a Data Warehouse? COMP 465 Data Mining Data Warehousing Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Defined in many different ways,

More information

Data warehouses Decision support The multidimensional model OLAP queries

Data warehouses Decision support The multidimensional model OLAP queries Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing

More information

CHAPTER 3 Implementation of Data warehouse in Data Mining

CHAPTER 3 Implementation of Data warehouse in Data Mining CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected

More information

A Multi-Dimensional Data Model

A Multi-Dimensional Data Model A Multi-Dimensional Data Model A Data Warehouse is based on a Multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in

More information

Rocky Mountain Technology Ventures

Rocky Mountain Technology Ventures Rocky Mountain Technology Ventures Comparing and Contrasting Online Analytical Processing (OLAP) and Online Transactional Processing (OLTP) Architectures 3/19/2006 Introduction One of the most important

More information

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Decision Support Chapter 25 CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support

More information

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa ICS 421 Spring 2010 Data Warehousing 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/30/2010 Lipyeow Lim -- University of Hawaii at Manoa 1 Data Warehousing

More information

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse Principles of Knowledge Discovery in bases Fall 1999 Chapter 2: Warehousing and Dr. Osmar R. Zaïane University of Alberta Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke Data Warehouses Yanlei Diao Slides Courtesy of R. Ramakrishnan and J. Gehrke Introduction v In the late 80s and early 90s, companies began to use their DBMSs for complex, interactive, exploratory analysis

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring

More information

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction 2 Data warehousing

More information

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 31 Table of contents 1 Introduction 2 Data warehousing

More information

Chapter 4, Data Warehouse and OLAP Operations

Chapter 4, Data Warehouse and OLAP Operations CSI 4352, Introduction to Data Mining Chapter 4, Data Warehouse and OLAP Operations Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining

More information

Data warehousing in telecom Industry

Data warehousing in telecom Industry Data warehousing in telecom Industry Dr. Sanjay Srivastava, Kaushal Srivastava, Avinash Pandey, Akhil Sharma Abstract: Data Warehouse is termed as the storage for the large heterogeneous data collected

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

Decision Support Systems aka Analytical Systems

Decision Support Systems aka Analytical Systems Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis

More information

Q1) Describe business intelligence system development phases? (6 marks)

Q1) Describe business intelligence system development phases? (6 marks) BUISINESS ANALYTICS AND INTELLIGENCE SOLVED QUESTIONS Q1) Describe business intelligence system development phases? (6 marks) The 4 phases of BI system development are as follow: Analysis phase Design

More information

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination

More information

Teradata Aggregate Designer

Teradata Aggregate Designer Data Warehousing Teradata Aggregate Designer By: Sam Tawfik Product Marketing Manager Teradata Corporation Table of Contents Executive Summary 2 Introduction 3 Problem Statement 3 Implications of MOLAP

More information

Introduction to Data Warehousing

Introduction to Data Warehousing ICS 321 Spring 2012 Introduction to Data Warehousing Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/23/2012 Lipyeow Lim -- University of Hawaii at Manoa

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process. MTAT.03.183 Data Mining Week 7: Online Analytical Processing and Data Warehouses Marlon Dumas marlon.dumas ät ut. ee Acknowledgment This slide deck is a mashup of the following publicly available slide

More information

Constructing Object Oriented Class for extracting and using data from data cube

Constructing Object Oriented Class for extracting and using data from data cube Constructing Object Oriented Class for extracting and using data from data cube Antoaneta Ivanova Abstract: The goal of this article is to depict Object Oriented Conceptual Model Data Cube using it as

More information

Full file at

Full file at Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits

More information

Decision Support, Data Warehousing, and OLAP

Decision Support, Data Warehousing, and OLAP Decision Support, Data Warehousing, and OLAP : Contents Terminology : OLAP vs. OLTP Data Warehousing Architecture Technologies References 1 Decision Support and OLAP Information technology to help knowledge

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand

More information

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Lectures for the course: Data Warehousing and Data Mining (IT 60107) Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline

More information

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory Warehousing Outline Andrew Kusiak 2139 Seamans Center Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934 Introduction warehousing concepts Relationship

More information

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE Dr. Kirti Singh, Librarian, SSD Women s Institute of Technology, Bathinda Abstract: Major libraries have large collections and circulation. Managing

More information

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model Chapter 3 The Multidimensional Model: Basic Concepts Introduction Multidimensional Model Multidimensional concepts Star Schema Representation Conceptual modeling using ER, UML Conceptual modeling using

More information

Dta Mining and Data Warehousing

Dta Mining and Data Warehousing CSCI6405 Fall 2003 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: q.gao@dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:

More information

The strategic advantage of OLAP and multidimensional analysis

The strategic advantage of OLAP and multidimensional analysis IBM Software Business Analytics Cognos Enterprise The strategic advantage of OLAP and multidimensional analysis 2 The strategic advantage of OLAP and multidimensional analysis Overview Online analytical

More information

MICROSOFT BUSINESS INTELLIGENCE

MICROSOFT BUSINESS INTELLIGENCE SSIS MICROSOFT BUSINESS INTELLIGENCE 1) Introduction to Integration Services Defining sql server integration services Exploring the need for migrating diverse Data the role of business intelligence (bi)

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic

More information

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different? (Please write your Roll No. immediately) End-Term Examination Fourth Semester [MCA] MAY-JUNE 2006 Roll No. Paper Code: MCA-202 (ID -44202) Subject: Data Warehousing & Data Mining Note: Question no. 1 is

More information

This proposed research is inspired by the work of Mr Jagdish Sadhave 2009, who used

This proposed research is inspired by the work of Mr Jagdish Sadhave 2009, who used Literature Review This proposed research is inspired by the work of Mr Jagdish Sadhave 2009, who used the technology of Data Mining and Knowledge Discovery in Databases to build Examination Data Warehouse

More information

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus Overview: Analysis Services enables you to analyze large quantities of data. With it, you can design, create, and manage multidimensional structures that contain detail and aggregated data from multiple

More information

Table of Contents. Knowledge Management Data Warehouses and Data Mining. Introduction and Motivation

Table of Contents. Knowledge Management Data Warehouses and Data Mining. Introduction and Motivation Table of Contents Knowledge Management Data Warehouses and Data Mining Dr. Michael Hahsler Dept. of Information Processing Vienna Univ. of Economics and BA 11. December 2001

More information

Knowledge Management Data Warehouses and Data Mining

Knowledge Management Data Warehouses and Data Mining Knowledge Management Data Warehouses and Data Mining Dr. Michael Hahsler Dept. of Information Processing Vienna Univ. of Economics and BA 11. December 2001 1 Table of Contents

More information

1. Inroduction to Data Mininig

1. Inroduction to Data Mininig 1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing The Evolution of Data Warehousing Data Warehousing Concepts Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient and cost-effective

More information

CS 1655 / Spring 2013! Secure Data Management and Web Applications

CS 1655 / Spring 2013! Secure Data Management and Web Applications CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered

More information

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)? Introduction to Data Warehousing and Business Intelligence Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction A tour of the coming DW lectures DW Applications Loosely

More information

Call: SAS BI Course Content:35-40hours

Call: SAS BI Course Content:35-40hours SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio

More information

Data warehouse architecture consists of the following interconnected layers:

Data warehouse architecture consists of the following interconnected layers: Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include

More information

Data Analysis and Data Science

Data Analysis and Data Science Data Analysis and Data Science CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/29/15 Agenda Check-in Online Analytical Processing Data Science Homework 8 Check-in Online Analytical

More information

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives Data-Driven Driven Business Intelligence Systems: Parts I Week 5 Dr. Jocelyn San Pedro School of Information Management & Systems Monash University IMS3001 BUSINESS INTELLIGENCE SYSTEMS SEM 1, 2004 Lecture

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different? (Please write your Roll No. immediately) End-Term Examination Fourth Semester [MCA] MAY-JUNE 2006 Roll No. Paper Code: MCA-202 (ID -44202) Subject: Data Warehousing & Data Mining Time: 3 Hours Maximum

More information

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehouses Chapter 12. Class 10: Data Warehouses 1 Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is

More information

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Improving the Performance of OLAP Queries Using Families of Statistics Trees Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University

More information

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,

More information

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts Enn Õunapuu enn.ounapuu@ttu.ee Content Oveall approach Dimensional model Tabular model Overall approach Data modeling is a discipline that has been practiced

More information

Data Warehousing. Overview

Data Warehousing. Overview Data Warehousing Overview Basic Definitions Normalization Entity Relationship Diagrams (ERDs) Normal Forms Many to Many relationships Warehouse Considerations Dimension Tables Fact Tables Star Schema Snowflake

More information

A Systems Approach to Dimensional Modeling in Data Marts. Joseph M. Firestone, Ph.D. White Paper No. One. March 12, 1997

A Systems Approach to Dimensional Modeling in Data Marts. Joseph M. Firestone, Ph.D. White Paper No. One. March 12, 1997 1 of 8 5/24/02 4:43 PM A Systems Approach to Dimensional Modeling in Data Marts By Joseph M. Firestone, Ph.D. White Paper No. One March 12, 1997 OLAP s Purposes And Dimensional Data Modeling Dimensional

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 07 Terminologies Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Database

More information

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

Data Preprocessing. Data Mining 1

Data Preprocessing. Data Mining 1 Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple, heterogenous sources.

More information

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples. Instructions to the Examiners: 1. May the Examiners not look for exact words from the text book in the Answers. 2. May any valid example be accepted - example may or may not be from the text book 1. Attempt

More information

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?

More information

The Data Organization

The Data Organization C V I T F E P A O TM The Data Organization Best Practices Metadata Dictionary Application Architecture Prepared by Rainer Schoenrank January 2017 Table of Contents 1. INTRODUCTION... 3 1.1 PURPOSE OF THE

More information