Database Technologies for E-Business. Dongmei CUI
|
|
- Helen Singleton
- 5 years ago
- Views:
Transcription
1 Database Technologies for E-Business 15 Database Technologies for E-Business Dongmei CUI Abstract In today's fast-paced business environment, business processes such as designing product, obtaining suppliers, selling, fulfilling orders, and providing services, are performed through the extensive use of computer and communication technologies and computerized data. Under the "e" environment, many companies have been already collecting and refining vast amounts of data. From analyzing the data, they understand customer expectations and optimize operations to unprecedented degrees which lead to success. The companies, called analytics competitor, are competing on analytics and three key attributes they own are found out. This paper aims to investigate database technologies from data perspective, focusing on the requirement for the exploitation of analytic techniques in today's e-business. key words Relational data model, Multidimensional data model, Data warehouse, ETL, Online analytical processing, Data mining, Analytics competitor 1. INTRODUCTION In today's competitive business environment characterized by globalization, short product life cycles, short spans of distribution and diversity of customers' needs, a lot of innovations have been appearing on the business and technology side of organizations (Hammer 2001; Crainer 2000). On the business side we have been seeing business process reengineering (BPR) and balanced scorecard (BSC), the management philosophies of customer relationship and supply chain management, electronic commerce, and business-to-business trading exchanges. On the technology side, there is a move from standalone information systems to large-scale information systems, such as enterprise resource planning (ERP) systems, knowledge management systems (KMS), and different information technology solutions for enterprise application integration (e.g. the customer relationship management (CRM) systems), interorganizational systems (IOS), as well as the standardization like the internet, electronic data interchange (EDI) and so forth. It can say that those technologies are indispensable for such business processes as designing product, obtaining suppliers, selling, fulfilling orders, and providing service, we call it e-business (Alter 2002), where business activities are conducted using computer and communication technologies and computerized data. Under the "e" environment, many companies have been already collecting and refining vast amounts of data in databases / data warehouses. From analyzing the data, they understand customer expectations and optimize operations to unprecedented degrees. Several well-known stories, such as electronic reservations from American Airlines, online ordering from American Hospital, and online book selling from Amazon and so on, have been intensively studied. Those companies are called analytics competitors and the key attributes they have are found out from a recent research (Davenport 2006): they use widely data modeling and optimization technologies; they apply analytic techniques through the whole enterprise; and their senior executive promotes analytics for
2 16 competing. Needless to say, the exploitation of analytic techniques has brought competitive advantage to the companies, which is not just because they can analysis, but also because they should keep competing on analytics. This paper attempts to overview database technologies from data perspective, focusing on the requirement for applying analytic techniques in e-business. The paper is organized in the following way: Firstly, we start with investigating how the data is design in a database / data warehouse in the coming section. Secondly, for the purpose of analyzing the multidimensional data, data warehousing and online analytical processing are discussed in detail under the data warehouse architecture in section 3. Finally, a conclusion is discussed in the end. 2. RELATIONAL AND MULTIDIMENSIONAL DATA MODEL According to the database terminology, a data model is defined as a set of constructs that describe the structure of data, and a set of operations which are used for manipulating the data. In this section, we use the data model term to refer a structure of the data by design, not a structure discovered existing within the data which is usually modeled by entity-relationship (ER) modeling method to represent the semantic data model of entities and their relationships. 2.1 Relational Data Model Generally, the data stored in a database is designed by a tabular way, or organized in a tabular form. A database can contain multiple tables. The schema of a table consists of the table name and a set of columns with the corresponding names; the column names are also referred to as attributes and a row of table is called a tuple. An instance of the schema (namely, the actual table) is called a relation. Each table entry (relational data) in the column for each attribute is a value within a domain of the corresponding attribute. For example, suppose we have a database containing five tables for storing the data regarding the process of supply which might be as the following: Figure 1. Example of a database containing five relations: supply, part, project, inventory and supply. QOH and QTY stand for quantity of house and quantity of supply respectively. Herein, we describe the relational data more formally. Let R be a relation schema, which is a set of attributes {A 1,..., A i,..., A n}, where each attribute A i has a associated domain Dom(A i). A row over the schema R is a mapping t : R i Dom(A i) where t(a i) Dom(A i). A table (relation) over the schema R is a collection of rows over R. A database schema R is a collection of {R 1,..., R i,..., R m} of relation schemas and a database r over the schema R consists of relations over R i for each i = 1,..., m. The database consisting of the tables is called relational database (Codd 1970). Here, each instance of R i, i = 1,..., m can be seen as a basic relation of database. Moreover, new relations can be derived from the basic relations or the pre-derived relations. Figure 2 shows user can operate database through two ways: accessing basic relations (table) directly, or operating the basic relations indirectly through the derived relations. Two kinds of relations can be derived: view and snapshot. View is a set of data satisfying a constraint, or collection of some attributes (columns), or a result obtained by relational operation, which is a dynamic, also called virtual relation as it doesn't shore the data, so that user has to access basic relations (tables) of database according to the definition of view. On the other hand, snapshot is a copy of data of a period in database which is not time-varying, or says, static relation. For example, we can create a snapshot for making monthly sales report on the each final day of every month. As illustrated in Figure 2, users use the interface SQL (Structured Query Language) to access R1 directly, and also they can get the data in R1 through V3. Further, in order to obtain the data from R3, they can operate V3 and V1 since V3 is defined
3 Database Technologies for E-Business 17 over V1 and V1 is derived from R3, where the notation of R1, R2, and R3 are referred to as basic relations, and V1, V2, and V3 as derived relations which could be a view or snapshot. Figure 2. Basic relations and derived relations In the relational data model, each relation is normalized according to the concept of function dependence, and entries of a given relation can crossreference other entries of the same relation or entries of a different relation under the primary-foreign key relationship (Codd 1970). In other words, the primary key (a column or combination of columns) plays a role to be used to reference relation itself or to be referenced by other relations. In contrast, the foreign key is used by combining with the primary key of other relations for relation connecting with other relations. 2.2 Multidimensional Data Model Although data could be modeled in relational modeling, it is more intuitive to think of it in terms of dimensions and facts while aggregating/summarizing data, such as the sales for the product in the month of September over all stores. Information about the dimension values is maintained in the dimension tables, usually, one dimension table is created for one dimension. And information about the facts is organized in a table. Each row contains one fact, which is represented by references to the dimensions and the measures (e.g. sales). Technically, each dimension table holds a primary key, which is also included in the fact table as a foreign key. The combination of all foreign keys becomes the primary key of the fact table. Now we arrive at considering what the schema expressing facts from multiple dimensions would be. Most commonly the multidimensional data model is mapped onto a star schema which consists of a fact table and several dimension tables. The fact table has measure attributes that record the facts and dimension attributes that form a foreign key to the dimension tables. Imagine a picture in which the fact table is in the middle encircled by several dimension tables. Figure 3 gives a concrete example from (Han & Kamber 2006) showing a star schema in the left side. As shown in the right side of Figure 3, the values for a dimension can be grouped in a hierarchical tree structure, so that the analyst can view data at the different levels along the hierarchy of time and Figure 3. A star schema and the hierarchy of dimension
4 18 location dimension. Different aggregation functions can be applied to the lower-level data in order to obtain the data at higher levels along the direction of arrow (see Figure 3). In business analysis, sum, count, min, max, and average are commonly used aggregation functions. For instance, the aggregation function "sum" can be applied to sales values within each month to get the monthly figure to see the total sales figure at monthly level. In order to reduce the number of dimension tables (or say, joins) during query processing, the dimension tables in a star schema are de-normalized. For normalizing each dimension table, the alternative approach is to have a snowflake schema. In the snowflake schema, the dimension tables can be normalized by splitting the information in them in several tables; unlike star schema described above, picture a large fact table in the middle surrounded by dimension tables. But now, each dimension table in turn may be surrounded by a number of smaller tables. As shown in Figure 4, the item dimension table is normalized into supplier dimension table, and location dimension table into city dimension table, that is, the attribute of supplier_key in item dimension table is a foreign key while the attribute of supplier_key in supplier dimension table is the primary key after the item dimension table is normalized. Hence, the same changing is happening between the location dimension table and city dimension table. Figure 4. A snowflake schema 2.3 Representation of Multidimensional Data Model Besides the n-tuple of the tables described in section 2.1 is represented as n-array in the computer (Codd 1970), facts with n dimensions can be organized by an n-dimensional cube stored as an n-dimensional array (Gray, Chauduri and Bosworth el al. 1997) (see Figure 5). All the dimensions together are assumed to uniquely determine the measure, that is, in our example from (Han & Kamber 2006), a particular time, item, location and branch give us a unique sale. Thus, time t, item i, location j and branch k give a cell [t][i][j][k] that records the content of the measures for that sale. Clearly, each cell in the cube corresponds to a row in a fact table, which brings a great problem with sparseness in the representation of n-dimensional cube, because the fact that most combinations of dimensions do not have an associated measure (see Figure 5). For example, all items are not sold at all branches at all times. Several compression versions such as iceberg cube (Fang, Shivakumar and Garcia-Molina el al. 1998) are used in the model to avoid having to store large, mostly empty arrays. The multidimensional data cube technology is influenced by the success of spreadsheet programs in business analysis. However, nowadays, the model based on relational technology is mostly used since it allows leveraging all the know-how and software already existing in relational database systems.
5 Database Technologies for E-Business 19 Figure 5. Lattice of cuboids, making up a 4-D data cube for the dimensions time, item, location, and branch. Each cuboid represents a different degree of summarization (Han & Kamber 2006) 3. DATA WAREHOUSING AND ONLINE ANALYTICAL PROCESSING Data warehousing and online analytical processing have been becoming increasingly important for comprehensive analysis of current and historical data, in order to extract key insights from the vast amounts of data being collected. The purpose of these systems is to provide users a fast analysis so that they can interactively analyze the data to understand business pattern. Generally, relational database described in section 2.1 is designed to mainly maintain data for everyday operations. A bank database, for instance, is a typical example which contains information about accounts and runs everyday under a network of ATM machines. The database is mainly to support transactions which are operations that access and change (e.g. insert or update) the data in the database, called OnLine Transaction Processing (OLTP). OLTP system uses the primary-foreign key relationship to relate tables to each other, and usually is created for a specific use such as the example of bank, as well as order processing, ticket tracking, or personnel file systems. However, after emphasis shifted towards comprehensive analysis of current and historical data, in order to understand customers' expectation and business patterns, data processing is demanded to summarize large amounts of low-level data (see Figure 3) and relate different aspects of business to find interesting correlations. Therefore, database access needs to be based on complex queries, which is called OnLine Analytical Processing (OLAP) (Kimball & Strehlo 1997). In order to process the complex queries efficiently for analyzing vast amounts of data from multiple dimensions, data has to be collected in advance of queries, that is to say, data needs to be extracted from many sources to be collected in a database holding information about subjects spanning the entire organization, called data warehouse, or multiple smallsize databases holding information about a subset of corporation-wide data (e.g. marketing data), called data mart. Figure 6 shows a multi-tier architecture of data warehouse (McFadden 1996; Han & Kimball 2006). Usually, the data stored in a data warehouse (data mart) is copied from multiple OLTP databases to keep history of many data (sets of snapshots). In order to get the data and allow them to continue working normally, it is necessary to watch out for redundant data, missing data, or heterogeneous data. In data warehousing systems, a variety of data extraction and cleaning tools, and utilities of load and refresh are exploited for populating warehouses during the extracting, transforming, and loading process (Immon 2002; Kimball & Ross 2002). Data extraction from "foreign" sources is usually implemented via gateways and standard interfaces.
6 20 Figure 6. Data warehouse: a multi-tier architecture Not surprisingly, there is a high probability of errors and anomalies in the data, since large volumes of data from multiple sources are involved. For example, data cleaning is processed several tasks: filling in missing entries, identifying outliers and smooth out noisy data (e.g. incorrect attribute values: random error or variance in a measured variable), correcting inconsistent data (e.g. inconsistent value assignments, inconsistent field length, inconsistent descriptions) and resolving redundancy caused by data integration (e.g. schema integration: A.cust-id = B.cust-#). The discrepancy of data is usually detected through checking field overloading, checking uniqueness rule, consecutive rule and null rule, using metadata such as domain, range, or dependency, as well as applying some tools: Data scrubbing tools use simple domain knowledge (e.g. postal addresses, spell-check) to detect and correct the data. Parsing and fuzzy matching techniques are often exploited to scrub the data from multiple sources. Data auditing tools scan the data and discover rules and relationships to detect violators, for example, analyzing correlation and clustering to find outliers. Thus, such tools may be considered variants of data mining tools, for instance, the tool may discover a suspicious pattern based on statistical analysis that a certain car dealer has never received any complaints. For the migration and integration of data, data migration tools and ETL (extraction/ transformation/loading) tools fall in this category: data migration tools allow simple transformation rules to be specified, for example, replace the string "gender" by "sex", while ETL tools provide users a graphical user interface to specify transformation. After extracting, cleaning and transforming data, typically, batch load utilities are used for populating the warehouse. Several processes are required: checking integrity constraints; sorting; creating the derived tables stored in the warehouse by summarization, aggregation and other computation; building indices and other access paths; and partitioning data to multiple target storage areas. Furthermore, a load utility must allow the system administrator to monitor status, to cancel, suspend and resume a load. If a failure occurs during the load, the loading process can be restart from the last checkpoint by using periodic checkpoints. In practice, pipelined and partitioned parallelism are typically exploited to prevent loads taking a very long time, for example, sequential loads may take weeks and months for loading a terabyte of data. However, even using parallelism, loading process may still take too long time. Therefore, incremental loading can be used during refresh, in order to reduce the volume of data that has to be incorporated into the warehouse, in which only the updated tuples are
7 Database Technologies for E-Business 21 inserted. Refreshing a warehouse consists in propagating updates on source data to correspondingly update the data stored in the warehouse. Usually, the warehouse is refreshed periodically (e.g., daily or weekly). The refresh policy is set by the warehouse administrator depending on user needs and traffic and so on. Most contemporary database systems provide replication servers that support incremental techniques for propagating updates from a primary database to one or more replicas. Such replication servers can be used to incrementally refresh a warehouse when the sources change. The data shored in data warehouse is modeled in multidimensional data model as described in section 2.2. How to compute and organize the data cube is important process in OLAP. The multidimensional data can be stored and organized in different ways. In the OLAP engine tier shown in Figure 6, there are two contrasting approaches called relational OLAP (ROLAP) and multidimensional OLAP (MOLAP). In a ROLAP system, the data is stored in relational tables and the analytical engine is built on the top of relation database system through standard SQL interface to access the multidimensional data in the tables which are commonly mapped onto a star schema or a snowflake schema. On the other hand, the data in MOLAP systems is stored in a specialized form such as multidimensional arrays described in section 2.3. Since ROLAP uses the well-developed relational database technology (e.g. query processing and optimization), it can coexist with other data sources based on relational database technology and dose not need any specialized storage mechanisms, whereas MOLAP computes and organizes the data cube in a n-dimensional array which could lead to a fast multidimensional analysis. The benefits of both can combined in Hybird OLAP (HOLAP), for example, the Microsoft SQL Server 2000 supports a HOLAP server, which can store large volumes of detail data in a relational database, while aggregations are kept in a separate MOLAP store. According to recent reports from vendors (Ault 2003; Oracle presentation 2005), there are two main moves among them: building a specialized multidimensional engine and attempting to push OLAP functionality into relational databases. At the front-end of the data warehouse architecture illustrated in Figure 6, users use the front-end tools to make complex queries, modify information in a report, swapping between aggregated and detail data, select part of the data, and so forth through OLAP operations: explore the multidimensional data cube by moving up the dimension hierarchy (roll up), moving down (drill down), restricting to a dimension value (slice), selecting an aggregated sub-space (dice), and crossing tabulation (pivot). Alike online analytical processing, data mining is one of most important approaches for multidimensional analysis in data warehouses. Data mining is the extraction of interesting, such as nontrivial, implicit, previously unknown, and potentially useful information or patterns from data in large databases (Fayad, Piatetsky-Shiapiro, Smyth, & Uthurusamy 1996; Han & Kamber 2006), trying to generate such a hypothesis by uncovering hidden patterns. Motivated by the popularity of OLAP technology, Han developed an online analytical mining (OLAM) mechanism to integrate OLAP with multidimensional data mining (Han 1997; Han & Kamber 2006). OLAM provides facilities for data mining on different subsets of data and at different level of abstraction by drilling, pivoting, filtering, dicing, and slicing on a data cube. This can greatly enhance the power and flexibility of exploratory data mining together with visualization tools (Aggarwal 2002). 4. CONCLUSION Since the advent of information technology, businesses have been collecting vast amounts of data about their daily transactions, refining the system that produce transaction data, making data from multiple sources available in warehouses, selecting and implementing analytic tools and assembling the hardware and communication environment. From data perspective, we discussed database technologies associated with the exploitation of analytic techniques:
8 22 multidimensional data modeling, data warehousing and online analytical processing, which are indispensable technological demand for being an analytics competitor. The purpose of these systems is to provide users a fast analysis so that they can interactively analyze the data to understand business pattern such as customer behavior, product movement, employee performance, and financial reactions. In order to build such a system, there are a lot of challenges including data modeling, schema design, loading, maintenance, query processing and so on. For ease to use, simpler and more deployment, and optimal value, a trend has been appearing that data collection, storage, processing, and other issues specific to analytics are incorporated into overall system design. REFERENCES Aggarwal, Charu C. (2002) Towards Effective and Interpretable Data Mining by Visual Interaction. SIGMOD Explorations, Vol.3 Issue 2 pp.11/22 Alter Steven (2002) Information Systems-The Foundation of E-Business, Fourth Edition, Prentice Hall, pp.3/35 Ault Mike (2003) Oracle Data Warehouse Management- Secrets of Oracle Data Warehousing, Rampant TechPress Codd, E. F. (1970) A Relational Model of Data for Large Shared Data Banks, Communication of ACM, Vol. 13, No. 6, June Crainer Stuart (2000) The Management Century A Critical Review of 20th Century Thought & Practice, Booz Allen & Hamilton Inc. Japanese Translation pp.240/296 Davenport, Thomas H. (2006) Competing on analytics, Harvard Business Review, Jan. Fang, M., Shivakumar, H., Garcia-Molina, F., Motwani, R., and Ullman, J.D. (1998) Computing iceberg queries efficiently, Proceedings of Very Large Data Bases, pp.299/310, New York, Aug. Fayad, U., Piatetsky-Shiapiro, G., Smyth, P., and Uthurusamy, R. (1996) Advances in Knowledge Discovery and Data mining, Menlo Park, CA: AAAI Press Giudici, P. (2003) Applied Data Mining Statistical Methods for Business and Industry, England, Wiley & Sons Gray, J., Chaudhuri, S., Bosworth, A., Layman A., Reichart, D., VenKatrao, M., Pellow, F., and Pirahesh, H. (1997) Data Cube: A relational aggregation operator generalizing group-by, cross-tab and sub-total, Data Mining and Knowledge Discovery, No.1 pp.29/54 Hammer Michael (2001) The Agenda What Every Business Must Do to Dominate The Decade, Three River Press Han, J. W. and Kamber, M. (2006) Data Mining: Concepts and Techniques, Morgan Kaufmann Publisher Han, J. (1997) OLAP mining: An Integration of OLAP with Data Mining, Proceedings of the 1997 IFIP Conference on Data Semantics, Oct. IDG Japan (2004) Business Innovation Powered by Oracle E-Business Suite, ISBN Immon, W. H. (2002) Building the Data Warehouse (3 rd Ed.), New York, Wiley & Sons Kimball, R. and Ross, M. (2002) The Data Warehouse Toolkit (2 nd Ed.), New York, Wiley & Sons Kimball, R. and Strehlo, K. (1995) Why Decision Support Fails and How to Fix It, SIGMOD Record, 24(3) pp.92/97 Kinghtsbridge (2005) Top 10 Trends in Business Intelligence and Data Wareshousing for 2005, White Paper, Kinghtsbridge Solutions LLC, Jan. McFadden, Fred R. (1996) Data Warehouse for EIS: Some Issues and Impacts, Proceedings of the Hawaii International Conference on Systems Sciences Oracle Presentation (2005) Oracle Database 10g Release 2-The Exploitation of Data Warehouse, Oracle Corporation
ETL and OLAP Systems
ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationDATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY
DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationDatabase design View Access patterns Need for separate data warehouse:- A multidimensional data model:-
UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to
More informationData Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A
Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business
More informationData Warehousing and Decision Support
Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical
More informationThis tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.
About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This
More informationA Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective
A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India
More informationData Warehousing and Decision Support
Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business
More informationcollection of data that is used primarily in organizational decision making.
Data Warehousing A data warehouse is a special purpose database. Classic databases are generally used to model some enterprise. Most often they are used to support transactions, a process that is referred
More informationDATA MINING TRANSACTION
DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is
More informationData Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 22 Table of contents 1 Introduction 2 Data warehousing
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationREPORTING AND QUERY TOOLS AND APPLICATIONS
Tool Categories: REPORTING AND QUERY TOOLS AND APPLICATIONS There are five categories of decision support tools Reporting Managed query Executive information system OLAP Data Mining Reporting Tools Production
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationCT75 DATA WAREHOUSING AND DATA MINING DEC 2015
Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data
More informationFig 1.2: Relationship between DW, ODS and OLTP Systems
1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions
More informationCHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI
CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)
More informationIT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS
PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 3
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights
More informationOLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube
OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationEvolution of Database Systems
Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second
More informationNovel Materialized View Selection in a Multidimensional Database
Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/
More informationData Warehousing and OLAP
Data Warehousing and OLAP INFO 330 Slides courtesy of Mirek Riedewald Motivation Large retailer Several databases: inventory, personnel, sales etc. High volume of updates Management requirements Efficient
More informationData Warehousing and OLAP Technologies for Decision-Making Process
Data Warehousing and OLAP Technologies for Decision-Making Process Hiren H Darji Asst. Prof in Anand Institute of Information Science,Anand Abstract Data warehousing and on-line analytical processing (OLAP)
More informationAdnan YAZICI Computer Engineering Department
Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection
More informationCS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)
CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm
More informationAn Overview of Data Warehousing and OLAP Technology
An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationData Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini
Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 669-674 Research India Publications http://www.ripublication.com/aeee.htm Data Warehousing Ritham Vashisht,
More informationWhat is a Data Warehouse?
What is a Data Warehouse? COMP 465 Data Mining Data Warehousing Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Defined in many different ways,
More informationData warehouses Decision support The multidimensional model OLAP queries
Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing
More informationCHAPTER 3 Implementation of Data warehouse in Data Mining
CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected
More informationA Multi-Dimensional Data Model
A Multi-Dimensional Data Model A Data Warehouse is based on a Multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in
More informationRocky Mountain Technology Ventures
Rocky Mountain Technology Ventures Comparing and Contrasting Online Analytical Processing (OLAP) and Online Transactional Processing (OLTP) Architectures 3/19/2006 Introduction One of the most important
More informationDecision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1
Decision Support Chapter 25 CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support
More informationData Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa
ICS 421 Spring 2010 Data Warehousing 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/30/2010 Lipyeow Lim -- University of Hawaii at Manoa 1 Data Warehousing
More informationSummary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse
Principles of Knowledge Discovery in bases Fall 1999 Chapter 2: Warehousing and Dr. Osmar R. Zaïane University of Alberta Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University
More informationBasics of Dimensional Modeling
Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension
More informationData Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke
Data Warehouses Yanlei Diao Slides Courtesy of R. Ramakrishnan and J. Gehrke Introduction v In the late 80s and early 90s, companies began to use their DBMSs for complex, interactive, exploratory analysis
More informationOLAP Introduction and Overview
1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata
More informationWKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems
Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring
More informationData Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction 2 Data warehousing
More informationData Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 31 Table of contents 1 Introduction 2 Data warehousing
More informationChapter 4, Data Warehouse and OLAP Operations
CSI 4352, Introduction to Data Mining Chapter 4, Data Warehouse and OLAP Operations Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining
More informationData warehousing in telecom Industry
Data warehousing in telecom Industry Dr. Sanjay Srivastava, Kaushal Srivastava, Avinash Pandey, Akhil Sharma Abstract: Data Warehouse is termed as the storage for the large heterogeneous data collected
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More informationDecision Support Systems aka Analytical Systems
Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis
More informationQ1) Describe business intelligence system development phases? (6 marks)
BUISINESS ANALYTICS AND INTELLIGENCE SOLVED QUESTIONS Q1) Describe business intelligence system development phases? (6 marks) The 4 phases of BI system development are as follow: Analysis phase Design
More informationCHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)
CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination
More informationTeradata Aggregate Designer
Data Warehousing Teradata Aggregate Designer By: Sam Tawfik Product Marketing Manager Teradata Corporation Table of Contents Executive Summary 2 Introduction 3 Problem Statement 3 Implications of MOLAP
More informationIntroduction to Data Warehousing
ICS 321 Spring 2012 Introduction to Data Warehousing Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/23/2012 Lipyeow Lim -- University of Hawaii at Manoa
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationAcknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.
MTAT.03.183 Data Mining Week 7: Online Analytical Processing and Data Warehouses Marlon Dumas marlon.dumas ät ut. ee Acknowledgment This slide deck is a mashup of the following publicly available slide
More informationConstructing Object Oriented Class for extracting and using data from data cube
Constructing Object Oriented Class for extracting and using data from data cube Antoaneta Ivanova Abstract: The goal of this article is to depict Object Oriented Conceptual Model Data Cube using it as
More informationFull file at
Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits
More informationDecision Support, Data Warehousing, and OLAP
Decision Support, Data Warehousing, and OLAP : Contents Terminology : OLAP vs. OLTP Data Warehousing Architecture Technologies References 1 Decision Support and OLAP Information technology to help knowledge
More informationGUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV
GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand
More informationLectures for the course: Data Warehousing and Data Mining (IT 60107)
Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline
More informationThe University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory
Warehousing Outline Andrew Kusiak 2139 Seamans Center Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934 Introduction warehousing concepts Relationship
More informationDATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE
DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE Dr. Kirti Singh, Librarian, SSD Women s Institute of Technology, Bathinda Abstract: Major libraries have large collections and circulation. Managing
More informationChapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model
Chapter 3 The Multidimensional Model: Basic Concepts Introduction Multidimensional Model Multidimensional concepts Star Schema Representation Conceptual modeling using ER, UML Conceptual modeling using
More informationDta Mining and Data Warehousing
CSCI6405 Fall 2003 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: q.gao@dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:
More informationThe strategic advantage of OLAP and multidimensional analysis
IBM Software Business Analytics Cognos Enterprise The strategic advantage of OLAP and multidimensional analysis 2 The strategic advantage of OLAP and multidimensional analysis Overview Online analytical
More informationMICROSOFT BUSINESS INTELLIGENCE
SSIS MICROSOFT BUSINESS INTELLIGENCE 1) Introduction to Integration Services Defining sql server integration services Exploring the need for migrating diverse Data the role of business intelligence (bi)
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic
More informationUNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?
(Please write your Roll No. immediately) End-Term Examination Fourth Semester [MCA] MAY-JUNE 2006 Roll No. Paper Code: MCA-202 (ID -44202) Subject: Data Warehousing & Data Mining Note: Question no. 1 is
More informationThis proposed research is inspired by the work of Mr Jagdish Sadhave 2009, who used
Literature Review This proposed research is inspired by the work of Mr Jagdish Sadhave 2009, who used the technology of Data Mining and Knowledge Discovery in Databases to build Examination Data Warehouse
More informationDeccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus
Overview: Analysis Services enables you to analyze large quantities of data. With it, you can design, create, and manage multidimensional structures that contain detail and aggregated data from multiple
More informationTable of Contents. Knowledge Management Data Warehouses and Data Mining. Introduction and Motivation
Table of Contents Knowledge Management Data Warehouses and Data Mining Dr. Michael Hahsler Dept. of Information Processing Vienna Univ. of Economics and BA 11. December 2001
More informationKnowledge Management Data Warehouses and Data Mining
Knowledge Management Data Warehouses and Data Mining Dr. Michael Hahsler Dept. of Information Processing Vienna Univ. of Economics and BA 11. December 2001 1 Table of Contents
More information1. Inroduction to Data Mininig
1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the
More informationChapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationThe Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing
The Evolution of Data Warehousing Data Warehousing Concepts Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient and cost-effective
More informationCS 1655 / Spring 2013! Secure Data Management and Web Applications
CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered
More informationOverview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?
Introduction to Data Warehousing and Business Intelligence Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction A tour of the coming DW lectures DW Applications Loosely
More informationCall: SAS BI Course Content:35-40hours
SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio
More informationData warehouse architecture consists of the following interconnected layers:
Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More informationData Analysis and Data Science
Data Analysis and Data Science CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/29/15 Agenda Check-in Online Analytical Processing Data Science Homework 8 Check-in Online Analytical
More informationData-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives
Data-Driven Driven Business Intelligence Systems: Parts I Week 5 Dr. Jocelyn San Pedro School of Information Management & Systems Monash University IMS3001 BUSINESS INTELLIGENCE SYSTEMS SEM 1, 2004 Lecture
More informationDATA WAREHOUING UNIT I
BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationUNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?
(Please write your Roll No. immediately) End-Term Examination Fourth Semester [MCA] MAY-JUNE 2006 Roll No. Paper Code: MCA-202 (ID -44202) Subject: Data Warehousing & Data Mining Time: 3 Hours Maximum
More informationData Warehouses Chapter 12. Class 10: Data Warehouses 1
Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is
More informationImproving the Performance of OLAP Queries Using Families of Statistics Trees
Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University
More informationData Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20
Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,
More informationIDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu
IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts Enn Õunapuu enn.ounapuu@ttu.ee Content Oveall approach Dimensional model Tabular model Overall approach Data modeling is a discipline that has been practiced
More informationData Warehousing. Overview
Data Warehousing Overview Basic Definitions Normalization Entity Relationship Diagrams (ERDs) Normal Forms Many to Many relationships Warehouse Considerations Dimension Tables Fact Tables Star Schema Snowflake
More informationA Systems Approach to Dimensional Modeling in Data Marts. Joseph M. Firestone, Ph.D. White Paper No. One. March 12, 1997
1 of 8 5/24/02 4:43 PM A Systems Approach to Dimensional Modeling in Data Marts By Joseph M. Firestone, Ph.D. White Paper No. One March 12, 1997 OLAP s Purposes And Dimensional Data Modeling Dimensional
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 07 Terminologies Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Database
More informationGuide Users along Information Pathways and Surf through the Data
Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise
More informationData Preprocessing. Data Mining 1
Data Preprocessing Today s real-world databases are highly susceptible to noisy, missing, and inconsistent data due to their typically huge size and their likely origin from multiple, heterogenous sources.
More information1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.
Instructions to the Examiners: 1. May the Examiners not look for exact words from the text book in the Answers. 2. May any valid example be accepted - example may or may not be from the text book 1. Attempt
More informationData Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?
More informationThe Data Organization
C V I T F E P A O TM The Data Organization Best Practices Metadata Dictionary Application Architecture Prepared by Rainer Schoenrank January 2017 Table of Contents 1. INTRODUCTION... 3 1.1 PURPOSE OF THE
More information