Warehousing Outline Andrew Kusiak 2139 Seamans Center Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934 Introduction warehousing concepts Relationship to data mining Applications Definition (1/3) warehouse = An enabled database designed to support large volume of data at high performance level, usability, and manageability. (2/3) A data warehouse is a copy of transaction data specifically structured for querying and reporting. The form of the stored data has nothing to do with whether something is a data warehouse. warehousing is not necessarily for the needs of "decision makers" or used in the process of decision making. Ralph Kimball (3/3) Easy data access Quick data access Low cost data access Accurate data access Warehousing (1/2) warehousing systems, for the most part, store historical data that have been generated in internal transaction processing systems. This is a small part of the universe of data available to manage a business. Sometimes this part has limited value. warehousing systems can complicate business processes. warehousing can have a learning curve that may be too long for impatient firms. 1
DW Architecture Marts Automotive Analogy Trans DB Individual parts Exploration Subassemblies, e.g., gear box Trans DB Mart Car Legacy Applications Integration/ Transformation Layer Warehousing (2/2) warehousing can become an exercise in data for the sake of the data. warehousing systems can require a great deal of "maintenance" which many organizations cannot or will not support. Sometimes the cost to capture data, clean it up, and deliver it in a format and time frame that is useful for the end users is too much of a cost to bear. http://www.dwinfocenter.org/ Successful (1/3) From day one establish that warehousing is a joint user/builder project. Establish that maintaining data quality will be an ongoing joint user/builder responsibility. Train the users one step at a time. Train the users about the data stored in the data warehouse. Consider doing a high level corporate data model / data warehouse architecture "exercise" in three weeks. Implement a user accessible automated directory to information stored in the warehouse. Successful (2/3) Once you know what raw data you want to feed into the data, request that data. Determine a plan to test the integrity of the data in the warehouse. From the start get warehouse users in the habit of testing complex queries. Coordinate system roll-out with network administration personnel. Have a good grasp of desktop databases and spreadsheets. Successful (3/3) Be prepared to support beginning users immediately and at any time. Maintain the audit trail to the feeder systems. Market and sell your data warehousing systems. 2
Decision Support System A decision support system or tool is one specifically designed to allow business end users to perform computer generated analyses of data on their own. Designed for performing analytical tasks using a variety of data. Supports a relatively small number of users with relatively long interaction loads. Its usage is read-intensive. Its content is periodically updated, mostly through additions. It contains a relatively few large tables. Each query normally produces a large result set. Current detail data = acquired directly from the transactional database, frequently representing an entire application (e.g., enterprise). Old detail data = Previously stored current detail data allowing for analysis of trends. mart = An implementation of the data warehouse with a limited scope of data. A data warehouse may be a collection of gradually constructed data marts. Summarized data = aggregated for executive reporting, trend analysis, and decion-making. Drill-down = A capability of performing data analysis in a top-down fashion. The summary data can be decomposed into current and old detail data. Metadata = ( about data) A description of all data items, their location, sources, structure, content, end-user views, and so on. Tabular form reporting. Information mapping, e.g., mapping spatial data. Complex queries and sophisticated criteria search. Ranking. Multivariable analysis. series analysis. visualization, graphing, charting, and pivoting. Complex textual search. Advanced statistical analysis. Trend analysis.. Pattern and associations discovery. OLAP tools mining tools 3
OLAP OLTP (On-line transaction processing) = processing in traditional databases that are also called transactional databases. OLAP (On-line analytical processing) = analysis for maximum data usability. Mining mining = In -depth processing of data leading to discovery of non-obvious relationships. Warehousing Quick location of the right information. Presentation of information in the needed form. Testing of hypotheses. Knowledge discovery. Sharing the analysis results. Warehousing Improved product inventory turnover. Improved selection of targeted markets reduces the product introduction cost. More effective decision-making. More effective business intelligence. Enhanced asset and liability management due to the big picture view provided by the data warehouse. Warehousing Improved productivity due to the single source of information. Reduced redundancy in information processing. Enhanced customer relationship. Enabler of business process reengineering and breakthrough idea generation by providing useful insights into the processes. Relational Table Product Market No. of Units P1 Chicago Q1 1000 P2 Chicago Q2 1200 P3 Chicago Q3 1500 P4 Chicago Q4 2000 P5 Atlanta Q1 1400 P6 Atlanta Q2 1600 P7 Atlanta Q3 1100 P8 Atlanta Q4 1900 P9 Paris Q1 1300 P10 Paris Q2 1000 P11 Paris Q3 1900 P12 Paris Q3 1400 4
Cube Cube P1 Paris Atlanta Chicago 1000 1200 1500 2000 Markets Cuboids [1D cuboid, 2D cuboid, 3D cuboid, etc.] [3D cuboid = data cube] Products Lattice of cuboids Q1 Q2 Q3 Q4 Lattice of cuboids All 0D (apex) cuboid Market Product 1D cuboid Product Product, 2D cuboid Product, 3D cuboid 5