ETL and OLAP Systems
|
|
- Phebe Nash
- 6 years ago
- Views:
Transcription
1 ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester Academic year 2017/18 (winter course) 1 / 40
2 Review of the Previous Lecture Mining of massive datasets. Evolution of database systems. Dimensional modeling: Three goals of the logical design of data warehouse: simplicity, expressiveness and performance. The most popular conceptual schema: star schema. Designing data warehouses is not an easy task... 2 / 40
3 Outline 1 Motivation 2 ETL 3 OLAP Systems 4 Summary 3 / 40
4 Outline 1 Motivation 2 ETL 3 OLAP Systems 4 Summary 4 / 40
5 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. 5 / 40
6 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: 5 / 40
7 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. 5 / 40
8 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. 5 / 40
9 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! 5 / 40
10 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: 5 / 40
11 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. 5 / 40
12 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. Additional data structures: sorting, indexing, summarizing, cubes. 5 / 40
13 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. Additional data structures: sorting, indexing, summarizing, cubes. Refreshing of data structures. 5 / 40
14 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. Additional data structures: sorting, indexing, summarizing, cubes. Refreshing of data structures. Querying multidimensional data: 5 / 40
15 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. Additional data structures: sorting, indexing, summarizing, cubes. Refreshing of data structures. Querying multidimensional data: SQL extensions, 5 / 40
16 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. Additional data structures: sorting, indexing, summarizing, cubes. Refreshing of data structures. Querying multidimensional data: SQL extensions, Multidimensional expressions (MDX), 5 / 40
17 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. Additional data structures: sorting, indexing, summarizing, cubes. Refreshing of data structures. Querying multidimensional data: SQL extensions, Multidimensional expressions (MDX), Map-reduce-based language. 5 / 40
18 Outline 1 Motivation 2 ETL 3 OLAP Systems 4 Summary 6 / 40
19 ETL ETL = Extraction, Transformation, and Load Extraction of data from source systems, Transformation and integration of data into a useful format for analysis, Load of data into the warehouse and build of additional structures. Refreshment of data warehouse is closely related to ETL process. The ETL process is described by metadata stored in data warehouse. Architecture of data warehousing: Data sources Data staging area Data warehouse 7 / 40
20 ETL 8 / 40
21 Tools for ETL Data extraction from heterogeneous data sources. Data transformation, integration, and cleansing. Data quality analysis and control. Data loading. High-speed data transfer. Data refreshment. Managing and analyzing metadata. Examples of ETL tools: MS SQL Server Integration Services(SSIS), IBM Infosphere DataStage, SAS ETL Studio, Oracle Warehouse Builder, Oracle Data Integrator, Business Objects Data Integrator, Pentaho Data Integration. 9 / 40
22 Tools for ETL MS SQL Server Integration Services(SSIS) 10 / 40
23 Tools for ETL MS SQL Server Integration Services(SSIS) 11 / 40
24 Data extraction Data warehouse needs extraction of data from different external data sources: 12 / 40
25 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), 12 / 40
26 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), 12 / 40
27 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), 12 / 40
28 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, 12 / 40
29 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, and other documents (.txt,.doc, XML, WWW). 12 / 40
30 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, and other documents (.txt,.doc, XML, WWW). Access to data sources can be difficult: 12 / 40
31 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, and other documents (.txt,.doc, XML, WWW). Access to data sources can be difficult: Data sources are often operational systems, providing the lowest level of data. 12 / 40
32 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, and other documents (.txt,.doc, XML, WWW). Access to data sources can be difficult: Data sources are often operational systems, providing the lowest level of data. Data sources are designed for operational use, not for decision support, and the data reflect this fact. 12 / 40
33 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, and other documents (.txt,.doc, XML, WWW). Access to data sources can be difficult: Data sources are often operational systems, providing the lowest level of data. Data sources are designed for operational use, not for decision support, and the data reflect this fact. Multiple data sources are often from different systems, run on a wide range of hardware and much of the software is built in-house or highly customized. 12 / 40
34 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, and other documents (.txt,.doc, XML, WWW). Access to data sources can be difficult: Data sources are often operational systems, providing the lowest level of data. Data sources are designed for operational use, not for decision support, and the data reflect this fact. Multiple data sources are often from different systems, run on a wide range of hardware and much of the software is built in-house or highly customized. Data sources can be designed using different logical structures. 12 / 40
35 Data extraction Identification of concepts and objects does not have to be easy. 13 / 40
36 Data extraction Identification of concepts and objects does not have to be easy. Example: Extract information about sales from the source system. 13 / 40
37 Data extraction Identification of concepts and objects does not have to be easy. Example: Extract information about sales from the source system. What is meant by the term sale? A sale has occurred when 13 / 40
38 Data extraction Identification of concepts and objects does not have to be easy. Example: Extract information about sales from the source system. What is meant by the term sale? A sale has occurred when 1 the order has been received by a customer, 13 / 40
39 Data extraction Identification of concepts and objects does not have to be easy. Example: Extract information about sales from the source system. What is meant by the term sale? A sale has occurred when 1 the order has been received by a customer, 2 the order is sent to the customer, 13 / 40
40 Data extraction Identification of concepts and objects does not have to be easy. Example: Extract information about sales from the source system. What is meant by the term sale? A sale has occurred when 1 the order has been received by a customer, 2 the order is sent to the customer, 3 the invoice has been raised against the order. 13 / 40
41 Data extraction Identification of concepts and objects does not have to be easy. Example: Extract information about sales from the source system. What is meant by the term sale? A sale has occurred when 1 the order has been received by a customer, 2 the order is sent to the customer, 3 the invoice has been raised against the order. It is a common problem that there is no table SALES in the operational databases; some other tables can exist like ORDER with an attribute ORDER STATUS. 13 / 40
42 Conflicts and dirty data Different logical models of operational sources, 14 / 40
43 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), 14 / 40
44 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), 14 / 40
45 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), 14 / 40
46 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), Different field lengths (address stored by using 20 or 50 chars), 14 / 40
47 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), Different field lengths (address stored by using 20 or 50 chars), Different naming conventions: homonyms and synonyms, 14 / 40
48 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), Different field lengths (address stored by using 20 or 50 chars), Different naming conventions: homonyms and synonyms, Inconsistent information concerning the same object, 14 / 40
49 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), Different field lengths (address stored by using 20 or 50 chars), Different naming conventions: homonyms and synonyms, Inconsistent information concerning the same object, Information concerning the same object, but indicated by different keys, 14 / 40
50 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), Different field lengths (address stored by using 20 or 50 chars), Different naming conventions: homonyms and synonyms, Inconsistent information concerning the same object, Information concerning the same object, but indicated by different keys, Missing values, 14 / 40
51 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), Different field lengths (address stored by using 20 or 50 chars), Different naming conventions: homonyms and synonyms, Inconsistent information concerning the same object, Information concerning the same object, but indicated by different keys, Missing values, / 40
52 Deduplication and householding Deduplication ensures that one accurate record exists for each business entity represented in a database, 15 / 40
53 Deduplication and householding Deduplication ensures that one accurate record exists for each business entity represented in a database, Householding is the technique of grouping individual customers by the household or organization of which they are a member; this technique has some interesting marketing implications, and can also support cost-saving measures of direct advertising. 15 / 40
54 Deduplication and householding Deduplication ensures that one accurate record exists for each business entity represented in a database, Householding is the technique of grouping individual customers by the household or organization of which they are a member; this technique has some interesting marketing implications, and can also support cost-saving measures of direct advertising. Example: 15 / 40
55 Deduplication and householding Deduplication ensures that one accurate record exists for each business entity represented in a database, Householding is the technique of grouping individual customers by the household or organization of which they are a member; this technique has some interesting marketing implications, and can also support cost-saving measures of direct advertising. Example: Consider the following rows in a database: Tim Jones 123 Main Street Marlboro MA T. Jones 123 Main St. Marlborogh MA Timothy Jones 321 Maine Street Marlborog AM Jones, Timothy 123 Maine Ave Marlborough MA / 40
56 Deduplication and householding Deduplication ensures that one accurate record exists for each business entity represented in a database, Householding is the technique of grouping individual customers by the household or organization of which they are a member; this technique has some interesting marketing implications, and can also support cost-saving measures of direct advertising. Example: Consider the following rows in a database: Tim Jones 123 Main Street Marlboro MA T. Jones 123 Main St. Marlborogh MA Timothy Jones 321 Maine Street Marlborog AM Jones, Timothy 123 Maine Ave Marlborough MA The sales for around $500 are counted for each tuple. 15 / 40
57 Deduplication and householding Deduplication ensures that one accurate record exists for each business entity represented in a database, Householding is the technique of grouping individual customers by the household or organization of which they are a member; this technique has some interesting marketing implications, and can also support cost-saving measures of direct advertising. Example: Consider the following rows in a database: Tim Jones 123 Main Street Marlboro MA T. Jones 123 Main St. Marlborogh MA Timothy Jones 321 Maine Street Marlborog AM Jones, Timothy 123 Maine Ave Marlborough MA The sales for around $500 are counted for each tuple. Is it the same person? 15 / 40
58 Load of data After extracting, cleaning and transforming, data must be loaded into the warehouse. 16 / 40
59 Load of data After extracting, cleaning and transforming, data must be loaded into the warehouse. Loading the warehouse includes some other processing tasks: checking integrity constraints, sorting, summarizing, creating indexes, etc. 16 / 40
60 Load of data After extracting, cleaning and transforming, data must be loaded into the warehouse. Loading the warehouse includes some other processing tasks: checking integrity constraints, sorting, summarizing, creating indexes, etc. Batch (bulk) load utilities are used for loading. 16 / 40
61 Load of data After extracting, cleaning and transforming, data must be loaded into the warehouse. Loading the warehouse includes some other processing tasks: checking integrity constraints, sorting, summarizing, creating indexes, etc. Batch (bulk) load utilities are used for loading. A load utility must allow the administrator to monitor status, to cancel, suspend, and resume a load, and to restart after failure with no loss of data integrity. 16 / 40
62 Data warehouse refreshment Refreshing a warehouse means propagating updates on source data to the data stored in the warehouse. 17 / 40
63 Data warehouse refreshment Refreshing a warehouse means propagating updates on source data to the data stored in the warehouse. Follows the same structure as ETL process. 17 / 40
64 Data warehouse refreshment Refreshing a warehouse means propagating updates on source data to the data stored in the warehouse. Follows the same structure as ETL process. Several constraints: accessibility of data sources, size of data, size of data warehouse, frequency of data refreshing, degradation of performance of operational systems. 17 / 40
65 Data warehouse refreshment Detect changes in external data sources: 18 / 40
66 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. 18 / 40
67 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources 18 / 40
68 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources 18 / 40
69 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources 18 / 40
70 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources Extract the changes and integrate into the warehouse. 18 / 40
71 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources Extract the changes and integrate into the warehouse. Update indexes, subaggregates and any other additional data structures. 18 / 40
72 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources Extract the changes and integrate into the warehouse. Update indexes, subaggregates and any other additional data structures. Types of refreshments: 18 / 40
73 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources Extract the changes and integrate into the warehouse. Update indexes, subaggregates and any other additional data structures. Types of refreshments: Periodical refreshment (daily or weekly). 18 / 40
74 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources Extract the changes and integrate into the warehouse. Update indexes, subaggregates and any other additional data structures. Types of refreshments: Periodical refreshment (daily or weekly). Immediate refreshment. 18 / 40
75 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources Extract the changes and integrate into the warehouse. Update indexes, subaggregates and any other additional data structures. Types of refreshments: Periodical refreshment (daily or weekly). Immediate refreshment. Determined by usage, types of data source, etc. 18 / 40
76 Outline 1 Motivation 2 ETL 3 OLAP Systems 4 Summary 19 / 40
77 OLAP systems The next step is to provide solutions for querying and reporting multidimensional analytical data. 20 / 40
78 Multidimensional cube The proper data model for multidimensional reporting is the multidimensional one. 21 / 40
79 Operations in multidimensional data model Roll up summarize data along a dimension hierarchy. Drill down go from higher level summary to lower level summary or detailed data. Slice and dice corresponds to selection and projection. Pivot reorient cube. Raking, Time functions, etc. 22 / 40
80 Lattice of cuboids Different degrees of summarizations are presented as a lattice of cuboids. Example for dimensions: time, product, location, supplier {all} {time} {product} {location} {supplier} {time, product} {time, location} {time, supplier} {product, location} {product, supplier} {location, supplier} {time, product, location} {time, product, supplier} {time, location, supplier} location, {product, supplier} {time,product,location,supplier} Using this structure, one can easily show roll up and drill down operations. 23 / 40
81 Total number of cuboids For an n-dimensional data cube, the total number of cuboids that can be generated is: n T = (L i + 1), i=1 where L i is the number of levels associated with dimension i (excluding the virtual top level all since generalizing to all is equivalent to the removal of a dimension). For example, if the cube has 10 dimensions and each dimension has 4 levels, the total number of cuboids that can be generated will be: T = 5 10 = 9, / 40
82 Total number of cuboids Example: Consider a simple database with two dimensions: 25 / 40
83 Total number of cuboids Example: Consider a simple database with two dimensions: Columns in Date dimension: day, month, year Columns in Localization dimension: street, city, country. Without any information about hierarchies, the number of all possible group-bys is 25 / 40
84 Total number of cuboids Example: Consider a simple database with two dimensions: Columns in Date dimension: day, month, year Columns in Localization dimension: street, city, country. Without any information about hierarchies, the number of all possible group-bys is 2 6 : 25 / 40
85 Total number of cuboids Example: Consider a simple database with two dimensions: Columns in Date dimension: day, month, year Columns in Localization dimension: street, city, country. Without any information about hierarchies, the number of all possible group-bys is 2 6 : day street month city year country day, month street, city day, year street, country month, year city, country day, month, year street, city, country 25 / 40
86 Total number of cuboids Example: Consider the same relations but with defined hierarchies: 26 / 40
87 Total number of cuboids Example: Consider the same relations but with defined hierarchies: day month year street city country 26 / 40
88 Total number of cuboids Example: Consider the same relations but with defined hierarchies: day month year street city country Many combinations of columns can be excluded, e.g., group by day, year, street, country. The number of group-bys is then 26 / 40
89 Total number of cuboids Example: Consider the same relations but with defined hierarchies: day month year street city country Many combinations of columns can be excluded, e.g., group by day, year, street, country. The number of group-bys is then 4 2 : 26 / 40
90 Total number of cuboids Example: Consider the same relations but with defined hierarchies: day month year street city country Many combinations of columns can be excluded, e.g., group by day, year, street, country. The number of group-bys is then 4 2 : year country month, year city, country day, month, year street, city, country 26 / 40
91 Three types of aggregate functions distributive: count(), sum(), max(), min(), algebraic: ave(), stddev(), holistic: median(), mode(), rank(). 27 / 40
92 OLAP servers Relational OLAP (ROLAP), Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP). 28 / 40
93 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. 29 / 40
94 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. 29 / 40
95 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: 29 / 40
96 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, 29 / 40
97 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, Materialized views, 29 / 40
98 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, Materialized views, Partitioning, 29 / 40
99 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, Materialized views, Partitioning, Joins, 29 / 40
100 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, Materialized views, Partitioning, Joins, Indexes (join index, bitmaps), 29 / 40
101 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, Materialized views, Partitioning, Joins, Indexes (join index, bitmaps), Query processing. 29 / 40
102 ROLAP Advantages of ROLAP Servers: 30 / 40
103 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, 30 / 40
104 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, 30 / 40
105 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, Sparsity is not a problem (fact tables contain only facts), 30 / 40
106 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, Sparsity is not a problem (fact tables contain only facts), Mature and well-developed technology. 30 / 40
107 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, Sparsity is not a problem (fact tables contain only facts), Mature and well-developed technology. Disadvantage of ROLAP Servers: 30 / 40
108 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, Sparsity is not a problem (fact tables contain only facts), Mature and well-developed technology. Disadvantage of ROLAP Servers: Worse performance than MOLAP, 30 / 40
109 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, Sparsity is not a problem (fact tables contain only facts), Mature and well-developed technology. Disadvantage of ROLAP Servers: Worse performance than MOLAP, Additional data structures and optimization techniques used to improve the performance. 30 / 40
110 MOLAP MOLAP Servers use array-based multidimensional storage engines. 31 / 40
111 MOLAP MOLAP Servers use array-based multidimensional storage engines. Optimization techniques: 31 / 40
112 MOLAP MOLAP Servers use array-based multidimensional storage engines. Optimization techniques: Two-level storage representation: dense cubes are identified and stored as array structures, sparse cubes employ compression techniques, 31 / 40
113 MOLAP MOLAP Servers use array-based multidimensional storage engines. Optimization techniques: Two-level storage representation: dense cubes are identified and stored as array structures, sparse cubes employ compression techniques, Materialized cubes. 31 / 40
114 MOLAP Advantages of MOLAP Servers: 32 / 40
115 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, 32 / 40
116 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. 32 / 40
117 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: 32 / 40
118 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: Scalability problem in the case of larger number of dimensions, 32 / 40
119 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: Scalability problem in the case of larger number of dimensions, Not tailored for sparse data. 32 / 40
120 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: Scalability problem in the case of larger number of dimensions, Not tailored for sparse data. Example: 32 / 40
121 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: Scalability problem in the case of larger number of dimensions, Not tailored for sparse data. Example: Logical model consists of four dimensions: customer, product, location, and day 32 / 40
122 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: Scalability problem in the case of larger number of dimensions, Not tailored for sparse data. Example: Logical model consists of four dimensions: customer, product, location, and day In case of customers, products, locations and days, the data cube will contain cells! 32 / 40
123 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: Scalability problem in the case of larger number of dimensions, Not tailored for sparse data. Example: Logical model consists of four dimensions: customer, product, location, and day In case of customers, products, locations and days, the data cube will contain cells! A huge number of cells is empty: a customer is not able to buy all products in all locations / 40
124 HOLAP HOLAP servers are a hybrid approach that combines ROLAP and MOLAP technology. HOLAP benefits from the greater scalability of ROLAP and the faster computation of MOLAP. 33 / 40
125 Multidimensional queries We need an intuitive way of expressing analytical (multidimensional) queries: 34 / 40
126 Multidimensional queries We need an intuitive way of expressing analytical (multidimensional) queries: Operations like roll up, drill down, slice and dice, pivoting, ranking, time and window functions, etc. 34 / 40
127 Multidimensional queries We need an intuitive way of expressing analytical (multidimensional) queries: Operations like roll up, drill down, slice and dice, pivoting, ranking, time and window functions, etc. Two solutions: 34 / 40
128 Multidimensional queries We need an intuitive way of expressing analytical (multidimensional) queries: Operations like roll up, drill down, slice and dice, pivoting, ranking, time and window functions, etc. Two solutions: Extending SQL, or 34 / 40
129 Multidimensional queries We need an intuitive way of expressing analytical (multidimensional) queries: Operations like roll up, drill down, slice and dice, pivoting, ranking, time and window functions, etc. Two solutions: Extending SQL, or Inventing a new language ( MDX), 34 / 40
130 SQL OLAP extensions in SQL: GROUP BY CUBE, GROUP BY ROLLUP, GROUP BY GROUPING SETS, GROUPING and DECODE/CASE OVER and PARTITION BY, RANK. 35 / 40
131 MDX MDX Multidimensional expressions. 36 / 40
132 MDX MDX Multidimensional expressions. For OLAP queries, MDX is an alternative to SQL: 36 / 40
133 MDX MDX Multidimensional expressions. For OLAP queries, MDX is an alternative to SQL: 36 / 40
134 MDX MDX Multidimensional expressions. For OLAP queries, MDX is an alternative to SQL: Academic year Instructor AVG(Grade) 2011/2 Stefanowski /2 S lowiński /3 Stefanowski /3 S lowiński /4 Stefanowski /4 S lowiński /4 Dembczyński / 40
135 MDX MDX Multidimensional expressions. For OLAP queries, MDX is an alternative to SQL: Academic year Instructor AVG(Grade) 2011/2 Stefanowski /2 S lowiński /3 Stefanowski /3 S lowiński /4 Stefanowski /4 S lowiński /4 Dembczyński 4.8 AVG(Grade) Academic year Name 2011/2 2012/3 2013/4 Stefanowski S lowiński Dembczyński / 40
136 MDX MDX query: SELECT {[Time].[1997],[Time].[1998]} ON COLUMNS, {[Measures].[Sales],[Measures].[Cost]} ON ROWS FROM Warehouse WHERE ([Store].[All].[USA]) Seems to be similar to SQL, but in fact it is quite different! 37 / 40
137 Outline 1 Motivation 2 ETL 3 OLAP Systems 4 Summary 38 / 40
138 Summary ETL process is a strategic element of data warehousing. Main concepts: extraction, transformation and integration, load, data warehouse refreshment and metadata. New emerging technology... OLAP systems: ROLAP, MOLAP and HOLAP. 39 / 40
139 Bibliography J. Han and M. Kamber. Data Mining: Concepts and Techniques (second edition). Morgan Kaufmann Publishers, / 40
Multidimensional Queries
Multidimensional Queries Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationOLAP Systems and Multidimensional Expressions
OLAP Systems and Multidimensional Expressions Krzysztof Dembczyński Institute of Computing Science Laboratory of Intelligent Decision Support Systems Politechnika Poznańska (Poznań University of Technology)
More informationEvolution of Database Systems
Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second
More informationProcessing of Very Large Data
Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationOLAP Introduction and Overview
1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata
More informationDatabase design View Access patterns Need for separate data warehouse:- A multidimensional data model:-
UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to
More informationDATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY
DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data
More informationCall: SAS BI Course Content:35-40hours
SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio
More informationAn Overview of Data Warehousing and OLAP Technology
An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection
More informationLectures for the course: Data Warehousing and Data Mining (IT 60107)
Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline
More informationcollection of data that is used primarily in organizational decision making.
Data Warehousing A data warehouse is a special purpose database. Classic databases are generally used to model some enterprise. Most often they are used to support transactions, a process that is referred
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic
More informationChapter 18: Data Analysis and Mining
Chapter 18: Data Analysis and Mining Database System Concepts See www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP 18.2 Decision
More informationData Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders
Data Warehousing Conclusion Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders Motivation for the Course Database = a piece of software to handle data: Store, maintain, and query Most ideal system
More informationREPORTING AND QUERY TOOLS AND APPLICATIONS
Tool Categories: REPORTING AND QUERY TOOLS AND APPLICATIONS There are five categories of decision support tools Reporting Managed query Executive information system OLAP Data Mining Reporting Tools Production
More informationCT75 DATA WAREHOUSING AND DATA MINING DEC 2015
Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers
More informationChapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives
Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: How business intelligence is a comprehensive framework to support business decision making How operational
More informationIT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS
PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such
More informationWhat is a Data Warehouse?
What is a Data Warehouse? COMP 465 Data Mining Data Warehousing Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Defined in many different ways,
More informationCHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI
CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)
More informationDeccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus
Overview: Analysis Services enables you to analyze large quantities of data. With it, you can design, create, and manage multidimensional structures that contain detail and aggregated data from multiple
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationChapter 4, Data Warehouse and OLAP Operations
CSI 4352, Introduction to Data Mining Chapter 4, Data Warehouse and OLAP Operations Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining
More informationData Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A
Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationBUSINESS INTELLIGENCE. SSAS - SQL Server Analysis Services. Business Informatics Degree
BUSINESS INTELLIGENCE SSAS - SQL Server Analysis Services Business Informatics Degree 2 BI Architecture SSAS: SQL Server Analysis Services 3 It is both an OLAP Server and a Data Mining Server Distinct
More informationOBIEE Performance Improvement Tips and Techniques
OBIEE Performance Improvement Tips and Techniques Vivek Jain, Manager Deloitte Speaker Bio Manager with Deloitte Consulting, Information Management (BI/DW) Skills in OBIEE, OLAP, RTD, Spatial / MapViewer,
More informationData Warehousing and Decision Support
Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical
More informationCHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)
CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination
More informationData Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini
Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 669-674 Research India Publications http://www.ripublication.com/aeee.htm Data Warehousing Ritham Vashisht,
More informationUnit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP
Unit 7: Basics in MS Power BI for Excel M7-5: OLAP Outline: Introduction Learning Objectives Content Exercise What is an OLAP Table Operations: Drill Down Operations: Roll Up Operations: Slice Operations:
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More informationData Warehousing and Decision Support
Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business
More informationData Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 31 Table of contents 1 Introduction 2 Data warehousing
More informationData Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong
Data Warehouse Asst.Prof.Dr. Pattarachai Lalitrojwong Faculty of Information Technology King Mongkut s Institute of Technology Ladkrabang Bangkok 10520 pattarachai@it.kmitl.ac.th The Evolution of Data
More informationData Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction 2 Data warehousing
More informationSql Fact Constellation Schema In Data Warehouse With Example
Sql Fact Constellation Schema In Data Warehouse With Example Data Warehouse OLAP - Learn Data Warehouse in simple and easy steps using Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP), Specialized SQL
More informationSQL Server Analysis Services
DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, SQL Server 2005 Analysis Services SQL Server 2005 Analysis Services - 1 Analysis Services Database and
More informationImproving the Performance of OLAP Queries Using Families of Statistics Trees
Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University
More informationDATA WAREHOUING UNIT I
BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009
More informationAdnan YAZICI Computer Engineering Department
Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection
More informationData Warehousing ETL. Esteban Zimányi Slides by Toon Calders
Data Warehousing ETL Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders 1 Overview Picture other sources Metadata Monitor & Integrator OLAP Server Analysis Operational DBs Extract Transform Load
More informationTribhuvan University Institute of Science and Technology MODEL QUESTION
MODEL QUESTION 1. Suppose that a data warehouse for Big University consists of four dimensions: student, course, semester, and instructor, and two measures count and avg-grade. When at the lowest conceptual
More informationData Partitioning and MapReduce
Data Partitioning and MapReduce Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies,
More informationProceedings of the IE 2014 International Conference AGILE DATA MODELS
AGILE DATA MODELS Mihaela MUNTEAN Academy of Economic Studies, Bucharest mun61mih@yahoo.co.uk, Mihaela.Muntean@ie.ase.ro Abstract. In last years, one of the most popular subjects related to the field of
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 02 Lifecycle of Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationCS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)
CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm
More informationAcknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.
MTAT.03.183 Data Mining Week 7: Online Analytical Processing and Data Warehouses Marlon Dumas marlon.dumas ät ut. ee Acknowledgment This slide deck is a mashup of the following publicly available slide
More informationData Warehousing & OLAP
Data Warehousing & OLAP Data Mining: Concepts and Techniques Chapter 3 Jiawei Han and An Introduction to Database Systems C.J.Date, Eighth Eddition, Addidon Wesley, 4 1 What is Data Warehousing? What is
More informationCS 1655 / Spring 2013! Secure Data Management and Web Applications
CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered
More informationIntroduction to DWH / BI Concepts
SAS INTELLIGENCE PLATFORM CURRICULUM SAS INTELLIGENCE PLATFORM BI TOOLS 4.2 VERSION SAS BUSINESS INTELLIGENCE TOOLS - COURSE OUTLINE Practical Project Based Training & Implementation on all the BI Tools
More informationInterdepartmental Programme of Postgraduate Studies in Information Systems, University of Macedonia MASTER THESIS
Interdepartmental Programme of Postgraduate Studies in Information Systems, University of Macedonia MASTER THESIS ANALYSIS AND PROCESS OF HIERARCHICAL SEMI-STRUCTURED DATA George N. Tsilinikos Submitted
More informationCSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!
CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News! Make-up class on Saturday, Mar 9 in Gleacher 203 10:30am 1:30pm.! Last 15 minute in-class quiz (6:30pm) on Mar 5.! Covers
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 14 : 18/11/2014 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationOLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube
OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach
More informationBasics of Dimensional Modeling
Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension
More informationData Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke
Data Warehouses Yanlei Diao Slides Courtesy of R. Ramakrishnan and J. Gehrke Introduction v In the late 80s and early 90s, companies began to use their DBMSs for complex, interactive, exploratory analysis
More informationData Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa
ICS 421 Spring 2010 Data Warehousing 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/30/2010 Lipyeow Lim -- University of Hawaii at Manoa 1 Data Warehousing
More informationIDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu
IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts Enn Õunapuu enn.ounapuu@ttu.ee Content Oveall approach Dimensional model Tabular model Overall approach Data modeling is a discipline that has been practiced
More informationDecision Support Systems
Decision Support Systems 2011/2012 Week 3. Lecture 6 Previous Class Dimensions & Measures Dimensions: Item Time Loca0on Measures: Quan0ty Sales TransID ItemName ItemID Date Store Qty T0001 Computer I23
More informationAdvanced Data Management Technologies Written Exam
Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This
More informationDecision Support Systems aka Analytical Systems
Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis
More informationTeradata Aggregate Designer
Data Warehousing Teradata Aggregate Designer By: Sam Tawfik Product Marketing Manager Teradata Corporation Table of Contents Executive Summary 2 Introduction 3 Problem Statement 3 Implications of MOLAP
More informationData Warehouses and OLAP. Database and Information Systems. Data Warehouses and OLAP. Data Warehouses and OLAP
Database and Information Systems 11. Deductive Databases 12. Data Warehouses and OLAP 13. Index Structures for Similarity Queries 14. Data Mining 15. Semi-Structured Data 16. Document Retrieval 17. Web
More informationA Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective
A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India
More informationCS 245: Database System Principles. Warehousing. Outline. What is a Warehouse? What is a Warehouse? Notes 13: Data Warehousing
Recall : Database System Principles Notes 3: Data Warehousing Three approaches to information integration: Federated databases did teaser Data warehousing next Mediation Hector Garcia-Molina (Some modifications
More informationData Warehousing & OLAP
CMPUT 391 Database Management Systems Data Warehousing & OLAP Textbook: 17.1 17.5 (first edition: 19.1 19.5) Based on slides by Lewis, Bernstein and Kifer and other sources University of Alberta 1 Why
More informationCreate Cube From Star Schema Grouping Framework Manager
Create Cube From Star Schema Grouping Framework Manager Create star schema groupings to provide authors with logical groupings of query Connect to an OLAP data source (cube) in a Framework Manager project
More informationDecision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1
Decision Support Chapter 25 CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support
More informationData Warehousing Introduction. Toon Calders
Data Warehousing Introduction Toon Calders toon.calders@ulb.ac.be Course Organization Lectures on Tuesday 14:00 and Friday 16:00 Check http://gehol.ulb.ac.be/ for room Most exercises in computer class
More informationThis tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.
About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This
More informationEnterprise Informatization LECTURE
Enterprise Informatization LECTURE Piotr Zabawa, PhD. Eng. IBM/Rational Certified Consultant e-mail: pzabawa@pk.edu.pl www: http://www.pk.edu.pl/~pzabawa/en 07.10.2011 Lecture 5 Analytical tools in business
More informationCSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Fall 2016 Lecture 14 - Data Warehousing and Column Stores References Data Cube: A Relational Aggregation Operator Generalizing Group By, Cross-Tab, and
More informationInfoSphere Warehouse V9.5 Exam.
IBM 000-719 InfoSphere Warehouse V9.5 Exam TYPE: DEMO http://www.examskey.com/000-719.html Examskey IBM 000-719 exam demo product is here for you to test the quality of the product. This IBM 000-719 demo
More informationSummary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse
Principles of Knowledge Discovery in bases Fall 1999 Chapter 2: Warehousing and Dr. Osmar R. Zaïane University of Alberta Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University
More informationData warehouses Decision support The multidimensional model OLAP queries
Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing
More informationIntroduction to Data Warehousing
ICS 321 Spring 2012 Introduction to Data Warehousing Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/23/2012 Lipyeow Lim -- University of Hawaii at Manoa
More informationGUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV
GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand
More informationA Multi-Dimensional Data Model
A Multi-Dimensional Data Model A Data Warehouse is based on a Multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationOn-Line Analytical Processing (OLAP) Traditional OLTP
On-Line Analytical Processing (OLAP) CSE 6331 / CSE 6362 Data Mining Fall 1999 Diane J. Cook Traditional OLTP DBMS used for on-line transaction processing (OLTP) order entry: pull up order xx-yy-zz and
More informationData Warehousing Introduction. Esteban Zimanyi Slides from Toon Calders
Data Warehousing Introduction Esteban Zimanyi ezimanyi@ulb.ac.be Slides from Toon Calders Course Organization Lectures on Tuesday 14:00 and Thursday 16:00 Check http://gehol.ulb.ac.be/ for room Most exercises
More informationAggregating Knowledge in a Data Warehouse and Multidimensional Analysis
Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.com Objectives Explain the basics of: 1. Data
More informationSyllabus. Syllabus. Motivation Decision Support. Syllabus
Presentation: Sophia Discussion: Tianyu Metadata Requirements and Conclusion 3 4 Decision Support Decision Making: Everyday, Everywhere Decision Support System: a class of computerized information systems
More information~ Ian Hunneybell: DWDM Revision Notes (31/05/2006) ~
Useful reference: Microsoft Developer Network Library (http://msdn.microsoft.com/library). Drill down to Servers and Enterprise Development SQL Server SQL Server 2000 SDK Documentation Creating and Using
More informationData Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20
Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,
More informationImplementing and Maintaining Microsoft SQL Server 2008 Analysis Services
Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course Details Course Outline Module 1: Introduction to Microsoft SQL Server Analysis Services This module introduces
More informationSAS 9.2 OLAP Server. User s Guide
SAS 9.2 OLAP Server User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2009. SAS 9.2 OLAP Server: User s Guide. Cary, NC: SAS Institute Inc. SAS 9.2 OLAP
More informationDr.G.R.Damodaran College of Science
1 of 20 8/28/2017 2:13 PM Dr.G.R.Damodaran College of Science (Autonomous, affiliated to the Bharathiar University, recognized by the UGC)Reaccredited at the 'A' Grade Level by the NAAC and ISO 9001:2008
More informationData Warehousing & On-Line Analytical Processing
Data Warehousing & On-Line Analytical Processing Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl
More informationUNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?
(Please write your Roll No. immediately) End-Term Examination Fourth Semester [MCA] MAY-JUNE 2006 Roll No. Paper Code: MCA-202 (ID -44202) Subject: Data Warehousing & Data Mining Note: Question no. 1 is
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationData Warehouse. Concepts and Techniques. Chapter 3. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1
Data Warehouse Concepts and Techniques Chapter 3 SS Chung April 5, 2013 Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional
More informationCOURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER
ABOUT THIS COURSE The focus of this five-day instructor-led course is on creating managed enterprise BI solutions. It describes how to implement multidimensional and tabular data models, deliver reports
More informationData Warehousing and OLAP
Data Warehousing and OLAP INFO 330 Slides courtesy of Mirek Riedewald Motivation Large retailer Several databases: inventory, personnel, sales etc. High volume of updates Management requirements Efficient
More informationAnalytical data bases Database lectures for math
Analytical data bases Database lectures for mathematics students May 14, 2017 Decision support systems From the perspective of the time span all decisions in the organization could be divided into three
More informationAfter completing this course, participants will be able to:
Designing a Business Intelligence Solution by Using Microsoft SQL Server 2008 T h i s f i v e - d a y i n s t r u c t o r - l e d c o u r s e p r o v i d e s i n - d e p t h k n o w l e d g e o n d e s
More informationBy Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad
By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad All the content of these PPTs were taken from PPTS of renown author via internet. These PPTs are only mean to share the knowledge among
More informationData Warehousing & On-line Analytical Processing
Data Warehousing & On-line Analytical Processing Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ Chapter 4: Data Warehousing and On-line
More information