ETL and OLAP Systems

Size: px
Start display at page:

Download "ETL and OLAP Systems"

Transcription

1 ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester Academic year 2017/18 (winter course) 1 / 40

2 Review of the Previous Lecture Mining of massive datasets. Evolution of database systems. Dimensional modeling: Three goals of the logical design of data warehouse: simplicity, expressiveness and performance. The most popular conceptual schema: star schema. Designing data warehouses is not an easy task... 2 / 40

3 Outline 1 Motivation 2 ETL 3 OLAP Systems 4 Summary 3 / 40

4 Outline 1 Motivation 2 ETL 3 OLAP Systems 4 Summary 4 / 40

5 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. 5 / 40

6 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: 5 / 40

7 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. 5 / 40

8 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. 5 / 40

9 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! 5 / 40

10 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: 5 / 40

11 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. 5 / 40

12 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. Additional data structures: sorting, indexing, summarizing, cubes. 5 / 40

13 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. Additional data structures: sorting, indexing, summarizing, cubes. Refreshing of data structures. 5 / 40

14 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. Additional data structures: sorting, indexing, summarizing, cubes. Refreshing of data structures. Querying multidimensional data: 5 / 40

15 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. Additional data structures: sorting, indexing, summarizing, cubes. Refreshing of data structures. Querying multidimensional data: SQL extensions, 5 / 40

16 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. Additional data structures: sorting, indexing, summarizing, cubes. Refreshing of data structures. Querying multidimensional data: SQL extensions, Multidimensional expressions (MDX), 5 / 40

17 Motivation OLAP queries are usually performed in a separate system, i.e., a data warehouse. Transferring data to data warehouse: Data warehouses combine data from multiple sources. Data must be translated into a consistent format. Data integration represents 80% of effort for a typical data warehouse project! Optimization of data warehouse: Data storage: relational or multi-dimensional. Additional data structures: sorting, indexing, summarizing, cubes. Refreshing of data structures. Querying multidimensional data: SQL extensions, Multidimensional expressions (MDX), Map-reduce-based language. 5 / 40

18 Outline 1 Motivation 2 ETL 3 OLAP Systems 4 Summary 6 / 40

19 ETL ETL = Extraction, Transformation, and Load Extraction of data from source systems, Transformation and integration of data into a useful format for analysis, Load of data into the warehouse and build of additional structures. Refreshment of data warehouse is closely related to ETL process. The ETL process is described by metadata stored in data warehouse. Architecture of data warehousing: Data sources Data staging area Data warehouse 7 / 40

20 ETL 8 / 40

21 Tools for ETL Data extraction from heterogeneous data sources. Data transformation, integration, and cleansing. Data quality analysis and control. Data loading. High-speed data transfer. Data refreshment. Managing and analyzing metadata. Examples of ETL tools: MS SQL Server Integration Services(SSIS), IBM Infosphere DataStage, SAS ETL Studio, Oracle Warehouse Builder, Oracle Data Integrator, Business Objects Data Integrator, Pentaho Data Integration. 9 / 40

22 Tools for ETL MS SQL Server Integration Services(SSIS) 10 / 40

23 Tools for ETL MS SQL Server Integration Services(SSIS) 11 / 40

24 Data extraction Data warehouse needs extraction of data from different external data sources: 12 / 40

25 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), 12 / 40

26 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), 12 / 40

27 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), 12 / 40

28 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, 12 / 40

29 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, and other documents (.txt,.doc, XML, WWW). 12 / 40

30 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, and other documents (.txt,.doc, XML, WWW). Access to data sources can be difficult: 12 / 40

31 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, and other documents (.txt,.doc, XML, WWW). Access to data sources can be difficult: Data sources are often operational systems, providing the lowest level of data. 12 / 40

32 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, and other documents (.txt,.doc, XML, WWW). Access to data sources can be difficult: Data sources are often operational systems, providing the lowest level of data. Data sources are designed for operational use, not for decision support, and the data reflect this fact. 12 / 40

33 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, and other documents (.txt,.doc, XML, WWW). Access to data sources can be difficult: Data sources are often operational systems, providing the lowest level of data. Data sources are designed for operational use, not for decision support, and the data reflect this fact. Multiple data sources are often from different systems, run on a wide range of hardware and much of the software is built in-house or highly customized. 12 / 40

34 Data extraction Data warehouse needs extraction of data from different external data sources: operational databases (relational, hierarchical, network, itp.), files of standard applications (Excel, COBOL applications), additional databases (direct marketing databases) and data services (stock data), various log files, and other documents (.txt,.doc, XML, WWW). Access to data sources can be difficult: Data sources are often operational systems, providing the lowest level of data. Data sources are designed for operational use, not for decision support, and the data reflect this fact. Multiple data sources are often from different systems, run on a wide range of hardware and much of the software is built in-house or highly customized. Data sources can be designed using different logical structures. 12 / 40

35 Data extraction Identification of concepts and objects does not have to be easy. 13 / 40

36 Data extraction Identification of concepts and objects does not have to be easy. Example: Extract information about sales from the source system. 13 / 40

37 Data extraction Identification of concepts and objects does not have to be easy. Example: Extract information about sales from the source system. What is meant by the term sale? A sale has occurred when 13 / 40

38 Data extraction Identification of concepts and objects does not have to be easy. Example: Extract information about sales from the source system. What is meant by the term sale? A sale has occurred when 1 the order has been received by a customer, 13 / 40

39 Data extraction Identification of concepts and objects does not have to be easy. Example: Extract information about sales from the source system. What is meant by the term sale? A sale has occurred when 1 the order has been received by a customer, 2 the order is sent to the customer, 13 / 40

40 Data extraction Identification of concepts and objects does not have to be easy. Example: Extract information about sales from the source system. What is meant by the term sale? A sale has occurred when 1 the order has been received by a customer, 2 the order is sent to the customer, 3 the invoice has been raised against the order. 13 / 40

41 Data extraction Identification of concepts and objects does not have to be easy. Example: Extract information about sales from the source system. What is meant by the term sale? A sale has occurred when 1 the order has been received by a customer, 2 the order is sent to the customer, 3 the invoice has been raised against the order. It is a common problem that there is no table SALES in the operational databases; some other tables can exist like ORDER with an attribute ORDER STATUS. 13 / 40

42 Conflicts and dirty data Different logical models of operational sources, 14 / 40

43 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), 14 / 40

44 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), 14 / 40

45 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), 14 / 40

46 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), Different field lengths (address stored by using 20 or 50 chars), 14 / 40

47 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), Different field lengths (address stored by using 20 or 50 chars), Different naming conventions: homonyms and synonyms, 14 / 40

48 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), Different field lengths (address stored by using 20 or 50 chars), Different naming conventions: homonyms and synonyms, Inconsistent information concerning the same object, 14 / 40

49 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), Different field lengths (address stored by using 20 or 50 chars), Different naming conventions: homonyms and synonyms, Inconsistent information concerning the same object, Information concerning the same object, but indicated by different keys, 14 / 40

50 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), Different field lengths (address stored by using 20 or 50 chars), Different naming conventions: homonyms and synonyms, Inconsistent information concerning the same object, Information concerning the same object, but indicated by different keys, Missing values, 14 / 40

51 Conflicts and dirty data Different logical models of operational sources, Different data types (account number stored as String or Numeric), Different data domains (gender: M, F, male, female, 1, 0), Different date formats (dd-mm-yyyy or mm-dd-yyyy), Different field lengths (address stored by using 20 or 50 chars), Different naming conventions: homonyms and synonyms, Inconsistent information concerning the same object, Information concerning the same object, but indicated by different keys, Missing values, / 40

52 Deduplication and householding Deduplication ensures that one accurate record exists for each business entity represented in a database, 15 / 40

53 Deduplication and householding Deduplication ensures that one accurate record exists for each business entity represented in a database, Householding is the technique of grouping individual customers by the household or organization of which they are a member; this technique has some interesting marketing implications, and can also support cost-saving measures of direct advertising. 15 / 40

54 Deduplication and householding Deduplication ensures that one accurate record exists for each business entity represented in a database, Householding is the technique of grouping individual customers by the household or organization of which they are a member; this technique has some interesting marketing implications, and can also support cost-saving measures of direct advertising. Example: 15 / 40

55 Deduplication and householding Deduplication ensures that one accurate record exists for each business entity represented in a database, Householding is the technique of grouping individual customers by the household or organization of which they are a member; this technique has some interesting marketing implications, and can also support cost-saving measures of direct advertising. Example: Consider the following rows in a database: Tim Jones 123 Main Street Marlboro MA T. Jones 123 Main St. Marlborogh MA Timothy Jones 321 Maine Street Marlborog AM Jones, Timothy 123 Maine Ave Marlborough MA / 40

56 Deduplication and householding Deduplication ensures that one accurate record exists for each business entity represented in a database, Householding is the technique of grouping individual customers by the household or organization of which they are a member; this technique has some interesting marketing implications, and can also support cost-saving measures of direct advertising. Example: Consider the following rows in a database: Tim Jones 123 Main Street Marlboro MA T. Jones 123 Main St. Marlborogh MA Timothy Jones 321 Maine Street Marlborog AM Jones, Timothy 123 Maine Ave Marlborough MA The sales for around $500 are counted for each tuple. 15 / 40

57 Deduplication and householding Deduplication ensures that one accurate record exists for each business entity represented in a database, Householding is the technique of grouping individual customers by the household or organization of which they are a member; this technique has some interesting marketing implications, and can also support cost-saving measures of direct advertising. Example: Consider the following rows in a database: Tim Jones 123 Main Street Marlboro MA T. Jones 123 Main St. Marlborogh MA Timothy Jones 321 Maine Street Marlborog AM Jones, Timothy 123 Maine Ave Marlborough MA The sales for around $500 are counted for each tuple. Is it the same person? 15 / 40

58 Load of data After extracting, cleaning and transforming, data must be loaded into the warehouse. 16 / 40

59 Load of data After extracting, cleaning and transforming, data must be loaded into the warehouse. Loading the warehouse includes some other processing tasks: checking integrity constraints, sorting, summarizing, creating indexes, etc. 16 / 40

60 Load of data After extracting, cleaning and transforming, data must be loaded into the warehouse. Loading the warehouse includes some other processing tasks: checking integrity constraints, sorting, summarizing, creating indexes, etc. Batch (bulk) load utilities are used for loading. 16 / 40

61 Load of data After extracting, cleaning and transforming, data must be loaded into the warehouse. Loading the warehouse includes some other processing tasks: checking integrity constraints, sorting, summarizing, creating indexes, etc. Batch (bulk) load utilities are used for loading. A load utility must allow the administrator to monitor status, to cancel, suspend, and resume a load, and to restart after failure with no loss of data integrity. 16 / 40

62 Data warehouse refreshment Refreshing a warehouse means propagating updates on source data to the data stored in the warehouse. 17 / 40

63 Data warehouse refreshment Refreshing a warehouse means propagating updates on source data to the data stored in the warehouse. Follows the same structure as ETL process. 17 / 40

64 Data warehouse refreshment Refreshing a warehouse means propagating updates on source data to the data stored in the warehouse. Follows the same structure as ETL process. Several constraints: accessibility of data sources, size of data, size of data warehouse, frequency of data refreshing, degradation of performance of operational systems. 17 / 40

65 Data warehouse refreshment Detect changes in external data sources: 18 / 40

66 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. 18 / 40

67 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources 18 / 40

68 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources 18 / 40

69 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources 18 / 40

70 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources Extract the changes and integrate into the warehouse. 18 / 40

71 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources Extract the changes and integrate into the warehouse. Update indexes, subaggregates and any other additional data structures. 18 / 40

72 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources Extract the changes and integrate into the warehouse. Update indexes, subaggregates and any other additional data structures. Types of refreshments: 18 / 40

73 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources Extract the changes and integrate into the warehouse. Update indexes, subaggregates and any other additional data structures. Types of refreshments: Periodical refreshment (daily or weekly). 18 / 40

74 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources Extract the changes and integrate into the warehouse. Update indexes, subaggregates and any other additional data structures. Types of refreshments: Periodical refreshment (daily or weekly). Immediate refreshment. 18 / 40

75 Data warehouse refreshment Detect changes in external data sources: Different monitoring techniques: external and intrusive techniques. Snapshot vs. timestamped sources Queryable, logged, and replicated sources Callback and internal action sources Extract the changes and integrate into the warehouse. Update indexes, subaggregates and any other additional data structures. Types of refreshments: Periodical refreshment (daily or weekly). Immediate refreshment. Determined by usage, types of data source, etc. 18 / 40

76 Outline 1 Motivation 2 ETL 3 OLAP Systems 4 Summary 19 / 40

77 OLAP systems The next step is to provide solutions for querying and reporting multidimensional analytical data. 20 / 40

78 Multidimensional cube The proper data model for multidimensional reporting is the multidimensional one. 21 / 40

79 Operations in multidimensional data model Roll up summarize data along a dimension hierarchy. Drill down go from higher level summary to lower level summary or detailed data. Slice and dice corresponds to selection and projection. Pivot reorient cube. Raking, Time functions, etc. 22 / 40

80 Lattice of cuboids Different degrees of summarizations are presented as a lattice of cuboids. Example for dimensions: time, product, location, supplier {all} {time} {product} {location} {supplier} {time, product} {time, location} {time, supplier} {product, location} {product, supplier} {location, supplier} {time, product, location} {time, product, supplier} {time, location, supplier} location, {product, supplier} {time,product,location,supplier} Using this structure, one can easily show roll up and drill down operations. 23 / 40

81 Total number of cuboids For an n-dimensional data cube, the total number of cuboids that can be generated is: n T = (L i + 1), i=1 where L i is the number of levels associated with dimension i (excluding the virtual top level all since generalizing to all is equivalent to the removal of a dimension). For example, if the cube has 10 dimensions and each dimension has 4 levels, the total number of cuboids that can be generated will be: T = 5 10 = 9, / 40

82 Total number of cuboids Example: Consider a simple database with two dimensions: 25 / 40

83 Total number of cuboids Example: Consider a simple database with two dimensions: Columns in Date dimension: day, month, year Columns in Localization dimension: street, city, country. Without any information about hierarchies, the number of all possible group-bys is 25 / 40

84 Total number of cuboids Example: Consider a simple database with two dimensions: Columns in Date dimension: day, month, year Columns in Localization dimension: street, city, country. Without any information about hierarchies, the number of all possible group-bys is 2 6 : 25 / 40

85 Total number of cuboids Example: Consider a simple database with two dimensions: Columns in Date dimension: day, month, year Columns in Localization dimension: street, city, country. Without any information about hierarchies, the number of all possible group-bys is 2 6 : day street month city year country day, month street, city day, year street, country month, year city, country day, month, year street, city, country 25 / 40

86 Total number of cuboids Example: Consider the same relations but with defined hierarchies: 26 / 40

87 Total number of cuboids Example: Consider the same relations but with defined hierarchies: day month year street city country 26 / 40

88 Total number of cuboids Example: Consider the same relations but with defined hierarchies: day month year street city country Many combinations of columns can be excluded, e.g., group by day, year, street, country. The number of group-bys is then 26 / 40

89 Total number of cuboids Example: Consider the same relations but with defined hierarchies: day month year street city country Many combinations of columns can be excluded, e.g., group by day, year, street, country. The number of group-bys is then 4 2 : 26 / 40

90 Total number of cuboids Example: Consider the same relations but with defined hierarchies: day month year street city country Many combinations of columns can be excluded, e.g., group by day, year, street, country. The number of group-bys is then 4 2 : year country month, year city, country day, month, year street, city, country 26 / 40

91 Three types of aggregate functions distributive: count(), sum(), max(), min(), algebraic: ave(), stddev(), holistic: median(), mode(), rank(). 27 / 40

92 OLAP servers Relational OLAP (ROLAP), Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP). 28 / 40

93 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. 29 / 40

94 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. 29 / 40

95 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: 29 / 40

96 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, 29 / 40

97 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, Materialized views, 29 / 40

98 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, Materialized views, Partitioning, 29 / 40

99 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, Materialized views, Partitioning, Joins, 29 / 40

100 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, Materialized views, Partitioning, Joins, Indexes (join index, bitmaps), 29 / 40

101 ROLAP ROLAP servers use a relational or post-relational database management system to store and manage warehouse data. ROLAP systems use SQL and its OLAP extensions. Optimization techniques: Denormalization, Materialized views, Partitioning, Joins, Indexes (join index, bitmaps), Query processing. 29 / 40

102 ROLAP Advantages of ROLAP Servers: 30 / 40

103 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, 30 / 40

104 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, 30 / 40

105 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, Sparsity is not a problem (fact tables contain only facts), 30 / 40

106 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, Sparsity is not a problem (fact tables contain only facts), Mature and well-developed technology. 30 / 40

107 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, Sparsity is not a problem (fact tables contain only facts), Mature and well-developed technology. Disadvantage of ROLAP Servers: 30 / 40

108 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, Sparsity is not a problem (fact tables contain only facts), Mature and well-developed technology. Disadvantage of ROLAP Servers: Worse performance than MOLAP, 30 / 40

109 ROLAP Advantages of ROLAP Servers: Scalable with respect to the number of dimensions, Scalable with respect to the size of data, Sparsity is not a problem (fact tables contain only facts), Mature and well-developed technology. Disadvantage of ROLAP Servers: Worse performance than MOLAP, Additional data structures and optimization techniques used to improve the performance. 30 / 40

110 MOLAP MOLAP Servers use array-based multidimensional storage engines. 31 / 40

111 MOLAP MOLAP Servers use array-based multidimensional storage engines. Optimization techniques: 31 / 40

112 MOLAP MOLAP Servers use array-based multidimensional storage engines. Optimization techniques: Two-level storage representation: dense cubes are identified and stored as array structures, sparse cubes employ compression techniques, 31 / 40

113 MOLAP MOLAP Servers use array-based multidimensional storage engines. Optimization techniques: Two-level storage representation: dense cubes are identified and stored as array structures, sparse cubes employ compression techniques, Materialized cubes. 31 / 40

114 MOLAP Advantages of MOLAP Servers: 32 / 40

115 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, 32 / 40

116 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. 32 / 40

117 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: 32 / 40

118 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: Scalability problem in the case of larger number of dimensions, 32 / 40

119 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: Scalability problem in the case of larger number of dimensions, Not tailored for sparse data. 32 / 40

120 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: Scalability problem in the case of larger number of dimensions, Not tailored for sparse data. Example: 32 / 40

121 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: Scalability problem in the case of larger number of dimensions, Not tailored for sparse data. Example: Logical model consists of four dimensions: customer, product, location, and day 32 / 40

122 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: Scalability problem in the case of larger number of dimensions, Not tailored for sparse data. Example: Logical model consists of four dimensions: customer, product, location, and day In case of customers, products, locations and days, the data cube will contain cells! 32 / 40

123 MOLAP Advantages of MOLAP Servers: Multidimensional views are directly mapped to data cube array structures efficient access to data, Can easily store subaggregates. Disadvantages of MOLAP Servers: Scalability problem in the case of larger number of dimensions, Not tailored for sparse data. Example: Logical model consists of four dimensions: customer, product, location, and day In case of customers, products, locations and days, the data cube will contain cells! A huge number of cells is empty: a customer is not able to buy all products in all locations / 40

124 HOLAP HOLAP servers are a hybrid approach that combines ROLAP and MOLAP technology. HOLAP benefits from the greater scalability of ROLAP and the faster computation of MOLAP. 33 / 40

125 Multidimensional queries We need an intuitive way of expressing analytical (multidimensional) queries: 34 / 40

126 Multidimensional queries We need an intuitive way of expressing analytical (multidimensional) queries: Operations like roll up, drill down, slice and dice, pivoting, ranking, time and window functions, etc. 34 / 40

127 Multidimensional queries We need an intuitive way of expressing analytical (multidimensional) queries: Operations like roll up, drill down, slice and dice, pivoting, ranking, time and window functions, etc. Two solutions: 34 / 40

128 Multidimensional queries We need an intuitive way of expressing analytical (multidimensional) queries: Operations like roll up, drill down, slice and dice, pivoting, ranking, time and window functions, etc. Two solutions: Extending SQL, or 34 / 40

129 Multidimensional queries We need an intuitive way of expressing analytical (multidimensional) queries: Operations like roll up, drill down, slice and dice, pivoting, ranking, time and window functions, etc. Two solutions: Extending SQL, or Inventing a new language ( MDX), 34 / 40

130 SQL OLAP extensions in SQL: GROUP BY CUBE, GROUP BY ROLLUP, GROUP BY GROUPING SETS, GROUPING and DECODE/CASE OVER and PARTITION BY, RANK. 35 / 40

131 MDX MDX Multidimensional expressions. 36 / 40

132 MDX MDX Multidimensional expressions. For OLAP queries, MDX is an alternative to SQL: 36 / 40

133 MDX MDX Multidimensional expressions. For OLAP queries, MDX is an alternative to SQL: 36 / 40

134 MDX MDX Multidimensional expressions. For OLAP queries, MDX is an alternative to SQL: Academic year Instructor AVG(Grade) 2011/2 Stefanowski /2 S lowiński /3 Stefanowski /3 S lowiński /4 Stefanowski /4 S lowiński /4 Dembczyński / 40

135 MDX MDX Multidimensional expressions. For OLAP queries, MDX is an alternative to SQL: Academic year Instructor AVG(Grade) 2011/2 Stefanowski /2 S lowiński /3 Stefanowski /3 S lowiński /4 Stefanowski /4 S lowiński /4 Dembczyński 4.8 AVG(Grade) Academic year Name 2011/2 2012/3 2013/4 Stefanowski S lowiński Dembczyński / 40

136 MDX MDX query: SELECT {[Time].[1997],[Time].[1998]} ON COLUMNS, {[Measures].[Sales],[Measures].[Cost]} ON ROWS FROM Warehouse WHERE ([Store].[All].[USA]) Seems to be similar to SQL, but in fact it is quite different! 37 / 40

137 Outline 1 Motivation 2 ETL 3 OLAP Systems 4 Summary 38 / 40

138 Summary ETL process is a strategic element of data warehousing. Main concepts: extraction, transformation and integration, load, data warehouse refreshment and metadata. New emerging technology... OLAP systems: ROLAP, MOLAP and HOLAP. 39 / 40

139 Bibliography J. Han and M. Kamber. Data Mining: Concepts and Techniques (second edition). Morgan Kaufmann Publishers, / 40

Multidimensional Queries

Multidimensional Queries Multidimensional Queries Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

OLAP Systems and Multidimensional Expressions

OLAP Systems and Multidimensional Expressions OLAP Systems and Multidimensional Expressions Krzysztof Dembczyński Institute of Computing Science Laboratory of Intelligent Decision Support Systems Politechnika Poznańska (Poznań University of Technology)

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

Processing of Very Large Data

Processing of Very Large Data Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:- UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to

More information

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data

More information

Call: SAS BI Course Content:35-40hours

Call: SAS BI Course Content:35-40hours SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio

More information

An Overview of Data Warehousing and OLAP Technology

An Overview of Data Warehousing and OLAP Technology An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection

More information

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Lectures for the course: Data Warehousing and Data Mining (IT 60107) Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline

More information

collection of data that is used primarily in organizational decision making.

collection of data that is used primarily in organizational decision making. Data Warehousing A data warehouse is a special purpose database. Classic databases are generally used to model some enterprise. Most often they are used to support transactions, a process that is referred

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic

More information

Chapter 18: Data Analysis and Mining

Chapter 18: Data Analysis and Mining Chapter 18: Data Analysis and Mining Database System Concepts See www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP 18.2 Decision

More information

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders Data Warehousing Conclusion Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders Motivation for the Course Database = a piece of software to handle data: Store, maintain, and query Most ideal system

More information

REPORTING AND QUERY TOOLS AND APPLICATIONS

REPORTING AND QUERY TOOLS AND APPLICATIONS Tool Categories: REPORTING AND QUERY TOOLS AND APPLICATIONS There are five categories of decision support tools Reporting Managed query Executive information system OLAP Data Mining Reporting Tools Production

More information

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015 Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers

More information

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: How business intelligence is a comprehensive framework to support business decision making How operational

More information

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such

More information

What is a Data Warehouse?

What is a Data Warehouse? What is a Data Warehouse? COMP 465 Data Mining Data Warehousing Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Defined in many different ways,

More information

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)

More information

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus Overview: Analysis Services enables you to analyze large quantities of data. With it, you can design, create, and manage multidimensional structures that contain detail and aggregated data from multiple

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Chapter 4, Data Warehouse and OLAP Operations

Chapter 4, Data Warehouse and OLAP Operations CSI 4352, Introduction to Data Mining Chapter 4, Data Warehouse and OLAP Operations Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

BUSINESS INTELLIGENCE. SSAS - SQL Server Analysis Services. Business Informatics Degree

BUSINESS INTELLIGENCE. SSAS - SQL Server Analysis Services. Business Informatics Degree BUSINESS INTELLIGENCE SSAS - SQL Server Analysis Services Business Informatics Degree 2 BI Architecture SSAS: SQL Server Analysis Services 3 It is both an OLAP Server and a Data Mining Server Distinct

More information

OBIEE Performance Improvement Tips and Techniques

OBIEE Performance Improvement Tips and Techniques OBIEE Performance Improvement Tips and Techniques Vivek Jain, Manager Deloitte Speaker Bio Manager with Deloitte Consulting, Information Management (BI/DW) Skills in OBIEE, OLAP, RTD, Spatial / MapViewer,

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination

More information

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 669-674 Research India Publications http://www.ripublication.com/aeee.htm Data Warehousing Ritham Vashisht,

More information

Unit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP

Unit 7: Basics in MS Power BI for Excel 2013 M7-5: OLAP Unit 7: Basics in MS Power BI for Excel M7-5: OLAP Outline: Introduction Learning Objectives Content Exercise What is an OLAP Table Operations: Drill Down Operations: Roll Up Operations: Slice Operations:

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 31 Table of contents 1 Introduction 2 Data warehousing

More information

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong Data Warehouse Asst.Prof.Dr. Pattarachai Lalitrojwong Faculty of Information Technology King Mongkut s Institute of Technology Ladkrabang Bangkok 10520 pattarachai@it.kmitl.ac.th The Evolution of Data

More information

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction 2 Data warehousing

More information

Sql Fact Constellation Schema In Data Warehouse With Example

Sql Fact Constellation Schema In Data Warehouse With Example Sql Fact Constellation Schema In Data Warehouse With Example Data Warehouse OLAP - Learn Data Warehouse in simple and easy steps using Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP), Specialized SQL

More information

SQL Server Analysis Services

SQL Server Analysis Services DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, SQL Server 2005 Analysis Services SQL Server 2005 Analysis Services - 1 Analysis Services Database and

More information

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Improving the Performance of OLAP Queries Using Families of Statistics Trees Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

Adnan YAZICI Computer Engineering Department

Adnan YAZICI Computer Engineering Department Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection

More information

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders Data Warehousing ETL Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders 1 Overview Picture other sources Metadata Monitor & Integrator OLAP Server Analysis Operational DBs Extract Transform Load

More information

Tribhuvan University Institute of Science and Technology MODEL QUESTION

Tribhuvan University Institute of Science and Technology MODEL QUESTION MODEL QUESTION 1. Suppose that a data warehouse for Big University consists of four dimensions: student, course, semester, and instructor, and two measures count and avg-grade. When at the lowest conceptual

More information

Data Partitioning and MapReduce

Data Partitioning and MapReduce Data Partitioning and MapReduce Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies,

More information

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

Proceedings of the IE 2014 International Conference  AGILE DATA MODELS AGILE DATA MODELS Mihaela MUNTEAN Academy of Economic Studies, Bucharest mun61mih@yahoo.co.uk, Mihaela.Muntean@ie.ase.ro Abstract. In last years, one of the most popular subjects related to the field of

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 02 Lifecycle of Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures) CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm

More information

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process. MTAT.03.183 Data Mining Week 7: Online Analytical Processing and Data Warehouses Marlon Dumas marlon.dumas ät ut. ee Acknowledgment This slide deck is a mashup of the following publicly available slide

More information

Data Warehousing & OLAP

Data Warehousing & OLAP Data Warehousing & OLAP Data Mining: Concepts and Techniques Chapter 3 Jiawei Han and An Introduction to Database Systems C.J.Date, Eighth Eddition, Addidon Wesley, 4 1 What is Data Warehousing? What is

More information

CS 1655 / Spring 2013! Secure Data Management and Web Applications

CS 1655 / Spring 2013! Secure Data Management and Web Applications CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered

More information

Introduction to DWH / BI Concepts

Introduction to DWH / BI Concepts SAS INTELLIGENCE PLATFORM CURRICULUM SAS INTELLIGENCE PLATFORM BI TOOLS 4.2 VERSION SAS BUSINESS INTELLIGENCE TOOLS - COURSE OUTLINE Practical Project Based Training & Implementation on all the BI Tools

More information

Interdepartmental Programme of Postgraduate Studies in Information Systems, University of Macedonia MASTER THESIS

Interdepartmental Programme of Postgraduate Studies in Information Systems, University of Macedonia MASTER THESIS Interdepartmental Programme of Postgraduate Studies in Information Systems, University of Macedonia MASTER THESIS ANALYSIS AND PROCESS OF HIERARCHICAL SEMI-STRUCTURED DATA George N. Tsilinikos Submitted

More information

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News! CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News! Make-up class on Saturday, Mar 9 in Gleacher 203 10:30am 1:30pm.! Last 15 minute in-class quiz (6:30pm) on Mar 5.! Covers

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 14 : 18/11/2014 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke Data Warehouses Yanlei Diao Slides Courtesy of R. Ramakrishnan and J. Gehrke Introduction v In the late 80s and early 90s, companies began to use their DBMSs for complex, interactive, exploratory analysis

More information

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa ICS 421 Spring 2010 Data Warehousing 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/30/2010 Lipyeow Lim -- University of Hawaii at Manoa 1 Data Warehousing

More information

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts Enn Õunapuu enn.ounapuu@ttu.ee Content Oveall approach Dimensional model Tabular model Overall approach Data modeling is a discipline that has been practiced

More information

Decision Support Systems

Decision Support Systems Decision Support Systems 2011/2012 Week 3. Lecture 6 Previous Class Dimensions & Measures Dimensions: Item Time Loca0on Measures: Quan0ty Sales TransID ItemName ItemID Date Store Qty T0001 Computer I23

More information

Advanced Data Management Technologies Written Exam

Advanced Data Management Technologies Written Exam Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This

More information

Decision Support Systems aka Analytical Systems

Decision Support Systems aka Analytical Systems Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis

More information

Teradata Aggregate Designer

Teradata Aggregate Designer Data Warehousing Teradata Aggregate Designer By: Sam Tawfik Product Marketing Manager Teradata Corporation Table of Contents Executive Summary 2 Introduction 3 Problem Statement 3 Implications of MOLAP

More information

Data Warehouses and OLAP. Database and Information Systems. Data Warehouses and OLAP. Data Warehouses and OLAP

Data Warehouses and OLAP. Database and Information Systems. Data Warehouses and OLAP. Data Warehouses and OLAP Database and Information Systems 11. Deductive Databases 12. Data Warehouses and OLAP 13. Index Structures for Similarity Queries 14. Data Mining 15. Semi-Structured Data 16. Document Retrieval 17. Web

More information

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India

More information

CS 245: Database System Principles. Warehousing. Outline. What is a Warehouse? What is a Warehouse? Notes 13: Data Warehousing

CS 245: Database System Principles. Warehousing. Outline. What is a Warehouse? What is a Warehouse? Notes 13: Data Warehousing Recall : Database System Principles Notes 3: Data Warehousing Three approaches to information integration: Federated databases did teaser Data warehousing next Mediation Hector Garcia-Molina (Some modifications

More information

Data Warehousing & OLAP

Data Warehousing & OLAP CMPUT 391 Database Management Systems Data Warehousing & OLAP Textbook: 17.1 17.5 (first edition: 19.1 19.5) Based on slides by Lewis, Bernstein and Kifer and other sources University of Alberta 1 Why

More information

Create Cube From Star Schema Grouping Framework Manager

Create Cube From Star Schema Grouping Framework Manager Create Cube From Star Schema Grouping Framework Manager Create star schema groupings to provide authors with logical groupings of query Connect to an OLAP data source (cube) in a Framework Manager project

More information

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Decision Support Chapter 25 CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support

More information

Data Warehousing Introduction. Toon Calders

Data Warehousing Introduction. Toon Calders Data Warehousing Introduction Toon Calders toon.calders@ulb.ac.be Course Organization Lectures on Tuesday 14:00 and Friday 16:00 Check http://gehol.ulb.ac.be/ for room Most exercises in computer class

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

Enterprise Informatization LECTURE

Enterprise Informatization LECTURE Enterprise Informatization LECTURE Piotr Zabawa, PhD. Eng. IBM/Rational Certified Consultant e-mail: pzabawa@pk.edu.pl www: http://www.pk.edu.pl/~pzabawa/en 07.10.2011 Lecture 5 Analytical tools in business

More information

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Fall 2016 Lecture 14 - Data Warehousing and Column Stores References Data Cube: A Relational Aggregation Operator Generalizing Group By, Cross-Tab, and

More information

InfoSphere Warehouse V9.5 Exam.

InfoSphere Warehouse V9.5 Exam. IBM 000-719 InfoSphere Warehouse V9.5 Exam TYPE: DEMO http://www.examskey.com/000-719.html Examskey IBM 000-719 exam demo product is here for you to test the quality of the product. This IBM 000-719 demo

More information

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse Principles of Knowledge Discovery in bases Fall 1999 Chapter 2: Warehousing and Dr. Osmar R. Zaïane University of Alberta Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University

More information

Data warehouses Decision support The multidimensional model OLAP queries

Data warehouses Decision support The multidimensional model OLAP queries Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing

More information

Introduction to Data Warehousing

Introduction to Data Warehousing ICS 321 Spring 2012 Introduction to Data Warehousing Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/23/2012 Lipyeow Lim -- University of Hawaii at Manoa

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand

More information

A Multi-Dimensional Data Model

A Multi-Dimensional Data Model A Multi-Dimensional Data Model A Data Warehouse is based on a Multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

On-Line Analytical Processing (OLAP) Traditional OLTP

On-Line Analytical Processing (OLAP) Traditional OLTP On-Line Analytical Processing (OLAP) CSE 6331 / CSE 6362 Data Mining Fall 1999 Diane J. Cook Traditional OLTP DBMS used for on-line transaction processing (OLTP) order entry: pull up order xx-yy-zz and

More information

Data Warehousing Introduction. Esteban Zimanyi Slides from Toon Calders

Data Warehousing Introduction. Esteban Zimanyi Slides from Toon Calders Data Warehousing Introduction Esteban Zimanyi ezimanyi@ulb.ac.be Slides from Toon Calders Course Organization Lectures on Tuesday 14:00 and Thursday 16:00 Check http://gehol.ulb.ac.be/ for room Most exercises

More information

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.com Objectives Explain the basics of: 1. Data

More information

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Syllabus. Syllabus. Motivation Decision Support. Syllabus Presentation: Sophia Discussion: Tianyu Metadata Requirements and Conclusion 3 4 Decision Support Decision Making: Everyday, Everywhere Decision Support System: a class of computerized information systems

More information

~ Ian Hunneybell: DWDM Revision Notes (31/05/2006) ~

~ Ian Hunneybell: DWDM Revision Notes (31/05/2006) ~ Useful reference: Microsoft Developer Network Library (http://msdn.microsoft.com/library). Drill down to Servers and Enterprise Development SQL Server SQL Server 2000 SDK Documentation Creating and Using

More information

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,

More information

Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course Details Course Outline Module 1: Introduction to Microsoft SQL Server Analysis Services This module introduces

More information

SAS 9.2 OLAP Server. User s Guide

SAS 9.2 OLAP Server. User s Guide SAS 9.2 OLAP Server User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2009. SAS 9.2 OLAP Server: User s Guide. Cary, NC: SAS Institute Inc. SAS 9.2 OLAP

More information

Dr.G.R.Damodaran College of Science

Dr.G.R.Damodaran College of Science 1 of 20 8/28/2017 2:13 PM Dr.G.R.Damodaran College of Science (Autonomous, affiliated to the Bharathiar University, recognized by the UGC)Reaccredited at the 'A' Grade Level by the NAAC and ISO 9001:2008

More information

Data Warehousing & On-Line Analytical Processing

Data Warehousing & On-Line Analytical Processing Data Warehousing & On-Line Analytical Processing Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl

More information

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different? (Please write your Roll No. immediately) End-Term Examination Fourth Semester [MCA] MAY-JUNE 2006 Roll No. Paper Code: MCA-202 (ID -44202) Subject: Data Warehousing & Data Mining Note: Question no. 1 is

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Data Warehouse. Concepts and Techniques. Chapter 3. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1

Data Warehouse. Concepts and Techniques. Chapter 3. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1 Data Warehouse Concepts and Techniques Chapter 3 SS Chung April 5, 2013 Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional

More information

COURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER

COURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER ABOUT THIS COURSE The focus of this five-day instructor-led course is on creating managed enterprise BI solutions. It describes how to implement multidimensional and tabular data models, deliver reports

More information

Data Warehousing and OLAP

Data Warehousing and OLAP Data Warehousing and OLAP INFO 330 Slides courtesy of Mirek Riedewald Motivation Large retailer Several databases: inventory, personnel, sales etc. High volume of updates Management requirements Efficient

More information

Analytical data bases Database lectures for math

Analytical data bases Database lectures for math Analytical data bases Database lectures for mathematics students May 14, 2017 Decision support systems From the perspective of the time span all decisions in the organization could be divided into three

More information

After completing this course, participants will be able to:

After completing this course, participants will be able to: Designing a Business Intelligence Solution by Using Microsoft SQL Server 2008 T h i s f i v e - d a y i n s t r u c t o r - l e d c o u r s e p r o v i d e s i n - d e p t h k n o w l e d g e o n d e s

More information

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad All the content of these PPTs were taken from PPTS of renown author via internet. These PPTs are only mean to share the knowledge among

More information

Data Warehousing & On-line Analytical Processing

Data Warehousing & On-line Analytical Processing Data Warehousing & On-line Analytical Processing Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ Chapter 4: Data Warehousing and On-line

More information