Logical Design A logical design is conceptual and abstract. It is not necessary to deal with the physical implementation details at this stage.

Size: px
Start display at page:

Download "Logical Design A logical design is conceptual and abstract. It is not necessary to deal with the physical implementation details at this stage."

Transcription

1

2 Logical Design A logical design is conceptual and abstract. It is not necessary to deal with the physical implementation details at this stage. You need to only define the types of information specified by your requirements. One technique you can use to model your logical information requirements is entity-relationship (ER) modeling. ER modeling involves identifying important data (entities), the properties of these entities (attributes), and how they are related to one another (relationships). For modeling purposes, an entity represents a chunk of information. In relational databases, an entity often maps to a table. An attribute is a component of an entity that helps define the uniqueness of the entity. In relational databases, an attribute maps to a column. To ensure that your data is consistent, you should use unique identifiers. A unique identifier is added to tables so that you can differentiate between the same item when it appears in different places. In practice, this is usually a primary key. Although entity-relationship diagramming has traditionally been associated with highly normalized models such as OLTP applications, the technique is still useful for data warehouse design in the form of dimensional modeling. Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables. In dimensional modeling, you identify which information belongs to a central fact table and which information belongs to its associated dimension tables.

3 Logical Design (continued) You identify business subjects or fields of data, define relationships between business subjects, and name the attributes for each subject. Your logical design should include: A set of entities and attributes corresponding to fact tables and dimension tables A model of operational data from your source data into subjectoriented information in your target data warehouse schema You can create the logical design using a pen and paper, or you can use a design tool such as Oracle Warehouse Builder (specifically designed for modeling the ETL process) or Oracle Designer (a general purpose modeling tool).

4 Data Warehousing Schemas A schema is a collection of database objects that includes tables, views, indexes, and synonyms. You can arrange schema objects in the schema models designed for data warehousing in a variety of ways. Most data warehouses use a dimensional model. The model of your source data and the requirements of your users help you design the data warehouse schema. You can sometimes get the source model from your enterprise data model and reverse-engineer the logical data model for the data warehouse from this. The physical implementation of the logical data warehouse model may require some changes to help you adapt it to your system parameters size of machine, number of users, storage capacity, type of network, and software. A common data warehouse schema model is the star schema. However, there are other schema models that are commonly used for data warehouses. The most prevalent of these schema models is the third normal form (3NF) schema. The snowflake schema is a type of star schema, but slightly more complex. Additionally, some data warehouse schemas are neither star schemas nor 3NF schemas, but share characteristics of both schemas; these are referred to as hybrid schema models. The important thing to remember when designing your schema is not to get lost in theory and academic comparisons. These days, most successful data warehouses employ a hybrid approach to schemas.

5 Schema Characteristics The star schema is perhaps the simplest data warehouse schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. A star schema is characterized by one or more very large fact tables that contain the primary information in the data warehouse, and a number of much smaller dimension tables (or lookup tables), each of which contains information about the entries for a particular attribute in the fact table. A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other. The cost-based optimizer recognizes star queries and generates efficient execution plans (star transformation) for them. Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data is grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema may be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. Although this saves space, it increases the number of dimension tables and requires more foreign key joins. This results in more complex queries and reduced query performance.

6 Schema Characteristics (continued) A third normal form (3NF) schema is a classical relational database modeling technique that minimizes data redundancy through normalization. A relationship can be considered to be 3NF if none of the nonprimary key attributes are duplicated in other tables. When compared to a star schema, a 3NF schema typically has a larger number of tables due to this normalization process. 3NF schemas are typically chosen for large data warehouses, especially environments with significant data-loading requirements that are used to feed data marts and execute long-running queries.

7 Star Schema Model Normalization is not always a good thing when dealing with large amounts of data. Although it is ideal for data updates, inserts, deletes, and integrity, it can slow down processing. To speed up processing, you can denormalize data into a star schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central fact table. The center of the star consists of one or more fact tables and the points of the star are the dimension tables. This kind of schema can be more natural to nontechnical end users who are more familiar with logical entities rather than entities and relationships. A star schema is characterized by one or more very large fact tables that contain the primary information in the data warehouse and a number of much smaller dimension tables, each of which contains information about the entries for a particular attribute in the fact table. A star query is a join between a fact table and several dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other. A typical fact table contains keys and measures. For example, in the Sales History schema, the fact table, sales, contains the measures quantity_sold, amount, and cost, and the keys cust_id, time_id, prod_id, channel_id, and promo_id. The dimension tables are customers, times, products, channels, and promotions.

8 Star Schema Model (continued) The products dimension table, for example, contains information about each product number that appears in the fact table. The main advantages of star schemas are that they: Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design Provide highly optimized performance for typical star queries Are widely supported by a large number of business intelligence tools, which may anticipate or even require that the data warehouse schema contain dimension tables. The most natural way to model a data warehouse is as a star schema, where only one join establishes the relationship between the fact table and any one of the dimension tables. A star schema optimizes performance by keeping queries simple and providing fast response time. All the information about each level is stored in one row. Star schemas do have some inherent difficulties. It is possible for the central fact table to grow very large, with an upper limit of the product of the number of rows in each dimension table. Also, the dimension tables are no longer normalized, so they are larger and harder to maintain with lots of duplicate data.

9 Snowflake Schema Model If your business needs require more normalization, you can employ a snowflake schema, which is a star schema with some of the features of third normal form (3NF) data. The snowflake schema is a more complex data warehouse model than a star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake. Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been normalized into multiple smaller tables instead of one large table. For example, a product dimension table in a star schema may be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. Although this saves space, it increases the number of dimension tables and requires more foreign key joins. This results in more complex queries and reduced query performance. The main advantage of the snowflake schema is the improvement in query performance due to minimized disk storage requirements and the joining of smaller lookup tables. The main disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increased number of lookup tables. Although snowflake schemas are unnecessary when the dimension tables are small, a business having large dimension tables containing millions of rows can use snowflake schemas to significantly improve performance.

10 Snowflake Schema Model (continued) However, a potential problem that you may encounter with snowflake schemas is that they may start to show signs of the performance problems of 3NF queries. Note: It is suggested that you choose a star schema over a snowflake schema unless you have a clear business reason to choose the snowflake schema.

11

12

13

14 Data Warehousing Objects Fact tables and dimension tables are the two types of objects commonly found in dimensional data warehouse schemas. A fact table is large and typically has two types of columns: those that contain numeric facts (often called measurements) and those that are foreign keys to dimension tables. Measures are the data that you want to analyze, such as total_sales or unit_cost. Fact tables typically contain facts and foreign keys to the dimension tables. Fact tables represent data that can be analyzed and examined. Examples of fact tables include SALES, COST, and PROFIT. Facts are generated by events that occurred in the past, and are unlikely to change, regardless of how they are analyzed.. A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggregation. Fact tables are usually deep but not wide. Must decide the Granularity of Fact Table what level of detail do you want? Transactional grain finest level Aggregated grain more summarized data. Can impact the dimension table attributes. Finer grains better for market basket analysis capability where you want to identify affinity between products However, Finer grain more dimension tables, more rows in fact table!

15 Data Warehousing Objects(continued) Though most facts are additive, they can also be semiadditive or nonadditive. Additive facts can be aggregated by simple arithmetic addition. A common example of this is sales. Nonadditive facts cannot be added at all. An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it. You must define a fact table for each star schema. From a modeling standpoint, the primary key of the fact table is usually a composite key that is made up of all its foreign keys. You must define a fact table for each star schema. From a modeling standpoint, the primary key of the fact table is usually a composite key that is made up of all its foreign keys. However, it is common to add a surrogate key particularly in regard to deal with slowly changing dimensions. For example, Product Floppy Disk may have a natural key of x001 which held in the Fact table as a foreign key relationship to the Product Dimension in the Datawarehouse. What impact is there on the Data Warehouse if the Product Description is changed to Optical Disk using the same natural key x001 in the production system as Floppy Disk is not a product produced by the company? The fact data relating to the old product Floppy Disk is broken. By introducing a surrogate key to for the product dimension table we will be able to preserve the old relationship with the old product by inserting a new row for Optical Disk and the next surrogate key value in the dimension table. By doing this, we have preserved the fact data relating to the old product name. Dimension tables, also known as lookup or reference tables, contain the relatively static data in the data warehouse. Dimension tables store the information that you normally use to contain queries. Dimension tables are usually textual and descriptive and you can use them as the row headers of the result set. Examples are CUSTOMERS and PRODUCTS. Relationships guarantee the integrity of business information. An example is that if a business sells something, there is obviously a customer and a product. Designing a relationship between the sales information in the fact table and the dimension tables PRODUCTS and CUSTOMERS enforces the business rules in databases.

16 Data Warehousing Objects(continued) Dimension tables, also known as lookup or reference tables, contain the relatively static data in the data warehouse. Dimension tables store the information that you normally use to contain queries. Dimension tables are usually textual and descriptive and you can use them as the row headers of the result set. Examples are CUSTOMERS and PRODUCTS. The dimension table is wide, but not typically deep and may have more that 50 attributes It may only have rows (in some cases could have 1,000s to millions). For star schema the dimension data is not normalised. Dimensions can support drill-downs and roll-ups (aka drill-ups) where they exhibit natural hierarchies e.g. Drill down Total sales by year, quarter, month, week, day. Relationships guarantee the integrity of business information. An example is that if a business sells something, there is obviously a customer and a product. Designing a relationship between the sales information in the fact table and the dimension tables PRODUCTS and CUSTOMERS enforces the business rules in databases. Commonly, a surrogate key is used as the primary key rather than the natural key due to slowly changing dimensions (SCD) SCD Type 1 Overwrite data in the dimension. Does not maintain history but good for correcting errors. Most DW s start out with type 1 as the default SCD Type 2 Add a new dimensional record to preserve history. Must generalize the primary key by replacing it with a surrogate key and with start and end effective date columns. SCD Type 3 Some user assigned attributes can legitimately have more than one assigned value depending on the observer s viewpoint. For example, in a stationery shop a marking pen could be assigned to household goods category or the art supplies category. We need to add a new alternate category column to the dimension to facilitate this. This approach however, does not scale gracefully beyond a few choices.

17 Dimensions and Hierarchies A dimension is a structure composed of one or more hierarchies that categorizes data. Dimensions are descriptive labels that provide supplemental information about facts and are stored in dimension tables. They are normally descriptive, textual values. Several distinct dimensions, combined with facts, enable you to answer business questions. Commonly used dimensions are customers, products, and time. Dimension data is typically collected at the lowest level of detail and then aggregated into higher-level totals that are more useful for analysis. These natural rollups or aggregations within a dimension table are called hierarchies. Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a TIMEdimension, a hierarchy may aggregate data from the month level to the quarter level to the year level. Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the PRODUCT dimension, there may be two hierarchies one for product categories and the other for product suppliers.

18 Defining Dimensions and Hierarchies The example in the slide describes a single hierarchy within the time dimension, but it is possible to have multiple hierarchies. For example, another hierarchy can be created to link sales date with week or season. Note that by creating a dimension, you just create meta information that the Oracle server can use afterward, for example, during query rewrite. It does not mean that you enforce any of the relationships described in the newly created dimension. That is why constraints can still be used, whenever possible, to maintain dimension validity. After defining the dimension, you can validate the dimension by using the DBMS.DIMENSION.VALIDATE_DIMENSION procedure.

19 Current approaches to dimensional design mostly take a top down approach. From the demand side, the models are developed fro user query analysis requirements However, from the supply side, to a large extent, a data warehouse is simply a repackaging of operational data in a more accessible form. Therefore, dimensional modelling design is highly constrained and limited by what data is available from operational systems. The transformation of the an ER model to dimensional form takes place in four steps: Step 1 Classify Entities Step 2 Design High-level star schema Step 3 Identify Star Schemas Required Step 4 Define Level of Summarisation

20 Transaction Entities record details of business events (e.g. orders, shipments, airline reservations)most BI applications focus on these events, to identify patterns, trends and potential problems. Component Entities are directly related to a transaction entity by a one-to-many relationship. They are involved in the business event and answer the who what where how and why questions about the event. Classification Entities are related to the component entity by a chain of one-to-many relationships. These define embedded hierarchies in the data model and are used to classify component entities. 2. Design High Level Star Schema Identify Star Schemas Required Each transaction entity is a candidate for a star schema. The process of creating star schemas is one of sub-setting i.e. dividing a large and complex model into manageable sized chunks. Note there is not always a one-to-one correspondence between transaction entities and star schemas. For instance, not all transactions will be important for decision making and user input is required to identify transaction entities that are important. When transaction entities are connected in master detail structure, they should be combined into a single star schema e.g. Order and OrderItem. Define Level of Summarisations Deciding on the level of granularity is one of the most critical decisions in star schema design Fine (or transaction level) grain unsummarised each fact table row corresponds to a single transaction. This grain provides maximum flexibility but has storage implications Coarse grain Summarised may be summarised by a subset of dimensions or dimensional attributes. Each row in the fact table corresponds to multiple transactions. Less storage is required here but summarisation loses information and can limit the types of analyses. Most data warehouse environments have a combination of unsummarised and summarised data. An important design decision with respect to integration of the star schemas is that all the star schemas should share common dimensions called conformed dimensions. This ensures that users can drill across from one star schema to another. Identify Relevant Dimensions Component entities associated with each transaction entity represent candidate dimensions. There is not always a one-to-one mapping; All component entities may not be relevant for the purposes of analysis or for the level of granularity. Date/and or time appear as explicit dimensions in most star schemas but not normally represented as entities in operational systems. If non transaction granularity is chosen the dimension required will be determined by how the transactions are summarized (often a subset of the dimensions used in the transaction level star schema).

21 Step 3 Detailed Fact Table design Define Key For fact table it will be a composite key made up by the dimension keys which act as foreign keys to the dimension tables Define Facts Facts are determined on what facts are defined in the operational data. These will be the non-key attributes of the fact table are popularly called measures. Facts should be numeric and at the same grain. For performance reasons it is common to stored derived values/pre-calculations in the fact table e.g gross pay deductibles = net pay While we can derive net pat from the gross pay and deductibles it makes sense to store net pay to improve query performance. There are 3 types of measures 1) Fully additive, 2) semi-additive and 3) non-additive. Where possible, one should convert non-additive and semi-additive to fully-additive facts. In some cases fact tables can be fact-less where one wants to capture that a particular events has occurred e.g. a crime, a student registration. Step 4 Detailed Dimension Table Design Define the Dimension Primary Key This should be a simple numeric key that is generalized. This facilitates preservation of the historic especially in regard to slowly changing dimensions(keys may be used over time). One should still retain the natural key as part of the table definition. It should be generalised to preserve historical data as the primary keys in the operational data may be reused in the OLTP. e.g. for a student dimension introduce a generalized numeric key but retain the natural key X Collapse Hierarchies Dimension tables are formed by collapsing or denormalising hierarchies which are defined by classification entities. Dimension tables are wide and can have 100+ attributes. This introduces redundancy in the data in the form of Transitive dependencies i.e. breaking the 3NF rule. Replace Codes and abbreviations by descriptive text. For understand ability and readability, codes and abbreviations in the source data should be removed and replaced by descriptive text. This is also called rounding out the dimension tables.

22 Examples Transaction entities:- Order, Order Item and Stock Level Component entities:- Customer, Employee, RetailOutlet, Product, Warehouse and Delivery Method 31 classification entities

23 I/O Performance in Data Warehouses Input/output (I/O) performance should always be a key consideration for data warehouse designers and administrators. The typical workload in a data warehouse is especially I/O intensive, with operations such as large data loads and index builds, creation of materialized views, and queries over large volumes of data. The underlying I/O system for a data warehouse should be designed to meet these heavy requirements. One of the major causes of data warehouse performance issues is poor I/O configuration. Database administrators who have previously managed other systems need to pay more attention to the I/O configuration for a data warehouse than they may have previously done for other environments. Although data warehouses usually require large storage systems, storage configurations should be chosen on the basis of I/O bandwidth. Every component of the I/O system should provide enough bandwidth including the physical disks, the I/O channels, and the IO adapters. (As a rule, at least 200 MB per second of IO bandwidth per gigahertz of processing power will be needed.) When considering I/O in high-performance OLTP environments, the critical factor is often random I/Os per second; however, in data warehouses, the critical factor is often sequential I/O throughput. The sequential throughput is usually bounded by the number of active channels between the hosts and the disk arrays.

24 I/O Performance in Data Warehouses Input/output (I/O) performance should always be a key consideration for data warehouse designers and administrators. The typical workload in a data warehouse is especially I/O intensive, with operations such as large data loads and index builds, creation of materialized views, and queries over large volumes of data. The underlying I/O system for a data warehouse should be designed to meet these heavy requirements. One of the major causes of data warehouse performance issues is poor I/O configuration. Database administrators who have previously managed other systems need to pay more attention to the I/O configuration for a data warehouse than they may have previously done for other environments. Although data warehouses usually require large storage systems, storage configurations should be chosen on the basis of I/O bandwidth. Every component of the I/O system should provide enough bandwidth including the physical disks, the I/O channels, and the IO adapters. (As a rule, at least 200 MB per second of IO bandwidth per gigahertz of processing power will be needed.) When considering I/O in high-performance OLTP environments, the critical factor is often random I/Os per second; however, in data warehouses, the critical factor is often sequential I/O throughput. The sequential throughput is usually bounded by the number of active channels between the hosts and the disk arrays.

25 Performance of Sequential I/Os Unlike many OLTP databases whose throughput comprises many small I/Os, data warehouse drive arrays generally see random large I/Os spread across the devices. This type of throughput is known as multiuser sequential workload. Acceptable multiuser sequential throughput requires that large I/Os up to 1 megabyte in size be issued to disks. However, it is common for the host operating system, device drivers, or storage array to fracture these large I/Os into smaller I/Os. For example, default Linux configurations often fracture I/Os into smaller ones (up to 32 KB). This level of I/O fracturing can have a disastrous effect on the total throughput. Therefore, it is important that you use a version of Linux or UNIX with host bus adapters and drives capable of handling 128 KB I/Os or larger. A lot of attention is paid to file system disk fragmentation, but you must remember that in a database environment, I/O fracturing is at least as important.

26 Minimizing I/O Requests Intelligent partitioning can help you tune SQL statements to avoid unnecessary index and table scans (using partition pruning). In the query example in the slide, only the data for March, April, and May need to be accessed. The unnecessary partitions are pruned, so only the partitions corresponding to March, April, and May are accessed. In this example, the partition pruning results in a two-times gain in performance, because three partitions are being scanned instead of six. In many cases, the actual gains from partition pruning can be much more dramatic. Consider the business query that examines data from one month in a partitioned table containing 36 months of historical data. Partition pruning works in conjunction with all other performance features. A query can take advantage of partition pruning while taking advantage of other features such as parallelism and indexing. You can also improve the performance of massive join operations when large amounts of data (for example, several million rows) are joined together by using partitionwise joins. Finally, partitioning data greatly improves manageability of very large databases and dramatically reduces the time required for administrative tasks such as backup and restore. Granularity in a partitioning scheme can be easily changed by splitting or merging partitions. Thus, if a table s data is skewed to fill some partitions more than others, the ones that contain more data can be split to achieve a more even distribution.

27 Minimizing I/O Requests (continued) Bitmap indexes are ideally suited for data warehousing. In fact, bitmap indexes should be the most common type of index within a data warehouse. Most people who have used any sort of relational database are familiar with B-tree indexes. However, B-tree indexes rarely provide significant performance benefits for data-warehouse queries, and require large amounts of disk space; meanwhile, bitmap indexes are often an order of magnitude smaller than B-tree indexes and are also much more effective for data warehouse queries. The advantages of using bitmap indexes are greatest for low-cardinality columns, or the columns in which the number of distinct values is small compared to the number of rows in the table. A gender column, which has only two distinct values (male and female), is ideal for a bitmap index. However, data warehouse administrators can also choose to build bitmap indexes on columns with much higher cardinalities. Bitmap indexes specifically provide a mechanism for efficiently doing set-based logic. For example, consider the simple data warehouse query: How many of my customers live in New York, are between the ages of 30 and 40, and bill more than $100 per month? One way to process this query is to scan the entire table, and examine each

28 Minimizing I/O Requests (continued) When base tables contain a large amount of data, it is an expensive and time-consuming process to compute the required aggregates or to compute joins between these tables. In such cases, queries can take minutes or even hours to return the answer. Because materialized views contain already precomputed aggregates and joins, the Oracle Database server employs an extremely powerful process called query rewrite to quickly answer the query using materialized views. One of the major benefits of creating and maintaining materialized views is the ability to take advantage of query rewrite, which transforms a SQL statement expressed in terms of tables or views into a statement accessing one or more materialized views that are defined on the detail tables. The transformation is transparent to the end user or application, requiring no intervention and no reference to the materialized view in the SQL statement. Because query rewrite is transparent, materialized views can be added or dropped just like indexes without invalidating the SQL in the application code.

29 Minimizing I/O Requests (continued) A better way to evaluate this query is to apply set-based logic: Find the set of customers who live in New York, the set of customers between 30 and 40, and the set of customers who bill more than $100, and then do an intersection of those three sets. This is exactly the functionality provided by bitmap indexes, an efficient mechanism for doing set-based manipulations of data. Bitmap indexes are ideal for a wide range of data warehouse queries. The star transformation is a powerful optimization technique that relies upon implicitly rewriting (or transforming) the SQL of the original star query. The end user never needs to know any of the details about the star transformation. Oracle's query optimizer automatically chooses the star transformation where appropriate. A prerequisite of the star transformation is that there be a single-column bitmap index on every join column of the fact table. These join columns include all foreign key columns. The star transformation is a query transformation aimed at executing star queries efficiently. Oracle processes a star query using two basic phases. The first phase retrieves exactly the necessary rows from the fact table (the result set). Because this retrieval utilizes bitmap indexes, it is very efficient. The second phase joins this result set to the dimension tables.

30 We have seen some of these steps when deriving Dimensional Models from ER Model earlier Step 1: Choosing the process The process (function) refers to the subject matter of a particular data mart. First data mart built should be the one that is most likely to be delivered on time, within budget, and to answer the most commercially important business questions. 2: Choosing the grain Deciding what a record of the fact table is to represent is critical. The dimensions of the fact table must be identified. The grain decision for the fact table also determines the grain of each dimension table. Also include time as a core dimension, which is always present in star schemas. 3: Identifying and conforming the dimensions Dimensions set the context for asking questions about the facts in the fact table. If any dimension occurs in two data marts, they must be exactly the same dimension, or one must be a mathematical subset of the other if Drill-Across the data marts is desired. A dimension used in more than one data mart is referred to as being conformed. To provide fast access and intuitive "drill down" capabilities of data originating from multiple operational systems, it is often necessary to replicate dimensional data in Data Warehouses and in Data Marts. Examples of obvious conformed dimensions include Customer, Location, Organization, Time, and Product. 4: Choosing the facts The grain of the fact table determines which facts can be used in the data mart. Facts should be numeric and additive. Unusable facts include: (1)non-numeric facts (2) non-additive facts (3)fact at different granularity from other facts in table

31 Step 5: Storing pre-calculations in the fact table Once the facts have been selected each should be re-examined to determine whether there are opportunities to use pre-calculations i.e summary tables and Materialized Views. 6: Rounding out the dimension tables Text descriptions are added to the dimension tables. Text descriptions should be as intuitive and understandable to the users as possible. Usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables. 7: Choosing the duration of the database Duration measures how far back in time the fact table goes. Very large fact tables raise at least two very significant data warehouse design issues. Often difficult to source increasing old data from OLTP systems. It is mandatory that the old versions of the important dimensions be used, not the most current versions. Remember the Slowly Changing Dimensions problem. 8: Tracking slowly changing dimensions Slowly changing dimension problem means that the proper description of the old dimension data must be used with the old fact data. Often, a generalized key must be assigned to important dimensions in order to distinguish multiple snapshots of dimensions over a period of time. Most critical physical design issues affecting the end-user s perception includes: physical sort order of the fact table on disk presence of pre-stored summaries or aggregations 9: Deciding the query priorities and the query modes Additional physical design issues include administration, backup, indexing performance, and security. Most critical physical design issues affecting the end-user s perception that includes: physical sort order of the fact table on disk presence of pre-stored summaries or aggregations Additional physical design issues include administration, backup, indexing performance, and security.

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses Designing Data Warehouses To begin a data warehouse project, need to find answers for questions such as: Data Warehousing Design Which user requirements are most important and which data should be considered

More information

Data Warehousing. Overview

Data Warehousing. Overview Data Warehousing Overview Basic Definitions Normalization Entity Relationship Diagrams (ERDs) Normal Forms Many to Many relationships Warehouse Considerations Dimension Tables Fact Tables Star Schema Snowflake

More information

Data warehouse architecture consists of the following interconnected layers:

Data warehouse architecture consists of the following interconnected layers: Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and

More information

Data Strategies for Efficiency and Growth

Data Strategies for Efficiency and Growth Data Strategies for Efficiency and Growth Date Dimension Date key (PK) Date Day of week Calendar month Calendar year Holiday Channel Dimension Channel ID (PK) Channel name Channel description Channel type

More information

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database. 1. Creating a data warehouse involves using the functionalities of database management software to implement the data warehouse model as a collection of physically created and mutually connected database

More information

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases.

This module presents the star schema, an alternative to 3NF schemas intended for analytical databases. Topic 3.3: Star Schema Design This module presents the star schema, an alternative to 3NF schemas intended for analytical databases. Star Schema Overview The star schema is a simple database architecture

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.com Objectives Explain the basics of: 1. Data

More information

Advanced Data Management Technologies Written Exam

Advanced Data Management Technologies Written Exam Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This

More information

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata COGNOS (R) 8 FRAMEWORK MANAGER GUIDELINES FOR MODELING METADATA Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata GUIDELINES FOR MODELING METADATA THE NEXT LEVEL OF PERFORMANCE

More information

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehouses Chapter 12. Class 10: Data Warehouses 1 Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

SOME TYPES AND USES OF DATA MODELS

SOME TYPES AND USES OF DATA MODELS 3 SOME TYPES AND USES OF DATA MODELS CHAPTER OUTLINE 3.1 Different Types of Data Models 23 3.1.1 Physical Data Model 24 3.1.2 Logical Data Model 24 3.1.3 Conceptual Data Model 25 3.1.4 Canonical Data Model

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures) CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm

More information

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples. Instructions to the Examiners: 1. May the Examiners not look for exact words from the text book in the Answers. 2. May any valid example be accepted - example may or may not be from the text book 1. Attempt

More information

DATABASE DEVELOPMENT (H4)

DATABASE DEVELOPMENT (H4) IMIS HIGHER DIPLOMA QUALIFICATIONS DATABASE DEVELOPMENT (H4) December 2017 10:00hrs 13:00hrs DURATION: 3 HOURS Candidates should answer ALL the questions in Part A and THREE of the five questions in Part

More information

A Star Schema Has One To Many Relationship Between A Dimension And Fact Table

A Star Schema Has One To Many Relationship Between A Dimension And Fact Table A Star Schema Has One To Many Relationship Between A Dimension And Fact Table Many organizations implement star and snowflake schema data warehouse The fact table has foreign key relationships to one or

More information

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model Chapter 3 The Multidimensional Model: Basic Concepts Introduction Multidimensional Model Multidimensional concepts Star Schema Representation Conceptual modeling using ER, UML Conceptual modeling using

More information

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

Data Warehouse Design Using Row and Column Data Distribution

Data Warehouse Design Using Row and Column Data Distribution Int'l Conf. Information and Knowledge Engineering IKE'15 55 Data Warehouse Design Using Row and Column Data Distribution Behrooz Seyed-Abbassi and Vivekanand Madesi School of Computing, University of North

More information

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)? Introduction to Data Warehousing and Business Intelligence Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction A tour of the coming DW lectures DW Applications Loosely

More information

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE

FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE FROM A RELATIONAL TO A MULTI-DIMENSIONAL DATA BASE David C. Hay Essential Strategies, Inc In the buzzword sweepstakes of 1997, the clear winner has to be Data Warehouse. A host of technologies and techniques

More information

Star Schema Design (Additonal Material; Partly Covered in Chapter 8) Class 04: Star Schema Design 1

Star Schema Design (Additonal Material; Partly Covered in Chapter 8) Class 04: Star Schema Design 1 Star Schema Design (Additonal Material; Partly Covered in Chapter 8) Class 04: Star Schema Design 1 Star Schema Overview Star Schema: A simple database architecture used extensively in analytical applications,

More information

The Data Organization

The Data Organization C V I T F E P A O TM The Data Organization 1251 Yosemite Way Hayward, CA 94545 (510) 303-8868 rschoenrank@computer.org Business Intelligence Process Architecture By Rainer Schoenrank Data Warehouse Consultant

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar 1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar 1) What does the term 'Ad-hoc Analysis' mean? Choice 1 Business analysts use a subset of the data for analysis. Choice 2: Business analysts access the Data

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Course Number : SEWI ZG514 Course Title : Data Warehousing Type of Exam : Open Book Weightage : 60 % Duration : 180 Minutes

Course Number : SEWI ZG514 Course Title : Data Warehousing Type of Exam : Open Book Weightage : 60 % Duration : 180 Minutes Birla Institute of Technology & Science, Pilani Work Integrated Learning Programmes Division M.S. Systems Engineering at Wipro Info Tech (WIMS) First Semester 2014-2015 (October 2014 to March 2015) Comprehensive

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

ORACLE DATA SHEET ORACLE PARTITIONING

ORACLE DATA SHEET ORACLE PARTITIONING Note: This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development,

More information

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22 ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS CS121: Relational Databases Fall 2017 Lecture 22 E-R Diagramming 2 E-R diagramming techniques used in book are similar to ones used in industry

More information

Rocky Mountain Technology Ventures

Rocky Mountain Technology Ventures Rocky Mountain Technology Ventures Comparing and Contrasting Online Analytical Processing (OLAP) and Online Transactional Processing (OLTP) Architectures 3/19/2006 Introduction One of the most important

More information

Oracle 1Z0-640 Exam Questions & Answers

Oracle 1Z0-640 Exam Questions & Answers Oracle 1Z0-640 Exam Questions & Answers Number: 1z0-640 Passing Score: 800 Time Limit: 120 min File Version: 28.8 http://www.gratisexam.com/ Oracle 1Z0-640 Exam Questions & Answers Exam Name: Siebel7.7

More information

Handout 12 Data Warehousing and Analytics.

Handout 12 Data Warehousing and Analytics. Handout 12 CS-605 Spring 17 Page 1 of 6 Handout 12 Data Warehousing and Analytics. Operational (aka transactional) system a system that is used to run a business in real time, based on current data; also

More information

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

Oracle Database 11g: Data Warehousing Fundamentals

Oracle Database 11g: Data Warehousing Fundamentals Oracle Database 11g: Data Warehousing Fundamentals Duration: 3 Days What you will learn This Oracle Database 11g: Data Warehousing Fundamentals training will teach you about the basic concepts of a data

More information

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives Data-Driven Driven Business Intelligence Systems: Parts I Week 5 Dr. Jocelyn San Pedro School of Information Management & Systems Monash University IMS3001 BUSINESS INTELLIGENCE SYSTEMS SEM 1, 2004 Lecture

More information

Modern Systems Analysis and Design

Modern Systems Analysis and Design Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Designing Databases Learning Objectives Concisely define each of the following key database design terms:

More information

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 7: Schemas. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 7: Schemas Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database schema A Database Schema captures: The concepts represented Their attributes

More information

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes?

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 07 Terminologies Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Database

More information

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer Segregating Data Within Databases for Performance Prepared by Bill Hulsizer When designing databases, segregating data within tables is usually important and sometimes very important. The higher the volume

More information

normalization are being violated o Apply the rule of Third Normal Form to resolve a violation in the model

normalization are being violated o Apply the rule of Third Normal Form to resolve a violation in the model Database Design Section1 - Introduction 1-1 Introduction to the Oracle Academy o Give examples of jobs, salaries, and opportunities that are possible by participating in the Academy. o Explain how your

More information

Advanced Modeling and Design

Advanced Modeling and Design Advanced Modeling and Design 1. Advanced Multidimensional Modeling Handling changes in dimensions Large-scale dimensional modeling 2. Design Methodologies 3. Project Management Acknowledgements: I am indebted

More information

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data

More information

Decision Support Systems aka Analytical Systems

Decision Support Systems aka Analytical Systems Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis

More information

ETL and OLAP Systems

ETL and OLAP Systems ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

Question: 1 What are some of the data-related challenges that create difficulties in making business decisions? Choose three.

Question: 1 What are some of the data-related challenges that create difficulties in making business decisions? Choose three. Question: 1 What are some of the data-related challenges that create difficulties in making business decisions? Choose three. A. Too much irrelevant data for the job role B. A static reporting tool C.

More information

Data Warehousing and OLAP Technologies for Decision-Making Process

Data Warehousing and OLAP Technologies for Decision-Making Process Data Warehousing and OLAP Technologies for Decision-Making Process Hiren H Darji Asst. Prof in Anand Institute of Information Science,Anand Abstract Data warehousing and on-line analytical processing (OLAP)

More information

How to speed up a database which has gotten slow

How to speed up a database which has gotten slow Triad Area, NC USA E-mail: info@geniusone.com Web: http://geniusone.com How to speed up a database which has gotten slow hardware OS database parameters Blob fields Indices table design / table contents

More information

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data. Managing Data Data storage tool must provide the following features: Data definition (data structuring) Data entry (to add new data) Data editing (to change existing data) Querying (a means of extracting

More information

Extended TDWI Data Modeling: An In-Depth Tutorial on Data Warehouse Design & Analysis Techniques

Extended TDWI Data Modeling: An In-Depth Tutorial on Data Warehouse Design & Analysis Techniques : An In-Depth Tutorial on Data Warehouse Design & Analysis Techniques Class Format: The class is an instructor led format using multiple learning techniques including: lecture to present concepts, principles,

More information

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India

More information

Data Warehousing and OLAP

Data Warehousing and OLAP Data Warehousing and OLAP INFO 330 Slides courtesy of Mirek Riedewald Motivation Large retailer Several databases: inventory, personnel, sales etc. High volume of updates Management requirements Efficient

More information

Call: SAS BI Course Content:35-40hours

Call: SAS BI Course Content:35-40hours SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio

More information

Techno Expert Solutions An institute for specialized studies!

Techno Expert Solutions An institute for specialized studies! Getting Started Course Content of IBM Cognos Data Manger Identify the purpose of IBM Cognos Data Manager Define data warehousing and its key underlying concepts Identify how Data Manager creates data warehouses

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

QUALITY MONITORING AND

QUALITY MONITORING AND BUSINESS INTELLIGENCE FOR CMS DATA QUALITY MONITORING AND DATA CERTIFICATION. Author: Daina Dirmaite Supervisor: Broen van Besien CERN&Vilnius University 2016/08/16 WHAT IS BI? Business intelligence is

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

Module 1.Introduction to Business Objects. Vasundhara Sector 14-A, Plot No , Near Vaishali Metro Station,Ghaziabad

Module 1.Introduction to Business Objects. Vasundhara Sector 14-A, Plot No , Near Vaishali Metro Station,Ghaziabad Module 1.Introduction to Business Objects New features in SAP BO BI 4.0. Data Warehousing Architecture. Business Objects Architecture. SAP BO Data Modelling SAP BO ER Modelling SAP BO Dimensional Modelling

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Lectures for the course: Data Warehousing and Data Mining (IT 60107) Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline

More information

Chapter 3: The Relational Database Model

Chapter 3: The Relational Database Model Chapter 3: The Relational Database Model Student: 1. The practical significance of taking the logical view of a database is that it serves as a reminder of the simple file concept of data storage. 2. You

More information

SOFTWARE ENGINEERING Prof.N.L.Sarda Computer Science & Engineering IIT Bombay. Lecture #10 Process Modelling DFD, Function Decomp (Part 2)

SOFTWARE ENGINEERING Prof.N.L.Sarda Computer Science & Engineering IIT Bombay. Lecture #10 Process Modelling DFD, Function Decomp (Part 2) SOFTWARE ENGINEERING Prof.N.L.Sarda Computer Science & Engineering IIT Bombay Lecture #10 Process Modelling DFD, Function Decomp (Part 2) Let us continue with the data modeling topic. So far we have seen

More information

Processing of Very Large Data

Processing of Very Large Data Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first

More information

Overview. DW Performance Optimization. Aggregates. Aggregate Use Example

Overview. DW Performance Optimization. Aggregates. Aggregate Use Example Overview DW Performance Optimization Choosing aggregates Maintaining views Bitmapped indices Other optimization issues Original slides were written by Torben Bach Pedersen Aalborg University 07 - DWML

More information

CHAPTER 3 Implementation of Data warehouse in Data Mining

CHAPTER 3 Implementation of Data warehouse in Data Mining CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected

More information

Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance

Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance Data Warehousing > Tools & Utilities Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance By: Rod Vandervort, Jeff Shelton, and Louis Burger Table of Contents

More information

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing The Evolution of Data Warehousing Data Warehousing Concepts Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient and cost-effective

More information

Part 1: Indexes for Big Data

Part 1: Indexes for Big Data JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

A Data Warehouse Implementation Using the Star Schema. For an outpatient hospital information system

A Data Warehouse Implementation Using the Star Schema. For an outpatient hospital information system A Data Warehouse Implementation Using the Star Schema For an outpatient hospital information system GurvinderKaurJosan Master of Computer Application,YMT College of Management Kharghar, Navi Mumbai ---------------------------------------------------------------------***----------------------------------------------------------------

More information

Two Success Stories - Optimised Real-Time Reporting with BI Apps

Two Success Stories - Optimised Real-Time Reporting with BI Apps Oracle Business Intelligence 11g Two Success Stories - Optimised Real-Time Reporting with BI Apps Antony Heljula October 2013 Peak Indicators Limited 2 Two Success Stories - Optimised Real-Time Reporting

More information

Oracle Database 10g: Introduction to SQL

Oracle Database 10g: Introduction to SQL ORACLE UNIVERSITY CONTACT US: 00 9714 390 9000 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Data Vault Partitioning Strategies WHITE PAPER

Data Vault Partitioning Strategies WHITE PAPER Dani Schnider Data Vault ing Strategies WHITE PAPER Page 1 of 18 www.trivadis.com Date 09.02.2018 CONTENTS 1 Introduction... 3 2 Data Vault Modeling... 4 2.1 What is Data Vault Modeling? 4 2.2 Hubs, Links

More information

Data Warehousing. Seminar report. Submitted in partial fulfillment of the requirement for the award of degree Of Computer Science

Data Warehousing. Seminar report.  Submitted in partial fulfillment of the requirement for the award of degree Of Computer Science A Seminar report On Data Warehousing Submitted in partial fulfillment of the requirement for the award of degree Of Computer Science SUBMITTED TO: SUBMITTED BY: www.studymafia.org www.studymafia.org Preface

More information

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Information Technology IT6702 Data Warehousing & Data Mining Anna University 2 & 16 Mark Questions & Answers Year / Semester: IV / VII Regulation:

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 02 Introduction to Data Warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

CHAPTER 6 DATABASE MANAGEMENT SYSTEMS

CHAPTER 6 DATABASE MANAGEMENT SYSTEMS CHAPTER 6 DATABASE MANAGEMENT SYSTEMS Management Information Systems, 10 th edition, By Raymond McLeod, Jr. and George P. Schell 2007, Prentice Hall, Inc. 1 Learning Objectives Understand the hierarchy

More information

Analytics: Server Architect (Siebel 7.7)

Analytics: Server Architect (Siebel 7.7) Analytics: Server Architect (Siebel 7.7) Student Guide June 2005 Part # 10PO2-ASAS-07710 D44608GC10 Edition 1.0 D44917 Copyright 2005, 2006, Oracle. All rights reserved. Disclaimer This document contains

More information

Application software office packets, databases and data warehouses.

Application software office packets, databases and data warehouses. Introduction to Computer Systems (9) Application software office packets, databases and data warehouses. Piotr Mielecki Ph. D. http://www.wssk.wroc.pl/~mielecki piotr.mielecki@pwr.edu.pl pmielecki@gmail.com

More information

SMD149 - Operating Systems - File systems

SMD149 - Operating Systems - File systems SMD149 - Operating Systems - File systems Roland Parviainen November 21, 2005 1 / 59 Outline Overview Files, directories Data integrity Transaction based file systems 2 / 59 Files Overview Named collection

More information

Product Documentation SAP Business ByDesign August Analytics

Product Documentation SAP Business ByDesign August Analytics Product Documentation PUBLIC Analytics Table Of Contents 1 Analytics.... 5 2 Business Background... 6 2.1 Overview of Analytics... 6 2.2 Overview of Reports in SAP Business ByDesign... 12 2.3 Reports

More information

Essentials of Database Management

Essentials of Database Management Essentials of Database Management Jeffrey A. Hoffer University of Dayton Heikki Topi Bentley University V. Ramesh Indiana University PEARSON Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

MICROSOFT BUSINESS INTELLIGENCE

MICROSOFT BUSINESS INTELLIGENCE SSIS MICROSOFT BUSINESS INTELLIGENCE 1) Introduction to Integration Services Defining sql server integration services Exploring the need for migrating diverse Data the role of business intelligence (bi)

More information

MOC 20463C: Implementing a Data Warehouse with Microsoft SQL Server

MOC 20463C: Implementing a Data Warehouse with Microsoft SQL Server MOC 20463C: Implementing a Data Warehouse with Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to implement a data warehouse with Microsoft SQL Server.

More information

Hierarchies in a multidimensional model: From conceptual modeling to logical representation

Hierarchies in a multidimensional model: From conceptual modeling to logical representation Data & Knowledge Engineering 59 (2006) 348 377 www.elsevier.com/locate/datak Hierarchies in a multidimensional model: From conceptual modeling to logical representation E. Malinowski *, E. Zimányi Department

More information

DATA STRUCTURES USING C

DATA STRUCTURES USING C DATA STRUCTURES USING C File Management Chapter 9 2 File Concept Contiguous logical address space Types: Data numeric character binary Program 3 File Attributes Name the only information kept in human-readable

More information

HANA Performance. Efficient Speed and Scale-out for Real-time BI

HANA Performance. Efficient Speed and Scale-out for Real-time BI HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business

More information

Data Vault Brisbane User Group

Data Vault Brisbane User Group Data Vault Brisbane User Group 26-02-2013 Agenda Introductions A brief introduction to Data Vault Creating a Data Vault based Data Warehouse Comparisons with 3NF/Kimball When is it good for you? Examples

More information

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City

Complete. The. Reference. Christopher Adamson. Mc Grauu. LlLIJBB. New York Chicago. San Francisco Lisbon London Madrid Mexico City The Complete Reference Christopher Adamson Mc Grauu LlLIJBB New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto Contents Acknowledgments

More information

Data Warehouse and Mining

Data Warehouse and Mining Data Warehouse and Mining 1. is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management decisions. A. Data Mining. B. Data Warehousing. C. Web Mining. D. Text

More information

The strategic advantage of OLAP and multidimensional analysis

The strategic advantage of OLAP and multidimensional analysis IBM Software Business Analytics Cognos Enterprise The strategic advantage of OLAP and multidimensional analysis 2 The strategic advantage of OLAP and multidimensional analysis Overview Online analytical

More information

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag. Physical Design D B M G 1 Phases of database design Application requirements Conceptual design Conceptual schema Logical design ER or UML Relational tables Logical schema Physical design Physical schema

More information

CTL.SC4x Technology and Systems

CTL.SC4x Technology and Systems in Supply Chain Management CTL.SC4x Technology and Systems Key Concepts Document This document contains the Key Concepts for the SC4x course, Weeks 1 and 2. These are meant to complement, not replace,

More information