Data Modelling for Data Warehousing

Size: px
Start display at page:

Download "Data Modelling for Data Warehousing"

Transcription

1 Data Modelling for Data Warehousing

2 Table of Contents 1 DATA WAREHOUSING CONCEPTS DEFINITION DATA WAREHOUSING ARCHITECTURE GLOBAL WAREHOUSE ARCHITECTURE INDEPENTDENT DATA MART ARCHITECTURE INTER-CONNECTED DATA MART ARCHITECTURE COMPONENTS OF DATA WAREHOUSING DATA ANALYSIS TECHNIQUES QUERY AND REPORTING MULTIDIMENSIONAL ANALYSIS DATA MINING IMPLEMENTATION OF A WAREHOUSE TOP DOWN IMPLEMENTATION BOTTOM UP IMPLEMENTATION A COMBINED APPROACH ARCHITECTING DATA GRANULAITY OF DATA IN DATA WAREHOUSE LOGICAL DATA PARTITIONING IDENTIFYING METADATA DATA MODELING FOR THE DATA WAREHOUSE VISUALISATION OF THE BUSSINESS WORLD DATA MODELLING TECHNIQUES ER MODELLING DIMENSIONAL MODELLING VISUALIZATION OF DIMENSIONAL MODEL DATA MODELLING FOR OLAP DRILL DOWN AND ROLL UP SLICE AND DICE SCHEMAS IN DATA WAREHOUSING STAR SCHEMA SNOWFLAKE SCHEMA CREATING A DIMENSIONAL MODEL MANAGING THE WAREHOUSING ENVIRONMENT EXTRACTION TRANSFORM STRUCTURAL TRANSFORMATION CONTENT TRANSFORMATION FUNCTIONAL TRANSFORMATION LOADING ROLE OF OLAP SERVER IN A DATA WAREHOUSING ENVIORNMENT COMPARISONS BETWEEN OLTP, DATA WAREHOUSE RDBMS AND OLAP SERVER DATA WAREHOUSING TOOLS HYPERION MICROSTRATEGY ASSOCIATED REFERENCES...23 Page 2 of 23

3 1 DATA WAREHOUSING CONCEPTS Data Warehousing is a best-in-class approach for leveraging information. It accommodates need to consolidate and store data in information systems that provide end users with timely access to critical information so they can make decisions that improve their organisation s performance. Data warehousing also provides a base for powerful data analysis techniques such as data mining and multidimensional analysis, as well as the more traditional query and reporting. The primary concept of data warehousing is that the data stored for business analysis can most effectively be accessed by separating it from the data in the operational systems. Organisations have vast amounts of data but have found it increasingly difficult to access it and make use of it. This is because it is in many different formats, exists on many different platforms, and resides in many different file and database structures developed by different vendors. Thus organisations have had to write and maintain perhaps hundreds of programs that are used to extract, prepare, and consolidate data for use by many different applications for analysis and reporting. Also, decision-makers often want to dig deeper into the data once initial findings are made. This would typically require modification of the extract programs or development of new ones. This process is costly, inefficient, and very time consuming. Data warehousing offers a better approach. 1.1 DEFINITION A data warehouse is a structured extensible environment designed for the analysis of nonvolatile data, logically and physically transformed from multiple source applications to align with business structure, updated and maintained for a long time period, expressed in simple business terms, and summarized for quick analysis. A data warehouse is a: Subject Oriented Integrated Time Variant Non Volatile collection of data in support of management decision processes. SUBJECT ORIENTATION Data is organised around major subjects of the enterprise Applications typically are designed around process/ functions Data warehouse is subject/data-driven orientation. It only includes data used for decision making Operational data relate to immediate needs. It is based on current business rules. Data warehouse data spans time and allows more complex relations DATA INTEGRATION Legacy systems have many ways of referring to same data.when data is brought into data warehouse it is integrated so it is referred to in only one way, has the same format, and the same units in which attributes are measured Thus, in data warehouse, the data is stored in a single Page 3 of 23

4 TIME globally acceptable fashion even though sources may differ In operational environment, data is accurate at moment of access. In data warehouse, data is accurate at some moment in time but not necessarily "right now". Data warehouse has 5-10 year horizon, operational data ~ days.in data warehouse, key always contains unit of time (day, week etc). Correctly recorded data warehouse data, CANNOT be updated NON-VOLATILITY In operational environment, updates, (inserts, deletes, changes) done regularly on record-by record basis In data warehouse, data is not updated, only data loading and data access is done. This results in much simpler technical environment- Little or no redundancy between operational and data warehouse environments Data is filtered and transformed as it passes to the data warehouse. Only data needed for DSS/EIS is stored. Data warehouse contains summary data not found in operational data Page 4 of 23

5 2 DATA WAREHOUSING ARCHITECTURE Selection of architecture will determine where the data warehouses and/or data marts themselves will reside and where the control resides. For example, the data can reside in a central location that is managed centrally. Or, the data can reside in distributed local and/or remote locations that are either managed centrally or independently. 2.1 GLOBAL WAREHOUSE ARCHITECTURE A global data warehouse is one that will support all, or a large part of the corporation that has the requirement for a more fully integrated data warehouse with a high degree of data access and usage across departments or lines-of-business. It is designed and constructed based on the needs of the enterprise as a whole. A global warehouse could be considered to be a common repository for decision support data that is available across the entire organisation, or a large subset thereof. A distributed global warehouse is also to be used by the entire organisation to distribute the data across multiple physical locations within the organisation and is managed by the central department. 2.2 INDEPENTDENT DATA MART ARCHITECTURE Data marts are smaller data warehouses that can function independently or can be interconnected to form a global integrated data warehouse. An independent data mart architecture implies stand-alone data marts that are controlled by a particular workgroup, department, or line of business and are built solely to meet their needs. There may not even be any connectivity with data marts in other workgroups, departments, or lines of business. The data for these data marts may be generated internally. The data may be extracted from operational systems but would then require the support of Information System(IS). IS would not control the implementation but would simply help manage the environment. 2.3 INTER-CONNECTED DATA MART ARCHITECTURE Although separate data marts are implemented in a particular workgroup, department, or line of business, they can be integrated, or interconnected, to provide a more enterprise wide or corporate wide view of the data. At the highest level of integration, they can become the global data warehouse. Therefore, end users in one department can access and use the data on a data mart in another department. Data in an inter-connected data mart can come from operational or external data sources and also another global data warehouse. Interconnected data marts can be independently controlled by a workgroup, department, or line of business. They decide what source data to load into the data mart, when to update it, who can access it, and where it resides. They may also elect to provide the tools and skills necessary to implement the data mart themselves. Page 5 of 23

6 3 COMPONENTS OF DATA WAREHOUSING Data warehousing components are identified in the figure above. A detailed discussion of each of the components will be followed as we move along. Data Mart -- A data mart is a focused subset of a data warehouse that deals with a single area of data and is organised for quick analysis. Metadata -- Literally, "data about data." It is the descriptions of what kind of information is stored where, how it is encoded, how it is related to other information, where it comes from, and how it is related to your business. Query -- A specific atomic request for information from a database. Data Mining -- Data mining is the running of automated routines that search through data organised in a warehouse. They look for patterns in order to point you to areas that you should be addressing. OLAP (On-Line Analytical Processing) -- Tools that extract data from data warehouses go by a variety of names: OLAP, ROLAP (Relational On-Line Analytical Processing), multidimensional analysis tools, and decision support systems being the most common ones. All provide the ability to do rapid analysis of multiple simultaneous factors, something that relational databases can't do. Data Visualization --Techniques for turning data into information by using the high capacity of the human brain to visually recognise patterns and trends. There are many specialised techniques designed to make particular kinds of visualization easy. Page 6 of 23

7 4 DATA ANALYSIS TECHNIQUES 4.1 QUERY AND REPORTING Query definition is the process of taking a business question or a particular decision support tool can use hypothesis and translating it into a query format. When the query is executed, the tool generates the appropriate language commands to access and retrieve the requested data, which is returned in what is typically called an answer set. The data analyst then performs the required calculations and manipulations on the answer set to achieve the desired results. Those results are then formatted to fit into a display or report template that has been selected for ease of understanding by the end user. This template could consist of combinations of text, graphic images, video and audio. Finally, the report is delivered to the end user on the desired output medium, which could be printed on paper, visualised on a computer display device, or presented audibly. The process of query and reporting starts with query definition and ends with report delivery. 4.2 MULTIDIMENSIONAL ANALYSIS Data is stored in the form of a multidimensional cube. A major advantage of Multidimensional database (MDD) over relational databases is that they can be optimised for speed and ease of query response. Rather than submitting multiple queries, data is structured to enable fast and easy access to answers to the questions that are typically asked. For example, the data would be structured to include answers to the question, How much of each of our products was sold on a particular day, by a particular sales person, in a particular store? Each separate part of that query is called a dimension. By pre-calculating answers to each sub query within the larger context, many answers can be readily available because the results are not recalculated with each query; they are simply accessed and displayed. By having the results to the above query, one would automatically have the answer to any of the sub-queries. That is, we would already know the answer to the sub-query, How much of a particular product was sold by a particular salesperson? Having the data categorised by these different factors, or dimensions, makes it easier to understand, particularly by business-oriented users of the data. Dimensions can have individual entities or a hierarchy of entities, such as region, store, and department. Multidimensional analysis enables users to look at a large number of interdependent factors involved in a business problem and to view the data in complex relationships. The complex relationships can be analysed through an iterative process that includes drilling down to lower levels of detail or rolling up to higher levels of summarisation and aggregation. The figure below demonstrates that the user can start by viewing the total sales for the organisation and drill down to view the sales by continent, region, country, and finally by customer. Or, the user could start at customer and roll up through the different levels to finally reach total sales. Page 7 of 23

8 4.3 DATA MINING Also known as Knowledge Data Discovery (KDD) Data mining refers to finding answers about a business from the data warehouse that executives or analyst had not thought to ask. It allows managerial information from the legacy systems they have been long paying for. KDD applies techniques mostly from artificial intelligence to discover new information. That is, it is designed to find information that queries and reports don t reveal effectively. KDD seeks to find patterns in data and to infer rules. KDD techniques include : statistical analysis of data neural networks, expert systems fuzzy logic intelligent agents multidimensional analysis data visualization decision trees Data mining can be bottom up (explore raw facts to find connections) top down (search to test hypotheses) Data mining deals with five kinds of data- Associations (things done together),sequences (events over time), classifications,pattern recognition, clusters (define new groups) and forecasting predictions from time series Page 8 of 23

9 5 IMPLEMENTATION OF A WAREHOUSE The choice of an implementation approach is influenced by such factors as the current IS infrastructure, resources available, the architecture selected, scope of the implementation, the need for more global data access across the organisation, return-on-investment requirements, and speed of implementation. 5.1 TOP DOWN IMPLEMENTATION A top down implementation requires more planning and design work to be completed at the beginning of the project. This brings with it the need to involve people from each of the workgroups, departments, or lines of business that will be participating in the data warehouse implementation. Decisions concerning data sources to be used, security, data structure, data quality, data standards, and an overall data model will typically need to be completed before actual implementation begins. The top down implementation can also imply more of a need for an enterprise wide or corporate wide data warehouse with a higher degree of cross workgroup, department, or line of business access to the data. The top down implementation approach can work well when there is a good centralised organisation that is responsible for all hardware and other computer resources. 5.2 BOTTOM UP IMPLEMENTATION A bottom up implementation involves the planning and designing of data marts without waiting for a more global infrastructure to be put in place. This approach is more widely accepted today than the top down approach because immediate results from the data marts can be realised and used as justification for expanding to a more global implementation. Advantages of the bottom up approach are faster payback and less complex design than a global data warehouse. But it should be kept in mind that as more data marts are created, data redundancy and inconsistency between the data marts can occur. Also multiple data marts may bring with them an increased load on operational systems because more data extract operations are required. Integration of the data marts into a more global environment, if that is the desire, can be difficult unless some degree of planning has been done. 5.3 A COMBINED APPROACH Using both the above approaches hand in hand would mean developing a base level infrastructure definition for the global data warehouse, initially staying at a business level. For example, as a first step simply identify the lines of business that will be participating. A high level view of the business processes and data areas of interest to them will provide the elements for a plan for implementation of the data marts. As data marts are implemented, develop a plan for how to handle the data elements that are needed by multiple data marts. This could be the start of a more global data warehouse structure or simply a common data store accessible by all the data marts. It some cases it may be appropriate to duplicate the data across multiple data marts. This is a trade-off decision between storage space, ease of access, and the impact of data redundancy along with the requirement to keep the data in the multiple data marts at the same level of consistency. Page 9 of 23

10 6 ARCHITECTING DATA One of the most basic concepts of data warehousing is to clean, filter, transform, summarise, and aggregate the data, and then put it in a structure for easy access and analysis by those users. In architecting the data, it is structured and located according to its characteristics. A warehouse contains: real time data old detail data Derived data Reconciled data Metadata REAL TIME DATA reflects most recent happenings voluminous if stored at lowest level of granularity usually stored on disk OLDER DETAIL DATA stored on mass storage medium DERIVED DATA Uses unit of time for summarisation Uses attributes to summarise RECONCILED DATA Compact and easily accessible METADATA Data about the data Contains directory of warehouse Guide to mapping of data from operational to warehouse form Describes rules used for summarisation. 6.1 GRANULAITY OF DATA IN DATA WAREHOUSE Granularity of data in the data warehouse is concerned with the level of summarisation of the data elements. It refers to the level of detail available in the data elements. Granularity is important in data warehouse modelling because it offers the opportunity for trade-off between important issues in data warehousing. For example, one trade-off could be performance versus volume of data. Another example might be a trade-off between the ability to access data at a very detailed level versus performance and the cost of storing and accessing large volumes of data. Selecting the appropriate level of granularity significantly affects the volume of data in the data warehouse. Along with that, selecting the appropriate level of granularity determines the capability of the data warehouse to enable answers to different types of queries. In organisations that have large volumes of data, multiple levels of granularity could be considered to overcome the trade-offs. For example, we could divide the data in a data warehouse into detailed raw data and summarised data. Detailed raw data is the lowest level of detailed transaction data without any aggregation and summarisation. Summarised data is transaction data aggregated at the level required for the most typically used queries. A much lower volume of data is required for the summarised data source as compared to the detailed raw data. Page 10 of 23

11 6.2 LOGICAL DATA PARTITIONING Partitioning the data in the data warehouse enables the accomplishment of several critical goals. For example, it can: Provide flexible access to data Provide easy and efficient data management services Ensure scalability of the data warehouse Enable elements of the data warehouse to be portable. That is, certain elements of the data warehouse can be shared with other physical warehouses or archived on other storage media. We usually partition large volumes of current detail data by splitting it into smaller pieces. Doing that helps make the data easier to: Restructure Index Sequentially scan Reorganise Recover Monitor Data can be partitioned according to several of the below criteria- Time period (date, month, or quarter) Geography (location) Product (more generically, by line of business) Organisational unit A combination of the above 6.3 IDENTIFYING METADATA Metadata contains descriptions of what kind of information is stored where, how it is encoded, how it is related to other information, where it comes from, and how it is related to the business. Metadata is information kept about the data. Any form of auxiliary data that is maintained by an application about its data The description of the structure content, keys, indexes etc. of data Information about the business terms to be used. Name, definition and purpose for each model. The names simply give the users something to focus on when they are searching. Usually it is the same as fact. The definition identifies what is modelled and purpose describes what it is modelled for. The metadata for the model should also contain a list of dimensions, facts, and measures associated with it, as well as the name of a contact person so that users can get additional information when they have questions about the model. Metadata about a dimension should also include hierarchy, change rules, load frequency, and the attributes, facts, and measures associated with the dimension. For attributes that contain derived values, the rules for determining the value must be documented. Metadata about a fact should include the load frequency, the measures and dimensions associated with the fact, and the grain of time for the fact. Page 11 of 23

12 Metadata about a measure should include its data type, domain, derivation rules, and the facts and dimensions associated with the measure. Subsidiary targets are targets derived from the originally designed fact and dimension tables. Metadata for subsidiary targets should be the same as for the original facts and dimensions, with only the aggregates themselves being different. The figure below shows the complete Metadata diagram for the Data Warehouse Page 12 of 23

13 7 DATA MODELING FOR THE DATA WAREHOUSE 7.1 VISUALISATION OF THE BUSSINESS WORLD Data modelling gives us the ability to visualise what we cannot yet realise. Two data modelling techniques relevant in a data warehousing environment are ER Modelling and Dimensional Modelling. While ER diagram is a tool that can help in the analysis of business requirements and in the design of the resulting data structure. Dimensional Modelling gives us an improved capability to visualise the very abstract questions that the business end users are required to answer. 7.2 DATA MODELLING TECHNIQUES ER MODELLING An ER model is represented by an ER diagram, which uses three basic graphic symbols to conceptualise the data: entity, relationship, and attribute. Entity- An entity is defined to be a person, place, thing, or event of interest to the business or the organisation. An entity represents a class of objects, which are things in the real world that can be observed and classified by their properties and characteristics. Relationship-A relationship is represented with lines drawn between entities. It depicts the structural interaction and association among the entities in a model. The relationship between two entities can be defined in terms of the cardinality. This is the maximum number of instances of one entity that are related to a single instance in another table and vice versa. The possible cardinalities are: one-to-one (1:1), one-to-many (1:M), and many-to-many (M:M).In a detailed (normalized) ER model, any M:M relationship is not shown because it is resolved to an associative entity. Attribute- Attributes describe the characteristics of properties of the entities. An attribute Name should be unique in an entity and should be self-explanatory. The figure above is an example of an ER Model Page 13 of 23

14 7.2.2 DIMENSIONAL MODELLING Dimensional modelling is a technique for conceptualising and visualising data models as a set of measures that are described by common aspects of the business. It is especially useful for summarising and rearranging the data and presenting views of the data to support data analysis. Dimensional modelling focuses on numeric data, such as values, counts, weights, balances, and occurrences. Dimensional modelling has the following basic concepts: Facts Dimensions Measures (variables) Fact A fact is a collection of related data items, consisting of measures and context data. Each fact typically represents a business item, a business transaction, or an event that can be used in analysing the business or business processes. In a data warehouse, facts are implemented in the core tables in which all of the numeric data is stored. Dimension A dimension is a collection of members or units of the same type of views. In a dimensional model, every data point in the fact table is associated with one and only one member from each of the multiple dimensions. Dimensions determine the contextual background for the facts. Dimensions are the parameters over which we want to perform Online Analytical Processing (OLAP). For example, in a database for analysing all sales of products, common dimensions could be: Time Location/region Customers Salesperson Scenarios such as actual, budgeted, or estimated numbers Dimensions can usually be mapped to nonnumeric, informative entities such as branch or employee. Dimension Members: A dimension contains many dimension members. A dimension member is a distinct name or identifier used to determine a data items position. For example, all months, quarters, and years make up a time dimension, and all cities, regions, and countries make up a geography dimension. Dimension Hierarchies: We can arrange the members of a dimension into one or more hierarchies. Each hierarchy can also have multiple hierarchy levels. Every member of a dimension does not locate on one hierarchy structure. Page 14 of 23

15 Measure A measure is a numeric attribute of a fact, representing the performance or behaviour of the business relative to the dimensions. For example, measures are the sales in money, the sales volume, the quantity supplied, the supply cost, the transaction amount, and so forth. A measure is determined by combinations of the members of the dimensions and is located on facts. 7.3 VISUALIZATION OF DIMENSIONAL MODEL The most popular way of visualizing a dimensional model is to draw a cube. We can represent a three-dimensional model using a cube. For example the measurement is the volume of production, which is determined by the combination of three dimensions: location, product, and time. The location dimension and product dimension may have their own two levels of hierarchy. Say the location dimension has the region level and plant. In each dimension, there are members such as the east region and west region of the location dimension. Thus this information can be represented by a three dimensional cube. 7.4 DATA MODELLING FOR OLAP Dimensional modelling is primarily to support OLAP and decision making. Four types of operations are used in OLAP to analyse data. Considering granularity, the operations of drill down and roll up are performed. To browse along the dimensions, slice and dice operations are used DRILL DOWN AND ROLL UP Drill down and roll up are the operations for moving the view down and up along the dimensional hierarchy levels. With drill-down capability, users can navigate to higher levels of detail. With roll-up capability, users can zoom out to see a summarise level of data. The navigation path is determined by the hierarchies within dimensions SLICE AND DICE Slice and dice are the operations for browsing the data through the visualised cube. Slicing cuts through the cube so that users can focus on some specific perspectives. Dicing rotates the cube to another perspective so that users can be more specific with the data analysis. For example suppose we are considering a company s production of two products Cell phones and pagers. The dimensions would be location, time, product. While analysing the production report of a specific month by plant and product, you get the quarterly view of gross production by plant. You can then change the dimension from product to time, which is dicing. Now, you want to focus on the Cell Phone only, rather than gross production. To do this, you can cut off the cube only for the Cell Phone for the same dimensions, which is slicing. Page 15 of 23

16 8 SCHEMAS IN DATA WAREHOUSING There are two basic models that can be used in dimensional modelling: Star model Snowflake model 8.1 STAR SCHEMA The star schema is perhaps the simplest data warehouse schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. The centre of the star consists of a large fact table and the points of the star are the dimension tables. A star schema is characterised by one or more very large fact tables that contain the primary information in the data warehouse, and a number of much smaller dimension tables (or lookup tables), each of which contains information about the entries for a particular attribute in the fact table. A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other. A typical fact table contains keys and measures. In the example given below, the fact table, sales, contain the measures quantity_sold, amount, and cost, and the keys cust_id, time_id, prod_id, channel_id, and promo_id. The dimension tables are customers, times, products, channels, and promotions. The product dimension table, contains information about each product number that appears in the fact table. A star join is a primary key to foreign key join of the dimension tables to a fact table. The main advantages of star schemas are that they: Provide a direct and intuitive mapping between the business entities being analysed by end users and the schema design. Provide highly optimised performance for typical star queries. Are widely supported by a large number of business intelligence tools, which may anticipate or even require that the data-warehouse schema contain dimension tables Star schemas are used for both simple data marts and very large data warehouses. Page 16 of 23

17 8.2 SNOWFLAKE SCHEMA Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. The snowflake model is the result of decomposing one or more of the dimensions, which sometimes have hierarchies themselves. For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. The figure below presents a graphical representation of a snowflake schema. 8.3 CREATING A DIMENSIONAL MODEL The creation of a data model can be summarised in the following steps : Creating a model to identify the measures and dimensions within our requirements Adding a time Dimension( To evaluate a data in its proper context it should always contain a dimension of time) Create Facts Granularity, Additivity and Merging of Facts Integration with Existing Models Identifying Metadata Validating the Model Page 17 of 23

18 9 MANAGING THE WAREHOUSING ENVIRONMENT Data warehouse needs to be regularly loaded so that it can serve its purpose of facilitating business analysis. To do this, data from one or more operational systems needs to be extracted and copied into the warehouse. The process of extracting data from source systems and bringing it into the data warehouse is commonly called ETL, which stands for extraction, transformation, and loading. 9.1 EXTRACTION Extraction is the process of collecting the source data from the operational systems and other external sources. The source of data for the capture process includes file formats and both relational and nonrelational database management systems. The data can be captured from many types of files, including extract files or tables, image copies, changed data files or tables, DBMS logs or journals, message files, and event logs. The type of capture file depends on the technique used for capturing the data. Data extraction techniques include source data extraction, DBMS log capture, triggered capture, application-assisted capture, time-stamp-based capture, and file comparison capture The different ways of extracting data are listed below- Source data extraction provides a static snapshot of source data as of a specific point in time. Log capture enables the data to be captured from the DBMS logging system. Triggers are procedures, supported by most database management systems, that provide for the execution of SQL or complex applications on the basis of recognition of a specific event in the database. Application-assisted capture involves programming logic in existing operational system applications. This implies total control by the application programmer along with all the responsibilities for testing and maintenance. DBMS log capture, triggered capture, and application-assisted capture can produce an incremental record of source changes, to enable use of a continuous history model. Time-stamp-based capture is a simple technique that involves checking a time stamp value to determine whether the record has changed since the last capture. If a record has changed, or a new record has been added, it is captured to a file or table for subsequent processing. Extraction can be done by file comparison. Although it may not be as efficient, it is an easy technique to understand and implement. It involves saving a snapshot of the data source at a specific point in time of data capture. At a later point in time, the current file is compared with the previous snapshot. Any changes and additions that are detected are captured to a separate file for subsequent processing and adding to the data warehouse databases. Time-stamp-based capture, with its file comparison technique, produces a record of the incremental changes that enables support of a continuous history model. 9.2 TRANSFORM The transform process converts the captured source data into a format and structure suitable for loading into the data warehouse. The mapping characteristics used to transform the source data are captured and stored as metadata. This defines any changes that are required prior to loading the data into the data warehouse. This process will help to resolve the anomalies in Page 18 of 23

19 the source data and produce a high quality data source for the target data warehouse. Transformation of data can occur at the record level or at the attribute level. The basic techniques include structural transformation, content transformation, and functional transformation STRUCTURAL TRANSFORMATION Structural transformation changes the structure of the source records to that of the target database. This technique transforms data at the record level. These transformations occur by selecting only a subset of records from the source records, by selecting a subset of records from the source records and mapping to different target records, by selecting a subset of different records from the source records and mapping to the same target record, or by some combination of each. If a fact table in the model holds data based on events, records should be created only when the event occurs. However, if a fact table holds data based on the state of the data, each time the data is captured a record should be created for the target table CONTENT TRANSFORMATION This technique transforms data at the attribute level. Content transformation converts values by use of algorithms or by use of data transformation tables FUNCTIONAL TRANSFORMATION Functional transformation creates new data values in the target records based on data in the source records. This technique transforms data at the attribute level. These transformations occur either through data aggregation or enrichment. Aggregation is the calculation of derived values such as totals and averages based on multiple attributes in different records. Enrichment combines two or more data values and creates one or more new attributes from a single source record or multiple source records that can be from the same or different sources. 9.3 LOADING The loading process uses the files or tables created in the transform process and applies them to the relevant data warehouse or data mart. There are four basic techniques for applying data: load, append, constructive merge, and destructive merge. Load replaces the existing data in the target data warehouse tables with that created in the transform process. If the target tables do not exist, the load process can create the table. Append loads new data from the transform file or table to an already existing table by appending the new data to the end of the existing data. Constructive merge appends the new records to the existing target table and updates an end time value in the record whose state is being superseded. Destructive merge overwrites existing records with new data. Page 19 of 23

20 10 ROLE OF OLAP SERVER IN A DATA WAREHOUSING ENVIORNMENT OLAP servers deliver warehouse applications such as performance reporting, sales forecasting, product line and customer profitability, sales analysis, marketing analysis, what-if analysis and manufacturing mix analysis applications that require historical, projected and derived data. With OLAP servers robust calculation engines, historical data is made vastly more useful by transforming it into derived and projected data. Users gain broader insights by combining standard access tools with a powerful analytic engine. An OLAP server provides functionality and performance that leverages the data warehouse for reporting, analysis, modelling and planning requirements. It is essential to create operational scenarios that are shaped by the past yet also include planned and potential changes that will impact tomorrow s corporate performance. Requirements for the OLAP component of a data warehouse or data mart strategy include: The ability to scale to large volumes of data and large numbers of concurrent users Consistent, fast query response times that allow for iterative speed-of-thought analysis Integrated metadata that seamlessly links the OLAP server and the data warehouse relational database The ability to automatically drill from summary and calculated data, which is managed by the OLAP server, to detail data stored in the data warehouse relational database A calculation engine that includes robust mathematical functions for computing derived data (aggregations, matrix calculations, cross-dimensional calculations, OLAP-aware formulas and procedural calculations) Seamless integration of historical, projected and derived data A multi-user read/write environment to support users what-if analysis, modeling and planning requirements The ability to be deployed quickly, adopted easily and maintained cost-effectively Robust data-access security and user management Availability of a wide variety of viewing and analysis tools to support different user communities 10.1 COMPARISONS BETWEEN OLTP, DATA WAREHOUSE RDBMS AND OLAP SERVER System OLTP Data Warehouse RDBMS OLAP Server Purpose System charter Operational Historical and detail data Analytic Access Access type Read/write Read-only Read/write Access mode Atomic, singular, Simple update Singular, list-oriented Queries and reports Iterative, comparative analytic Page 20 of 23

21 investigation Access process Response Characteristics IT-supported queries Fast update, varied Query response IT-assisted or pre-planned Queries and reports Varied, potentially very slow query response IT-independent, ad hoc navigation and investigation drilldown Fast, consistent query Response Data Storage Content scope Applicationspecific Actual/vertical Limited historical Warehouse: cross-subject data Data mart: single subject area Historical data Data detail level Transaction detail Cleansed and lightly Summarized Data structure Normalized Normalized or denormalized Many cubes. Each cube is a single subject area: historical, calculated, projected, what-if, derived data Summarized, aggregated and calculated using sophisticated analytics Dimensional, hierarchical Data structure design goal Update List-oriented query Analysis Data volumes Gigabytes/terabytes Gigabytes Implementation Deployability Slow (multimonth/year) Slow (multi-month/year) Fast (days/weeks) Adaptability Computer hardware investment required Limited requires Significant resource Moderate to expensive Low requires significant resource Moderate to extremely expensive High easily modified Minimal to moderately expensive Page 21 of 23

22 11 DATA WAREHOUSING TOOLS There are many data warehousing tools available in the market. IBM,Intersolv,Powersoft, Sterling, Hyperion are some companies delivering data warehousing solutions. Below are the features of some of these tools HYPERION Hyperion, uses both analytic applications and online analytical processing (OLAP) technology, has a unique opportunity to influence the course of the build and buy markets for analytic solutions. Features of Hyperion include : The ability to scale to large volumes of data and large numbers of concurrent users Consistent, fast query response times that allow for iterative speed-of-thought analysis Integrated metadata that seamlessly links the OLAP server and the data warehouse relational database The ability to automatically drill from summary and calculated data, which is managed by the OLAP server, to detail data stored in the data warehouse relational database A calculation engine that includes robust mathematical functions for computing derived data (aggregations, matrix calculations, cross-dimensional calculations,olap-aware formulas and procedural calculations) 11.2 MICROSTRATEGY Another tool is MicroStrategy 7i. End users can add or remove report objects, add derived metrics and modify the filter - all with speed of thought response time against Intelligent Cubes. 7i OLAP Services enables full multi-dimensional OLAP analysis within Intelligent Cube, while retaining users ability to seamlessly drill through to the full breadth and depth of the data warehouse. Features of Micro Strategy include- Add or Remove Attributes and Metrics With 7i OLAP Services, users can create unique report views by adding or removing attributes and metrics contained within the Intelligent Cube. This allows speed of thought report creation and modification with no need to extract data from the data warehouse. Derived Metrics Users can create new on-the-fly metric calculations from existing metrics in an Intelligent Cube. The new calculation is performed without submitting a new request to the data warehouse. Filter Data within an Intelligent Cube Users can easily filter their view of the data within an Intelligent Cube. The filtering will be performed on MicroStrategy Intelligence Server within the Intelligent Cube. Transparent to the End-User Users do not need to know the name and location of an Intelligent Cube, or even if it exists. Page 22 of 23

23 7i OLAP Services works with MicroStrategy Intelligence Server to automatically use the appropriate Intelligent Cube or create a new one to satisfy the end user request. 12 ASSOCIATED REFERENCES 1) Oracle9i Data Warehousing Guide 2) Role of OLAP Server in a Data Warehousing Solution ( 3) Data Modeling Techniques for Data Warehousing by Chuck Ballard, Dirk Herreman, Don Schau, Rhonda Bell, Eunsaeng Kim, Ann Valencic 4) Large Scale Data Warehousing Using Hyperion Essbase OLAP Technology ( 5) AIS Handout by Paul Gray on Data Warehousing Written By: Bani Trehan Tata Consultancy Services Mail : Bani_Trehan@delhi.tcs.co.in Page 23 of 23

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar

1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar 1 DATAWAREHOUSING QUESTIONS by Mausami Sawarkar 1) What does the term 'Ad-hoc Analysis' mean? Choice 1 Business analysts use a subset of the data for analysis. Choice 2: Business analysts access the Data

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data

More information

Full file at

Full file at Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Data warehouse architecture consists of the following interconnected layers:

Data warehouse architecture consists of the following interconnected layers: Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures) CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm

More information

CHAPTER 3 Implementation of Data warehouse in Data Mining

CHAPTER 3 Implementation of Data warehouse in Data Mining CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected

More information

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India

More information

Data Warehouse and Mining

Data Warehouse and Mining Data Warehouse and Mining 1. is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management decisions. A. Data Mining. B. Data Warehousing. C. Web Mining. D. Text

More information

REPORTING AND QUERY TOOLS AND APPLICATIONS

REPORTING AND QUERY TOOLS AND APPLICATIONS Tool Categories: REPORTING AND QUERY TOOLS AND APPLICATIONS There are five categories of decision support tools Reporting Managed query Executive information system OLAP Data Mining Reporting Tools Production

More information

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such

More information

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse Principles of Knowledge Discovery in bases Fall 1999 Chapter 2: Warehousing and Dr. Osmar R. Zaïane University of Alberta Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 02 Introduction to Data Warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

An Overview of Data Warehousing and OLAP Technology

An Overview of Data Warehousing and OLAP Technology An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection

More information

Data Warehousing and OLAP

Data Warehousing and OLAP Data Warehousing and OLAP INFO 330 Slides courtesy of Mirek Riedewald Motivation Large retailer Several databases: inventory, personnel, sales etc. High volume of updates Management requirements Efficient

More information

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:- UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to

More information

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini

Data Warehousing. Ritham Vashisht, Sukhdeep Kaur and Shobti Saini Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 6 (2013), pp. 669-674 Research India Publications http://www.ripublication.com/aeee.htm Data Warehousing Ritham Vashisht,

More information

Data Mining and Warehousing

Data Mining and Warehousing Data Mining and Warehousing Sangeetha K V I st MCA Adhiyamaan College of Engineering, Hosur-635109. E-mail:veerasangee1989@gmail.com Rajeshwari P I st MCA Adhiyamaan College of Engineering, Hosur-635109.

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 07 Terminologies Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Database

More information

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g Vlamis Software Solutions, Inc. Founded in 1992 in Kansas City, Missouri Oracle Partner and reseller since 1995 Specializes

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information

Decision Support, Data Warehousing, and OLAP

Decision Support, Data Warehousing, and OLAP Decision Support, Data Warehousing, and OLAP : Contents Terminology : OLAP vs. OLTP Data Warehousing Architecture Technologies References 1 Decision Support and OLAP Information technology to help knowledge

More information

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives

Data-Driven Driven Business Intelligence Systems: Parts I. Lecture Outline. Learning Objectives Data-Driven Driven Business Intelligence Systems: Parts I Week 5 Dr. Jocelyn San Pedro School of Information Management & Systems Monash University IMS3001 BUSINESS INTELLIGENCE SYSTEMS SEM 1, 2004 Lecture

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

The strategic advantage of OLAP and multidimensional analysis

The strategic advantage of OLAP and multidimensional analysis IBM Software Business Analytics Cognos Enterprise The strategic advantage of OLAP and multidimensional analysis 2 The strategic advantage of OLAP and multidimensional analysis Overview Online analytical

More information

Data Warehousing and OLAP Technologies for Decision-Making Process

Data Warehousing and OLAP Technologies for Decision-Making Process Data Warehousing and OLAP Technologies for Decision-Making Process Hiren H Darji Asst. Prof in Anand Institute of Information Science,Anand Abstract Data warehousing and on-line analytical processing (OLAP)

More information

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Syllabus. Syllabus. Motivation Decision Support. Syllabus Presentation: Sophia Discussion: Tianyu Metadata Requirements and Conclusion 3 4 Decision Support Decision Making: Everyday, Everywhere Decision Support System: a class of computerized information systems

More information

Rocky Mountain Technology Ventures

Rocky Mountain Technology Ventures Rocky Mountain Technology Ventures Comparing and Contrasting Online Analytical Processing (OLAP) and Online Transactional Processing (OLTP) Architectures 3/19/2006 Introduction One of the most important

More information

Teradata Aggregate Designer

Teradata Aggregate Designer Data Warehousing Teradata Aggregate Designer By: Sam Tawfik Product Marketing Manager Teradata Corporation Table of Contents Executive Summary 2 Introduction 3 Problem Statement 3 Implications of MOLAP

More information

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts Enn Õunapuu enn.ounapuu@ttu.ee Content Oveall approach Dimensional model Tabular model Overall approach Data modeling is a discipline that has been practiced

More information

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 22 Table of contents 1 Introduction 2 Data warehousing

More information

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa

Data Warehousing. Data Warehousing and Mining. Lecture 8. by Hossen Asiful Mustafa Data Warehousing Data Warehousing and Mining Lecture 8 by Hossen Asiful Mustafa Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information,

More information

The Data Organization

The Data Organization C V I T F E P A O TM The Data Organization Best Practices Metadata Dictionary Application Architecture Prepared by Rainer Schoenrank January 2017 Table of Contents 1. INTRODUCTION... 3 1.1 PURPOSE OF THE

More information

Handout 12 Data Warehousing and Analytics.

Handout 12 Data Warehousing and Analytics. Handout 12 CS-605 Spring 17 Page 1 of 6 Handout 12 Data Warehousing and Analytics. Operational (aka transactional) system a system that is used to run a business in real time, based on current data; also

More information

Q1) Describe business intelligence system development phases? (6 marks)

Q1) Describe business intelligence system development phases? (6 marks) BUISINESS ANALYTICS AND INTELLIGENCE SOLVED QUESTIONS Q1) Describe business intelligence system development phases? (6 marks) The 4 phases of BI system development are as follow: Analysis phase Design

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems Department of Industrial Engineering Sharif University of Technology Session# 9 Contents: The role of managers in Information Technology (IT) Organizational Issues Information Technology Operational and

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

Chapter 6 VIDEO CASES

Chapter 6 VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.com Objectives Explain the basics of: 1. Data

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: How business intelligence is a comprehensive framework to support business decision making How operational

More information

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing

The Evolution of Data Warehousing. Data Warehousing Concepts. The Evolution of Data Warehousing. The Evolution of Data Warehousing The Evolution of Data Warehousing Data Warehousing Concepts Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient and cost-effective

More information

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

Proceedings of the IE 2014 International Conference  AGILE DATA MODELS AGILE DATA MODELS Mihaela MUNTEAN Academy of Economic Studies, Bucharest mun61mih@yahoo.co.uk, Mihaela.Muntean@ie.ase.ro Abstract. In last years, one of the most popular subjects related to the field of

More information

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Information Technology IT6702 Data Warehousing & Data Mining Anna University 2 & 16 Mark Questions & Answers Year / Semester: IV / VII Regulation:

More information

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehouses Chapter 12. Class 10: Data Warehouses 1 Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Data Management Glossary

Data Management Glossary Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative

More information

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database. 1. Creating a data warehouse involves using the functionalities of database management software to implement the data warehouse model as a collection of physically created and mutually connected database

More information

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Chapter 3. Foundations of Business Intelligence: Databases and Information Management Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz May 20, 2014 Announcements DB 2 Due Tuesday Next Week The Database Approach to Data Management Database: Collection of related files containing

More information

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa

STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS. By: Dr. Tendani J. Lavhengwa STRATEGIC INFORMATION SYSTEMS IV STV401T / B BTIP05 / BTIX05 - BTECH DEPARTMENT OF INFORMATICS LECTURE: 05 (A) DATA WAREHOUSING (DW) By: Dr. Tendani J. Lavhengwa lavhengwatj@tut.ac.za 1 My personal quote:

More information

by Prentice Hall

by Prentice Hall Chapter 6 Foundations of Business Intelligence: Databases and Information Management 6.1 2010 by Prentice Hall Organizing Data in a Traditional File Environment File organization concepts Computer system

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)

More information

Benefits of Automating Data Warehousing

Benefits of Automating Data Warehousing Benefits of Automating Data Warehousing Introduction Data warehousing can be defined as: A copy of data specifically structured for querying and reporting. In most cases, the data is transactional data

More information

Dr.G.R.Damodaran College of Science

Dr.G.R.Damodaran College of Science 1 of 20 8/28/2017 2:13 PM Dr.G.R.Damodaran College of Science (Autonomous, affiliated to the Bharathiar University, recognized by the UGC)Reaccredited at the 'A' Grade Level by the NAAC and ISO 9001:2008

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

Data Warehousing. Overview

Data Warehousing. Overview Data Warehousing Overview Basic Definitions Normalization Entity Relationship Diagrams (ERDs) Normal Forms Many to Many relationships Warehouse Considerations Dimension Tables Fact Tables Star Schema Snowflake

More information

A Multi-Dimensional Data Model

A Multi-Dimensional Data Model A Multi-Dimensional Data Model A Data Warehouse is based on a Multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in

More information

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong

Data Warehouse. Asst.Prof.Dr. Pattarachai Lalitrojwong Data Warehouse Asst.Prof.Dr. Pattarachai Lalitrojwong Faculty of Information Technology King Mongkut s Institute of Technology Ladkrabang Bangkok 10520 pattarachai@it.kmitl.ac.th The Evolution of Data

More information

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,

More information

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke Data Warehouses Yanlei Diao Slides Courtesy of R. Ramakrishnan and J. Gehrke Introduction v In the late 80s and early 90s, companies began to use their DBMSs for complex, interactive, exploratory analysis

More information

DATABASE DEVELOPMENT (H4)

DATABASE DEVELOPMENT (H4) IMIS HIGHER DIPLOMA QUALIFICATIONS DATABASE DEVELOPMENT (H4) December 2017 10:00hrs 13:00hrs DURATION: 3 HOURS Candidates should answer ALL the questions in Part A and THREE of the five questions in Part

More information

Chapter 3. Databases and Data Warehouses: Building Business Intelligence

Chapter 3. Databases and Data Warehouses: Building Business Intelligence Chapter 3 Databases and Data Warehouses: Building Business Intelligence How Can a Business Increase its Intelligence? Summary Overview of Main Concepts Details/Design of a Relational Database Creating

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz Nov 10, 2016 Class Announcements n Database Assignment 2 posted n Due 11/22 The Database Approach to Data Management The Final Database Design

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT MANAGING THE DIGITAL FIRM, 12 TH EDITION Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT VIDEO CASES Case 1: Maruti Suzuki Business Intelligence and Enterprise Databases

More information

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015 Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers

More information

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1

Decision Support. Chapter 25. CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Decision Support Chapter 25 CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support

More information

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses

Designing Data Warehouses. Data Warehousing Design. Designing Data Warehouses. Designing Data Warehouses Designing Data Warehouses To begin a data warehouse project, need to find answers for questions such as: Data Warehousing Design Which user requirements are most important and which data should be considered

More information

Managing Data Resources

Managing Data Resources Chapter 7 Managing Data Resources 7.1 2006 by Prentice Hall OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Describe how

More information

Decision Support Systems aka Analytical Systems

Decision Support Systems aka Analytical Systems Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis

More information

Managing Data Resources

Managing Data Resources Chapter 7 OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Managing Data Resources Describe how a database management system

More information

Logical Design A logical design is conceptual and abstract. It is not necessary to deal with the physical implementation details at this stage.

Logical Design A logical design is conceptual and abstract. It is not necessary to deal with the physical implementation details at this stage. Logical Design A logical design is conceptual and abstract. It is not necessary to deal with the physical implementation details at this stage. You need to only define the types of information specified

More information

Outline. Managing Information Resources. Concepts and Definitions. Introduction. Chapter 7

Outline. Managing Information Resources. Concepts and Definitions. Introduction. Chapter 7 Outline Managing Information Resources Chapter 7 Introduction Managing Data The Three-Level Database Model Four Data Models Getting Corporate Data into Shape Managing Information Four Types of Information

More information

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach

More information

Data Warehouse Testing. By: Rakesh Kumar Sharma

Data Warehouse Testing. By: Rakesh Kumar Sharma Data Warehouse Testing By: Rakesh Kumar Sharma Index...2 Introduction...3 About Data Warehouse...3 Data Warehouse definition...3 Testing Process for Data warehouse:...3 Requirements Testing :...3 Unit

More information

collection of data that is used primarily in organizational decision making.

collection of data that is used primarily in organizational decision making. Data Warehousing A data warehouse is a special purpose database. Classic databases are generally used to model some enterprise. Most often they are used to support transactions, a process that is referred

More information

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory Warehousing Outline Andrew Kusiak 2139 Seamans Center Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934 Introduction warehousing concepts Relationship

More information

Microsoft SQL Server Training Course Catalogue. Learning Solutions

Microsoft SQL Server Training Course Catalogue. Learning Solutions Training Course Catalogue Learning Solutions Querying SQL Server 2000 with Transact-SQL Course No: MS2071 Two days Instructor-led-Classroom 2000 The goal of this course is to provide students with the

More information

MICROSOFT BUSINESS INTELLIGENCE

MICROSOFT BUSINESS INTELLIGENCE SSIS MICROSOFT BUSINESS INTELLIGENCE 1) Introduction to Integration Services Defining sql server integration services Exploring the need for migrating diverse Data the role of business intelligence (bi)

More information

What is a Data Warehouse?

What is a Data Warehouse? What is a Data Warehouse? COMP 465 Data Mining Data Warehousing Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Defined in many different ways,

More information

Data Mining & Data Warehouse

Data Mining & Data Warehouse Data Mining & Data Warehouse Associate Professor Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology (1) 2016 2017 1 Points to Cover Why Do We Need Data Warehouses?

More information

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus

Deccansoft Software Services Microsoft Silver Learning Partner. SSAS Syllabus Overview: Analysis Services enables you to analyze large quantities of data. With it, you can design, create, and manage multidimensional structures that contain detail and aggregated data from multiple

More information

Adnan YAZICI Computer Engineering Department

Adnan YAZICI Computer Engineering Department Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection

More information

Knowledge Modelling and Management. Part B (9)

Knowledge Modelling and Management. Part B (9) Knowledge Modelling and Management Part B (9) Yun-Heh Chen-Burger http://www.aiai.ed.ac.uk/~jessicac/project/kmm 1 A Brief Introduction to Business Intelligence 2 What is Business Intelligence? Business

More information

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model Chapter 3 The Multidimensional Model: Basic Concepts Introduction Multidimensional Model Multidimensional concepts Star Schema Representation Conceptual modeling using ER, UML Conceptual modeling using

More information

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different?

UNIT -1 UNIT -II. Q. 4 Why is entity-relationship modeling technique not suitable for the data warehouse? How is dimensional modeling different? (Please write your Roll No. immediately) End-Term Examination Fourth Semester [MCA] MAY-JUNE 2006 Roll No. Paper Code: MCA-202 (ID -44202) Subject: Data Warehousing & Data Mining Note: Question no. 1 is

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic

More information