Acknowledgment. Sales data example. Outline. Excel pivot table. Example: Sales

Size: px
Start display at page:

Download "Acknowledgment. Sales data example. Outline. Excel pivot table. Example: Sales"

Transcription

1 Acknowledgment Data Mining MTAT.3.83 Online Analy4cal Processing and Data Warehouses Jaak Vilo 2 Fall This slide deck is a mashup of the following publicly available slide decks: DataCube/CubeNotesKerschberg.ppt Hector Garcia-Molina, Stanford University Marlon Dumas, Univ. of Tartu, Sulev Reisberg, Quretec & STACC Outline The data cube abstraction Multidimensional data models Data warehouses Sales data example!" #$%&'( )*'+$,-*$%'+. /+'2* -*$ 3-4$ &(( 74$8&3*$ 69 )-83(% 5:;5<;=<55 5<<< = 6-+* >?(-@$3@3 69 )-83(% 5=;5<;=<55 AB< : 6-44&(( C3*&@- #-&' )'(. 5<;A;=<55 5=<< D 6-+* >?(-@$3@3 #-&' )'(. 55;55;=<55 55E< E 6-44&(( 74$8&3*$ 69 )-83(% 55;55;=<55 AA< F 6-+* >?(-@$3@3 69 GH&4&/3 5=;55;=<55 5E<< I #-@J$+$ >?(-@$3@3 69 )-83(% 5:;A;=<5< :<< B 6-+* >?(-@$3@3 69 )'(. 5=;A;=<55 5=<< A 6-44&(( C3*&@- #-&' GH&4&/3 55;55;=<55 :E< 5< 6-+* >?(-@$3@3 69 )'(. 55;55;=<55 55E< Jaak Vilo and other authors UT: Data Mining 29 4 Excel pivot table Example: Sales!"#$%&'()*+#,!"$&-'".'/ $%'".',)#+ 2"-)#'!"$&-'".'/ 2"-)#'$%'".',)#+ 3"4'()*+#, 3)56" 27 3)56" 27 3)89+:+ ; <== ; <== 2)##6&& > > ;??= ;@@= A <?A= 2):-$ ; A ;;?= AB<=??@B=!"#$%&'()#* +, -,..,/-. /. -. Jaak Vilo and other authors UT: Data Mining 29 5

2 Multidimensional View of Sales Multidimensional analysis involves viewing data simultaneously categorized along potentially many dimensions Pivoting Typical Data Analysis Process Formulate a query to extract relevant information Extract aggregated data from the database Visualize the result to look for patterns. Analyze the result and formulate new queries. Online Analytical Processing (OLAP) is about supporting such processes OLAP characteristics: No updates, lots of aggregation, need to visualize and to interact Let s first talk about aggregation Relational Aggregation Operators SQL has several aggregate operators: SUM(), MIN(), MAX(), COUNT(), AVG() The basic idea is: Combine all values in a column 5 into a single scalar value 7 2 Syntax SELECT AVG(Temp) FROM Weather; 3... AVG()? IDSLab. The Relational GROUP BY Operator GROUP BY allows aggregates over table sub-groups SELECT Time, Altitude, AVG(Temp) FROM Weather GROUP BY Time, Altitude; Altitude Time Latitude Longitude Temp (m) Altitude 7/9/5: Time AVG(Temp) (m) 2 7/9/5: /9/5:5 23 7/9/5:5 7 7/9/5:5 7 7/9/9: /9/9: /9/9:5 5 2 Limitations of the GROUP BY Group-by is one-dimensional: one group per combination of the selected attribute values Model Year Color Sales à Does not give sub-totals Chevy 994 Black 5 Chevy 995 Black 85 Chevy 994 White 4 Chevy 995 White 5. Calculate total sales per year 2. Compute total sales per year and per color 3. Calculate sales per year, per color and per model IDSLab. 2

3 Grouping with Sub-Totals (Pivot table) Sales by Model by Year by Color Grouping with sub-totals (crosstab) Note that sub-totals by color are missing, if added it becomes a cross-tabulation Grouping with Sub-Totals (Relational version) SQL Query Sub-totals by color are still missing IDSLab. Adding the colors CUBE and Roll Up Operators Aggregate Sum RED WHITE BLUE Group By (with total) By Color Sum RED WHITE BLUE By Make Cross Tab Chevy Ford By Color Sum By Make & Year By Year CHEVY By Color & Year The Data Cube and The Sub-Space Aggregates FORD By Make RED WHITE BLUE By Make & Color Sum By Color 7 3

4 The Cube An Example of 3D Data Cube By Year Chevy By Color & Year Ford By Make & Year 99 Sum By Color IDSLab. 9 Red White Blue By Make & Color By Make Cube: Each ADribute is a Dimension N-dimensional Aggregate (sum(), max(),...) Fits relational model exactly: a, a 2,..., a N, f() Super-aggregate over N- Dimensional subcubes ALL, a 2,..., a N, f() a 3, ALL, a 3,..., a N, f()... a, a 2,..., ALL, f() This is the N- Dimensional cross-tab. Super-aggregate over N-2 Dimensional subcubes The Data Cube Concept Sub-cube Derivation W B <M,Y,C> MAKE Ford B W <M,Y,*> <M,*,C> <*,Y,C> F White C 994 COLOR 995 Black Chevy F C F YEAR C B W <M,*,*> <*,Y,*> <*,*,C> <*,*,*> Dimension collapse, * denotes ALL CUBE Operator Possible syntax Proposed syntax example: SELECT Model, Make, Year, SUM(Sales) FROM Sales WHERE Model IN { Chevy, Ford } AND Year BETWEEN 99 AND 994 GROUP BY CUBE Model, Make, Year HAVING SUM(Sales) > ; Note: GROUP BY operator repeats aggregate list in select list in group by list Rollup Operator ROLLUP Operator: special case of CUBE Operator Return Sales Roll Up by Store by Quarter in 994.: SELECT Store, quarter, SUM(Sales) FROM Sales WHERE nation= Korea AND Year=994 GROUP BY ROLLUP Store, Quarter(Date) AS quarter; IDSLab IDSLab. 24 4

5 Cube Operator Example Summary SALES Model Year Color Sales Chevy 99 red 5 Chevy 99 white 87 Chevy 99 blue 62 Chevy 99 red 54 Chevy 99 white 95 Chevy 99 blue 49 Chevy 992 red 3 Chevy 992 white 54 Chevy 992 blue 7 Ford 99 red 64 Ford 99 white 62 Ford 99 blue 63 Ford 99 red 52 Ford 99 white 9 Ford 99 blue 55 Ford 992 red 27 Ford 992 white 62 Ford 992 blue 39 CUBE DATA CUBE Model Year Color Sales ALL ALL ALL 942 chevy ALL ALL 5 ford ALL ALL 432 ALL 99 ALL 343 ALL 99 ALL 34 ALL 992 ALL 285 ALL ALL red 65 ALL ALL white 273 ALL ALL blue 339 chevy 99 ALL 54 chevy 99 ALL 99 chevy 992 ALL 57 ford 99 ALL 89 ford 99 ALL 6 ford 992 ALL 28 chevy ALL red 9 chevy ALL white 236 chevy ALL blue 83 ford ALL red 44 ford ALL white 33 ford ALL blue 56 ALL 99 red 69 ALL 99 white 49 ALL 99 blue 25 ALL 99 red 7 ALL 99 white 4 ALL 99 blue 4 ALL 992 red 59 ALL 992 white 6 ALL 992 blue 25 Problems with GROUP BY GROUP BY cannot directly construct Pivot tables / roll-up reports Cross-Tabs CUBE Operator Generalizes GROUP BY and Roll-Up and Cross-Tabs!! IDSLab Now let s have a look at one OLAP Screen Example NASA Workforce cubes Btell demo reports Follow the demo link and start a demo, the go to reports 27 OLAP Screen Example Warehouse Architecture Client Query & Analysis Client Metadata Warehouse Integration Source Source Source 3 5

6 Why a Warehouse? Multidimensional Data Two Approaches: Query-Driven (Lazy) Warehouse (Eager) Source? Source Sales volume as a function of product, month, and region Product Region Dimensions: Product, Location, Time Hierarchical summarization paths Industry Region Year Category Country Quarter Product City Month Week Office Day 3 J. Han: Data Mining: Concepts and Techniques 32 Star Star Schema product prodid name price p bolt p2 nut 5 store storeid c nyc c2 sfo c3 la sale oderid date custid prodid storeid qty amt o /7/97 53 p c 2 o2 2/7/97 53 p2 c 2 5 3/8/97 p c3 5 5 product prodid name price sale orderid date custid prodid storeid qty amt customer custid name address customer custid name address 53 joe main sfo 8 fred 2 main sfo sally 8 willow la store storeid Terms Dimension Hierarchies Fact table Dimension tables Measures product prodid name price sale orderid date custid prodid storeid qty amt customer custid name address store stype store storeid Id tid mgr s5 sfo t joe s7 sfo t2 fred s9 la t nancy region stype tid size location t small downtown t2 large suburbs Id pop regid sfo M north la 5M south store storeid è snowflake schema è constellations region regid name north cold region south warm region

7 Cube 3-D Cube Fact table view: sale prodid storeid amt p c 2 p2 c p c3 5 p2 c2 8 Multi-dimensional cube: c c2 c3 p 2 5 p2 8 Fact table view: sale prodid storeid date amt p c 2 p2 c p c3 5 p2 c2 8 p c 2 44 p c2 2 4 day 2 day Multi-dimensional cube: c c2 c3 p 44 4 p2 c c2 c3 p 2 5 p2 8 dimensions = 2 dimensions = Star Schema Snowflake Schema time time_key day day_of_the_week month quarter year branch branch_key branch_name branch_type Measures Sales Fact Table J. Han: Data Mining: Concepts and Techniques time_key item_key branch_key location_key units_sold dollars_sold avg_sales item item_key item_name brand type supplier_type location location_key street state_or_province country 39 time time_key day day_of_the_week month quarter year branch branch_key branch_name branch_type Measures Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales J. Han: Data Mining: Concepts and Techniques item item_key item_name brand type supplier_key location location_key street _key supplier supplier_key supplier_type _key state_or_province country 4 OLTP vs. OLAP OLTP Online Transaction Processing Traditional database technology Many small transactions (point queries: UPDATE or INSERT) Avoid redundancy, normalize schemas Access to consistent, up-to-date database OLTP Examples: Flight reservation Banking and financial transactions Order Management, Procurement,... Extremely fast response times... Carsten Binnig, ETH Zürich 4 OLTP vs. OLAP OLAP Online Analytical Processing Big aggerate queries, no Updates Redundancy a necessity (Materialized Views, specialpurpose indexes, de-normalized schemas) Periodic refresh of data (daily or weekly) OLAP Examples Decision support (sales per employee) Marketing (purchases per customer) Biomedical databases Goal: Response Time of seconds / few minutes Carsten Binnig, ETH Zürich 42 7

8 OLTP vs. OLAP (Water and Oil) Lock Conflicts: OLAP blocks OLTP Database design: OLTP normalized, OLAP de-normalized Tuning, Optimization OLTP: inter-query parallelism, heuristic optimization OLAP: intra-query parallelism, full-fledged optimization Freshness of Data: OLTP: serializability OLAP: reproducibility Integrity: OLTP: ACID OLAP: Sampling, Confidence Intervals Atomi Consistency Isolation Durability Solution: Data Warehouse Special Sandbox for OLAP Data input using OLTP systems Data Warehouse aggregates and replicates data (special schema) New Data is periodically uploaded to Warehouse Carsten Binnig, ETH Zürich 43 Carsten Binnig, ETH Zürich 44 DW Architecture What is data warehouse OLTP OLTP Applications DB DB2 DB3 Extract Transform Load (ETL) OLAP GUI, Spreadsheets Data Warehouse InformaKon system for reporkng purposes The goal is to fulfill reporkng needs which are unsaksfied in operakonal system It is easy to modify old and design new reports No write spec to sorware developer to get the report anymore Reports can be filled with data quickly No start the report generakon at night to prevent system load anymore The data comes from operakonal system (s) Carsten Binnig, ETH Zürich 45 Goal of the work package Partners in this work package Work out the main concepts for building data warehouse for hospital IS What are the reporkng needs? What are the data cubes that cover most reporkng needs for universal hospital? How to get the data into these cubes? Ida- Tallinna Keskhaigla (ITK) One of the biggest hospitals in Estonia Huge amount of data in operakonal system (system called ESTER) Has difficulkes in generakng reports on operakonal system Interested in improving the report managment Quretec Provides data management sorware for different clients in Europe, especially in healthcare area Interested in increasing the knowledge of data warehousing area 8

9 So far... () We have analyzed the data and data structures in operakonal system So far...(2) We have designed the interface for ge`ng the data from ESTER We have built 2 data cubes Data in operakonal IS Interface for building data cubes SQL view Data cubes Reports OperaKonal IS SQL view So far... (3) We have designed reports on the data cubes So far... (4) Showed that report generakon Kme has reduced from tens of minutes to few seconds Selected period Number of pa4ents Seconds for genera4ng report in opera4onal system Seconds for genera4ng the same report in data warehouse day month year So far... (5) We showed that data warehouse offers addikonal benefits: MulKple output formats Reports can be redesigned easily New combined reports - > new value from the data Implementing a Warehouse Monitoring: Sending data from sources Integrating: Loading, cleansing,... Processing: Query processing, indexing,... Managing: Metadata, Design,

10 Monitoring Monitoring Techniques 55 Source Types: relational, flat file, IMS, VSAM, IDMS, WWW, news-wire, Incremental vs. Refresh customer id name address 53 joe main sfo 8 fred 2 main sfo sally 8 willow la new 56 Periodic snapshots Database triggers Log shipping Data shipping (replication service) Transaction shipping Polling (queries to source) Screen scraping Application level monitoring è Advantages & Disadvantages!! Monitoring Issues Integration Frequency periodic: daily, weekly, triggered: on big change, lots of changes,... Data transformation Data Cleaning Data Loading Derived Data Client Query & Analysis Client convert data to uniform format Metadata Warehouse remove & add fields (e.g., add date to get history) Integration Standards (e.g., ODBC) Gateways Source Source Source Data Cleaning Loading Data Migration (e.g., yen ð dollars) Scrubbing: use domain-specific knowledge (e.g., social security numbers) Fusion (e.g., mail list, customer merging) billing DB customer(joe) merged_customer(joe) Incremental vs. refresh Off-line vs. on-line Frequency of loading At night, x a week/month, continuously Parallel/Partitioned load service DB customer2(joe) Auditing: discover rules & relationships (like data mining) 59 6

11 Derived Data Materialized Views Derived Warehouse Data indexes aggregates materialized views (next slide) When to update derived data? Incremental vs. refresh Define new warehouse relations using SQL expressions sale prodid storeid date amt p c 2 p2 c p c3 5 p2 c2 8 p c 2 44 p c2 2 4 product id name price p bolt p2 nut 5 jointb prodid name price storeid date amt p bolt c 2 p2 nut 5 c p bolt c3 5 p2 nut 5 c2 8 p bolt c 2 44 p bolt c2 2 4 does not exist at any source 6 62 Processing ROLAP Server ROLAP servers vs. MOLAP servers Index Structures Relational OLAP Server sale prodid date sum p 62 p2 9 p 2 48 What to Materialize? tools Algorithms Client Metadata Query & Analysis Warehouse Client utilities ROLAP server Special indices, tuning; Schema is denormalized Integration Source Source Source relational DBMS MOLAP Server Index Structures Multi-Dimensional OLAP Server Traditional Access Methods utilities M.D. tools multidimensional server Product milk soda eggs soap City A B Date could also sit on relational DBMS Sales B-trees, hash tables, R-trees, grids, Popular in Warehouses inverted lists bit map indexes join indexes text indexes 65 66

12 Inverted Lists Using Inverted Lists r4 r8 r34 r35 r5 r9 r37 r4 rid name age r4 joe 2 r8 fred 2 r9 sally 2 r34 nancy 2 r35 tom 2 r36 pat 25 r5 dave 2 r4 jeff Query: Get people with age = 2 and name = fred List for age = 2: r4, r8, r34, r35 List for name = fred : r8, r52 Answer is intersection: r8 age index inverted lists data records Bit Maps Using Bit Maps 2 23 age index bit maps id name age joe 2 2 fred 2 3 sally 2 4 nancy 2 5 tom 2 6 pat 25 7 dave 2 8 jeff data records Query: Get people with age = 2 and name = fred List for age = 2: List for name = fred : Answer is intersection: Good if domain cardinality small Bit vectors can be compressed 69 7 Join Join Indexes Combine SALE, PRODUCT relations In SQL: SELECT * FROM SALE, PRODUCT sale prodid storeid date amt p c 2 p2 c p c3 5 p2 c2 8 p c 2 44 p c2 2 4 product id name price p bolt p2 nut 5 jointb prodid name price storeid date amt p bolt c 2 p2 nut 5 c p bolt c3 5 p2 nut 5 c2 8 p bolt c 2 44 p bolt c2 2 4 product id name price jindex p bolt r,r3,r5,r6 p2 nut 5 r2,r4 join index sale rid prodid storeid date amt r p c 2 r2 p2 c r3 p c3 5 r4 p2 c2 8 r5 p c 2 44 r6 p c

13 What to Materialize? Store in warehouse results useful for common queries Example: day 2 day c c2 c p p2 c c2 c3 p 2 5 p total sales Materialization Factors Type/frequency of queries Query response time Storage cost Update cost c c2 c3 c c2 c3 p p2 8 p materialize c p p Cube Aggregates Lattice Dimension Hierarchies 29 all all c c2 c3 p c c2 c3 p p2 8 product date, product, date product, date state cities state c CA c2 NY c c2 c3 day 2 p 44 4 p2 c c2 c3 day p 2 5 p2 8, product, date use greedy algorithm to decide what to materialize Dimension Hierarchies Interesting Hierarchy all product date, product, date product, date, product, date state state, product state, date all weeks years quarters months time day week month quarter year conceptual dimension table not all arcs shown... state, product, date days

14 Design Tools What data is needed? Where does it come from? How to clean data? How to represent in warehouse (schema)? What to summarize? What to materialize? What to index? Development design & edit: schemas, views, scripts, rules, queries, reports Planning & Analysis what-if scenarios (schema changes, refresh rates), capa planning Warehouse Management performance monitoring, usage patterns, exception reporting System & Network Management measure traffic (sources, warehouse, clients) Workflow Management reliable scripts for cleaning & analyzing data 79 8 DW Products and Tools Oracle g, IBM DB2, Microsoft SQL Server,... All provide OLAP extensions SAP Business Information Warehouse ERP vendors MicroStrategy, Cognos (now IBM) Specialized vendors Kind of Web-based EXCEL Niche Players (e.g., Btell) Vertical application domain MDX (Multi-Dimensional expressions) " MDX is a Microsoft implementation of query language for OLAP " Example SELECT {[Dim Date].[Time Year].[Time Year]} ON COLUMNS, {[Dim Location].[Region].[Region]} ON ROWS FROM [Mini DW] WHERE ([Measures].[Sales Amount]) 82 Chapter 2: Data Preprocessing Discretization Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept hierarchy generation Summary Three types of attributes: Nominal values from an unordered set, e.g., color, profession Ordinal values from an ordered set, e.g., military or academic rank Continuous real numbers, e.g., integer or real numbers Discretization: Divide the range of a continuous attribute into intervals Some classification algorithms only accept categorical attributes. Reduce data size by discretization Prepare for further analysis October 26, 2 Data Mining: Concepts and Techniques 83 October 26, 2 Data Mining: Concepts and Techniques 84 4

15 Discretization and Concept Hierarchy Segmentation by Natural Partitioning Discretization Reduce the number of values for a given continuous attribute by dividing the range of the attribute into intervals Interval labels can then be used to replace actual data values Supervised vs. unsupervised Split (top-down) vs. merge (bottom-up) A simply rule can be used to segment numeric data into relatively uniform, natural intervals. If an interval covers 3, 6, 7 or 9 distinct values at the most significant digit, partition the range into 3 equiwidth intervals Discretization can be performed recursively on an attribute Concept hierarchy formation Recursively reduce the data by collecting and replacing low level concepts (such as numeric values for age) by higher level concepts (such as young, middle-aged, or senior) If it covers 2, 4, or 8 distinct values at the most significant digit, partition the range into 4 intervals If it covers, 5, or distinct values at the most significant digit, partition the range into 5 intervals October 26, 2 Data Mining: Concepts and Techniques 85 October 26, 2 Data Mining: Concepts and Techniques 86 Example of Rule -35, ,7,896.5 Example count Step : -$35 -$59 profit $,838 $4,7 Min Low (i.e, 5%-tile) High(i.e, 95%- tile) Max Step 2: msd=, Low=-$, High=$2, (-$, - $2,) Step 3: (-$, - ) ( -$,) ($, - $2,) (-$4 -$5,) Step 4: (-$4 - ) ($2, - $5, ) ( - $,) ($, - $2, ) ( - (-$4 - ($, - $2) ($2, - -$3) $,2) ($2 - $3,) ($,2 - (-$3 - $4) $,4) -$2) ($3, - ($4 - ($,4 - $4,) (-$2 - $6) $,6) ($4, - -$) ($6 - ($,6 - $5,) $8) ($8 - ($,8 - $,8) (-$ - $,) $2,) ) Data Mining: Concepts and October 26, 2 Techniques 87 MIN=-35,976. MAX=4,7,896.5 LOW = 5th percentile -59,876 HIGH = 95th percentile,838,76 msd =,, (most significant digit) LOW = -,, (round down) HIGH = 2,, (round up) 3 value ranges. (-,,.. ] 2. (..,,] 3. (,,.. 2,,] Adjust with real MIN and MAX. (-4,.. ] 2. (..,,] 3. (,,.. 2,,] 4. (2,,.. 5,,] October 26, 2 Data Mining: Concepts and Techniques 88 Recursive.. (-4,.. -3, ].2. (-3,.. -2, ].3. (-2,.. -, ].4. (-,.. ] 2.. (.. 2, ] 2.2. (2,.. 4, ] 2.3. (4,.. 6, ] 2.4. (6,.. 8, ] 2.5. (8,..,, ] 3.. (,,..,2, ] 3.2. (,2,..,4, ] 3.3. (,4,..,6, ] 3.4. (,6,..,8, ] 3.5. (,8,.. 2,, ] 4.. (2,,.. 3,, ] 4.2. (3,,.. 4,, ] 4.3. (4,,.. 5,, ] Jaak Vilo and other authors UT: Data Mining Concept Hierarchy Generation for Categorical Data Specification of a partial/total ordering of attributes explicitly at the schema level by users or experts street < < state < country Specification of a hierarchy for a set of values by explicit data grouping {Urbana, Champaign, Chicago} < Illinois Specification of only a partial set of attributes E.g., only street <, not others Automatic generation of hierarchies (or attribute levels) by the analysis of the number of distinct values E.g., for a set of attributes: {street,, state, country} October 26, 2 Data Mining: Concepts and Techniques 9 5

16 Automatic Concept Hierarchy Generation Some hierarchies can be automatically generated based on the analysis of the number of distinct values per attribute in the data set The attribute with the most distinct values is placed at the lowest level of the hierarchy Exceptions, e.g., weekday, month, quarter, year country 5 distinct values Summary OLAP and DW a way to summarise data Prepare data for further data mining and visualisakon province_or_ state 365 distinct values Fact table, aggregakon, queries&indeces, October 26, 2 street 3567 distinct values 674,339 distinct values Data Mining: Concepts and Techniques 9 Jaak Vilo and other authors UT: Data Mining Reference (highly recommended) Jim Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery (), Data Warehousing chapter of Jianwei Han s textbook (chapter 3) Homework Exercises and 4 at: data-warehousing/ex2.pdf Multidimensional data modeling exercise in course Wiki pages

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process. MTAT.03.183 Data Mining Week 7: Online Analytical Processing and Data Warehouses Marlon Dumas marlon.dumas ät ut. ee Acknowledgment This slide deck is a mashup of the following publicly available slide

More information

CS 245: Database System Principles. Warehousing. Outline. What is a Warehouse? What is a Warehouse? Notes 13: Data Warehousing

CS 245: Database System Principles. Warehousing. Outline. What is a Warehouse? What is a Warehouse? Notes 13: Data Warehousing Recall : Database System Principles Notes 3: Data Warehousing Three approaches to information integration: Federated databases did teaser Data warehousing next Mediation Hector Garcia-Molina (Some modifications

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa

Data Warehousing 2. ICS 421 Spring Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa ICS 421 Spring 2010 Data Warehousing 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/30/2010 Lipyeow Lim -- University of Hawaii at Manoa 1 Data Warehousing

More information

Dta Mining and Data Warehousing

Dta Mining and Data Warehousing CSCI6405 Fall 2003 Dta Mining and Data Warehousing Instructor: Qigang Gao, Office: CS219, Tel:494-3356, Email: q.gao@dal.ca Teaching Assistant: Christopher Jordan, Email: cjordan@cs.dal.ca Office Hours:

More information

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 22 Table of contents 1 Introduction 2 Data warehousing

More information

Introduction to Data Warehousing

Introduction to Data Warehousing ICS 321 Spring 2012 Introduction to Data Warehousing Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/23/2012 Lipyeow Lim -- University of Hawaii at Manoa

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 14 : 18/11/2014 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

A Multi-Dimensional Data Model

A Multi-Dimensional Data Model A Multi-Dimensional Data Model A Data Warehouse is based on a Multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in

More information

Chapter 4, Data Warehouse and OLAP Operations

Chapter 4, Data Warehouse and OLAP Operations CSI 4352, Introduction to Data Mining Chapter 4, Data Warehouse and OLAP Operations Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining

More information

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction 2 Data warehousing

More information

What is a Data Warehouse?

What is a Data Warehouse? What is a Data Warehouse? COMP 465 Data Mining Data Warehousing Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Defined in many different ways,

More information

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 31 Table of contents 1 Introduction 2 Data warehousing

More information

ETL and OLAP Systems

ETL and OLAP Systems ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

Data Warehousing & OLAP

Data Warehousing & OLAP Data Warehousing & OLAP Data Mining: Concepts and Techniques Chapter 3 Jiawei Han and An Introduction to Database Systems C.J.Date, Eighth Eddition, Addidon Wesley, 4 1 What is Data Warehousing? What is

More information

Analyse des Données. Master 2 IMAFA. Andrea G. B. Tettamanzi

Analyse des Données. Master 2 IMAFA. Andrea G. B. Tettamanzi Analyse des Données Master 2 IMAFA Andrea G. B. Tettamanzi Université Nice Sophia Antipolis UFR Sciences - Département Informatique andrea.tettamanzi@unice.fr Andrea G. B. Tettamanzi, 2016 1 CM - Séance

More information

Cognos also provides you an option to export the report in XML or PDF format or you can view the reports in XML format.

Cognos also provides you an option to export the report in XML or PDF format or you can view the reports in XML format. About the Tutorial IBM Cognos Business intelligence is a web based reporting and analytic tool. It is used to perform data aggregation and create user friendly detailed reports. IBM Cognos provides a wide

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Jarek Szlichta Acknowledgments: Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques

Jarek Szlichta  Acknowledgments: Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques Jarek Szlichta http://data.science.uoit.ca/ Acknowledgments: Jiawei Han, Micheline Kamber and Jian Pei, Data Mining - Concepts and Techniques Frequent Itemset Mining Methods Apriori Which Patterns Are

More information

Data Warehousing & On-Line Analytical Processing

Data Warehousing & On-Line Analytical Processing Data Warehousing & On-Line Analytical Processing Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

CS 412 Intro. to Data Mining

CS 412 Intro. to Data Mining CS 412 Intro. to Data Mining Chapter 4. Data Warehousing and On-line Analytical Processing Jiawei Han, Computer Science, Univ. Illinois at Urbana -Champaign, 2017 1 2 3 Chapter 4: Data Warehousing and

More information

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad All the content of these PPTs were taken from PPTS of renown author via internet. These PPTs are only mean to share the knowledge among

More information

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Data Warehouses. Yanlei Diao. Slides Courtesy of R. Ramakrishnan and J. Gehrke Data Warehouses Yanlei Diao Slides Courtesy of R. Ramakrishnan and J. Gehrke Introduction v In the late 80s and early 90s, companies began to use their DBMSs for complex, interactive, exploratory analysis

More information

collection of data that is used primarily in organizational decision making.

collection of data that is used primarily in organizational decision making. Data Warehousing A data warehouse is a special purpose database. Classic databases are generally used to model some enterprise. Most often they are used to support transactions, a process that is referred

More information

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept

More information

Data Warehouse. Concepts and Techniques. Chapter 3. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1

Data Warehouse. Concepts and Techniques. Chapter 3. SS Chung. April 5, 2013 Data Mining: Concepts and Techniques 1 Data Warehouse Concepts and Techniques Chapter 3 SS Chung April 5, 2013 Data Mining: Concepts and Techniques 1 Chapter 3: Data Warehousing and OLAP Technology: An Overview What is a data warehouse? A multi-dimensional

More information

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:- UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to

More information

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse Principles of Knowledge Discovery in bases Fall 1999 Chapter 2: Warehousing and Dr. Osmar R. Zaïane University of Alberta Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University

More information

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad

By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad By Mahesh R. Sanghavi Associate professor, SNJB s KBJ CoE, Chandwad Data Analytics life cycle Discovery Data preparation Preprocessing requirements data cleaning, data integration, data reduction, data

More information

Decision Support Systems aka Analytical Systems

Decision Support Systems aka Analytical Systems Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis

More information

Reminds on Data Warehousing

Reminds on Data Warehousing BUSINESS INTELLIGENCE Reminds on Data Warehousing (details at the Decision Support Database course) Business Informatics Degree BI Architecture 2 Business Intelligence Lab Star-schema datawarehouse 3 time

More information

CS490D: Introduction to Data Mining Chris Clifton

CS490D: Introduction to Data Mining Chris Clifton CS490D: Introduction to Data Mining Chris Clifton January 16, 2004 Data Warehousing Data Warehousing and OLAP Technology for Data Mining What is a data warehouse? A multi-dimensional data model Data warehouse

More information

Decision Support Systems

Decision Support Systems Decision Support Systems 2011/2012 Week 3. Lecture 6 Previous Class Dimensions & Measures Dimensions: Item Time Loca0on Measures: Quan0ty Sales TransID ItemName ItemID Date Store Qty T0001 Computer I23

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015 Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

CS 1655 / Spring 2013! Secure Data Management and Web Applications

CS 1655 / Spring 2013! Secure Data Management and Web Applications CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered

More information

Sql Fact Constellation Schema In Data Warehouse With Example

Sql Fact Constellation Schema In Data Warehouse With Example Sql Fact Constellation Schema In Data Warehouse With Example Data Warehouse OLAP - Learn Data Warehouse in simple and easy steps using Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP), Specialized SQL

More information

DATA CUBE : A RELATIONAL AGGREGATION OPERATOR GENERALIZING GROUP-BY, CROSS-TAB AND SUB-TOTALS SNEHA REDDY BEZAWADA CMPT 843

DATA CUBE : A RELATIONAL AGGREGATION OPERATOR GENERALIZING GROUP-BY, CROSS-TAB AND SUB-TOTALS SNEHA REDDY BEZAWADA CMPT 843 DATA CUBE : A RELATIONAL AGGREGATION OPERATOR GENERALIZING GROUP-BY, CROSS-TAB AND SUB-TOTALS SNEHA REDDY BEZAWADA CMPT 843 WHAT IS A DATA CUBE? The Data Cube or Cube operator produces N-dimensional answers

More information

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI

More information

UNIT 2 Data Preprocessing

UNIT 2 Data Preprocessing UNIT 2 Data Preprocessing Lecture Topic ********************************************** Lecture 13 Why preprocess the data? Lecture 14 Lecture 15 Lecture 16 Lecture 17 Data cleaning Data integration and

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Business Analytics and Big Data: the process and the tools

Business Analytics and Big Data: the process and the tools Business Analytics and Big Data: the process and the tools Mehmet Gençer Assoc.Prof., Organization Studies & Computer Engineering mehmetgencer@yahoo.com mehmet.gencer@ieu.edu.tr https://mgencer.com How

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

An Overview of Data Warehousing and OLAP Technology

An Overview of Data Warehousing and OLAP Technology An Overview of Data Warehousing and OLAP Technology CMPT 843 Karanjit Singh Tiwana 1 Intro and Architecture 2 What is Data Warehouse? Subject-oriented, integrated, time varying, non-volatile collection

More information

Data Warehousing & On-line Analytical Processing

Data Warehousing & On-line Analytical Processing Data Warehousing & On-line Analytical Processing Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ Chapter 4: Data Warehousing and On-line

More information

Big Data 13. Data Warehousing

Big Data 13. Data Warehousing Ghislain Fourny Big Data 13. Data Warehousing fotoreactor / 123RF Stock Photo 2 The road to analytics Aurelio Scetta / 123RF Stock Photo 3 Another history of data management (T. Hofmann) 1970s 2000s Age

More information

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

Proceedings of the IE 2014 International Conference  AGILE DATA MODELS AGILE DATA MODELS Mihaela MUNTEAN Academy of Economic Studies, Bucharest mun61mih@yahoo.co.uk, Mihaela.Muntean@ie.ase.ro Abstract. In last years, one of the most popular subjects related to the field of

More information

Big Data 13. Data Warehousing

Big Data 13. Data Warehousing Ghislain Fourny Big Data 13. Data Warehousing fotoreactor / 123RF Stock Photo The road to analytics Aurelio Scetta / 123RF Stock Photo Another history of data management (T. Hofmann) 1970s 2000s Age of

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

DATA WAREHOUSING & DATA MINING. by: Prof. Asha Ambhaikar

DATA WAREHOUSING & DATA MINING. by: Prof. Asha Ambhaikar DATA WAREHOUSING & DATA MINING by: Prof. Asha Ambhaikar 1 UNIT-I Overview and Concepts 2 Contents of Unit-I Need for data warehousing, Basic elements of data warehousing, Trends in data warehousing. Planning

More information

Overview. DW Performance Optimization. Aggregates. Aggregate Use Example

Overview. DW Performance Optimization. Aggregates. Aggregate Use Example Overview DW Performance Optimization Choosing aggregates Maintaining views Bitmapped indices Other optimization issues Original slides were written by Torben Bach Pedersen Aalborg University 07 - DWML

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

Data warehouses Decision support The multidimensional model OLAP queries

Data warehouses Decision support The multidimensional model OLAP queries Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic

More information

Multidimensional Queries

Multidimensional Queries Multidimensional Queries Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,

More information

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

Advanced Data Management Technologies Written Exam

Advanced Data Management Technologies Written Exam Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This

More information

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures) CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm

More information

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,

More information

Data Mining and Analytics. Introduction

Data Mining and Analytics. Introduction Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data

More information

BUSINESS INTELLIGENCE. SSAS - SQL Server Analysis Services. Business Informatics Degree

BUSINESS INTELLIGENCE. SSAS - SQL Server Analysis Services. Business Informatics Degree BUSINESS INTELLIGENCE SSAS - SQL Server Analysis Services Business Informatics Degree 2 BI Architecture SSAS: SQL Server Analysis Services 3 It is both an OLAP Server and a Data Mining Server Distinct

More information

Data Warehousing Introduction. Toon Calders

Data Warehousing Introduction. Toon Calders Data Warehousing Introduction Toon Calders toon.calders@ulb.ac.be Course Organization Lectures on Tuesday 14:00 and Friday 16:00 Check http://gehol.ulb.ac.be/ for room Most exercises in computer class

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2017/18 Unit 10 J. Gamper 1/37 Advanced Data Management Technologies Unit 10 SQL GROUP BY Extensions J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements: I

More information

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders Data Warehousing ETL Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders 1 Overview Picture other sources Metadata Monitor & Integrator OLAP Server Analysis Operational DBs Extract Transform Load

More information

Decision Support, Data Warehousing, and OLAP

Decision Support, Data Warehousing, and OLAP Decision Support, Data Warehousing, and OLAP : Contents Terminology : OLAP vs. OLTP Data Warehousing Architecture Technologies References 1 Decision Support and OLAP Information technology to help knowledge

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model Chapter 3 The Multidimensional Model: Basic Concepts Introduction Multidimensional Model Multidimensional concepts Star Schema Representation Conceptual modeling using ER, UML Conceptual modeling using

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Table of Contents. Rajesh Pandey Page 1

Table of Contents. Rajesh Pandey Page 1 Table of Contents Chapter 1: Introduction to Data Mining and Data Warehousing... 4 1.1 Review of Basic Concepts of Data Mining and Data Warehousing... 4 1.2 Data Mining... 5 1.2.1 Why Data Mining?... 5

More information

Call: SAS BI Course Content:35-40hours

Call: SAS BI Course Content:35-40hours SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Fall 2013 Reading: Chapter 3 Han, Chapter 2 Tan Anca Doloc-Mihu, Ph.D. Some slides courtesy of Li Xiong, Ph.D. and 2011 Han, Kamber & Pei. Data Mining. Morgan Kaufmann.

More information

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 14 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Fall 2016 Lecture 14 - Data Warehousing and Column Stores References Data Cube: A Relational Aggregation Operator Generalizing Group By, Cross-Tab, and

More information

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.com Objectives Explain the basics of: 1. Data

More information

Data Warehousing and OLAP

Data Warehousing and OLAP Data Warehousing and OLAP INFO 330 Slides courtesy of Mirek Riedewald Motivation Large retailer Several databases: inventory, personnel, sales etc. High volume of updates Management requirements Efficient

More information

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Syllabus. Syllabus. Motivation Decision Support. Syllabus Presentation: Sophia Discussion: Tianyu Metadata Requirements and Conclusion 3 4 Decision Support Decision Making: Everyday, Everywhere Decision Support System: a class of computerized information systems

More information

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders

Data Warehousing Conclusion. Esteban Zimányi Slides by Toon Calders Data Warehousing Conclusion Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders Motivation for the Course Database = a piece of software to handle data: Store, maintain, and query Most ideal system

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato)

Data Warehouse Logical Design. Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato) Data Warehouse Logical Design Letizia Tanca Politecnico di Milano (with the kind support of Rosalba Rossato) Data Mart logical models MOLAP (Multidimensional On-Line Analytical Processing) stores data

More information

One Size Fits All: An Idea Whose Time Has Come and Gone

One Size Fits All: An Idea Whose Time Has Come and Gone ICS 624 Spring 2013 One Size Fits All: An Idea Whose Time Has Come and Gone Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 1/9/2013 Lipyeow Lim -- University

More information

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination

More information

Teradata Aggregate Designer

Teradata Aggregate Designer Data Warehousing Teradata Aggregate Designer By: Sam Tawfik Product Marketing Manager Teradata Corporation Table of Contents Executive Summary 2 Introduction 3 Problem Statement 3 Implications of MOLAP

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2017/18 Unit 13 J. Gamper 1/42 Advanced Data Management Technologies Unit 13 DW Pre-aggregation and View Maintenance J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:

More information

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22

ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS. CS121: Relational Databases Fall 2017 Lecture 22 ALTERNATE SCHEMA DIAGRAMMING METHODS DECISION SUPPORT SYSTEMS CS121: Relational Databases Fall 2017 Lecture 22 E-R Diagramming 2 E-R diagramming techniques used in book are similar to ones used in industry

More information

Reporting and Query tools and Applications

Reporting and Query tools and Applications BUSINESS ANALYSIS Reporting and Query tools and Applications Five categories of tools Reporting Managed Query Executive information systems On-line analytical processing Data mining Reporting tools Production

More information

Data Warehousing & OLAP

Data Warehousing & OLAP CMPUT 391 Database Management Systems Data Warehousing & OLAP Textbook: 17.1 17.5 (first edition: 19.1 19.5) Based on slides by Lewis, Bernstein and Kifer and other sources University of Alberta 1 Why

More information

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Lectures for the course: Data Warehousing and Data Mining (IT 60107) Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline

More information

SQL Server Analysis Services

SQL Server Analysis Services DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, SQL Server 2005 Analysis Services SQL Server 2005 Analysis Services - 1 Analysis Services Database and

More information

Create Cube From Star Schema Grouping Framework Manager

Create Cube From Star Schema Grouping Framework Manager Create Cube From Star Schema Grouping Framework Manager Create star schema groupings to provide authors with logical groupings of query Connect to an OLAP data source (cube) in a Framework Manager project

More information