Data Warehousing & Data Mining

Size: px
Start display at page:

Download "Data Warehousing & Data Mining"

Transcription

1 Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig

2 7. Building the DW 7. Building the DW 7.1 The DW Project 7.2 Data Extract/Transform/Load (ETL) 7.3 Metadata DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2

3 7.1 The DW Project Building a DW, is a complex IT project A middle size DW-project contains activities DW-Project organization Project roles and corresponding tasks, e.g.: Roles DW-PM DW-Architect DW-Miner Domain Expert DW-System Developer DW User Tasks Project Management Methods, Concepts, Modeling Concepts, Analyze(non-standard) Domain Knowledge System- and metadata-management, ETL Analyze (standard) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3

4 7.1 The DW Project Usual tasks in a DW-Project Communication, as process of information exchange between team members Conflict management The magical triangle, compromise between time, costs and quality Quality assurance Performance, reliability, scalability, robustness, etc. Documentation DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 4

5 7.1 The DW Project Software choice Database system for the DW Usually the choice is to use the same technology provider as for the operational data MDB vs. RDB ETL tools Differentiate by the data cleansing needs Analysis tools Varying from data mining to OLAP products, with a focus on reporting functionality Repository Not very oft used Helpful for metadata management DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5

6 7.1 The DW Project Hardware choice Data storage RAID systems, SAN s, NAS s Processing Multi-CPU systems, SMP, Clusters Failure tolerance Data replication, mirroring RAID, backup strategies Other factors Data access times, transfer rates, memory bandwidth, network throughput and latency DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 6

7 7.1 The DW Project Project Timeline, depends on the development methodology, but usually Phase I Proof of Concept Establish Technical Infrastructure Prototype Data Extraction/Transformation/Load Prototype Analysis & Reporting Phase II Controlled Release Iterative Process of Building Subject Oriented Data Marts Phase III General Availability On going operations, support and training, maintenance and growth The most important part of the DW building project is defining the ETL process DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 7

8 7.2 ETL What is ETL? Short for extract, transform, and load Three database functions that are combined into one tool to pull data out of productive databases and place it into the DW Migrate data from one database to another, to form data marts and data warehouses Convert databases from one format or type to another DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 8

9 7.2 ETL When should we ETL? Periodically (e.g., every night, every week) or after significant events Refresh policy set by administrator based on user needs and traffic Possibly different policies for different sources Rarely, on every update (real-time DW) Not warranted unless warehouse data require current data (up to the minute stock quotes) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 9

10 7.2 ETL ETL is used to integrate heterogeneous systems With different DBMS, operating system, hardware, communication protocols ETL challenges Getting the data from the source to target as fast as possible Allow recovery from failure without restarting the whole process This leads to balance between writing data to staging tables or keeping it in memory DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 10

11 7.2 ETL Staging area, basic rules Data in the staging area is owned by the ETL team Users are not allowed in the staging area at any time Reports cannot access data from the staging area Only ETL processes can write to and read from the staging area DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 11

12 7.2 Staging Area Staging area structures for holding data Flat files XML data sets Relational tables DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 12

13 7.2 Data Structures Flat files ETL tools based on scripts, such as Perl, VBScript or JavaScript Advantages No overhead of maintaining metadata as DBMS does Sorting, merging, deleting, replacing and other data-migration functions are much faster outside the DBMS Disadvantages No concept of updating Queries and random access lookups are not well supported by the operating system Flat files can not be indexed for fast lookups DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 13

14 7.2 Data Structures When should flat files be used? Staging source data for safekeeping and recovery Best approach to restart a failed process is by having data dumped in a flat file Sorting data Sorting data in a file system may be more efficient as performing it in a DBMS with Order By clause Sorting is important: a huge portion of the ETL processing cycles goes to sorting DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 14

15 7.2 Data Structures When should flat files be used? Filtering Using grep-like functionality Replacing text strings Sequential file processing is much faster at system-level than using a database DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 15

16 7.2 Data Structures XML Data sets Used as common format for both input and output from the ETL system Generally, not used for persistent staging Useful mechanisms XML schema (successor of DTD) XQuery, XPath XSLT DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 16

17 7.2 Data Structures Relational tables Using tables is most appropriate especially when there are no dedicated ETL tools Advantages Apparent metadata: column names data types and lengths, cardinality, etc. Relational abilities: data integrity as well as normalized staging Open repository/sql interface: easy to access by any SQL compliant tool Disadvantages Sometimes slower than the operating file system DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 17

18 7.2 Staging Area - Storage How is the staging area designed? Staging database, file system, and directory structures are set up by the DB and OS administrators based on ETL architect estimations e.g., tables volumetric worksheet Table Name Update strategy Load frequency ETL Job Initial row count Avg row length Grows with Expected rows/mo Expected bytes/mo Initial table size Table Size 6 mo. (MB) S_ACC Truncate/ Reload Daily SAcc 39, New account 9, ,548 1,078, S_ASSETS Insert/ Delete Daily SAssets 771, New assets 192,875 15,044,250 60,177, S_DEFECT Truncate/ Reload On demand SDefect New defect , DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 18

19 7.2 ETL ETL Data extraction Data transformation Data loading DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 19

20 7.2 Data Extraction Data Extraction Data needs to be taken from a data source so that it can be put into the DW Internal scripts/tools at the data source, which export the data to be used External programs, which extract the data from the source If the data is exported, it is typically exported into a text file that can then be brought into an intermediary database If the data is extracted from the source, it is typically transferred directly into an intermediary database DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 20

21 7.2 Data Extraction Steps in data extraction Initial extraction Preparing the logical map First time data extraction Ongoing extraction Just new data Changed data Or even deleted data DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 21

22 7.2 Data Extraction Logical map connects the original source data to the final data Building the logical map: first identify the data sources Data discovery phase Collecting and documenting source systems: databases, tables, relations, cardinality, keys, data types, etc. Anomaly detection phase NULL values can destroy any ETL process, e.g., if a foreign key is NULL, joining tables on a NULL column results in data loss, because in RDB NULL NULL DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 22

23 7.2 Data Extraction Example of solving NULL anomalies If NULL on foreign key then use outer joins If NULL on other columns then create a business rule to replace NULLs while loading data in DW Ongoing Extraction Data needs to be maintained in the DW also after the initial load Extraction is performed on a regular basis Only changes are extracted after the first time How do we detect changes? DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 23

24 7.2 Ongoing Extraction Detecting changes (new/changed data) Using audit columns Database log scraping or sniffing Process of elimination Using audit columns Store the date and time a record has been added or modified at Detect changes based on date stamps higher than the last extraction date DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 24

25 7.2 Detecting Changes Log scraping Takes a snapshot of the database redo log at a certain time (e.g., midnight) and finds the transactions affecting the tables ETL is interested in It can be problematic when the redo log gets full and is emptied by the DBA Log sniffing Pooling the redo log at small time granularity, capturing the transactions on the fly The better choice: suitable also for real-time ETL DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 25

26 7.2 Detecting Changes Process of elimination Preserves exactly one copy of each previous extraction During next run, it compares the entire source tables against the extraction copy Only differences are sent to the DW Advantages Because the process makes row by row comparisons, it is impossible to miss data It can also detect deleted rows DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 26

27 7.2 Data Transformation Data transformation Uses rules, lookup tables, or combinations with other data, to convert source data to the desired state 2 major steps Data Cleaning May involve manual work Assisted by artificial intelligence algorithms and pattern recognition Data Integration May also involve manual work DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 27

28 7.2 Data Cleaning Extracted data can be dirty. How does clean data look like? Data Quality characteristics: Correct: values and descriptions in data represent their associated objects truthfully E.g., if the city in which store A is located is Braunschweig, then the address should not report Paris Unambiguous: the values and descriptions in data can be taken to have only one meaning DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 28

29 7.2 Data Cleaning Consistent: values and descriptions in data use one constant notational convention E.g., Braunschweig can be expressed as BS or Brunswick, by our employees in USA. Consistency means using just BS in all our data Complete Individual values and descriptors in data have a value (not null) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 29

30 7.2 Data Cleaning The data cleaning engine produces 3 main deliverables: Data-profiling results: Meta-data repository describing schema definitions, business objects, domains, data sources, table definitions, data rules, value rules, etc. Represents a quantitative assessment of original data sources DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 30

31 7.2 Cleaning Deliverables Error event table Structured as a dimensional star schema Each data quality error identified by the cleaning subsystem is inserted as a row in the error event fact table Data Quality Screen: - Status report on data quality - Gateway which lets only clean data go through DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 31

32 7.2 Cleaning Deliverables Audit dimension Describes the data-quality context of a fact table record being loaded into the DW Attached to each fact record Aggregates the information from the error event table on a per record basis Audit key (PK) Completeness category (text) Completeness score (integer) Number screens failed Max severity score Extract timestamp Clean timestamp DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 32

33 7.2 Data Cleaning Overall Process Flow Is data really dirty? Data Transformation Deliverables DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 33

34 7.2 Data Quality Sometimes data is just garbage We shouldn t load garbage in the DW Cleaning data manually takes just forever!!! DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 34

35 7.2 Data Quality Use tools to clean data semi-automatic Commercial software SAP Business Objects IBM InfoSphere Data Stage Oracle Data Quality and Oracle Data Profiling Open source tools E.g., Eobjects DataCleaner, Talend Open Profiler DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 35

36 7.2 Data Quality Data cleaning process: use of thesaurus, regular expressions, geographical information, etc. DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 36

37 7.2 Data Quality Regular expressions for date/time data DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 37

38 7.2 Cleaning Engine Core of the data cleaning engine: anomaly detection phase Data anomaly is a piece of data which doesn t fit into the domain of the rest of the data it is stored with E.g., Male named Marry??? DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 38

39 7.2 Anomaly Detection Anomaly detection Count the rows in a table while grouping on the column in question e.g., SELECT city, count(*) FROM order_detail GROUP BY city City Count(*) Bremen 2 Berlin 3 WOB 4,500 BS 12,000 HAN 46,000 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 39

40 7.2 Anomaly Detection What if our table has 100 million rows with 250,000 distinct values? Use data sampling e.g., Divide the whole data into 1000 pieces, and choose 1 record from each Add a random number column to the data, sort it an take the first 1000 records Etc. Common mistake is to select a range of dates Most anomalies happen temporarily DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 40

41 7.2 Data Quality Data profiling Pay closer look to strange values Observe data distribution pattern Gaussian distribution SELECT AVERAGE(sales_value) 3 * STDDEV(sales_value), AVERAGE(sales_value) + 3 * STDDEV(sales_value) INTO Min_resonable, Max_resonable FROM DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 41

42 7.2 Data Quality Data distribution Flat distribution Identifiers distributions (keys) Zipfian distribution Some values appear more often than others In sales, more cheap goods are sold than expensive ones DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 42

43 7.2 Data Quality Data distribution Pareto, Poisson, S distribution Distribution discovery Statistical software: SPSS, StatSoft, R, etc. DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 43

44 7.2 Data Integration Data integration Several conceptual schemas need to be combined into a unified global schema All differences in perspective and terminology have to be resolved All redundancy has to be removed schema 1 schema 2 schema 3 schema 4 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 44

45 7.2 Schema Integration There are four basic steps needed for conceptual schema integration 1. Preintegration analysis 2. Comparison of schemas 3. Conformation of schemas 4. Merging and restructuring of schemas The integration process needs continual refinement and reevaluation DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 45

46 7.2 Schema Integration schemas with conflicts identify conflicts Preintegration analysis Comparison of schemas list of conflicts resolve conflicts Conformation of schemas modified schemas integrated schema integrate schemas Merging and restructuring DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 46

47 7.2 Schema Integration Preintegration analysis needs a close look on the individual conceptual schemas to decide for an adequate integration strategy The larger the number of constructs, the more important is modularization Is it really sensible/possible to integrate all schemas? DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 47

48 7.2 Schema Integration Integration strategy Binary approach n-ary approach Sequential integration Balanced integration One-shot integration Iterative integration DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 48

49 7.2 Schema Integration Schema conflicts Table/Table conflicts Table naming e.g., synonyms, homonyms Structure conflicts e.g., missing attributes Conflicting integrity conditions Attribute/Attribute conflicts Naming e.g., synonyms, homonyms Default value conflicts Conflicting integrity conditions e.g., different data types or boundary limitations Table/Attribute conflicts DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 49

50 7.2 Schema Integration The basic goal is to make schemas compatible for integration Conformation usually needs manual interaction Conflicts need to be resolved semantically Rename entities/attributes Convert differing types, e.g., convert an entity to an attribute or a relationship Align cardinalities/functionalities Align different datatypes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 50

51 7.2 Schema Integration Schema integration is a semantic process This usually means a lot of manual work Computers can support the process by matching some (parts of) schemas There have been some approaches towards (semi-)automatic matching of schemas Matching is a complex process and usually only focuses on simple constructs like Are two entities semantically equivalent? The result is still rather error-prone DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 51

52 7.2 Schema Integration Schema Matching Label-based matching For each label in one schema consider all labels of the other schema and every time gauge their semantic similarity (Price vs Cost) Instance-based matching E.g., find correlations between attributes: Are there duplicate tuples? or Are the data distributions in their respective domains similar? Structure-based matching Abstracting from the actual labels, only the structure of the schema is evaluated, e.g., regarding element types, depths in hierarchies, number and type of relationships, etc. DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 52

53 7.2 Schema Integration If integration is query-driven only Schema Mapping is needed Mapping from one or more source schemas to a target schema Source schema S High-level mapping Target schema T Correspondence Mapping compiler Correspondence Data Low-level mapping DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 53

54 7.2 Schema Integration Schema Mapping Abstracting from the actual labels, regarding element types, depths in hierarchies, number and type of relationships, etc. Product ID: Decimal Product: VARCHAR(50) GroupID:Decimal Product ProdID: Decimal Product: VARCHAR(50) Group: VARCHAR(50) Categ: VARCHAR(50) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 54

55 7.2 Schema Integration Schema integration in practice BEA AquaLogic Data Services Special Feature: easy-to-use modeling: Mappings and transformations can be designed in an easy-to-use GUI tool using a library of over 200 functions. For complex mappings and transformations, architects and developers can bypass the GUI tool and use an Xquery source code editor to define or edit services. DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 55

56 7.2 Schema Integration What tools are actually given to support integration? Data Translation Tool Transforms binary data into XML Transforms XML to binary data Data Transformation Tool Transforms an XML to another XML Base Idea Transform data to application specific XML Transform to XML specific to other application / general schema Transform back to binary Note: the integration work still has to be done manually DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 56

57 7.2 Schema Integration I can t afford expensive BEA consultants and the AquaLogic Integration Suite, what now?? Do it completely yourself Most used technologies can be found as Open Source projects (data mappers, XSL engines, XSL editors, etc) Do it yourself with specialized tools Many companies and open source projects are specialized in developing data integration and transformation tools CloverETL Altova MapForce BusinessObjects Data Integrator Informatica Powerhouse, etc DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 57

58 7.2 Schema Integration Altova MapForce Same idea as the BEA Integrator Also based on XSLT and a data description language Editors for binary/db to XML mapping Editor for XSL transformation Automatic generation of data sources, webservices, and transformation modules in Java, C#, C++ DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 58

59 7.2 Schema Integration Google Refine DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 59

60 7.2 Loading The loading process can be broken down into 2 different types: Initial load Continuous load (loading over time) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 60

61 7.2 Loading Issues Huge volumes of data to be loaded Small time window available when warehouse can be taken off line (usually nights) When to build index and aggregated tables Allow system administrators to monitor, cancel, resume, change load rates Recover gracefully - restart after failure from where you were and without loss of data integrity DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 61

62 7.2 Loading Initial Load Deliver dimensions tables Create and assign surrogate keys, each time a new cleaned and conformed dimension record has to be loaded Write dimensions to disk as physical tables, in the proper dimensional format Deliver fact tables Utilize bulk-load utilities Load in parallel Tools DTS Data Transformation Services (set of tools) bcp utility batch copy SQL* Loader DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 62

63 7.2 Loading Continuous load (loading over time) Must be scheduled and processed in a specific order to maintain integrity, completeness, and a satisfactory level of trust (if done once a year the data is obsolete) Should be the most carefully planned step in data warehousing or can lead to: Error duplication Exaggeration of inconsistencies in data DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 63

64 7.2 Loading Continuous load of facts Separate updates from inserts Drop any indexes not required to support updates Load updates Drop all remaining indexes Load inserts through bulk loaders Rebuild indexes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 64

65 7.3 Metadata Metadata - data about data In DW, metadata describe the contents of a data warehouse and how to use it What information exists in a data warehouse, what the information means, how it was derived, from what source systems it comes, when it was created, what pre-built reports and analyses exist for manipulating the information, etc. DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 65

66 7.3 Metadata Types of metadata in DW Source system metadata Data staging metadata DBMS metadata DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 66

67 7.3 Metadata Source system metadata Source specifications E.g., repositories, and source logical schemas Source descriptive information E.g., ownership descriptions, update frequencies and access methods Process information E.g., job schedules and extraction code DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 67

68 7.3 Metadata Data staging metadata Data acquisition information, such as data transmission scheduling and results, and file usage Dimension table management, such as definitions of dimensions, and surrogate key assignments Transformation and aggregation, such as data enhancement and mapping, DBMS load scripts, and aggregate definitions Audit, job logs and documentation, such as data lineage records, data transform logs DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 68

69 7.3 Metadata DW schema: e.g., Cube description metadata DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 69

70 Summary How to build a DW The DW Project: usual tasks, hardware, software, timeline (phases) Data Extract/Transform/Load (ETL): Data storage structures, extraction strategies (e.g., scraping, sniffing) Transformation: data quality, integration Loading: issues, and strategies, (bulk loading for fact data is a must) Metadata: Describes the contents of a DW, comprises all the intermediate products of ETL, Helps for understanding how to use the DW Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 70

71 Next lecture Real-Time Data Warehouses Real-Time Requirements Real-Time ETL In-Memory Data Warehouses DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 71

7. Building the DW. 7.1 The DW Project. 7.1 The DW Project. 7.1 The DW Project. 7.1 The DW Project. 7. Building the DW

7. Building the DW. 7.1 The DW Project. 7.1 The DW Project. 7.1 The DW Project. 7.1 The DW Project. 7. Building the DW 7. Building the DW Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 7. Building the DW 7.1

More information

Summary. 7. Building the DW. 7.1 The DW Project. 7.1 The DW Project. 7.1 The DW Project 12/10/2010. OLAP Operations:

Summary. 7. Building the DW. 7.1 The DW Project. 7.1 The DW Project. 7.1 The DW Project 12/10/2010. OLAP Operations: Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de OLAP Operations: Roll-up: hierarchical,

More information

Data Warehousing & Mining Techniques

Data Warehousing & Mining Techniques Data Warehousing & Mining Techniques Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 2. Summary Last week: What is a Data

More information

2. Summary. 2.1 Basic Architecture. 2. Architecture. 2.1 Staging Area. 2.1 Operational Data Store. Last week: Architecture and Data model

2. Summary. 2.1 Basic Architecture. 2. Architecture. 2.1 Staging Area. 2.1 Operational Data Store. Last week: Architecture and Data model 2. Summary Data Warehousing & Mining Techniques Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: What is a Data

More information

Data Warehousing & Mining Techniques

Data Warehousing & Mining Techniques 2. Summary Data Warehousing & Mining Techniques Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: What is a Data

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last Week: Optimization - Indexes

More information

Data Warehousing. Jens Teubner, TU Dortmund Summer Jens Teubner Data Warehousing Summer

Data Warehousing. Jens Teubner, TU Dortmund Summer Jens Teubner Data Warehousing Summer Jens Teubner Data Warehousing Summer 2018 1 Data Warehousing Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2018 Jens Teubner Data Warehousing Summer 2018 160 Part VI ETL Process ETL Overview

More information

ETL and OLAP Systems

ETL and OLAP Systems ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 02 Lifecycle of Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Data Warehousing. Adopted from Dr. Sanjay Gunasekaran

Data Warehousing. Adopted from Dr. Sanjay Gunasekaran Data Warehousing Adopted from Dr. Sanjay Gunasekaran Main Topics Overview of Data Warehouse Concept of Data Conversion Importance of Data conversion and the steps involved Common Industry Methodology Outline

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

CHAPTER 3 Implementation of Data warehouse in Data Mining

CHAPTER 3 Implementation of Data warehouse in Data Mining CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures) CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm

More information

Full file at

Full file at Chapter 2 Data Warehousing True-False Questions 1. A real-time, enterprise-level data warehouse combined with a strategy for its use in decision support can leverage data to provide massive financial benefits

More information

Techno Expert Solutions An institute for specialized studies!

Techno Expert Solutions An institute for specialized studies! Getting Started Course Content of IBM Cognos Data Manger Identify the purpose of IBM Cognos Data Manager Define data warehousing and its key underlying concepts Identify how Data Manager creates data warehouses

More information

A Examcollection.Premium.Exam.47q

A Examcollection.Premium.Exam.47q A2090-303.Examcollection.Premium.Exam.47q Number: A2090-303 Passing Score: 800 Time Limit: 120 min File Version: 32.7 http://www.gratisexam.com/ Exam Code: A2090-303 Exam Name: Assessment: IBM InfoSphere

More information

Relational Database Systems 1

Relational Database Systems 1 Relational Database Systems 1 Wolf-Tilo Balke Hermann Kroll, Janus Wawrzinek, Stephan Mennicke Institut für Informationssysteme Technische Universität Braunschweig www.ifis.cs.tu-bs.de 4 View Integration

More information

Data warehouse architecture consists of the following interconnected layers:

Data warehouse architecture consists of the following interconnected layers: Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Pro Tech protechtraining.com

Pro Tech protechtraining.com Course Summary Description This course provides students with the skills necessary to plan, design, build, and run the ETL processes which are needed to build and maintain a data warehouse. It is based

More information

Advanced Data Management Technologies Written Exam

Advanced Data Management Technologies Written Exam Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This

More information

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015

CT75 DATA WAREHOUSING AND DATA MINING DEC 2015 Q.1 a. Briefly explain data granularity with the help of example Data Granularity: The single most important aspect and issue of the design of the data warehouse is the issue of granularity. It refers

More information

Microsoft SQL Server Training Course Catalogue. Learning Solutions

Microsoft SQL Server Training Course Catalogue. Learning Solutions Training Course Catalogue Learning Solutions Querying SQL Server 2000 with Transact-SQL Course No: MS2071 Two days Instructor-led-Classroom 2000 The goal of this course is to provide students with the

More information

ETL (Extraction Transformation & Loading) Testing Training Course Content

ETL (Extraction Transformation & Loading) Testing Training Course Content 1 P a g e ETL (Extraction Transformation & Loading) Testing Training Course Content o Data Warehousing Concepts BY Srinivas Uttaravilli What are Data and Information and difference between Data and Information?

More information

Summary. Summary. 5. Optimization. 5.1 Partitioning. 5.1 Partitioning. 26-Nov-10. B-Trees are not fit for multidimensional data R-Trees

Summary. Summary. 5. Optimization. 5.1 Partitioning. 5.1 Partitioning. 26-Nov-10. B-Trees are not fit for multidimensional data R-Trees Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de B-Trees are not fit for multidimensional

More information

Designing your BI Architecture

Designing your BI Architecture IBM Software Group Designing your BI Architecture Data Movement and Transformation David Cope EDW Architect Asia Pacific 2007 IBM Corporation DataStage and DWE SQW Complex Files SQL Scripts ERP ETL Engine

More information

BI/DWH Test specifics

BI/DWH Test specifics BI/DWH Test specifics Jaroslav.Strharsky@s-itsolutions.at 26/05/2016 Page me => TestMoto: inadequate test scope definition? no problem problem cold be only bad test strategy more than 16 years in IT more

More information

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10 Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10 RAJIV GANDHI COLLEGE OF ENGINEERING & TECHNOLOGY, KIRUMAMPAKKAM-607 402 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING QUESTION BANK

More information

High Speed ETL on Low Budget

High Speed ETL on Low Budget High Speed ETL on Low Budget Introduction Data Acquisition & populating it in a warehouse has traditionally been carried out using dedicated ETL tools available in the market. An enterprise-wide Data Warehousing

More information

Relational Database Systems 2 1. System Architecture

Relational Database Systems 2 1. System Architecture Relational Database Systems 2 1. System Architecture Wolf-Tilo Balke Jan-Christoph Kalo Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 1 Organizational

More information

Replication. Some uses for replication:

Replication. Some uses for replication: Replication SQL Server 2000 Replication allows you to distribute copies of data from one database to another, on the same SQL Server instance or between different instances. Replication allows data to

More information

Call: SAS BI Course Content:35-40hours

Call: SAS BI Course Content:35-40hours SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio

More information

Introduction to ETL with SAS

Introduction to ETL with SAS Analytium Ltd Analytium Ltd Why ETL is important? When there is no managed ETL If you are here, at SAS Global Forum, you are probably involved in data management or data consumption in one or more ways.

More information

Essentials of Database Management

Essentials of Database Management Essentials of Database Management Jeffrey A. Hoffer University of Dayton Heikki Topi Bentley University V. Ramesh Indiana University PEARSON Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

One Size Fits All: An Idea Whose Time Has Come and Gone

One Size Fits All: An Idea Whose Time Has Come and Gone ICS 624 Spring 2013 One Size Fits All: An Idea Whose Time Has Come and Gone Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 1/9/2013 Lipyeow Lim -- University

More information

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS Questions & Answers- DBMS https://career.guru99.com/top-50-database-interview-questions/ 1) Define Database. A prearranged collection of figures known as data is called database. 2) What is DBMS? Database

More information

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data

Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data Oracle Warehouse Builder 10g Release 2 Integrating Packaged Applications Data June 2006 Note: This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality,

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database.

1. Analytical queries on the dimensionally modeled database can be significantly simpler to create than on the equivalent nondimensional database. 1. Creating a data warehouse involves using the functionalities of database management software to implement the data warehouse model as a collection of physically created and mutually connected database

More information

Passit4sure.P questions

Passit4sure.P questions Passit4sure.P2090-045.55 questions Number: P2090-045 Passing Score: 800 Time Limit: 120 min File Version: 5.2 http://www.gratisexam.com/ P2090-045 IBM InfoSphere Information Server for Data Integration

More information

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples. Instructions to the Examiners: 1. May the Examiners not look for exact words from the text book in the Answers. 2. May any valid example be accepted - example may or may not be from the text book 1. Attempt

More information

The InfoLibrarian Metadata Appliance Automated Cataloging System for your IT infrastructure.

The InfoLibrarian Metadata Appliance Automated Cataloging System for your IT infrastructure. Metadata Integration Appliance Times have changed and here is modern solution that delivers instant return on your investment. The InfoLibrarian Metadata Appliance Automated Cataloging System for your

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week: Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: Logical Model: Cubes,

More information

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse

Summary of Last Chapter. Course Content. Chapter 2 Objectives. Data Warehouse and OLAP Outline. Incentive for a Data Warehouse Principles of Knowledge Discovery in bases Fall 1999 Chapter 2: Warehousing and Dr. Osmar R. Zaïane University of Alberta Dr. Osmar R. Zaïane, 1999 Principles of Knowledge Discovery in bases University

More information

Data Strategies for Efficiency and Growth

Data Strategies for Efficiency and Growth Data Strategies for Efficiency and Growth Date Dimension Date key (PK) Date Day of week Calendar month Calendar year Holiday Channel Dimension Channel ID (PK) Channel name Channel description Channel type

More information

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1 Basic Concepts :- 1. What is Data? Data is a collection of facts from which conclusion may be drawn. In computer science, data is anything in a form suitable for use with a computer. Data is often distinguished

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Incremental Updates VS Full Reload

Incremental Updates VS Full Reload Incremental Updates VS Full Reload Change Data Capture Minutes VS Hours 1 Table of Contents Executive Summary - 3 Accessing Data from a Variety of Data Sources and Platforms - 4 Approaches to Moving Changed

More information

C_HANAIMP142

C_HANAIMP142 C_HANAIMP142 Passing Score: 800 Time Limit: 4 min Exam A QUESTION 1 Where does SAP recommend you create calculated measures? A. In a column view B. In a business layer C. In an attribute view D. In an

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,

More information

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems

Practical Database Design Methodology and Use of UML Diagrams Design & Analysis of Database Systems Practical Database Design Methodology and Use of UML Diagrams 406.426 Design & Analysis of Database Systems Jonghun Park jonghun@snu.ac.kr Dept. of Industrial Engineering Seoul National University chapter

More information

BI, Big Data, Mission Critical. Eduardo Rivadeneira Specialist Sales Manager

BI, Big Data, Mission Critical. Eduardo Rivadeneira Specialist Sales Manager BI, Big Data, Mission Critical Eduardo Rivadeneira Specialist Sales Manager Required 9s & Protection Blazing-Fast Performance Enhanced Security & Compliance Rapid Data Exploration & Visualization Managed

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 8. Real-Time DW 8. Real-Time Data Warehouses

More information

E(xtract) T(ransform) L(oad)

E(xtract) T(ransform) L(oad) Gunther Heinrich, Tobias Steimer E(xtract) T(ransform) L(oad) OLAP 20.06.08 Agenda 1 Introduction 2 Extract 3 Transform 4 Load 5 SSIS - Tutorial 2 1 Introduction 1.1 What is ETL? 1.2 Alternative Approach

More information

Fundamentals of Information Systems, Seventh Edition

Fundamentals of Information Systems, Seventh Edition Chapter 3 Data Centers, and Business Intelligence 1 Why Learn About Database Systems, Data Centers, and Business Intelligence? Database: A database is an organized collection of data. Databases also help

More information

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)? Introduction to Data Warehousing and Business Intelligence Overview Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction A tour of the coming DW lectures DW Applications Loosely

More information

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data. Managing Data Data storage tool must provide the following features: Data definition (data structuring) Data entry (to add new data) Data editing (to change existing data) Querying (a means of extracting

More information

Physical DB design and tuning: outline

Physical DB design and tuning: outline Physical DB design and tuning: outline Designing the Physical Database Schema Tables, indexes, logical schema Database Tuning Index Tuning Query Tuning Transaction Tuning Logical Schema Tuning DBMS Tuning

More information

Hyperion Interactive Reporting Reports & Dashboards Essentials

Hyperion Interactive Reporting Reports & Dashboards Essentials Oracle University Contact Us: +27 (0)11 319-4111 Hyperion Interactive Reporting 11.1.1 Reports & Dashboards Essentials Duration: 5 Days What you will learn The first part of this course focuses on two

More information

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis

Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Aggregating Knowledge in a Data Warehouse and Multidimensional Analysis Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.com Objectives Explain the basics of: 1. Data

More information

The Data Organization

The Data Organization C V I T F E P A O TM The Data Organization 1251 Yosemite Way Hayward, CA 94545 (510) 303-8868 rschoenrank@computer.org Business Intelligence Process Architecture By Rainer Schoenrank Data Warehouse Consultant

More information

Call: Datastage 8.5 Course Content:35-40hours Course Outline

Call: Datastage 8.5 Course Content:35-40hours Course Outline Datastage 8.5 Course Content:35-40hours Course Outline Unit -1 : Data Warehouse Fundamentals An introduction to Data Warehousing purpose of Data Warehouse Data Warehouse Architecture Operational Data Store

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective

A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective A Novel Approach of Data Warehouse OLTP and OLAP Technology for Supporting Management prospective B.Manivannan Research Scholar, Dept. Computer Science, Dravidian University, Kuppam, Andhra Pradesh, India

More information

Informatica Power Center 10.1 Developer Training

Informatica Power Center 10.1 Developer Training Informatica Power Center 10.1 Developer Training Course Overview An introduction to Informatica Power Center 10.x which is comprised of a server and client workbench tools that Developers use to create,

More information

C Exam Code: C Exam Name: IBM InfoSphere DataStage v9.1

C Exam Code: C Exam Name: IBM InfoSphere DataStage v9.1 C2090-303 Number: C2090-303 Passing Score: 800 Time Limit: 120 min File Version: 36.8 Exam Code: C2090-303 Exam Name: IBM InfoSphere DataStage v9.1 Actualtests QUESTION 1 In your ETL application design

More information

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1

Appliances and DW Architecture. John O Brien President and Executive Architect Zukeran Technologies 1 Appliances and DW Architecture John O Brien President and Executive Architect Zukeran Technologies 1 OBJECTIVES To define an appliance Understand critical components of a DW appliance Learn how DW appliances

More information

MCSA SQL SERVER 2012

MCSA SQL SERVER 2012 MCSA SQL SERVER 2012 1. Course 10774A: Querying Microsoft SQL Server 2012 Course Outline Module 1: Introduction to Microsoft SQL Server 2012 Introducing Microsoft SQL Server 2012 Getting Started with SQL

More information

Review -Chapter 4. Review -Chapter 5

Review -Chapter 4. Review -Chapter 5 Review -Chapter 4 Entity relationship (ER) model Steps for building a formal ERD Uses ER diagrams to represent conceptual database as viewed by the end user Three main components Entities Relationships

More information

Oracle 1Z0-640 Exam Questions & Answers

Oracle 1Z0-640 Exam Questions & Answers Oracle 1Z0-640 Exam Questions & Answers Number: 1z0-640 Passing Score: 800 Time Limit: 120 min File Version: 28.8 http://www.gratisexam.com/ Oracle 1Z0-640 Exam Questions & Answers Exam Name: Siebel7.7

More information

Bonus Content. Glossary

Bonus Content. Glossary Bonus Content Glossary ActiveX control: A reusable software component that can be added to an application, reducing development time in the process. ActiveX is a Microsoft technology; ActiveX components

More information

An Information Asset Hub. How to Effectively Share Your Data

An Information Asset Hub. How to Effectively Share Your Data An Information Asset Hub How to Effectively Share Your Data Hello! I am Jack Kennedy Data Architect @ CNO Enterprise Data Management Team Jack.Kennedy@CNOinc.com 1 4 Data Functions Your Data Warehouse

More information

Question: 1 What are some of the data-related challenges that create difficulties in making business decisions? Choose three.

Question: 1 What are some of the data-related challenges that create difficulties in making business decisions? Choose three. Question: 1 What are some of the data-related challenges that create difficulties in making business decisions? Choose three. A. Too much irrelevant data for the job role B. A static reporting tool C.

More information

Chapter 6 VIDEO CASES

Chapter 6 VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

SAS Data Integration Studio 3.3. User s Guide

SAS Data Integration Studio 3.3. User s Guide SAS Data Integration Studio 3.3 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS Data Integration Studio 3.3: User s Guide. Cary, NC: SAS Institute

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 05 Data Modeling Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Data Modeling

More information

Oracle Database 11g: Data Warehousing Fundamentals

Oracle Database 11g: Data Warehousing Fundamentals Oracle Database 11g: Data Warehousing Fundamentals Duration: 3 Days What you will learn This Oracle Database 11g: Data Warehousing Fundamentals training will teach you about the basic concepts of a data

More information

Topic 1, Volume A QUESTION NO: 1 In your ETL application design you have found several areas of common processing requirements in the mapping specific

Topic 1, Volume A QUESTION NO: 1 In your ETL application design you have found several areas of common processing requirements in the mapping specific Vendor: IBM Exam Code: C2090-303 Exam Name: IBM InfoSphere DataStage v9.1 Version: Demo Topic 1, Volume A QUESTION NO: 1 In your ETL application design you have found several areas of common processing

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic

More information

Data Mining & Data Warehouse

Data Mining & Data Warehouse Data Mining & Data Warehouse Asso. Profe. Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Information Technology 2016 2017 (1) Points to Cover Problem:

More information

1Z0-526

1Z0-526 1Z0-526 Passing Score: 800 Time Limit: 4 min Exam A QUESTION 1 ABC's Database administrator has divided its region table into several tables so that the west region is in one table and all the other regions

More information

Best ETL Design Practices. Helpful coding insights in SAS DI studio. Techniques and implementation using the Key transformations in SAS DI studio.

Best ETL Design Practices. Helpful coding insights in SAS DI studio. Techniques and implementation using the Key transformations in SAS DI studio. SESUG Paper SD-185-2017 Guide to ETL Best Practices in SAS Data Integration Studio Sai S Potluri, Synectics for Management Decisions; Ananth Numburi, Synectics for Management Decisions; ABSTRACT This Paper

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved.

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved. Configuring the Oracle Network Environment Objectives After completing this lesson, you should be able to: Use Enterprise Manager to: Create additional listeners Create Oracle Net Service aliases Configure

More information

Guide to Managing Common Metadata

Guide to Managing Common Metadata IBM InfoSphere Information Serer Version 11 Release 3 Guide to Managing Common Metadata SC19-4297-01 IBM InfoSphere Information Serer Version 11 Release 3 Guide to Managing Common Metadata SC19-4297-01

More information

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders

Data Warehousing ETL. Esteban Zimányi Slides by Toon Calders Data Warehousing ETL Esteban Zimányi ezimanyi@ulb.ac.be Slides by Toon Calders 1 Overview Picture other sources Metadata Monitor & Integrator OLAP Server Analysis Operational DBs Extract Transform Load

More information

File Processing Approaches

File Processing Approaches Relational Database Basics Review Overview Database approach Database system Relational model File Processing Approaches Based on file systems Data are recorded in various types of files organized in folders

More information

Sql Fact Constellation Schema In Data Warehouse With Example

Sql Fact Constellation Schema In Data Warehouse With Example Sql Fact Constellation Schema In Data Warehouse With Example Data Warehouse OLAP - Learn Data Warehouse in simple and easy steps using Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP), Specialized SQL

More information

IBM B5280G - IBM COGNOS DATA MANAGER: BUILD DATA MARTS WITH ENTERPRISE DATA (V10.2)

IBM B5280G - IBM COGNOS DATA MANAGER: BUILD DATA MARTS WITH ENTERPRISE DATA (V10.2) IBM B5280G - IBM COGNOS DATA MANAGER: BUILD DATA MARTS WITH ENTERPRISE DATA (V10.2) Dauer: 5 Tage Durchführungsart: Präsenztraining Zielgruppe: This course is intended for Developers. Nr.: 35231 Preis:

More information

Managing Data Resources

Managing Data Resources Chapter 7 Managing Data Resources 7.1 2006 by Prentice Hall OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Describe how

More information

Contingency Planning and Disaster Recovery

Contingency Planning and Disaster Recovery Contingency Planning and Disaster Recovery Best Practices Version: 7.2.x Written by: Product Knowledge, R&D Date: April 2017 2017 Lexmark. All rights reserved. Lexmark is a trademark of Lexmark International

More information

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

Proceedings of the IE 2014 International Conference  AGILE DATA MODELS AGILE DATA MODELS Mihaela MUNTEAN Academy of Economic Studies, Bucharest mun61mih@yahoo.co.uk, Mihaela.Muntean@ie.ase.ro Abstract. In last years, one of the most popular subjects related to the field of

More information

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept] Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept] 1. What is DBMS? A Database Management System (DBMS) is a program that controls creation, maintenance and use

More information