MEDICAL INFORMATICS & DATABASE MANAGEMENT MODULE 5: BIG DATA MANAGEMENT AND ANALYSIS DR.ORALUCK PATTANAPRATEEP

Size: px
Start display at page:

Download "MEDICAL INFORMATICS & DATABASE MANAGEMENT MODULE 5: BIG DATA MANAGEMENT AND ANALYSIS DR.ORALUCK PATTANAPRATEEP"

Transcription

1 MEDICAL INFORMATICS & DATABASE MANAGEMENT MODULE 5: BIG DATA MANAGEMENT AND ANALYSIS DR.ORALUCK PATTANAPRATEEP Doctor of Philosophy Program in Clinical Epidemiology Section for Clinical Epidemiology & Biostatistics Faculty of Medicine Ramathibodi Hospital Mahidol University Semester I Academic year

2 RACE 614 Medical Informatics & Database Management Module 5: Big data management and analysis

3 Contents Objectives... 1 References... 1 I. Big data and data science... 3 Why big data... 4 Data science... 4 II. Data warehouse and visualization... 6 What is a data warehouse Basic data warehousing environment Data mart and its components... 9 Big data and data lake Data visualization Infographic III. Machine learning algorithm and big data analytic What is machine learning algorithms Classification model Regression model Cluster analysis Association analysis... 21

4 Objectives Students should be able to: 1. Understand the big data, data science and data warehouse concept 2. Utilize the data science processes to big data problem 3. Select appropriate data visualizations to clearly communicate analytic insights to audiences 4. Apply appropriate machine learning algorithms to analyze big data References 1. Lantz B. Machine learning with R (2 nd edition). Packt Publishing Provost F and Fawcett T. Data science for business. O Reilly Media, Inc Reeves LL. A manager s guide to data warehousing. Wiley Publishing, Inc Berka P, Rauch J, and Zighed DA. Data mining and medical knowledge management: cases and applications. Information Science Reference Han J and Kamber M. Data mining: concepts and techniques (2 nd edition). Morgan Kaufmann Publisher, CA, USA Kimball R and Ross Margy. The data warehouse toolkit: the complete guide to dimensional modeling 2 nd ed. Wiley Computer Publishing

5 In previous modules, we explored data management of primary source, starting with design record form(s) and manage database in Epidata. However, in the real world, another source of data is secondary, especially in electronic format which recently has grown bigger in size. It is increasingly gathered by high performance and convenient devices, so called big data. In this final module, we will cover three domains exploring big data, which are: 1. Data science and big data: reveals the concept of data value added and introduces data science, which is the new era of data management. 2. Data warehouse and visualization: provide the concept of making data warehouse and demonstrate how to communicate the finding(s). 3. Machine learning algorithm and big data analytic: explore how to mine data with 4 main machine learning algorithms. 2

6 I. Big data and data science How do we find information and knowledge from data or big data. Figure 1 demonstrates the value added from no-meaning raw data at the base of pyramid to meaningful information, knowledge and wisdom. For example, 2 numbers at raw data level, 115 and 90, has no meaning without any clue. By adding the meaning to number, we found relationship of these 2 numbers. It is fasting blood sugar (FBS) which decreases from 115 to 90. But the next question is whether lower FBS is good or bad. Figure 1: from data to wisdom Wisdom Understand principles Knowledge Understand patterns Applied Control of the diet will improve patient s health Context FBS should be less than 100 by dietary control Information Understand relations Data Meaning FBS decreases from 115 to 90 Raw 115, 90 By adding the context that FBS should be less than 100, so the information we found FBS decreases from 115 to 90 is good. Moreover, tools and techniques called machine learning algorithm may be applied to understand patterns and predict future. From this example, we may find patterns of patients who control their diet well and 3

7 predict his/her level of FBS. Finally, at the top, we may conclude that controlling one s diet will improve patient s health. Why big data Big data simply means datasets that are too large, too various, and too rapid for traditional data processing systems. In the past, we have processed only small volumes of data with no variety of data type over one night each time to find information and knowledge. With high hardware performance and technology, data recently has been generated in many forms with high volume and can be kept, retrieved, and analyzed more rapidly at one time. Big data, then, can be mainly described by the following 3 characteristics: Volume: amount of data from a few to millions of records, from one to hundreds of tables. Variety: range of data types and sources from structured to unstructured, from text to image. Velocity: speed of data in and out from batch to real time. Data science Once the massive data in flexible forms can be processed in a few minutes, possibility to find information and dig knowledge will happen more. The next questions are who will perform the analysis and which special talent do they need. 4

8 Figure 2: data science Venn diagram 2.0 Data science Computer science Machine learning Unicorn Maths and statistics Traditional software Traditional research Subject matter expertise Ref: Three essential skills of data scientist (figure 2) are 1.computer science, 2.maths and statistics, and 3.subject matter expertise (Venn diagram 2.0). Since it is quite hard to find one person who is keen in all 3 areas, most data scientists are working as teamwork. Data science process (figure 3) starts with collecting raw data from real world situations e.g. from human behavior, financial issues or medicine utilization. Then formulate hypothesis, process and clean data to get exploratory data analysis which may be in summary statistics or graphs. In case, the data is not enough or cannot answer the question, more data should be collected, processed and cleaned. The next step is building models with machine learning algorithms. The final outcome from data science process is data product or value added data. Within the process, the important thing is communication to the audience to make decisions, such as reports or dashboards. 5

9 Figure 3: data science process Raw data is collected Data is processed Clean data Exploratory data analysis Models & Algorithms Data product Communicate Make decision Ref: II. Data warehouse and visualization From data science process, we discuss about collecting, processing, and cleaning data (so called ETL in data warehousing); and also the importance of communication to audience for making decisions. We begin with comparison of information which is kept in 2 forms with different purposes. Information is mainly kept in 2 forms: the operational systems of record and the data warehouse. The operational systems are where the data is put in and almost always deal with one record at a time; while the data warehouse is where the data is integrated from different operational systems and almost never dealt with one row at a time. Table 1 is a comparison of operational systems and data warehouse. 6

10 Table 1: comparison of operational systems and data warehouse Area of comparison Operational systems Data warehouse Purpose of data Daily business tasks Analysis, planning, decision supporting Function Day-to-day operation, detailed data Long term information, summarized data Design Application oriented, real time Subject oriented, depends on length of cycle for data supplements to warehouse Access Read and write Mostly read Size 100MB to GB 100 GB to TB What is a data warehouse Data warehouse is a central repository of integrated data from one or more disparate transactional data sources; such as relational database, enterprise resource planning. Figure 4 shows basic data warehousing environment, starting with transactional data sources, using a process called ETL: - Extracts data from transactional data sources and normally temporarily keeps in staging tables - Transforms the data in the proper format for the purposes of querying and analysis and - Loads it into the final target which designed and modeled in dimensional format Then at the client side, a user will retrieve data which is already in a data warehouse to create their own dashboard/report for either exploring or analysing purpose. 7

11 - Basic data warehousing environment Figure 4 explains basic data warehousing environment. From left to right, data in various forms are extracted, transformed and loaded into a data warehouse which consists of many data marts. The ETL process or metadata management deals with ODS (operational data store or a mirror (backup) of transactional database), staging tables which are temporary databases, master tables which mainly keep the master data warehouse dimensions. Figure 4: basic data warehouse environment Transactional data Metadata Dimensional Query/Report/ sources management modeling Visualization Relational DB/HIS ODS Data warehouse Pivot in MS Excel ERP Data mart Staging Other tables Data mart sources Dashboard in BI tools Flat files Master tables Data mart Meta data Skills DB design and Extract, transform, Dimensional Multidimensional administration, SQL load (ETL) modeling queries, Data mining, Predictive analysis 8

12 Transactional data Metadata Dimensional Query/Report/ sources management modeling Visualization Tools MS SQL server, IBM data manager, IBM framework IBM Cognos MySQL, Oracle Informatica, Oracle manager, Oracle business object, 11g, MS Access, ODI, SAS DI studio warehouse builder, MS analysis etc. DB2, SQL server services, (MS integration services powerbi, MS PowerPivot, QlikSense, Tableau) DB = database, HIS = hospital information system, ERP = enterprise resource planning, ODS = operational data store - Data mart and its components A simple form of data warehouse that is focused on a single functional area is called data mart or cube. A data mart is designed in dimensional format as a fact table which comprises of measures and dimensions. Typically, measures are values that can be aggregated, and dimensions are groups of hierarchies that define the facts. For example, in figure 5, number of visits is a measure; date, clinic, health scheme are elements of dimensions. A dimension may have none, one or more hierarchies. Health scheme has no hierarchy. Clinic has one hierarchy with one level that means clinic can drill up as building. Also date has 2 hierarchies which are calendar year and fiscal year; and each hierarchy is also in several levels i.e., week, month, and year. In addition, dimension date has one attribute which is day (Monday-Sunday). 9

13 Figure 5: a data mart in star schema Figure 6 explains a data mart as a cube for reporting number of visits in 3 dimensions: date in X axis, clinic in Y axis, and health scheme in Z axis. Each box contains number of visits, e.g., on 1/1/16, 22 NHSO patients visit medicine clinic (3 dimensions: date, clinic and heath scheme). A cube can accommodates data of dimensions that define a business problem. When dimension changes, measure will be summed as box is combined, e.g., 119 patients visit medicine clinic on 1/1/16 (2 dimensions: clinic and date) or when drilling up, 198 patients visit building 1 on 1/1/16. 10

14 Figure 6: a data mart as a cube Big data and data lake With the growth of data in the last decade, the new term dealing with data management system is data lake. Table 2 compares key differences between data warehouse and data lake. Table 2: comparison of data warehouse and data lake Area of comparison data warehouse data lake Data structure Structured Structured and unstructured Data type Cleansed/aggregated Raw Data volume Large (Terabytes) Extremely large (Petabytes) 11

15 Area of comparison data warehouse data lake Access methods SQL NoSQL However, data lake can be added to data system with data warehouse to maximize the use of data. In figure 7, Hadoop is added to retrieve data from unstructured data sources. Figure 7: basic data warehouse environment plus data lake architecture Transactional data sources Data system Query/Report/ Visualization Relational DB/HIS Pivot in MS Excel ERP Other sources Staging tables Data mart Dashboard in BI tools Flat files Master tables Data mart Unstructured data file Meta data 12

16 Data visualization From the last column of transformed data to information and knowledge in figure 4 and 7, data visualization, which is both an art and a science, is one of data science processes (figure 3) to communicate information, knowledge or even data products clearly and efficiently to the audiences. Effective visualization helps users analyze and get an evidence from data. To generate the visualization of the data, we need to understand the data we are trying to visualize, know the audience in what they want to know and then simply use a visual in the best and simplest form to convey the information. There are many tools to do data visualization. From simple tools such as MS Excel to small BI (business intelligence) tools such as MS Power BI, QlikSense, Tableau, etcetera and large BI tools such as as IBM Cognos. Figure 8 and 9 are examples of using pivot tools in MS Excel and dashboard in MS Power BI to present data from a data mart. Figure 8: pivot table and chart in MS Excel 13

17 Figure 9: a dashboard designed in MS Power BI - Infographic Infographic is a combination of 2 words - information and graphic. It is a kind of data visualization that is composed of three parts which are visual, content, and knowledge. The visual means how to make an attractive and memorable graphic, since the vision is a sense in which a human receives significantly more information than any of the other four (touch, hearing, smell, taste). The content must be a statistically proven fact and be able to transfer the knowledge to audiences. Figure 10 is a sample of infographic from WHO in

18 Figure 10: a sample of infographic from WHO 15

19 III. Machine learning algorithm and big data analytic What is machine learning algorithms In previous sections, we have discussed how to manage big data to get information. In this section, we will move next on to how to transform data or information into knowledge with a set of algorithms, called machine learning. Figure 11: machine learning and its combination AI = Artificial Intelligence, KDD = Knowledge Discovery and Data mining As a statistician, we may question what is the difference between statistical modellings and machine learnings. The answer is that statistical modellings are formalization of relationships between variables in the form of mathematical equations, while machine learning are algorithms that can learn from data without relying on rulesbased programming. Machine learning algorithms are generally divided into 2 major categories (descriptive and predictive) with 2 major types of data (continuous and categorical) as shown with the sample techniques in table 3. The objective of descriptive tasks is to derive patterns that summarize the underlying relationships in data. They are often 16

20 exploratory in nature to validate and explain the results. Predictive tasks aim to predict the value of a particular attribute based on the values of other attributes. Table 3: machine learning techniques/algorithms Descriptive tasks (unsupervised) Predictive tasks (supervised) Continuous Clustering Regression Categorical Association Classification Choosing the best algorithm to use for a specific analytical task can be a challenge. While we can use different algorithms to perform the same business task, each algorithm produces a different result, and some algorithms can produce more than one type of result. For example, we can use the Microsoft Decision Trees algorithm not only for prediction, but also as a way to reduce the number of columns in a dataset, because the decision tree can identify columns that do not affect the final mining model. In machine learning process, there are 6 steps as shown in figure 12 starting with 1.understanding the business and type of problem; 2.understanding data (data may be from different sources); 3.preparing data (ETL); 4.creating model; 5.evaluating the model; and 6.deploying. 17

21 Figure 12: CRISP-DM model - Classification model Classification consists of predicting a certain outcome based on a given input. In order to predict the outcome, the algorithm processes a training set containing a set of attributes and the respective outcome. The algorithm tries to discover relationships between the attributes that would make it possible to predict the outcome. Next, the prediction set, which contains the same set of attributes except for the prediction outcome, which will be applied to test the classification model. There are many algorithms applied in classification such as k-nn, Naïve Bayes, and decision tree etcetera. The k-nn or k-nearest neighbors algorithm uses information about a prediction s k-nearest neighbors to classify an unknown outcome. Figure 13 is an example of how to diagnose breast cancer with k-nn algorithm. With 2 dimensional attributes, texture and radius, each dot presents malignant (m) or benign (b). To classify x into m or b, k-nn calculates distance and decides outcome for x. 18

22 Figure 13: an example of k-nn algorithm In the other 2 algorithm, Naïve Bayes or Bayesian method use training data to calculate probability of unknown outcome by using the formula P(A B) = P(B A)P(A) / P(B). Decision tree uses a tree structure to model the relationships among attributes and outcomes. - Regression model While classification algorithm applies for categorical attributes, regression algorithm applies for continuous ones for supervised model. This algorithm is the same as in statistics class which uses independent variables to predict a dependent variable. 19

23 - Cluster analysis Clustering is an unsupervised model that divides data into clusters. I, classification we have training data for knowing outcomes and predicting outcomes from testing data. For example, figure 14, as a medical staff, we want to organize and facilitate diabetes patients to learn how to control their diet and do exercise by dividing into 3 groups based on patients age and level of sugar in blood. Figure 14: an example of diabetes patients The most common algorithm for cluster analysis is k-means. The k-means first assigns each of n examples to one of k clusters, then, it tries to minimize the differences within each cluster and maximize the differences between clusters. Figure 15 is the result of k-mean cluster analysis where patients are divided into 3 groups based on their similarity of age and level of sugar in blood. 20

24 Figure 15: an example of diabetes patients - Association analysis Association or market basket analysis is another unsupervised model that finds the relationships among categorical variables in a dataset. Table 4 is an example of 5 prescriptions from one clinic. Table 4: an example of drugs in prescriptions Rx no. Drug items 1 {PPI, NSAIDs, Calcium} 2 {Antidepressant, NSAIDs, Antianxiety, Muscle relaxant} 3 {NSAIDs, Muscle relaxant, PPI} 4 {Antidepressant, Antianxiety, Calcium} 5 {NSAIDs, PPI, Calcium} 21

25 By looking at only 5 prescriptions dataset, we may guess some patterns: Rx no. 1, 3, and 5 are for orthopedic patients, while Rx no. 2 and 4 are for psychiatric patients. With similar rules, in large transaction databases, association analysis uses statistical measures (support and confidence measures) to locate association of items and groups into the same basket. The most common method is Apriori approach. The Apriori approach is an association rule mining, based on the principle of frequent pattern mining. Performing Apriori analysis involves 2 steps as follow: 1. Generate candidate set: the first step finds items that occur with a frequency that exceeds a specified threshold (defined as support measure) from the data set, that is: Support = Number of observations having A B Total number of observations 2. Derive the association rules: the second step analyses items in the candidate set for mining association rules which indicate conditional probabilities between each pair of item groups. Rules are generated based on pairs whose conditional probability value exceeds a user-defined threshold (called confidence measure), that is: Confidence = Number of observations having A B Number of observations having A 22

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand

More information

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INFORMATION TECHNOLOGY DEFINITIONS AND TERMINOLOGY Course Name : DATA WAREHOUSING AND DATA MINING Course Code : AIT006 Program

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,

More information

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI) About the Speaker Dr. SubraMANI Paramasivam PhD., MCT, MCSE, MCITP, MCP, MCTS, MCSA CEO, Principal Consultant & Trainer

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

Keywords- Classification algorithm, Hypertensive, K Nearest Neighbor, Naive Bayesian, Data normalization

Keywords- Classification algorithm, Hypertensive, K Nearest Neighbor, Naive Bayesian, Data normalization GLOBAL JOURNAL OF ENGINEERING SCIENCE AND RESEARCHES APPLICATION OF CLASSIFICATION TECHNIQUES TO DETECT HYPERTENSIVE HEART DISEASE Tulasimala B. N* 1, Elakkiya S 2 & Keerthana N 3 *1 Assistant Professor,

More information

Data Management Glossary

Data Management Glossary Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

COURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER

COURSE 20466D: IMPLEMENTING DATA MODELS AND REPORTS WITH MICROSOFT SQL SERVER ABOUT THIS COURSE The focus of this five-day instructor-led course is on creating managed enterprise BI solutions. It describes how to implement multidimensional and tabular data models, deliver reports

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

1. Inroduction to Data Mininig

1. Inroduction to Data Mininig 1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the

More information

Knowledge Modelling and Management. Part B (9)

Knowledge Modelling and Management. Part B (9) Knowledge Modelling and Management Part B (9) Yun-Heh Chen-Burger http://www.aiai.ed.ac.uk/~jessicac/project/kmm 1 A Brief Introduction to Business Intelligence 2 What is Business Intelligence? Business

More information

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,

More information

Bull Fast Track/PDW and Big Data

Bull Fast Track/PDW and Big Data Bull Fast Track/PDW and Big Data Add High Performance BI to your Big Data Roger Van Unen Expert Microsoft / BI roger.van-unen@bull.net http://www.bull.fr/bi/fastrack.html Michael Schmitter BI Sales Germany

More information

Dr. SubraMANI Paramasivam. Think & Work like a Data Scientist with SQL 2016 & R

Dr. SubraMANI Paramasivam. Think & Work like a Data Scientist with SQL 2016 & R Dr. SubraMANI Paramasivam Think & Work like a Data Scientist with SQL 2016 & R About the Speaker Group Leader Dr. SubraMANI Paramasivam PhD., MVP, MCT, MCSE (x2), MCITP (x2), MCP, MCTS (x3), MCSA CEO,

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44 Data Mining Piotr Paszek piotr.paszek@us.edu.pl Introduction (Piotr Paszek) Data Mining DM KDD 1 / 44 Plan of the lecture 1 Data Mining (DM) 2 Knowledge Discovery in Databases (KDD) 3 CRISP-DM 4 DM software

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

Implementing Data Models and Reports with SQL Server 2014

Implementing Data Models and Reports with SQL Server 2014 Course 20466D: Implementing Data Models and Reports with SQL Server 2014 Page 1 of 6 Implementing Data Models and Reports with SQL Server 2014 Course 20466D: 4 days; Instructor-Led Introduction The focus

More information

SCHEME OF COURSE WORK. Data Warehousing and Data mining

SCHEME OF COURSE WORK. Data Warehousing and Data mining SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH

More information

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such

More information

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA. Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan

More information

Meetings This class meets on Mondays from 6:20 PM to 9:05 PM in CIS Room 1034 (in class delivery of instruction).

Meetings This class meets on Mondays from 6:20 PM to 9:05 PM in CIS Room 1034 (in class delivery of instruction). Clinton Daniel, Visiting Instructor Information Systems & Decision Sciences College of Business Administration University of South Florida 4202 E. Fowler Avenue, CIS1040 Tampa, Florida 33620-7800 cedanie2@usf.edu

More information

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY DATA WAREHOUSE EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY CHARACTERISTICS Data warehouse is a central repository for summarized and integrated data

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Big Data Specialized Studies

Big Data Specialized Studies Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Python With Data Science

Python With Data Science Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,

More information

Dr.G.R.Damodaran College of Science

Dr.G.R.Damodaran College of Science 1 of 20 8/28/2017 2:13 PM Dr.G.R.Damodaran College of Science (Autonomous, affiliated to the Bharathiar University, recognized by the UGC)Reaccredited at the 'A' Grade Level by the NAAC and ISO 9001:2008

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu

IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts. Enn Õunapuu IDU0010 ERP,CRM ja DW süsteemid Loeng 5 DW concepts Enn Õunapuu enn.ounapuu@ttu.ee Content Oveall approach Dimensional model Tabular model Overall approach Data modeling is a discipline that has been practiced

More information

Data warehouse architecture consists of the following interconnected layers:

Data warehouse architecture consists of the following interconnected layers: Architecture, in the Data warehousing world, is the concept and design of the data base and technologies that are used to load the data. A good architecture will enable scalability, high performance and

More information

ETL and OLAP Systems

ETL and OLAP Systems ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

After completing this course, participants will be able to:

After completing this course, participants will be able to: Designing a Business Intelligence Solution by Using Microsoft SQL Server 2008 T h i s f i v e - d a y i n s t r u c t o r - l e d c o u r s e p r o v i d e s i n - d e p t h k n o w l e d g e o n d e s

More information

Business Intelligence An Overview. Zahra Mansoori

Business Intelligence An Overview. Zahra Mansoori Business Intelligence An Overview Zahra Mansoori Contents 1. Preference 2. History 3. Inmon Model - Inmonities 4. Kimball Model - Kimballities 5. Inmon vs. Kimball 6. Reporting 7. BI Algorithms 8. Summary

More information

Benefits of Automating Data Warehousing

Benefits of Automating Data Warehousing Benefits of Automating Data Warehousing Introduction Data warehousing can be defined as: A copy of data specifically structured for querying and reporting. In most cases, the data is transactional data

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

Create Cube From Star Schema Grouping Framework Manager

Create Cube From Star Schema Grouping Framework Manager Create Cube From Star Schema Grouping Framework Manager Create star schema groupings to provide authors with logical groupings of query Connect to an OLAP data source (cube) in a Framework Manager project

More information

Intro to BI Architecture Warren Sifre

Intro to BI Architecture Warren Sifre Intro to BI Architecture Warren Sifre introduction Warren Sifre Principal Consultant Email: wa_sifre@hotmail.com Website: www.linkedin.com/in/wsifre Twitter: @WAS_SQL Professional History 20 years in the

More information

Data Analysis and Data Science

Data Analysis and Data Science Data Analysis and Data Science CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/29/15 Agenda Check-in Online Analytical Processing Data Science Homework 8 Check-in Online Analytical

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Sql Fact Constellation Schema In Data Warehouse With Example

Sql Fact Constellation Schema In Data Warehouse With Example Sql Fact Constellation Schema In Data Warehouse With Example Data Warehouse OLAP - Learn Data Warehouse in simple and easy steps using Multidimensional OLAP (MOLAP), Hybrid OLAP (HOLAP), Specialized SQL

More information

ETL is No Longer King, Long Live SDD

ETL is No Longer King, Long Live SDD ETL is No Longer King, Long Live SDD How to Close the Loop from Discovery to Information () to Insights (Analytics) to Outcomes (Business Processes) A presentation by Brian McCalley of DXC Technology,

More information

Data warehousing in telecom Industry

Data warehousing in telecom Industry Data warehousing in telecom Industry Dr. Sanjay Srivastava, Kaushal Srivastava, Avinash Pandey, Akhil Sharma Abstract: Data Warehouse is termed as the storage for the large heterogeneous data collected

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa An Introduction to Data Mining in Institutional Research Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa AIR/SPSS Professional Development Series Background Covering variety

More information

Data warehouses Decision support The multidimensional model OLAP queries

Data warehouses Decision support The multidimensional model OLAP queries Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing

More information

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore Data Warehousing Data Mining (17MCA442) 1. GENERAL INFORMATION: PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore 560 100 Department of MCA COURSE INFORMATION SHEET Academic

More information

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012 Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema

More information

Code No: R Set No. 1

Code No: R Set No. 1 Code No: R05321204 Set No. 1 1. (a) Draw and explain the architecture for on-line analytical mining. (b) Briefly discuss the data warehouse applications. [8+8] 2. Briefly discuss the role of data cube

More information

Cognos Dynamic Cubes

Cognos Dynamic Cubes Cognos Dynamic Cubes Amit Desai Cognos Support Engineer Open Mic Facilitator Reena Nagrale Cognos Support Engineer Presenter Gracy Mendonca Cognos Support Engineer Technical Panel Member Shashwat Dhyani

More information

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,

More information

20466C - Version: 1. Implementing Data Models and Reports with Microsoft SQL Server

20466C - Version: 1. Implementing Data Models and Reports with Microsoft SQL Server 20466C - Version: 1 Implementing Data Models and Reports with Microsoft SQL Server Implementing Data Models and Reports with Microsoft SQL Server 20466C - Version: 1 5 days Course Description: The focus

More information

Oracle 1Z0-515 Exam Questions & Answers

Oracle 1Z0-515 Exam Questions & Answers Oracle 1Z0-515 Exam Questions & Answers Number: 1Z0-515 Passing Score: 800 Time Limit: 120 min File Version: 38.7 http://www.gratisexam.com/ Oracle 1Z0-515 Exam Questions & Answers Exam Name: Data Warehousing

More information

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives

Chapter 13 Business Intelligence and Data Warehouses The Need for Data Analysis Business Intelligence. Objectives Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: How business intelligence is a comprehensive framework to support business decision making How operational

More information

Q1) Describe business intelligence system development phases? (6 marks)

Q1) Describe business intelligence system development phases? (6 marks) BUISINESS ANALYTICS AND INTELLIGENCE SOLVED QUESTIONS Q1) Describe business intelligence system development phases? (6 marks) The 4 phases of BI system development are as follow: Analysis phase Design

More information

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI CHAPTER 8 DECISION SUPPORT V2 ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI Topics 2 Business Intelligence (BI) Decision Support System (DSS) Data Warehouse Online Analytical Processing (OLAP)

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

Fig 1.2: Relationship between DW, ODS and OLTP Systems

Fig 1.2: Relationship between DW, ODS and OLTP Systems 1.4 DATA WAREHOUSES Data warehousing is a process for assembling and managing data from various sources for the purpose of gaining a single detailed view of an enterprise. Although there are several definitions

More information

Creating a target user and module

Creating a target user and module The Warehouse Builder contains a number of objects, which we can use in designing our data warehouse, that are either relational or dimensional. OWB currently supports designing a target schema only in

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

Enterprise Data Warehousing

Enterprise Data Warehousing Enterprise Data Warehousing SQL Server 2005 Ron Dunn Data Platform Technology Specialist Integrated BI Platform Integrated BI Platform Agenda Can SQL Server cope? Do I need Enterprise Edition? Will I avoid

More information

COMP 465 Special Topics: Data Mining

COMP 465 Special Topics: Data Mining COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,

More information

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION The process of planning and executing SQL Server migrations can be complex and risk-prone. This is a case where the right approach and

More information

Data Mining. ❸Chapter 3 Data warehouse, ETL and OLAP. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

Data Mining. ❸Chapter 3 Data warehouse, ETL and OLAP. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology ❸Chapter 3 Data warehouse, and Business School, University of Shanghai for Science & Technology 2016-2017 2nd Semester, Spring2017 Contents of chapter 2 1 KDD Process 2 3 4 5 What is KDD? KDD Process the

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Chapter 3: Data Mining:

Chapter 3: Data Mining: Chapter 3: Data Mining: 3.1 What is Data Mining? Data Mining is the process of automatically discovering useful information in large repository. Why do we need Data mining? Conventional database systems

More information

Analysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan

Analysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan International Journal of Scientific & Engineering Research Volume 2, Issue 5, May-2011 1 Analysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan Abstract - Data mining

More information

Call: SAS BI Course Content:35-40hours

Call: SAS BI Course Content:35-40hours SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio

More information

Winter Semester 2009/10 Free University of Bozen, Bolzano

Winter Semester 2009/10 Free University of Bozen, Bolzano Data Warehousing and Data Mining Winter Semester 2009/10 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html

More information

Microsoft Developer Day

Microsoft Developer Day Microsoft Developer Day Pradeep Menon Microsoft Developer Day Solutions Architect Agenda Microsoft Developer Day Traditional Business Intelligence Architecture Structured Sources Extract Transform Structurize

More information

CHAPTER 3 Implementation of Data warehouse in Data Mining

CHAPTER 3 Implementation of Data warehouse in Data Mining CHAPTER 3 Implementation of Data warehouse in Data Mining 3.1 Introduction to Data Warehousing A data warehouse is storage of convenient, consistent, complete and consolidated data, which is collected

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data. Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

More information

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been

More information

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Keywords: Big Data, Oracle Big Data Appliance, Hadoop, NoSQL, Oracle

More information

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management Management Information Systems Review Questions Chapter 6 Foundations of Business Intelligence: Databases and Information Management 1) The traditional file environment does not typically have a problem

More information

Table of Contents. Knowledge Management Data Warehouses and Data Mining. Introduction and Motivation

Table of Contents. Knowledge Management Data Warehouses and Data Mining. Introduction and Motivation Table of Contents Knowledge Management Data Warehouses and Data Mining Dr. Michael Hahsler Dept. of Information Processing Vienna Univ. of Economics and BA 11. December 2001

More information

Knowledge Management Data Warehouses and Data Mining

Knowledge Management Data Warehouses and Data Mining Knowledge Management Data Warehouses and Data Mining Dr. Michael Hahsler Dept. of Information Processing Vienna Univ. of Economics and BA 11. December 2001 1 Table of Contents

More information

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

Embedded Technosolutions

Embedded Technosolutions Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication

More information

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination

More information

Teradata Aggregate Designer

Teradata Aggregate Designer Data Warehousing Teradata Aggregate Designer By: Sam Tawfik Product Marketing Manager Teradata Corporation Table of Contents Executive Summary 2 Introduction 3 Problem Statement 3 Implications of MOLAP

More information

INDEPTH Network. Introduction to ETL. Tathagata Bhattacharjee ishare2 Support Team

INDEPTH Network. Introduction to ETL. Tathagata Bhattacharjee ishare2 Support Team INDEPTH Network Introduction to ETL Tathagata Bhattacharjee ishare2 Support Team Data Warehouse A data warehouse is a system used for reporting and data analysis. Integrating data from one or more different

More information

How to integrate data into Tableau

How to integrate data into Tableau 1 How to integrate data into Tableau a comparison of 3 approaches: ETL, Tableau self-service and WHITE PAPER WHITE PAPER 2 data How to integrate data into Tableau a comparison of 3 es: ETL, Tableau self-service

More information

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry Data Mining Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934

More information

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model

Chapter 3. The Multidimensional Model: Basic Concepts. Introduction. The multidimensional model. The multidimensional model Chapter 3 The Multidimensional Model: Basic Concepts Introduction Multidimensional Model Multidimensional concepts Star Schema Representation Conceptual modeling using ER, UML Conceptual modeling using

More information

Information Management Fundamentals by Dave Wells

Information Management Fundamentals by Dave Wells Information Management Fundamentals by Dave Wells All rights reserved. Reproduction in whole or part prohibited except by written permission. Product and company names mentioned herein may be trademarks

More information

Step-by-step data transformation

Step-by-step data transformation Step-by-step data transformation Explanation of what BI4Dynamics does in a process of delivering business intelligence Contents 1. Introduction... 3 Before we start... 3 1 st. STEP: CREATING A STAGING

More information