Graph Cube: On Warehousing and OLAP Multidimensional Networks

Size: px
Start display at page:

Download "Graph Cube: On Warehousing and OLAP Multidimensional Networks"

Transcription

1 Graph Cube: On Warehousing and OLAP Multidimensional Networks Peixiang Zhao, Xiaolei Li, Dong Xin, Jiawei Han Department of Computer Science, UIUC Groupon Inc. Google Cooperation June 16th, 2011 SIGMOD 2011 Athens, Greece 1 / 24

2 Outline 1 Introduction 2 The Graph Cube Model 3 OLAP on Graph Cube Cuboid Query Crossboid Query 4 Implementing Graph Cube 5 Experiment 6 Conclusion SIGMOD 2011 Athens, Greece 2 / 24

3 Introduction Recent years have seen an astounding growth of networks in a wide spectrum of application domains Communication networks Social networks Biological networks The Web Multidimensional networks 1 An underlying graph structure comprising entities and relationships 2 Multidimensional attributes are specified and associated with entities of the network There exist considerable technology gaps in managing, querying and summarizing multidimensional networks effectively SIGMOD 2011 Athens, Greece 3 / 24

4 A Sample Multidimensional Network (a) Graph ID Gender Location Profession Income 1 Male CA Teacher $70, Female WA Teacher $65, Female CA Engineer $80, Female NY Teacher $90, Male IL Lawyer $80, Female WA Teacher $90, Male NY Lawyer $100, Male IL Engineer $75, Female CA Lawyer $120, Male IL Engineer $95, 000 (b) Vertex Attribute Table Figure: A Multidimensional Network Comprising a Graph Structure and a Multidimensional Vertex Attribute Table SIGMOD 2011 Athens, Greece 4 / 24

5 Introduction Motivation: Can we extend decision support facilities on multidimensional networks? Data warehouses and OLAP are advantageous in the multidimensional network scenario Summarizing the massive networks into different levels of granularity for more effective analysis and exploration Business Intelligence: in Facebook and Twitter, advertisers and marketers take advantage of social networks within different multidimensional spaces to better promote their products via social targeting or viral marketing However, in multidimensional networks, much of the valuation and interest lies in the network itself! Simple numeric value based group-by s in traditional data warehouses are no longer insightful and of limited usage, because the structural information of the networks is simply ignored SIGMOD 2011 Athens, Greece 5 / 24

6 Network Aggregation v.s. Traditional Group-by Male Female (a) Aggregate Network 3 Gender COUNT(*) Male 5 Female 5 (b) Aggregate Table Figure: Multidimensional Network Aggregation v.s. Traditional RDB Aggregation (Group by Gender) (Female, CA) (Male, IL) (Male, CA) 1 (Female, WA) (Male, NY) (Female, NY) Gender Location COUNT(*) Male CA 1 Female CA 2 Female WA 2 Male IL 3 Male NY 1 Female NY 1 (a) Aggregate Network (b) Aggregate Table Figure: Multidimensional Network Aggregation v.s. Traditional RDB Aggregation (Group by Gender and Location) SIGMOD 2011 Athens, Greece 6 / 24

7 Introduction Graph Cube A multidimensional network can be summarized to aggregate networks in coarser levels of granularity within different multidimensional spaces Vertex coalescence Structure summarization Different query models and OLAP solutions are proposed for multidimensional networks Cuboid Queries Crossboid Queries Efficient implementation is based on a combination of Well-studied data cube implementation techniques Special characteristics of multidimensional networks The first to systematically address warehousing and OLAP issues on large multidimensional networks SIGMOD 2011 Athens, Greece 7 / 24

8 The Graph Cube Model Multidimensional Network A multidimensional network, N, is a graph denoted as N = (V, E, A), where V is a set of vertices, E V V is a set of edges and A = {A 1, A 2,..., A n } is a set of n vertex-specific attributes, i.e., u V, there is a tuple A(u) of u, denoted as A(u) = (A 1 (u), A 2 (u),..., A n (u)), where A i (u) is the value of u on i-th attribute, 1 i n. A is called the dimensions of the network N. Some (or all) dimension A i could be (ALL), representing a super-aggregation along A i Given a set of n dimensions of a network, there exist 2 n multidimensional spaces (aggregations) The measure within each possible space is no longer a simple numeric value, but an aggregate network SIGMOD 2011 Athens, Greece 8 / 24

9 The Graph Cube Model Graph Cube Given a multidimensional network N = (V, E, A), the graph cube is obtained by restructuring N in all possible aggregations of A. For each possible aggregation A of A, the grouping measure is an aggregate network G w.r.t. A. Apex 2 (Gender) (Location) (Profession) (Gender, Location) (Gender, Profession) (Location, Profession) Base Figure: The Graph Cube Lattice SIGMOD 2011 Athens, Greece 9 / 24

10 OLAP on Graph Cubes Cuboid Query: return as output the aggregate network corresponding to a specific aggregation of the dimensions of the multidimensional network What is the network structure between various genders? What is the network structure between the various gender and location combinations? (Male, CA) 1 (Female, CA) 2 2 (Female, WA) Male Female 3 (Male, IL) (Female, NY) (Male, NY) SIGMOD 2011 Athens, Greece 10 / 24

11 OLAP on Graph Cubes A cuboid query is within a single multidimensional space, which follows the traditional OLAP model A crossboid query crosses multiple multidimensional spaces of the network, i.e., more than one cuboid is involved in a query What is the network structure between the user with ID = 3 and various locations? What is the network structure between users grouped by gender v.s. users grouped by location?. Male Female WA IL ID: CA 1 1 NY CA IL WA NY SIGMOD 2011 Athens, Greece 11 / 24

12 Cuboid Queries v.s. Crossboid Queries (Gender) (Location) (Profession) (Gender, Location, Profession) (Location) "What is the network structure between users and the locations?" (Gender) (Gender, Profession) (Gender, Location) Apex (a) Traditional Cuboid Queries "What is the network structure between users grouped by gender and users grouped by location?" (Gender, Location, Profession) (b) Crossboid Queries Straddling Multiple Cuboids SIGMOD 2011 Athens, Greece 12 / 24

13 Graph Cube Implementation Objective: compute the aggregate networks of different cuboids grouping on all possible dimension combinations of a multidimensional network 1 Full materialization: Best query response time, worst space cost 2 No materialization: Best space cost, worst query response time 3 Partial materialization: A small portion of cuboids is materialized in order to balance the tradeoff between query response time and cube resource requirement SIGMOD 2011 Athens, Greece 13 / 24

14 Graph Cube Implementation: Partial Materialization Problem: To select a set S of k cuboids in the graph cube for materialization, such that the average time taken to evaluate the queries can be minimized The partial materialization problem is NP-complete, reduced from set-cover Greedy Algorithm: Selecting k cuboids with the highest size-reduction benefit Theorem Let B greedy be the benefit of k cuboids chosen by the greedy algorithm and let B opt be the benefit of any optimal set of k cuboids. Then B greedy (1 1/e) B opt and this bound is tight MinLevel Algorithm: Materializing cuboids c, where dim(c) = l 0 indicating the level in the cube lattice at which we start materializing cuboids SIGMOD 2011 Athens, Greece 14 / 24

15 Experimental Evaluation DBLP data set A co-authorship graph with 28, 702 authors as vertices and 66, 832 coauthor relationships as edges Three dimensions: name, area, productivity area: DB, DM, AI, IR productivity: Excellent, Good, Fair, Poor IMDB data set A movie rating network with 116, 164 vertices and 5, 452, 350 edges Seven dimensions: Title, Year, Length, Budget, Rating, MPAA and Type MPAA: G, PG, PG-13, R, NC-17, NR Type: action, animation, comedy, drama, documentary, romance, short SIGMOD 2011 Athens, Greece 15 / 24

16 Effectiveness Evaluation DB DM Poor Fair AI (c) (Area) IR Good Excellent (d) (Productivity) Figure: Cuboid Queries of the Graph Cube on DBLP Data Set SIGMOD 2011 Athens, Greece 16 / 24

17 Effectiveness Evaluation 4182 (DM, Poor) 252 (DB, Excellent) (DM, Fair) (DB, Good) (DM, Good) (DB, Fair) (DM, Excellent) (DB, Poor) (AI, Poor) (IR, Excellent) (AI, Fair) (AI, Good) (IR, Good) (AI, Excellent) (IR, Fair) 4590 (IR, Poor) (a) (Area, Productivity) Figure: Cuboid Queries of the Graph Cube on DBLP Data Set SIGMOD 2011 Athens, Greece 17 / 24

18 Effectiveness Evaluation DB DM AI IR Poor Fair Good Excellent (a) Area Productivity Figure: Crossboid Queries of the Graph Cube on DBLP Data Set SIGMOD 2011 Athens, Greece 18 / 24

19 Effectiveness Evaluation DB DM AI IR DB DM AI IR Hector Garcia-Molina Philip S. Yu Poor 33 Fair 24 Good 6 Excellent (a) Area Base Productivity for Hector Garcia-Molina 71 Poor 52 Fair 12 Good 13 Excellent (b) Area Base Productivity for Philip S. Yu Figure: Crossboid Queries of the Graph Cube on DBLP Data Set SIGMOD 2011 Athens, Greece 19 / 24

20 Efficiency Evaluation Runtime (seconds) Raw Table Graph Cube Number of Dimensions (a) Time v.s. # Dimensions Runtime (seconds) Raw Table Graph Cube Number of Edges (*10K) (b) Time v.s. # Edges Figure: Full Materialization of Graph Cube for DBLP Data Set SIGMOD 2011 Athens, Greece 20 / 24

21 Efficiency Evaluation Runtime (seconds) Graph Cube Raw Table Number of Dimensions (a) Time v.s. # Dimensions Runtime (seconds) Graph Cube Raw Table Number of Edges (*1M) (b) Time v.s. # Edges Figure: Full Materialization of Graph Cube for IMDB Data Set SIGMOD 2011 Athens, Greece 21 / 24

22 Efficiency Evaluation Runtime (seconds) Greedy MinLevel Number of Materialized Cuboids (a) Cuboid Queries Runtime (seconds) Greedy MinLevel Number of Materialized Cuboids (b) Crossboid Queries Figure: Average Query Respond Time w.r.t. Different Partial Materialization Algorithms SIGMOD 2011 Athens, Greece 22 / 24

23 Conclusion 1 This work seeks to enhance decision-support functionality on large multidimensional networks 2 Graph cube: A new data warehousing model is designed specifically for efficient aggregation on multidimensional networks 3 Different query models and OLAP solutions for Graph Cube are proposed and studied Crossboid queries break the boundary of the traditional OLAP model by straddling multiple cuboids of the Graph Cube 4 The implementation of Graph Cube is discussed and the experimental results have demonstrated the power and efficacy of Graph Cube as the first, to the best of our knowledge, tool for warehousing and OLAP large multidimensional networks SIGMOD 2011 Athens, Greece 23 / 24

24 Thank you SIGMOD 2011 Athens, Greece 24 / 24

Discovering Interesting Patterns in Large Graph Cubes

Discovering Interesting Patterns in Large Graph Cubes Discovering Interesting Patterns in Large Graph Cubes 07 BigGraphs Workshop at IEEE BigData'7 Florian Demesmaeker, Consultant @EURA NOVA Discovering Interesting Patterns in Large Graph Cubes Florian Demesmaeker,

More information

On Graph Query Optimization in Large Networks

On Graph Query Optimization in Large Networks On Graph Query Optimization in Large Networks Peixiang Zhao, Jiawei Han Department of omputer Science University of Illinois at Urbana-hampaign pzhao4@illinois.edu, hanj@cs.uiuc.edu September 14th, 2010

More information

Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar

Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours

More information

Novel Materialized View Selection in a Multidimensional Database

Novel Materialized View Selection in a Multidimensional Database Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/

More information

ETL and OLAP Systems

ETL and OLAP Systems ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester

More information

Social Data Exploration

Social Data Exploration Social Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization Workshop 12ème Séminaire POC LIP6 Dec 5 th, 2014 Collaborative data model User space (with attributes)

More information

Decision Support Systems aka Analytical Systems

Decision Support Systems aka Analytical Systems Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis

More information

Quotient Cube: How to Summarize the Semantics of a Data Cube

Quotient Cube: How to Summarize the Semantics of a Data Cube Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign)

More information

SQL Server Analysis Services

SQL Server Analysis Services DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, SQL Server 2005 Analysis Services SQL Server 2005 Analysis Services - 1 Analysis Services Database and

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

Discovering Interesting Patterns in Large Graph Cubes

Discovering Interesting Patterns in Large Graph Cubes Discovering Interesting Patterns in Large Graph Cubes Florian Demesmaeker, Amine Ghrab, Siegfried Nijssen and Sabri Skhiri Université Catholique de Louvain, Louvain-La-Neuve, Belgium Email: siegfried.nijssen@uclouvain.be

More information

International Journal of Computer Sciences and Engineering. Research Paper Volume-6, Issue-1 E-ISSN:

International Journal of Computer Sciences and Engineering. Research Paper Volume-6, Issue-1 E-ISSN: International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-6, Issue-1 E-ISSN: 2347-2693 Precomputing Shell Fragments for OLAP using Inverted Index Data Structure D. Datta

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

Subspace Discovery for Promotion: A Cell Clustering Approach

Subspace Discovery for Promotion: A Cell Clustering Approach Subspace Discovery for Promotion: A Cell Clustering Approach Tianyi Wu and Jiawei Han University of Illinois at Urbana-Champaign, USA {twu5,hanj}@illinois.edu Abstract. The promotion analysis problem has

More information

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory

The University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory Warehousing Outline Andrew Kusiak 2139 Seamans Center Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934 Introduction warehousing concepts Relationship

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Efficient Computation of Data Cubes. Network Database Lab

Efficient Computation of Data Cubes. Network Database Lab Efficient Computation of Data Cubes Network Database Lab Outlines Introduction Some CUBE Algorithms ArrayCube PartitionedCube and MemoryCube Bottom-Up Cube (BUC) Conclusions References Network Database

More information

Efficient Aggregation for Graph Summarization

Efficient Aggregation for Graph Summarization Efficient Aggregation for Graph Summarization Yuanyuan Tian (University of Michigan) Richard A. Hankins (Nokia Research Center) Jignesh M. Patel (University of Michigan) Motivation Graphs are everywhere

More information

SQL Server 2005 Analysis Services

SQL Server 2005 Analysis Services atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of SQL Server

More information

Incognito: Efficient Full Domain K Anonymity

Incognito: Efficient Full Domain K Anonymity Incognito: Efficient Full Domain K Anonymity Kristen LeFevre David J. DeWitt Raghu Ramakrishnan University of Wisconsin Madison 1210 West Dayton St. Madison, WI 53706 Talk Prepared By Parul Halwe(05305002)

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic

More information

Graph-Based Synopses for Relational Data. Alkis Polyzotis (UC Santa Cruz)

Graph-Based Synopses for Relational Data. Alkis Polyzotis (UC Santa Cruz) Graph-Based Synopses for Relational Data Alkis Polyzotis (UC Santa Cruz) Data Synopses Data Query Result Data Synopsis Query Approximate Result Problem: exact answer may be too costly to compute Examples:

More information

IMPLEMENTING STATISTICAL DOMAIN DATABASES IN POLAND. OPPORTUNITIES AND THREATS. Central Statistical Office in Poland

IMPLEMENTING STATISTICAL DOMAIN DATABASES IN POLAND. OPPORTUNITIES AND THREATS. Central Statistical Office in Poland IMPLEMENTING STATISTICAL DOMAIN DATABASES IN POLAND. OPPORTUNITIES AND THREATS. Central Statistical Office in Poland Agenda 2 Background Current state The goal of the SDD Architecture Technologies Data

More information

OLAP on Information Networks: a new Framework for Dealing with Bibliographic Data

OLAP on Information Networks: a new Framework for Dealing with Bibliographic Data OLAP on Information Networks: a new Framework for Dealing with Bibliographic Data Wararat Jakawat, C. Favre, Sabine Loudcher To cite this version: Wararat Jakawat, C. Favre, Sabine Loudcher. OLAP on Information

More information

Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding

Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding LienHua Pauline Chou and Xiuzhen Zhang School of Computer Science and Information Technology RMIT University, Melbourne, VIC., Australia,

More information

Analyzing a Greedy Approximation of an MDL Summarization

Analyzing a Greedy Approximation of an MDL Summarization Analyzing a Greedy Approximation of an MDL Summarization Peter Fontana fontanap@seas.upenn.edu Faculty Advisor: Dr. Sudipto Guha April 10, 2007 Abstract Many OLAP (On-line Analytical Processing) applications

More information

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube

OLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Compressed Aggregations for mobile OLAP Dissemination

Compressed Aggregations for mobile OLAP Dissemination INTERNATIONAL WORKSHOP ON MOBILE INFORMATION SYSTEMS (WMIS 2007) Compressed Aggregations for mobile OLAP Dissemination Ilias Michalarias Arkadiy Omelchenko {ilmich,omelark}@wiwiss.fu-berlin.de 1/25 Overview

More information

2. Data Preprocessing

2. Data Preprocessing 2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459

More information

Map-Reduce for Cube Computation

Map-Reduce for Cube Computation 299 Map-Reduce for Cube Computation Prof. Pramod Patil 1, Prini Kotian 2, Aishwarya Gaonkar 3, Sachin Wani 4, Pramod Gaikwad 5 Department of Computer Science, Dr.D.Y.Patil Institute of Engineering and

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Fast Contextual Preference Scoring of Database Tuples

Fast Contextual Preference Scoring of Database Tuples Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation

More information

Data Cube Materialization Using Map Reduce

Data Cube Materialization Using Map Reduce Data Cube Materialization Using Map Reduce Kawhale Rohitkumar 1, Sarita Patil 2 Student, Dept. of Computer Engineering, G.H Raisoni College of Engineering and Management, Pune, SavitribaiPhule Pune University,

More information

Mining Trusted Information in Medical Science: An Information Network Approach

Mining Trusted Information in Medical Science: An Information Network Approach Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou

More information

Chapter 18: Data Analysis and Mining

Chapter 18: Data Analysis and Mining Chapter 18: Data Analysis and Mining Database System Concepts See www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP 18.2 Decision

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2017/18 Unit 13 J. Gamper 1/42 Advanced Data Management Technologies Unit 13 DW Pre-aggregation and View Maintenance J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:

More information

DW Performance Optimization (II)

DW Performance Optimization (II) DW Performance Optimization (II) Overview Data Cube in ROLAP and MOLAP ROLAP Technique(s) Efficient Data Cube Computation MOLAP Technique(s) Prefix Sum Array Multiway Augmented Tree Aalborg University

More information

Big Data Analytics. Rasoul Karimi

Big Data Analytics. Rasoul Karimi Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline

More information

Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course Details Course Outline Module 1: Introduction to Microsoft SQL Server Analysis Services This module introduces

More information

Exam Datawarehousing INFOH419 July 2013

Exam Datawarehousing INFOH419 July 2013 Exam Datawarehousing INFOH419 July 2013 Lecturer: Toon Calders Student name:... The exam is open book, so all books and notes can be used. The use of a basic calculator is allowed. The use of a laptop

More information

ECT7110 Introduction to Data Warehousing

ECT7110 Introduction to Data Warehousing ECT7110 Introduction to Data Warehousing Prof. Wai Lam ECT7110 Introduction to Data Warehousing 1 What is Data Warehouse? Defined in many different ways, but not rigorously. A decision support database

More information

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks

AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, Jiawei Han University of Illinois at Urbana-Champaign (UIUC) Facebook Inc. U.S. Army Research

More information

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:-

Database design View Access patterns Need for separate data warehouse:- A multidimensional data model:- UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to

More information

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!

CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News! CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News! Make-up class on Saturday, Mar 9 in Gleacher 203 10:30am 1:30pm.! Last 15 minute in-class quiz (6:30pm) on Mar 5.! Covers

More information

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce

More information

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20

Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,

More information

Lectures for the course: Data Warehousing and Data Mining (IT 60107)

Lectures for the course: Data Warehousing and Data Mining (IT 60107) Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline

More information

ECLT 5810 Introduction to Data Warehousing

ECLT 5810 Introduction to Data Warehousing ECLT 5810 Introduction to Data Warehousing Prof. Wai Lam ECLT 5810 Introduction to Data Warehousing 1 What is Data Warehouse? Provides tools for business executives Systematically organize and understand

More information

3. Data Preprocessing. 3.1 Introduction

3. Data Preprocessing. 3.1 Introduction 3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation

More information

StreamOLAP. Salman Ahmed SHAIKH. Cost-based Optimization of Stream OLAP. DBSJ Japanese Journal Vol. 14-J, Article No.

StreamOLAP. Salman Ahmed SHAIKH. Cost-based Optimization of Stream OLAP. DBSJ Japanese Journal Vol. 14-J, Article No. StreamOLAP Cost-based Optimization of Stream OLAP Salman Ahmed SHAIKH Kosuke NAKABASAMI Hiroyuki KITAGAWA Salman Ahmed SHAIKH Toshiyuki AMAGASA (SPE) OLAP OLAP SPE SPE OLAP OLAP OLAP Due to the increase

More information

OLAP and Data Warehousing

OLAP and Data Warehousing OLAP and Data Warehousing Lab Exercises Part I OLAP Purpose: The purpose of this practical guide to data warehousing is to learn how online analytical processing (OLAP) methods and tools can be used to

More information

Chapter 5, Data Cube Computation

Chapter 5, Data Cube Computation CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Young-Rae Cho Associate Professor Department of Computer Science Baylor University A Roadmap for Data Cube Computation Full Cube Full

More information

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS

IT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such

More information

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES

UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES Data Pre-processing-Data Cleaning, Integration, Transformation, Reduction, Discretization Concept Hierarchies-Concept Description: Data Generalization And

More information

Introduction to MDDBs

Introduction to MDDBs 3 CHAPTER 2 Introduction to MDDBs What Is OLAP? 3 What Is SAS/MDDB Server Software? 4 What Is an MDDB? 4 Understanding the MDDB Structure 5 How Can I Use MDDBs? 7 Why Should I Use MDDBs? 8 What Is OLAP?

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check

More information

Data Warehousing and Data Mining

Data Warehousing and Data Mining Data Warehousing and Data Mining Lecture 3 Efficient Cube Computation CITS3401 CITS5504 Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement:

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 22 Table of contents 1 Introduction 2 Data warehousing

More information

Privacy Overview and Data Mining CSC 301 Spring 2018 Howard Rosenthal

Privacy Overview and Data Mining CSC 301 Spring 2018 Howard Rosenthal Privacy Overview and Data Mining CSC 301 Spring 2018 Howard Rosenthal Course Notes: Much of the material in the slides comes from the books and their associated support materials, below as well as many

More information

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks

A. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks A. Papadopoulos, G. Pallis, M. D. Dikaiakos Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks IEEE/WIC/ACM International Conference on Web Intelligence Nov.

More information

New Perspectives in Social Data Management

New Perspectives in Social Data Management New Perspectives in Social Data Management Sihem Amer-Yahia Research Director CNRS @ LIG Sihem.Amer-Yahia@imag.fr TSUKUBA University May 20 th, 2014 Traditional data management stack High-level specification

More information

Performing OLAP over Graph Data: Query Language, Implementation, and a Case Study

Performing OLAP over Graph Data: Query Language, Implementation, and a Case Study Performing OLAP over Graph Data: Query Language, Implementation, and a Case Study Leticia Gómez, Alejandro Vaisman Instituto Tecnológico de Buenos Aires, Argentina Bart Kuijpers Hasselt University and

More information

Optimal Query Mapping in Mobile OLAP

Optimal Query Mapping in Mobile OLAP Optimal Query Mapping in Mobile OLAP Ilias Michalarias and Hans -J. Lenz Freie Universität Berlin {ilmich,hjlenz}@wiwiss.fu-berlin.de Overview I. mobile OLAP Research Context II. Query Mapping in molap

More information

Computing Appropriate Representations for Multidimensional Data

Computing Appropriate Representations for Multidimensional Data Computing Appropriate Representations for Multidimensional Data Yeow Wei Choong LI - Université F Rabelais HELP Institute - Malaysia choong yw@helpedumy Dominique Laurent LI - Université F Rabelais Tours

More information

Data Cube Technology

Data Cube Technology Data Cube Technology Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl

More information

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,

More information

Data and Text Mining

Data and Text Mining Data representation and manipulation I prof. dr. Bojan Cestnik Temida d.o.o. & Jozef Stefan Institute Ljubljana bojan.cestnik@temida.si prof. dr. Bojan Cestnik 1 Contents Introduction Basic Data Mining

More information

Decision Support, Data Warehousing, and OLAP

Decision Support, Data Warehousing, and OLAP Decision Support, Data Warehousing, and OLAP : Contents Terminology : OLAP vs. OLTP Data Warehousing Architecture Technologies References 1 Decision Support and OLAP Information technology to help knowledge

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI

More information

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4 Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is

More information

Data Cube Technology. Chapter 5: Data Cube Technology. Data Cube: A Lattice of Cuboids. Data Cube: A Lattice of Cuboids

Data Cube Technology. Chapter 5: Data Cube Technology. Data Cube: A Lattice of Cuboids. Data Cube: A Lattice of Cuboids Chapter 5: Data Cube Technology Data Cube Technology Data Cube Computation: Basic Concepts Data Cube Computation Methods Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/

More information

Speeding Up Data Science: From a Data Management Perspective

Speeding Up Data Science: From a Data Management Perspective Speeding Up Data Science: From a Data Management Perspective Jiannan Wang Database System Lab (DSL) Simon Fraser University NWDS Meeting, Jan 5, 2018 1 Simon Fraser University 2 SFU DB/DM Group Ke Wang

More information

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Lecture # 24: Data Warehousing / Data Mining (R&G, ch 25 and 26) Data mining detailed outline Problem

More information

Essentials for Modern Data Analysis Systems

Essentials for Modern Data Analysis Systems Essentials for Modern Data Analysis Systems Mehrdad Jahangiri, Cyrus Shahabi University of Southern California Los Angeles, CA 90089-0781 {jahangir, shahabi}@usc.edu Abstract Earth scientists need to perform

More information

Call: SAS BI Course Content:35-40hours

Call: SAS BI Course Content:35-40hours SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio

More information

Schema-Agnostic Indexing with Azure Document DB

Schema-Agnostic Indexing with Azure Document DB Schema-Agnostic Indexing with Azure Document DB Introduction Azure DocumentDB is Microsoft s multi-tenant distributed database service for managing JSON documents at Internet scale Multi-tenancy is an

More information

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, G. Czajkowski Google, Inc. SIGMOD 2010 Presented by Ke Hong (some figures borrowed from

More information

The Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees

The Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees The Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees Omar H. Karam Faculty of Informatics and Computer Science, The British University in Egypt and Faculty of

More information

Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem.

Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem. Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Data Warehousing / Data Mining (R&G, ch 25 and 26) C. Faloutsos and A. Pavlo Data mining detailed outline

More information

Probabilistic Graph Summarization

Probabilistic Graph Summarization Probabilistic Graph Summarization Nasrin Hassanlou, Maryam Shoaran, and Alex Thomo University of Victoria, Victoria, Canada {hassanlou,maryam,thomo}@cs.uvic.ca 1 Abstract We study group-summarization of

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012

Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012 Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

What is a Data Warehouse?

What is a Data Warehouse? What is a Data Warehouse? COMP 465 Data Mining Data Warehousing Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Defined in many different ways,

More information

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process. MTAT.03.183 Data Mining Week 7: Online Analytical Processing and Data Warehouses Marlon Dumas marlon.dumas ät ut. ee Acknowledgment This slide deck is a mashup of the following publicly available slide

More information

Adnan YAZICI Computer Engineering Department

Adnan YAZICI Computer Engineering Department Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection

More information

BREAKING CHRISTIAN NEWS MEDIA KIT

BREAKING CHRISTIAN NEWS MEDIA KIT BREAKING CHRISTIAN NEWS MEDIA KIT Peter Spencer Advertising Specialist Peter@ELPublication.com Direct: 541-905-7687 Hours: 8am - 3pm Pacific Time We will generally respond within 24 hours of inquiry BREAKING

More information

CS 1655 / Spring 2013! Secure Data Management and Web Applications

CS 1655 / Spring 2013! Secure Data Management and Web Applications CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered

More information

Improving the Performance of OLAP Queries Using Families of Statistics Trees

Improving the Performance of OLAP Queries Using Families of Statistics Trees Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University

More information

C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking

C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking Dong Xin Zheng Shao Jiawei Han Hongyan Liu University of Illinois at Urbana-Champaign, Urbana, IL 6, USA Tsinghua University,

More information

Course Book Academic Year

Course Book Academic Year Nawroz University College of Computer and IT Department of Computer Science Stage: Third Course Book Academic Year 2015-2016 Subject Advanced Database No. of Hours No. of Units 6 Distribution of Marks

More information

Step-by-step data transformation

Step-by-step data transformation Step-by-step data transformation Explanation of what BI4Dynamics does in a process of delivering business intelligence Contents 1. Introduction... 3 Before we start... 3 1 st. STEP: CREATING A STAGING

More information

Chapter 3. Foundations of Business Intelligence: Databases and Information Management

Chapter 3. Foundations of Business Intelligence: Databases and Information Management Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional

More information