Graph Cube: On Warehousing and OLAP Multidimensional Networks
|
|
- Kristopher Gibson
- 6 years ago
- Views:
Transcription
1 Graph Cube: On Warehousing and OLAP Multidimensional Networks Peixiang Zhao, Xiaolei Li, Dong Xin, Jiawei Han Department of Computer Science, UIUC Groupon Inc. Google Cooperation June 16th, 2011 SIGMOD 2011 Athens, Greece 1 / 24
2 Outline 1 Introduction 2 The Graph Cube Model 3 OLAP on Graph Cube Cuboid Query Crossboid Query 4 Implementing Graph Cube 5 Experiment 6 Conclusion SIGMOD 2011 Athens, Greece 2 / 24
3 Introduction Recent years have seen an astounding growth of networks in a wide spectrum of application domains Communication networks Social networks Biological networks The Web Multidimensional networks 1 An underlying graph structure comprising entities and relationships 2 Multidimensional attributes are specified and associated with entities of the network There exist considerable technology gaps in managing, querying and summarizing multidimensional networks effectively SIGMOD 2011 Athens, Greece 3 / 24
4 A Sample Multidimensional Network (a) Graph ID Gender Location Profession Income 1 Male CA Teacher $70, Female WA Teacher $65, Female CA Engineer $80, Female NY Teacher $90, Male IL Lawyer $80, Female WA Teacher $90, Male NY Lawyer $100, Male IL Engineer $75, Female CA Lawyer $120, Male IL Engineer $95, 000 (b) Vertex Attribute Table Figure: A Multidimensional Network Comprising a Graph Structure and a Multidimensional Vertex Attribute Table SIGMOD 2011 Athens, Greece 4 / 24
5 Introduction Motivation: Can we extend decision support facilities on multidimensional networks? Data warehouses and OLAP are advantageous in the multidimensional network scenario Summarizing the massive networks into different levels of granularity for more effective analysis and exploration Business Intelligence: in Facebook and Twitter, advertisers and marketers take advantage of social networks within different multidimensional spaces to better promote their products via social targeting or viral marketing However, in multidimensional networks, much of the valuation and interest lies in the network itself! Simple numeric value based group-by s in traditional data warehouses are no longer insightful and of limited usage, because the structural information of the networks is simply ignored SIGMOD 2011 Athens, Greece 5 / 24
6 Network Aggregation v.s. Traditional Group-by Male Female (a) Aggregate Network 3 Gender COUNT(*) Male 5 Female 5 (b) Aggregate Table Figure: Multidimensional Network Aggregation v.s. Traditional RDB Aggregation (Group by Gender) (Female, CA) (Male, IL) (Male, CA) 1 (Female, WA) (Male, NY) (Female, NY) Gender Location COUNT(*) Male CA 1 Female CA 2 Female WA 2 Male IL 3 Male NY 1 Female NY 1 (a) Aggregate Network (b) Aggregate Table Figure: Multidimensional Network Aggregation v.s. Traditional RDB Aggregation (Group by Gender and Location) SIGMOD 2011 Athens, Greece 6 / 24
7 Introduction Graph Cube A multidimensional network can be summarized to aggregate networks in coarser levels of granularity within different multidimensional spaces Vertex coalescence Structure summarization Different query models and OLAP solutions are proposed for multidimensional networks Cuboid Queries Crossboid Queries Efficient implementation is based on a combination of Well-studied data cube implementation techniques Special characteristics of multidimensional networks The first to systematically address warehousing and OLAP issues on large multidimensional networks SIGMOD 2011 Athens, Greece 7 / 24
8 The Graph Cube Model Multidimensional Network A multidimensional network, N, is a graph denoted as N = (V, E, A), where V is a set of vertices, E V V is a set of edges and A = {A 1, A 2,..., A n } is a set of n vertex-specific attributes, i.e., u V, there is a tuple A(u) of u, denoted as A(u) = (A 1 (u), A 2 (u),..., A n (u)), where A i (u) is the value of u on i-th attribute, 1 i n. A is called the dimensions of the network N. Some (or all) dimension A i could be (ALL), representing a super-aggregation along A i Given a set of n dimensions of a network, there exist 2 n multidimensional spaces (aggregations) The measure within each possible space is no longer a simple numeric value, but an aggregate network SIGMOD 2011 Athens, Greece 8 / 24
9 The Graph Cube Model Graph Cube Given a multidimensional network N = (V, E, A), the graph cube is obtained by restructuring N in all possible aggregations of A. For each possible aggregation A of A, the grouping measure is an aggregate network G w.r.t. A. Apex 2 (Gender) (Location) (Profession) (Gender, Location) (Gender, Profession) (Location, Profession) Base Figure: The Graph Cube Lattice SIGMOD 2011 Athens, Greece 9 / 24
10 OLAP on Graph Cubes Cuboid Query: return as output the aggregate network corresponding to a specific aggregation of the dimensions of the multidimensional network What is the network structure between various genders? What is the network structure between the various gender and location combinations? (Male, CA) 1 (Female, CA) 2 2 (Female, WA) Male Female 3 (Male, IL) (Female, NY) (Male, NY) SIGMOD 2011 Athens, Greece 10 / 24
11 OLAP on Graph Cubes A cuboid query is within a single multidimensional space, which follows the traditional OLAP model A crossboid query crosses multiple multidimensional spaces of the network, i.e., more than one cuboid is involved in a query What is the network structure between the user with ID = 3 and various locations? What is the network structure between users grouped by gender v.s. users grouped by location?. Male Female WA IL ID: CA 1 1 NY CA IL WA NY SIGMOD 2011 Athens, Greece 11 / 24
12 Cuboid Queries v.s. Crossboid Queries (Gender) (Location) (Profession) (Gender, Location, Profession) (Location) "What is the network structure between users and the locations?" (Gender) (Gender, Profession) (Gender, Location) Apex (a) Traditional Cuboid Queries "What is the network structure between users grouped by gender and users grouped by location?" (Gender, Location, Profession) (b) Crossboid Queries Straddling Multiple Cuboids SIGMOD 2011 Athens, Greece 12 / 24
13 Graph Cube Implementation Objective: compute the aggregate networks of different cuboids grouping on all possible dimension combinations of a multidimensional network 1 Full materialization: Best query response time, worst space cost 2 No materialization: Best space cost, worst query response time 3 Partial materialization: A small portion of cuboids is materialized in order to balance the tradeoff between query response time and cube resource requirement SIGMOD 2011 Athens, Greece 13 / 24
14 Graph Cube Implementation: Partial Materialization Problem: To select a set S of k cuboids in the graph cube for materialization, such that the average time taken to evaluate the queries can be minimized The partial materialization problem is NP-complete, reduced from set-cover Greedy Algorithm: Selecting k cuboids with the highest size-reduction benefit Theorem Let B greedy be the benefit of k cuboids chosen by the greedy algorithm and let B opt be the benefit of any optimal set of k cuboids. Then B greedy (1 1/e) B opt and this bound is tight MinLevel Algorithm: Materializing cuboids c, where dim(c) = l 0 indicating the level in the cube lattice at which we start materializing cuboids SIGMOD 2011 Athens, Greece 14 / 24
15 Experimental Evaluation DBLP data set A co-authorship graph with 28, 702 authors as vertices and 66, 832 coauthor relationships as edges Three dimensions: name, area, productivity area: DB, DM, AI, IR productivity: Excellent, Good, Fair, Poor IMDB data set A movie rating network with 116, 164 vertices and 5, 452, 350 edges Seven dimensions: Title, Year, Length, Budget, Rating, MPAA and Type MPAA: G, PG, PG-13, R, NC-17, NR Type: action, animation, comedy, drama, documentary, romance, short SIGMOD 2011 Athens, Greece 15 / 24
16 Effectiveness Evaluation DB DM Poor Fair AI (c) (Area) IR Good Excellent (d) (Productivity) Figure: Cuboid Queries of the Graph Cube on DBLP Data Set SIGMOD 2011 Athens, Greece 16 / 24
17 Effectiveness Evaluation 4182 (DM, Poor) 252 (DB, Excellent) (DM, Fair) (DB, Good) (DM, Good) (DB, Fair) (DM, Excellent) (DB, Poor) (AI, Poor) (IR, Excellent) (AI, Fair) (AI, Good) (IR, Good) (AI, Excellent) (IR, Fair) 4590 (IR, Poor) (a) (Area, Productivity) Figure: Cuboid Queries of the Graph Cube on DBLP Data Set SIGMOD 2011 Athens, Greece 17 / 24
18 Effectiveness Evaluation DB DM AI IR Poor Fair Good Excellent (a) Area Productivity Figure: Crossboid Queries of the Graph Cube on DBLP Data Set SIGMOD 2011 Athens, Greece 18 / 24
19 Effectiveness Evaluation DB DM AI IR DB DM AI IR Hector Garcia-Molina Philip S. Yu Poor 33 Fair 24 Good 6 Excellent (a) Area Base Productivity for Hector Garcia-Molina 71 Poor 52 Fair 12 Good 13 Excellent (b) Area Base Productivity for Philip S. Yu Figure: Crossboid Queries of the Graph Cube on DBLP Data Set SIGMOD 2011 Athens, Greece 19 / 24
20 Efficiency Evaluation Runtime (seconds) Raw Table Graph Cube Number of Dimensions (a) Time v.s. # Dimensions Runtime (seconds) Raw Table Graph Cube Number of Edges (*10K) (b) Time v.s. # Edges Figure: Full Materialization of Graph Cube for DBLP Data Set SIGMOD 2011 Athens, Greece 20 / 24
21 Efficiency Evaluation Runtime (seconds) Graph Cube Raw Table Number of Dimensions (a) Time v.s. # Dimensions Runtime (seconds) Graph Cube Raw Table Number of Edges (*1M) (b) Time v.s. # Edges Figure: Full Materialization of Graph Cube for IMDB Data Set SIGMOD 2011 Athens, Greece 21 / 24
22 Efficiency Evaluation Runtime (seconds) Greedy MinLevel Number of Materialized Cuboids (a) Cuboid Queries Runtime (seconds) Greedy MinLevel Number of Materialized Cuboids (b) Crossboid Queries Figure: Average Query Respond Time w.r.t. Different Partial Materialization Algorithms SIGMOD 2011 Athens, Greece 22 / 24
23 Conclusion 1 This work seeks to enhance decision-support functionality on large multidimensional networks 2 Graph cube: A new data warehousing model is designed specifically for efficient aggregation on multidimensional networks 3 Different query models and OLAP solutions for Graph Cube are proposed and studied Crossboid queries break the boundary of the traditional OLAP model by straddling multiple cuboids of the Graph Cube 4 The implementation of Graph Cube is discussed and the experimental results have demonstrated the power and efficacy of Graph Cube as the first, to the best of our knowledge, tool for warehousing and OLAP large multidimensional networks SIGMOD 2011 Athens, Greece 23 / 24
24 Thank you SIGMOD 2011 Athens, Greece 24 / 24
Discovering Interesting Patterns in Large Graph Cubes
Discovering Interesting Patterns in Large Graph Cubes 07 BigGraphs Workshop at IEEE BigData'7 Florian Demesmaeker, Consultant @EURA NOVA Discovering Interesting Patterns in Large Graph Cubes Florian Demesmaeker,
More informationOn Graph Query Optimization in Large Networks
On Graph Query Optimization in Large Networks Peixiang Zhao, Jiawei Han Department of omputer Science University of Illinois at Urbana-hampaign pzhao4@illinois.edu, hanj@cs.uiuc.edu September 14th, 2010
More informationEffective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar
Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours
More informationNovel Materialized View Selection in a Multidimensional Database
Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/
More informationETL and OLAP Systems
ETL and OLAP Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first semester
More informationSocial Data Exploration
Social Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization Workshop 12ème Séminaire POC LIP6 Dec 5 th, 2014 Collaborative data model User space (with attributes)
More informationDecision Support Systems aka Analytical Systems
Decision Support Systems aka Analytical Systems Decision Support Systems Systems that are used to transform data into information, to manage the organization: OLAP vs OLTP OLTP vs OLAP Transactions Analysis
More informationQuotient Cube: How to Summarize the Semantics of a Data Cube
Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign)
More informationSQL Server Analysis Services
DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, SQL Server 2005 Analysis Services SQL Server 2005 Analysis Services - 1 Analysis Services Database and
More informationBasics of Dimensional Modeling
Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension
More informationDiscovering Interesting Patterns in Large Graph Cubes
Discovering Interesting Patterns in Large Graph Cubes Florian Demesmaeker, Amine Ghrab, Siegfried Nijssen and Sabri Skhiri Université Catholique de Louvain, Louvain-La-Neuve, Belgium Email: siegfried.nijssen@uclouvain.be
More informationInternational Journal of Computer Sciences and Engineering. Research Paper Volume-6, Issue-1 E-ISSN:
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-6, Issue-1 E-ISSN: 2347-2693 Precomputing Shell Fragments for OLAP using Inverted Index Data Structure D. Datta
More informationDATA WAREHOUING UNIT I
BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009
More informationSubspace Discovery for Promotion: A Cell Clustering Approach
Subspace Discovery for Promotion: A Cell Clustering Approach Tianyi Wu and Jiawei Han University of Illinois at Urbana-Champaign, USA {twu5,hanj}@illinois.edu Abstract. The promotion analysis problem has
More informationThe University of Iowa Intelligent Systems Laboratory The University of Iowa Intelligent Systems Laboratory
Warehousing Outline Andrew Kusiak 2139 Seamans Center Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934 Introduction warehousing concepts Relationship
More informationData Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality
Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 07 : 06/11/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationEfficient Computation of Data Cubes. Network Database Lab
Efficient Computation of Data Cubes Network Database Lab Outlines Introduction Some CUBE Algorithms ArrayCube PartitionedCube and MemoryCube Bottom-Up Cube (BUC) Conclusions References Network Database
More informationEfficient Aggregation for Graph Summarization
Efficient Aggregation for Graph Summarization Yuanyuan Tian (University of Michigan) Richard A. Hankins (Nokia Research Center) Jignesh M. Patel (University of Michigan) Motivation Graphs are everywhere
More informationSQL Server 2005 Analysis Services
atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of atabase and ata Mining Group of SQL Server
More informationIncognito: Efficient Full Domain K Anonymity
Incognito: Efficient Full Domain K Anonymity Kristen LeFevre David J. DeWitt Raghu Ramakrishnan University of Wisconsin Madison 1210 West Dayton St. Madison, WI 53706 Talk Prepared By Parul Halwe(05305002)
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic
More informationGraph-Based Synopses for Relational Data. Alkis Polyzotis (UC Santa Cruz)
Graph-Based Synopses for Relational Data Alkis Polyzotis (UC Santa Cruz) Data Synopses Data Query Result Data Synopsis Query Approximate Result Problem: exact answer may be too costly to compute Examples:
More informationIMPLEMENTING STATISTICAL DOMAIN DATABASES IN POLAND. OPPORTUNITIES AND THREATS. Central Statistical Office in Poland
IMPLEMENTING STATISTICAL DOMAIN DATABASES IN POLAND. OPPORTUNITIES AND THREATS. Central Statistical Office in Poland Agenda 2 Background Current state The goal of the SDD Architecture Technologies Data
More informationOLAP on Information Networks: a new Framework for Dealing with Bibliographic Data
OLAP on Information Networks: a new Framework for Dealing with Bibliographic Data Wararat Jakawat, C. Favre, Sabine Loudcher To cite this version: Wararat Jakawat, C. Favre, Sabine Loudcher. OLAP on Information
More informationComputing Complex Iceberg Cubes by Multiway Aggregation and Bounding
Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding LienHua Pauline Chou and Xiuzhen Zhang School of Computer Science and Information Technology RMIT University, Melbourne, VIC., Australia,
More informationAnalyzing a Greedy Approximation of an MDL Summarization
Analyzing a Greedy Approximation of an MDL Summarization Peter Fontana fontanap@seas.upenn.edu Faculty Advisor: Dr. Sudipto Guha April 10, 2007 Abstract Many OLAP (On-line Analytical Processing) applications
More informationOLAP2 outline. Multi Dimensional Data Model. A Sample Data Cube
OLAP2 outline Multi Dimensional Data Model Need for Multi Dimensional Analysis OLAP Operators Data Cube Demonstration Using SQL Multi Dimensional Data Model Multi dimensional analysis is a popular approach
More informationData Warehouse and Data Mining
Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationCompressed Aggregations for mobile OLAP Dissemination
INTERNATIONAL WORKSHOP ON MOBILE INFORMATION SYSTEMS (WMIS 2007) Compressed Aggregations for mobile OLAP Dissemination Ilias Michalarias Arkadiy Omelchenko {ilmich,omelark}@wiwiss.fu-berlin.de 1/25 Overview
More information2. Data Preprocessing
2. Data Preprocessing Contents of this Chapter 2.1 Introduction 2.2 Data cleaning 2.3 Data integration 2.4 Data transformation 2.5 Data reduction Reference: [Han and Kamber 2006, Chapter 2] SFU, CMPT 459
More informationMap-Reduce for Cube Computation
299 Map-Reduce for Cube Computation Prof. Pramod Patil 1, Prini Kotian 2, Aishwarya Gaonkar 3, Sachin Wani 4, Pramod Gaikwad 5 Department of Computer Science, Dr.D.Y.Patil Institute of Engineering and
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 01 Databases, Data warehouse Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationFast Contextual Preference Scoring of Database Tuples
Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation
More informationData Cube Materialization Using Map Reduce
Data Cube Materialization Using Map Reduce Kawhale Rohitkumar 1, Sarita Patil 2 Student, Dept. of Computer Engineering, G.H Raisoni College of Engineering and Management, Pune, SavitribaiPhule Pune University,
More informationMining Trusted Information in Medical Science: An Information Network Approach
Mining Trusted Information in Medical Science: An Information Network Approach Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign Collaborated with many, especially Yizhou
More informationChapter 18: Data Analysis and Mining
Chapter 18: Data Analysis and Mining Database System Concepts See www.db-book.com for conditions on re-use Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP 18.2 Decision
More informationAdvanced Data Management Technologies
ADMT 2017/18 Unit 13 J. Gamper 1/42 Advanced Data Management Technologies Unit 13 DW Pre-aggregation and View Maintenance J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:
More informationDW Performance Optimization (II)
DW Performance Optimization (II) Overview Data Cube in ROLAP and MOLAP ROLAP Technique(s) Efficient Data Cube Computation MOLAP Technique(s) Prefix Sum Array Multiway Augmented Tree Aalborg University
More informationBig Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline
More informationImplementing and Maintaining Microsoft SQL Server 2008 Analysis Services
Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course Details Course Outline Module 1: Introduction to Microsoft SQL Server Analysis Services This module introduces
More informationExam Datawarehousing INFOH419 July 2013
Exam Datawarehousing INFOH419 July 2013 Lecturer: Toon Calders Student name:... The exam is open book, so all books and notes can be used. The use of a basic calculator is allowed. The use of a laptop
More informationECT7110 Introduction to Data Warehousing
ECT7110 Introduction to Data Warehousing Prof. Wai Lam ECT7110 Introduction to Data Warehousing 1 What is Data Warehouse? Defined in many different ways, but not rigorously. A decision support database
More informationAspEm: Embedding Learning by Aspects in Heterogeneous Information Networks
AspEm: Embedding Learning by Aspects in Heterogeneous Information Networks Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, Jiawei Han University of Illinois at Urbana-Champaign (UIUC) Facebook Inc. U.S. Army Research
More informationDatabase design View Access patterns Need for separate data warehouse:- A multidimensional data model:-
UNIT III: Data Warehouse and OLAP Technology: An Overview : What Is a Data Warehouse? A Multidimensional Data Model, Data Warehouse Architecture, Data Warehouse Implementation, From Data Warehousing to
More informationCSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News!
CSPP 53017: Data Warehousing Winter 2013! Lecture 7! Svetlozar Nestorov! Class News! Make-up class on Saturday, Mar 9 in Gleacher 203 10:30am 1:30pm.! Last 15 minute in-class quiz (6:30pm) on Mar 5.! Covers
More informationData Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University
Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce
More informationData Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20
Data Warehousing and Decision Support (mostly using Relational Databases) CS634 Class 20 Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke, Chapter 25 Introduction Increasingly,
More informationLectures for the course: Data Warehousing and Data Mining (IT 60107)
Lectures for the course: Data Warehousing and Data Mining (IT 60107) Week 1 Lecture 1 21/07/2011 Introduction to the course Pre-requisite Expectations Evaluation Guideline Term Paper and Term Project Guideline
More informationECLT 5810 Introduction to Data Warehousing
ECLT 5810 Introduction to Data Warehousing Prof. Wai Lam ECLT 5810 Introduction to Data Warehousing 1 What is Data Warehouse? Provides tools for business executives Systematically organize and understand
More information3. Data Preprocessing. 3.1 Introduction
3. Data Preprocessing Contents of this Chapter 3.1 Introduction 3.2 Data cleaning 3.3 Data integration 3.4 Data transformation 3.5 Data reduction SFU, CMPT 740, 03-3, Martin Ester 84 3.1 Introduction Motivation
More informationStreamOLAP. Salman Ahmed SHAIKH. Cost-based Optimization of Stream OLAP. DBSJ Japanese Journal Vol. 14-J, Article No.
StreamOLAP Cost-based Optimization of Stream OLAP Salman Ahmed SHAIKH Kosuke NAKABASAMI Hiroyuki KITAGAWA Salman Ahmed SHAIKH Toshiyuki AMAGASA (SPE) OLAP OLAP SPE SPE OLAP OLAP OLAP Due to the increase
More informationOLAP and Data Warehousing
OLAP and Data Warehousing Lab Exercises Part I OLAP Purpose: The purpose of this practical guide to data warehousing is to learn how online analytical processing (OLAP) methods and tools can be used to
More informationChapter 5, Data Cube Computation
CSI 4352, Introduction to Data Mining Chapter 5, Data Cube Computation Young-Rae Cho Associate Professor Department of Computer Science Baylor University A Roadmap for Data Cube Computation Full Cube Full
More informationIT DATA WAREHOUSING AND DATA MINING UNIT-2 BUSINESS ANALYSIS
PART A 1. What are production reporting tools? Give examples. (May/June 2013) Production reporting tools will let companies generate regular operational reports or support high-volume batch jobs. Such
More informationUNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES
UNIT 2. DATA PREPROCESSING AND ASSOCIATION RULES Data Pre-processing-Data Cleaning, Integration, Transformation, Reduction, Discretization Concept Hierarchies-Concept Description: Data Generalization And
More informationIntroduction to MDDBs
3 CHAPTER 2 Introduction to MDDBs What Is OLAP? 3 What Is SAS/MDDB Server Software? 4 What Is an MDDB? 4 Understanding the MDDB Structure 5 How Can I Use MDDBs? 7 Why Should I Use MDDBs? 8 What Is OLAP?
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationData Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems
Data Warehousing and Data Mining CPS 116 Introduction to Database Systems Announcements (December 1) 2 Homework #4 due today Sample solution available Thursday Course project demo period has begun! Check
More informationData Warehousing and Data Mining
Data Warehousing and Data Mining Lecture 3 Efficient Cube Computation CITS3401 CITS5504 Wei Liu School of Computer Science and Software Engineering Faculty of Engineering, Computing and Mathematics Acknowledgement:
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationData Mining. Data warehousing. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Data warehousing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 22 Table of contents 1 Introduction 2 Data warehousing
More informationPrivacy Overview and Data Mining CSC 301 Spring 2018 Howard Rosenthal
Privacy Overview and Data Mining CSC 301 Spring 2018 Howard Rosenthal Course Notes: Much of the material in the slides comes from the books and their associated support materials, below as well as many
More informationA. Papadopoulos, G. Pallis, M. D. Dikaiakos. Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks
A. Papadopoulos, G. Pallis, M. D. Dikaiakos Identifying Clusters with Attribute Homogeneity and Similar Connectivity in Information Networks IEEE/WIC/ACM International Conference on Web Intelligence Nov.
More informationNew Perspectives in Social Data Management
New Perspectives in Social Data Management Sihem Amer-Yahia Research Director CNRS @ LIG Sihem.Amer-Yahia@imag.fr TSUKUBA University May 20 th, 2014 Traditional data management stack High-level specification
More informationPerforming OLAP over Graph Data: Query Language, Implementation, and a Case Study
Performing OLAP over Graph Data: Query Language, Implementation, and a Case Study Leticia Gómez, Alejandro Vaisman Instituto Tecnológico de Buenos Aires, Argentina Bart Kuijpers Hasselt University and
More informationOptimal Query Mapping in Mobile OLAP
Optimal Query Mapping in Mobile OLAP Ilias Michalarias and Hans -J. Lenz Freie Universität Berlin {ilmich,hjlenz}@wiwiss.fu-berlin.de Overview I. mobile OLAP Research Context II. Query Mapping in molap
More informationComputing Appropriate Representations for Multidimensional Data
Computing Appropriate Representations for Multidimensional Data Yeow Wei Choong LI - Université F Rabelais HELP Institute - Malaysia choong yw@helpedumy Dominique Laurent LI - Université F Rabelais Tours
More informationData Cube Technology
Data Cube Technology Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/ s.manegold@liacs.leidenuniv.nl e.m.bakker@liacs.leidenuniv.nl
More informationData Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems
Data Warehousing & Mining CPS 116 Introduction to Database Systems Data integration 2 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer,
More informationData and Text Mining
Data representation and manipulation I prof. dr. Bojan Cestnik Temida d.o.o. & Jozef Stefan Institute Ljubljana bojan.cestnik@temida.si prof. dr. Bojan Cestnik 1 Contents Introduction Basic Data Mining
More informationDecision Support, Data Warehousing, and OLAP
Decision Support, Data Warehousing, and OLAP : Contents Terminology : OLAP vs. OLTP Data Warehousing Architecture Technologies References 1 Decision Support and OLAP Information technology to help knowledge
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More informationData Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA
Obj ti Objectives Motivation: Why preprocess the Data? Data Preprocessing Techniques Data Cleaning Data Integration and Transformation Data Reduction Data Preprocessing Lecture 3/DMBI/IKI83403T/MTI/UI
More informationSummary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4
Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is
More informationData Cube Technology. Chapter 5: Data Cube Technology. Data Cube: A Lattice of Cuboids. Data Cube: A Lattice of Cuboids
Chapter 5: Data Cube Technology Data Cube Technology Data Cube Computation: Basic Concepts Data Cube Computation Methods Erwin M. Bakker & Stefan Manegold https://homepages.cwi.nl/~manegold/dbdm/ http://liacs.leidenuniv.nl/~bakkerem2/dbdm/
More informationSpeeding Up Data Science: From a Data Management Perspective
Speeding Up Data Science: From a Data Management Perspective Jiannan Wang Database System Lab (DSL) Simon Fraser University NWDS Meeting, Jan 5, 2018 1 Simon Fraser University 2 SFU DB/DM Group Ke Wang
More informationCarnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem
Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Lecture # 24: Data Warehousing / Data Mining (R&G, ch 25 and 26) Data mining detailed outline Problem
More informationEssentials for Modern Data Analysis Systems
Essentials for Modern Data Analysis Systems Mehrdad Jahangiri, Cyrus Shahabi University of Southern California Los Angeles, CA 90089-0781 {jahangir, shahabi}@usc.edu Abstract Earth scientists need to perform
More informationCall: SAS BI Course Content:35-40hours
SAS BI Course Content:35-40hours Course Outline SAS Data Integration Studio 4.2 Introduction * to SAS DIS Studio Features of SAS DIS Studio Tasks performed by SAS DIS Studio Navigation to SAS DIS Studio
More informationSchema-Agnostic Indexing with Azure Document DB
Schema-Agnostic Indexing with Azure Document DB Introduction Azure DocumentDB is Microsoft s multi-tenant distributed database service for managing JSON documents at Internet scale Multi-tenancy is an
More informationPREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING
PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, G. Czajkowski Google, Inc. SIGMOD 2010 Presented by Ke Hong (some figures borrowed from
More informationThe Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees
The Near Greedy Algorithm for Views Selection in Data Warehouses and Its Performance Guarantees Omar H. Karam Faculty of Informatics and Computer Science, The British University in Egypt and Faculty of
More informationData mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem.
Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Data Warehousing / Data Mining (R&G, ch 25 and 26) C. Faloutsos and A. Pavlo Data mining detailed outline
More informationProbabilistic Graph Summarization
Probabilistic Graph Summarization Nasrin Hassanlou, Maryam Shoaran, and Alex Thomo University of Victoria, Victoria, Canada {hassanlou,maryam,thomo}@cs.uvic.ca 1 Abstract We study group-summarization of
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationBig Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data. Fall 2012
Big Trend in Business Intelligence: Data Mining over Big Data Web Transaction Data Fall 2012 Data Warehousing and OLAP Introduction Decision Support Technology On Line Analytical Processing Star Schema
More informationOLAP Introduction and Overview
1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationWhat is a Data Warehouse?
What is a Data Warehouse? COMP 465 Data Mining Data Warehousing Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Defined in many different ways,
More informationAcknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.
MTAT.03.183 Data Mining Week 7: Online Analytical Processing and Data Warehouses Marlon Dumas marlon.dumas ät ut. ee Acknowledgment This slide deck is a mashup of the following publicly available slide
More informationAdnan YAZICI Computer Engineering Department
Data Warehouse Adnan YAZICI Computer Engineering Department Middle East Technical University, A.Yazici, 2010 Definition A data warehouse is a subject-oriented integrated time-variant nonvolatile collection
More informationBREAKING CHRISTIAN NEWS MEDIA KIT
BREAKING CHRISTIAN NEWS MEDIA KIT Peter Spencer Advertising Specialist Peter@ELPublication.com Direct: 541-905-7687 Hours: 8am - 3pm Pacific Time We will generally respond within 24 hours of inquiry BREAKING
More informationCS 1655 / Spring 2013! Secure Data Management and Web Applications
CS 1655 / Spring 2013 Secure Data Management and Web Applications 03 Data Warehousing Alexandros Labrinidis University of Pittsburgh What is a Data Warehouse A data warehouse: archives information gathered
More informationImproving the Performance of OLAP Queries Using Families of Statistics Trees
Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University
More informationC-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking
C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking Dong Xin Zheng Shao Jiawei Han Hongyan Liu University of Illinois at Urbana-Champaign, Urbana, IL 6, USA Tsinghua University,
More informationCourse Book Academic Year
Nawroz University College of Computer and IT Department of Computer Science Stage: Third Course Book Academic Year 2015-2016 Subject Advanced Database No. of Hours No. of Units 6 Distribution of Marks
More informationStep-by-step data transformation
Step-by-step data transformation Explanation of what BI4Dynamics does in a process of delivering business intelligence Contents 1. Introduction... 3 Before we start... 3 1 st. STEP: CREATING A STAGING
More informationChapter 3. Foundations of Business Intelligence: Databases and Information Management
Chapter 3 Foundations of Business Intelligence: Databases and Information Management THE DATA HIERARCHY TRADITIONAL FILE PROCESSING Organizing Data in a Traditional File Environment Problems with the traditional
More information