One-to-One relationship - In this scenario both sides of the relationship have - unique values for every row.

Similar documents
Temporal Data Warehouses: Logical Models and Querying

QMF: Query Management Facility

Querying Data with Transact SQL

Chapter 3B Objectives. Relational Set Operators. Relational Set Operators. Relational Algebra Operations

Database Management Systems,

INTERMEDIATE SQL GOING BEYOND THE SELECT. Created by Brian Duffey

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

Workbooks (File) and Worksheet Handling

Chapter 12: Query Processing

Query Processing & Optimization

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Welcome to the topic of SAP HANA modeling views.

Relational Database: The Relational Data Model; Operations on Database Relations

Microsoft Power Tools for Data Analysis #7 Power Query 6 Types of Merges/ Joins 9 Examples Notes from Video:

Basics of Dimensional Modeling

Chapter 4. The Relational Model

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

Normalization in DBMS

Applying Best Practices, QA, and Tips and Tricks to Our Reports

Concepts of Database Management Eighth Edition. Chapter 2 The Relational Model 1: Introduction, QBE, and Relational Algebra

Chapter 3: Introduction to SQL. Chapter 3: Introduction to SQL

SIT772 Database and Information Retrieval WEEK 6. RELATIONAL ALGEBRAS. The foundation of good database design

T-SQL Training: T-SQL for SQL Server for Developers

Database System Concepts

Statistics. Duplicate Elimination

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata

Implementing Table Operations Using Structured Query Language (SQL) Using Multiple Operations. SQL: Structured Query Language

Querying Data with Transact-SQL

Relational Model History. COSC 416 NoSQL Databases. Relational Model (Review) Relation Example. Relational Model Definitions. Relational Integrity

THE RELATIONAL DATABASE MODEL

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

Querying Data with Transact SQL Microsoft Official Curriculum (MOC 20761)

Release Summary Notes Maestro Version

Microsoft Power Tools for Data Analysis #10 Power BI M Code: Helper Table to Calculate MAT By Month & Product. Notes from Video:

Advanced Database Systems

Relational Algebra and SQL. Basic Operations Algebra of Bags

E2 Shop System Beta Release Notes

Oracle Database 10g: Introduction to SQL

20761 Querying Data with Transact SQL

2. In Video #6, we used Power Query to append multiple Text Files into a single Proper Data Set:

Module 1.Introduction to Business Objects. Vasundhara Sector 14-A, Plot No , Near Vaishali Metro Station,Ghaziabad

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Course Details Duration: 3 days Starting time: 9.00 am Finishing time: 4.30 pm Lunch and refreshments are provided.

Midterm Examination CS 265 Spring 2015 Name: KEY Total on any question cannot exceed 5 and cannot be less than 0 JPK, CK, JDK

HKTA TANG HIN MEMORIAL SECONDARY SCHOOL SECONDARY 3 COMPUTER LITERACY. Name: ( ) Class: Date: Databases and Microsoft Access

SQL and Incomp?ete Data

Analytics: Server Architect (Siebel 7.7)

Relational Model, Relational Algebra, and SQL

Chapter 12: Query Processing

SQL Server 2016 gives 40% improved performance over SQL Server 2014

Introduction to Relational Databases. Introduction to Relational Databases cont: Introduction to Relational Databases cont: Relational Data structure

CMP-3440 Database Systems

Querying Data with Transact-SQL

Microsoft Office Access 2007: Intermediate Course 01 Relational Databases

Database Usage (and Construction)

Announcements. Outline UNIQUE. (Inner) joins. (Inner) Joins. Database Systems CSE 414. WQ1 is posted to gradebook double check scores

Oracle BI 11g R1: Build Repositories

Corticon Rule Modeling Challenge Jan 2018 Order Promotions

DB2 SQL Class Outline

Lecture 03. Spring 2018 Borough of Manhattan Community College

Chapter 12: Query Processing. Chapter 12: Query Processing

Unit Assessment Guide

SQL grouping, views & modifying data

Operator Implementation Wrap-Up Query Optimization

Chapter 6 - Part II The Relational Algebra and Calculus

20461: Querying Microsoft SQL Server

Database performance becomes an important issue in the presence of

Course Outline. Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led

Chapter 3. Algorithms for Query Processing and Optimization

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Querying Microsoft SQL Server (MOC 20461C)

Automated SQL Ownage Techniques. OWASP October 30 th, The OWASP Foundation

20761C: Querying Data with Transact-SQL

Querying Data with Transact-SQL

WebIntelligence. Creating Documents

Chapter 3: Introduction to SQL

After completing this course, participants will be able to:

Chapter 1 SQL and Data


Exam /Course 20767B: Implementing a SQL Data Warehouse

Intellicus Enterprise Reporting and BI Platform

2.3 Algorithms Using Map-Reduce

normalization are being violated o Apply the rule of Third Normal Form to resolve a violation in the model

Designing a Database -- Understanding Relational Design

WHAT IS SQL. Database query language, which can also: Define structure of data Modify data Specify security constraints

20761B: QUERYING DATA WITH TRANSACT-SQL

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

INFORMATICS PRACTICES

Database Management Systems Paper Solution

What s a database system? Review of Basic Database Concepts. Entity-relationship (E/R) diagram. Two important questions. Physical data independence

Part 5: Introduction to Logical Design

Database Applications (15-415)

WEEK 3 TERADATA EXERCISES GUIDE

1.3. Joins Introduction Access across relations Miniworld approximation Pointing mechanism

Chapter 3: Introduction to SQL

Pentaho Analytics for MongoDB

Data Strategies for Efficiency and Growth

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

Relational Algebra. Algebra of Bags

Relational Algebra and SQL

Transcription:

MANY-TO-MANY RELATIONSHIPS Many-to-Many relationships exist when the value in each field used to create a relationship between tables is contained multiple times in each table. For example a hotel may have a table with reservation data and a table with payments data. In both tables the name of the guest is stored. A guest can have multiple reservations under their name as well as multiple payments for their stay recorded in their name. If a relationship between the reservation and payments table was created based on the guests name a many-to-many relationship would be created, as the guests name appears multiple times in each table. In general when a field from two or more tables contains the same value, and these values are duplicated in both tables a connection created based on this field will result in a manyto-many relationship. The problem with this kind of relationship is that it can create complex data sets which do not return the correct results or use excessive computing resources and do not return any results. Another clear symptom one could recognize for this situation would be that the Elasticube's ecdata file will inflate to volumes much larger than expected. There are several methods to resolve and bypass a many-to-many relationship; the solution depends on the business model and the logic of the business questions at hand. The following solutions differ by business logic and the schema at hand, each solution can be applied to each schema respectively. Below the following factors are examined: Testing for a many-to-many relationship.1 Understanding which scenario best fits your current schema According to the schema logic, apply the respective solution.2.3 There can be 3 types of relationships: One-to-One relationship - In this scenario both sides of the relationship have - unique values for every row. - In this scenario one side of the relationship will hold One-to-Many relationship unique values for every row, but the other side of the relationship will hold duplicate values for any or all of the corresponding values in the first table. Many-to-Many relationship - In this scenario, both sides of the relationship will hold duplicated values, causing excessive calculations for every query run against it. Testing if a relationship is a Many-to-Many - - We can easily determine if a relationship is Many-to-Many. In order to do so, we'll need to check the cardinality of the relationship. We'll need to determine the number of unique and duplicate values on each side of the relationship. If we get the same value for both the unique and duplicate values, then there are no duplications, and this will either be a One-to- Many or a One-to-One relationship. If the number of duplicate values are larger than the

number of unique values, then this side of the relationship has duplicated values, and we'll need to investigate the other side of the relationship. If the other side of the relationship yields unique values, this is a one-to-many relationship. If not, we've got a many-to-many relationship on our hands. It is important to identify and manage Many-to-Many (M2M) relationships as they can either cause queries to respond extremely slowly or even return incorrect results. Many-to-Many relationships occur when two tables are joined on a field containing duplicate values in both tables. For example the same guest may have multiple reservations and multiple payments at a hotel, thus joining on the guest between the reservation and payment table would result in a M2M relationship. We have written a simple SQL statement you can use to check for potential M2M relationships. Steps: 1. Open up the ecube file containing the schema of the ElastiCube 2. Click Add Data> Custom SQL Expression 3. Enter and adjust the SQL statement below. See the image below for the necessary changes. SELECT [Do I have duplications?] FROM ( SELECT distinct_count(t1.col1)<>count(t1.col1) AS [Do I have duplications?] FROM [Table1] t1 UNION all SELECT distinct_count(t2.col2)<>count(t2.col2) FROM [Table2] t2) AS temp GROUP BY [Do I have duplications?] 4. In the top right of the expression editor window click the 'Parse SQL Expression' button.if the expression parses successfully click the 'Preview result table' button in the top right.

5. If the result returned is 'True' in both lines a many-to-many relationship exists and will need to be considered in the ElastiCube design. Image 1: Many-to-Many relationship prior to resolution If the two values are equal, all guest ids appear only once, making all values unique. We can stop investigating at this stage, given that even if the other side of the relationship has duplicate values for guest id, we'll still be dealing with a One-To-Many relationship, where the unique values are the reservations side, and the duplicate values are on the Payments side. For more on One-to-Many relationships. If there are more than two tables connected to this relationship, that is, if there are more than two tables merged on the same field, we'll have a few more options. The solution for the single many-to-many relationship will be a sub-problem of this scenario. In this case, we'll need to run the test on every table to see the uniqueness or duplication of the merged fields. Possible Resolutions for 2 tables, one relationship: The direct solution for such a problem would be to break this relationship into two.1

separate one-to-many relationships, as seen in image 2. The logic behind testing this issue can be visualized in the decision tree below. Create a custom SQL expression in the Elasticube. In the expression of this table.a select all the individual values for the identifier column from both sides, the expression should look like this: SELECT * FROM (SELECT DISTINCT r.guestid, r.guestname FROM [Reservations] r UNION SELECT DISTINCT p.guestid, p.guestname FROM [Payments] p) AS G This query will take all Guest Id values from both tables, and using the UNION statement, will bring in only the unique values from both tables, making this a complete list of all distinct Guest Id values. We can now merge the Guest Id field from the new 'linking' table to the other two Guest Id fields from the other two tables, thus creating two One-To-Many relationships. We can now use this Guest Id field as the rows or axes elements of a widget, pulling the unique values from the new Guest Dimension, with measures from the two other tables..b Image 2: Two O-to-M relationships Create Aggregated Table.2 In situations where we have more than one fact table (A Fact table is a primary table containing the measures or fields used for calculations in the dashboard) in our Elasticube, there are several situations when an aggregated table can resolve a many-tomany relationship.

Image 3: Two Fact tables Assuming we'd like to segment our data according to a few different dimensions, creating relationships directly between these fields can and will create many-to-many relationships in one of two ways, according to the schema: Both tables don't hold unique values, and all values from one table are held 2.1 in the second table. In this scenario either a linked dimension (as described in solution 1) or an aggregated table can be created which will hold all the unique values and the desired calculations for one of the tables. In order to create an aggregative table, one can create a custom SQL expression and aggregate values from the table which holds all values; its' own, and the subset present in the other table with the following expression: SELECT i.orderdatekey, i.productkey, sum(i.discountamount), sum(i.salesamount), avg(i.unitpricediscountpct) FROM [FactInternetSales] i GROUP BY i.orderdatekey, i.productkey This custom SQL expression will select the distinct OrderDateKeys and their corresponding ProductKeys from the FactInternetSales, grouped by these fields, together with single value aggregations for the different fields, in this case, Discount Amount, Sales Amount and the average unit Price discount. After merging the OrderDateKey and Product Key to the two other tables, one will be able to pull the values from this new table into the rows or axes panel of a widget in the BiStudio with measures and additional aggregations

from the two other tables. *Note the non-aggregated table needs to be a subset in terms of the primary fields from the aggregated table. Both tables don't hold unique values, and there are different values for 2.2 several fields in both the tables. Resolving this scenario would incorporate both solutions from notes 2.1 and 1. In this scenario one should create an aggregated table as stated in 2.1, and a dimension table as stated in 1. The final resolution should look like this: Image 4: Two Fact tables with a date dimension table and an aggregative Products table Possible resolutions for more than 2 tables, more than 1 relationship: Using the Lookup Function.3 In most scenarios we'll aggregate values according to a given id, from the unique side of the relationship to the duplicate side. However in specific cases it'll be vice versa.

For example in the following scenario, in which we have 3 tables, and between them two one-to-many relationships, this can potentially create a many-to-many relationship, if we were to query the two leaf tables. This means that the query result table will have multiple rows which won't be distinguishable one from another. Image 5: Two consecutive M-to-M relationships Using the Lookup Function, one can import values from a remote table by matching values in a different column. This will create a new column in the table we'd like to perform an aggregation of a given field(s), with the matching value of the identifying field from the other table. Taking the following example of tables T1, T2 and T3, we'd like to run a query which will display aggregations from the duplicate id's from T1, with a measure from T3. If we would run the query as is, we'd get multiple values for the query's result set, and we won't be able to run this aggregation. In order to resolve this, we'll use the Lookup function in order to import the values from T3 into T2 and then rerun the query only on tables T1 and T2. Using the lookup function, available in the 'Miscellaneous Functions' in the custom SQL editor, we can import the values of 'M3' from the 'T3' table into the 'T2' table. Create a new custom column, and use the Lookup function to import the values of attribute, In this case, the Lookup function should look like this: Lookup([T3],[T3].[M3], [T2].id2,[T3].id2) Running this statement in table T2 will import the matching values of M3 from T3 according to the matching results in id2 between the two tables. **LOOKUP(remote_table,remote_result_column,current_match_column, remote_match_column) Matches the current value with another value from a remote table. The result will be the value in remote_result_column for which the corresponding remote_match_column equals the current_match_column. Image 6: Two consecutive M-to-O relationships after Lookup fix Concatenate the two tables into one:.4 Assuming we have 2 separate tables with duplicate id values in each, and each holding different columns for each id, we can create a new table which will hold all values for every id, and pull the aggregations from this new table.

Notice that the two original tables; Table_1, Table_2 have different columns. Image 7: Concatenating tables Using the following SQL statement, we can import the data from both tables, with the id's and the columns respectively: SELECT s.id AS id, s.m1, s.m2, ToInt( NULL ) m3, ToInt( NULL ) m4 FROM [Table 1] s UNION SELECT t.id, ToInt( NULL ), ToInt( NULL ), t.m3, t.m4 FROM [Table 2] t This will create a table with 5 columns: Id.1 M1 (from table_1).2 M2 (from table_1).3 M3 (from table_2).4 M4 (from table_2).5 The values missing from each table respectively will be NULL's which will result in the following table: -

Image 8: Concatenated table; result set

Image 9: Determining a Many-to-Many relationship; decision tree. This is based on the first example with the Payments and Reservations tables. One-to-Many Relationships In most scenarios we'll aggregate values according to a given id, from the unique side of the relationship to the duplicate side. A One-to-Many relationship occurs when one side has individual, unique values for all records on the id field, while the other side holds duplicate values for the identifying field. For instance, in the following scenario a relationship exists between the 'Categories' table and the 'Products' table. The CategoryID is a unique identifier for the different categories in this table, and each record will hold additional information for each category, i.e. CategoryName, Description, Picture and so on. We'll distinguish between the different categories by their ID. On the other side of the relationship, in our Products table, we'd like to know which category each product belongs to, i.e. juice, milk and beer will belong to the beverages category while bananas, kiwi and, mango will belong to the fruits category. In addition to the unique Product id for each product In the Products table we'll hold the CategoryID. This will create multiple values for the CategoryID in the Products table because many products can belong to the same category. This means that the CategoryID in the Products table will not be unique, making this relationship a One-to-Many. Placing the CategoryID in the rows or series element of a widget and a measure from the Products table, we'll be able to aggregate values from the products table, with the unique CategoryID's from the Category table. In the following scenario, a possible aggregation would be to count the

number of Products in each Category, or to sum the QuantityPerUnit for each Category. Image 8: One-to-Many Relationship