Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator

Size: px
Start display at page:

Download "Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator"

Transcription

1 Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator R.Saravanan 1, J.Sivapriya 2, M.Shahidha 3 1 Assisstant Professor, Department of IT,SMVEC, Puducherry, India 2,3 UG student, Department of IT, Puducherry, India Abstract--Preparing dataset is a very difficult process in cases where it has to be given as input for data mining which involves complex queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. Our project aims at implementing a new class of functions called horizontal aggregation. Horizontal aggregation build data set with a horizontal denormalized layout, which is the standard layout required by most data mining algorithms. PIVOT operator, offered by RDBMS is used to calculate aggregate operations. Our project aims at aggregating columns using PIVOT operator. Pivot operator is used to reorganize and summarize the selected columns and rows of data in a table to produce the desired reports. Using PIVOT operator horizontally aggregated data set is achieved which will act as input for any application involving data set. This horizontal aggregation will improve the performance of clustering process in data mining. Keywords--aggregation, dataset preparation, horizontal, pivoting, SQL. I. INTRODUCTION The term data set may also be used more loosely, to refer to the data in a collection of closely related tables, corresponding to a particular experiment or event. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Each value is known as a datum. The data set may comprise data for one or more members, corresponding to the number of rows. II. HORIZONTAL AGGREGATION Building a suitable data set for data mining purposes is a time-consuming task. This task generally requires writing long SQL statements or customizing SQL code if it is automatically generated by some tool. There are two main ingredients in such SQL code: joins and aggregations. The most widely-known aggregation is the sum of a column over groups of rows. There exist many aggregation functions and operators in SQL. Unfortunately, all these aggregations have limitations to build data sets for data mining purposes. Horizontal aggregations represent an extended form of traditional SQL aggregations, which return a set of values in a horizontal layout instead of a single value per row. Horizontal aggregations provide several unique features and advantages. First, they represent a template to generate SQL code from a data mining tool. This SQL code reduces manual work in the data preparation phase in a data mining project. Second, since SQL code is automatically generated it is likely to be more efficient than SQL code written by an end user. Third, the data set can be created entirely inside the DBMS. Horizontal aggregations just require a small syntax extension to aggregate functions called in a SELECT statement. III. SUMMARIZATION AND AGGREGATION Unfortunately, most data mining tasks require dimensions (variables) that are not readily available from the database. Such dimensions typically require computing aggregations at several granularity levels. This is because most columns required by data mining techniques require measures (or metrics), which translate as sums or counts computed with SQL. Unfortunately, granularity levels are not hierarchical (like cubes or OLAP [6]) making the use of separate summary table necessary (e.g. summarization by product or by customer, in a retail database). Thus, the user needs to decide: which tables and columns need to be summarized, the primary key(s) of output tables (i.e. grouping columns) and which kind of aggregations (e.g. sum(), average(), min()) are required. For a data mining practitioner it is best to create as many variables (features, dimensions) as possible to identify those that can help computing a more accurate model. Then summarization tends to create tables with many columns (sometimes hundreds), which make data mining analysis harder and query processing slower [2]. This task is related to attribute relevance, which determines which attributes are more important for a given model. 477

2 IV. International Journal of Emerging Technology and Advanced Engineering Attribute relevance is commonly determined by dimensionality reduction and feature selection techniques [1]. Most data mining techniques, especially for classification and regression, are designed to perform variable (feature) selection. A straightforward optimization is to compute as many dimensions in the same statement exploiting the same GROUP-BY clause, when possible. A typical query to derive dimensions from a transaction table is as follows: SELECT customer_id,count(*) AS cntitems,sum(salesamt) AS totalsales,sum(case when salesamt<0 then 1 end) AS cntreturns FROM sales GROUP BY customer_id; HORIZONTAL AGGREGATION FOR PREPARING TABULAR DATA SET In a data mining project, a significant portion of time is devoted to building a data set suitable for analysis. In a relational database environment, building such data set usually requires joining tables and aggregating columns with SQL queries. Existing SQL aggregations are limited since they return a single number per aggregated group, producing one row for each computed number[7]. These aggregations help, but a significant effort is still required to build data sets suitable for data mining purposes, where a tabular format is generally required. This work proposes very simple, yet powerful, extensions to SQL aggregate functions to produce aggregations in tabular form, returning a set of numbers instead of one number per row. We call this new class of functions horizontal aggregations. Horizontal aggregations help building answer sets in tabular form (e.g. point-dimension, observation-variable, instance feature), which is the standard form needed by most data mining algorithms. Two common data preparation tasks are explained, including transposition/aggregation and transforming categorical attributes into binary dimensions. They propose two strategies to evaluate horizontal aggregations using standard SQL. The first strategy is based only on relational operators and the second one uses the "case" construct. Experiments with large data sets study the proposed query optimization strategies. 478 They propose two basic strategies to evaluate horizontal aggregations. The first strategy relies only on relational operations [7]. That is, only doing select, project, join and aggregation queries; we call it the SPJ strategy. The second form relies on the SQL "case" construct. They call it the CASE strategy. Each table has an index on its primary key for efficient join processing. They do not consider additional indexing mechanisms to accelerate query evaluation. Aggregation is a rank of function to give aggregated columns in a straight outline. Managing large datasets except DBMS support can be a difficult job. Trying different subsets of data points and dimensions is more convenient, faster and easier to do inside a relational database with SQL queries than outside with alternative handler. There are several advantages for horizontal aggregation. 1) Horizontal aggregation represent a template to generate SQL code from a data mining tool. This SQL code reduces manual work in the data preparation phase of projects related to data mining. 2) It automatically generated code, which is more efficient than end user written SQL code. Thus datasets for the data mining projects can be created in lesser time. 3) The data sets can be created entirely inside the DBMS K-means clustering algorithms are used to cluster the attribute, that attribute is the result of horizontal aggregation. SPJ strategy The SPJ strategy is interesting from a theoretical point of view because it is based on relational operators only. The basic idea is to create one table with a vertical aggregation for each result column, and then join all those tables to produce FH (horizontal layout). They aggregate from F into N projected tables with N selection/projection/join/aggregation queries. Each table FI corresponds to one subgrouping combination and has fd1; : : : ;Dj as primary key and an aggregation on A as the only non-key column. The additional table F0 will be outerjoined with projected tables to get a complete result set. It has been proposed that there are two basic sub-strategies to compute FH. The first one directly aggregates from F. The second one computes the equivalent vertical aggregation in a temporary table FV grouping by D1; : : : ;Dk. Then horizontal aggregations can be indirectly computed from FV since standard aggregations are distributive[1].

3 CASE strategy For this strategy, the "case" programming construct available in SQL can be used. The case statement returns a value selected from a set of values based on Boolean expressions. From a relational database theory point of view this is equivalent to doing a simple projection/aggregation query where each non-key value is given by a function that returns a number based on some conjunction of conditions [1]. They propose two basic substrategies to compute FH. In a similar manner to SPJ, the first one directly aggregates from F and the second one computes the vertical aggregation in a temporary table FV and then horizontal aggregations are indirectly computed from FV. Then they represent the direct aggregation strategy. Horizontal aggregation queries can be evaluated by directly aggregating from F and transposing rows at the same time to produce FH. First, they need to get the unique combinations of Dh;:::;Dk that define the matching Boolean expression for resulting columns. The SQL code to compute horizontal aggregations directly from F is as follows: Observe V agg () is a standard SQL aggregation that has a "case" statement as argument. Horizontal aggregations need to set the result to null when there are no qualifying rows for the specific horizontal group to be consistent with the SPJ strategy and also with the extended relational model PIVOT_strategy. It is possible to start pivoting in standard SQL, though the syntax is cumbersome and its performance is generally poor. One method to express pivoting uses scalar sub queries in the projection list. Each pivoted column is created through a separate (but nearly identical) sub query. For database uses that do not support PIVOT, users could employ this technique to perform pivoting operations. V. POSSIBLE PIVOT SYNTAX Alas, this approach has limitations that restrict the power of pivoting. Each column has redundant syntax, which is cumbersome as the number of pivoted columns increases. These syntaxes are also potentially tough to optimize. For this syntax, the query optimizer is presented with a number of sub-queries, making it harder to identify that this whole operation represents a Pivot on a single table. In practice, this is not an easy operation, making pivot-specific optimizations very difficult. The common problem is that the intent of the query is difficult to infer from the syntax or common relational algebra representation. Therefore, we propose the following syntax for PIVOT as an additional option under the rule of the ANSI SQL grammar. This syntax is easier to read and better captures the intent of the desired operation. Repetition is eliminated, making queries easier to read, write, and maintain. It shows that this approach also enables additional query optimization techniques. We introduce two new data handling operators, Pivot and Unpivot, for use inside the Relational Database Management System. These make it better by many existing user scenarios and enable several new ones. Further, this paper outlines the basic syntactic, semantic, and implementation issues necessary to add this functionality to an existing Relational Database Management System based on numerical, charge based output and numerical statistics stream implementation. Pivot is an extendable part of Group By with unique restrictions and optimization opportunities, and this makes it very convenient to implement increasingly on high of existing grouping implementations. Finally, we represent a number of axioms of algebraic transforms useful in an implementation of Pivot and Unpivot. Consider the following table: 479

4 TABLE I S.No. Cst_ID PROD_CODE QTY 1 1 A B C A C A B C D A A A A A B B B B C C C C D D D D 20 In the above table S.No represents the serial number, CST_ID represents the customer id, PROD_CODE represents the product code and QTY represents the quantity. Consider a sample query for the table to calculate the sum of the column QTY over the rows: SELECT * FROM (SELECT cst_id, prod_code, qty FROM PIVOT_TEST) PIVOT ( SUM (qty) AS SUM_QTY FOR (prod_code) IN ( A AS a, B AS b, C AS c, D AS d) ) ORDER BY CST_ID; The resulting horizontally aggregated table is represented in TABLE II. The table consists of id (record number), cst_id, product code and quantity as the columns before aggregation. After using the aggregate function sum the result is horizontally aggregated using PIVOT operator in SQL. This provides data set that is horizontally oriented i.e. with transposing rows into columns. There are originally 26 records in the table and after aggregation it is made as 4 rows and 4 columns as a result of the query. This makes the data set effective while mining data. Clustering is an important step while mining data. This can be done effectively if data is aggregated horizontally. There exist two DBMS limitations with horizontal aggregations: reaching the maximum number of columns in one table and reaching the maximum column name length when columns are automatically named. To elaborate on this, a horizontal aggregation can return a table that goes beyond the maximum number of columns in the DBMS when the columns have a large number of distinct combinations of values, or when there are multiple horizontal aggregations in the same query. On the other hand, the second important issue is automatically generating unique column names. If there are many subgrouping columns R1;...;Rk or columns are of string data types, this may lead to generate very long column names, which may exceed DBMS limits. However, these are not important limitations because if there are many dimensions that is likely to correspond to a sparse matrix (having many zeroes or nulls) on which it will be difficult or impossible to compute a data mining model. Also, the large column name length can be solved as explained below. The problem of the columns going beyond the maximum number of columns can be solved by vertically partitioning the horizontally aggregated table so that each partition table does not exceed the maximum number of columns allowed by the DBMS. Evidently, each partition table must have say L1;... ; Lj as its primary key. Alternatively, the column name length issue can be solved by generating column identifiers with integers and creating a dimension description table that maps identifiers to full descriptions, but the meaning of each dimension is lost. An alternative is the use of abbreviations, which may require manual input. VI. CONCLUSION We introduced a new class of extended aggregate functions, called horizontal aggregations which help preparing data sets for data mining [1]. 480

5 Basically, a horizontal aggregation returns a set of numbers instead of a single number for each group, resembling a multidimensional vector. We proposed an abstract, but minimal, extension to SQL standard aggregate functions to compute horizontal aggregations which just requires specifying subgrouping columns inside the aggregation function call. From a query optimization perspective, we proposed three query evaluation methods. Our proposed horizontal aggregations can be used as a database method to automatically generate efficient SQL queries with three sets of parameters: grouping columns, subgrouping columns, and aggregated column. REFERENCES [1] Ordonez, C. and Zhibo Chen. 2012, Horizontal Aggregation in SQL to prepare Data Sets for Data Mining Analysis.. IEEE Transactions on Knowledge and Data Engineering (TKDE), pages [2] Ordonez, C Data Set Preprocessing and Transformation in a Database System, Intelligent Data Analysis, vol. 15, no. 4, pp [3] Ordonez, C Horizontal Aggregations for Building Tabular Data Set, Proc. Ninth ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD 04), pp [4] C. Ordonez, C. Vertical and horizontal percentage aggregations. In Proc. ACM SIGMOD Conference, pages [5] Dontu.Jagannadh, Gayathri, T. and Nagendranadh, M.V.S.S Horizontal aggregations for mining relational databases. International Journal of Computer Science and Information Technologies, Vol. 3 (2), pages [6] Nisha, S. and Lakshmipathi, B Optimization of horizontal aggregation in SQL by using K-means algorithm. [7] Umamaheswari, Mahesh,Horizontal, B. Layout Preparation Using Automatic Machine Learning Algorithms, ISSN: [8] Cunningham, C.,Graefe, G.and Galindo-Legaria, C.A PIVOT and UNPIVOT: Optimization and execution strategies in an RDBMS. In Proc. VLDB Conference, pages [9] [10] TABLE II CST_ID A_SUM_QTY B_SUM_QTY C_SUM_QTY D_SUM_QTY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY [Agrawal, 2(4): April, 2013] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY An Horizontal Aggregation Approach for Preparation of Data Sets in Data Mining Mayur

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Vidya Bodhe P.G. Student /Department of CE KKWIEER Nasik, University of Pune, India vidya.jambhulkar@gmail.com Abstract

More information

A Better Approach for Horizontal Aggregations in SQL Using Data Sets for Data Mining Analysis

A Better Approach for Horizontal Aggregations in SQL Using Data Sets for Data Mining Analysis Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 8, August 2013,

More information

Horizontal Aggregation in SQL to Prepare Dataset for Generation of Decision Tree using C4.5 Algorithm in WEKA

Horizontal Aggregation in SQL to Prepare Dataset for Generation of Decision Tree using C4.5 Algorithm in WEKA Horizontal Aggregation in SQL to Prepare Dataset for Generation of Decision Tree using C4.5 Algorithm in WEKA Mayur N. Agrawal 1, Ankush M. Mahajan 2, C.D. Badgujar 3, Hemant P. Mande 4, Gireesh Dixit

More information

Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL

Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL Sanjay Gandhi G 1, Dr.Balaji S 2 Associate Professor, Dept. of CSE, VISIT Engg College, Tadepalligudem, Scholar Bangalore

More information

A Hybrid Approach for Horizontal Aggregation Function Using Clustering

A Hybrid Approach for Horizontal Aggregation Function Using Clustering A Hybrid Approach for Horizontal Aggregation Function Using Clustering 1 Dr.K.Sathesh Kumar, 2 Dr.S.Ramkumar 1 Assistant Professor, Department of Computer Science and Information Technology, 2 Assistant

More information

Horizontal Aggregations in SQL to Generate Data Sets for Data Mining Analysis in an Optimized Manner

Horizontal Aggregations in SQL to Generate Data Sets for Data Mining Analysis in an Optimized Manner International Journal of Computer Science and Engineering Open Access Research Paper Volume-2, Issue-3 E-ISSN: 2347-2693 Horizontal Aggregations in SQL to Generate Data Sets for Data Mining Analysis in

More information

Fundamental methods to evaluate horizontal aggregation in SQL

Fundamental methods to evaluate horizontal aggregation in SQL Fundamental methods to evaluate in SQL Krupali R. Dhawale 1, Vani A. Hiremani 2 Abstract In data mining, we are extracting data from historical knowledge and create data sets. Many hyper graph concepts

More information

Horizontal Aggregations for Building Tabular Data Sets

Horizontal Aggregations for Building Tabular Data Sets Horizontal Aggregations for Building Tabular Data Sets Carlos Ordonez Teradata, NCR San Diego, CA, USA ABSTRACT In a data mining project, a significant portion of time is devoted to building a data set

More information

Vertical and Horizontal Percentage Aggregations

Vertical and Horizontal Percentage Aggregations Vertical and Horizontal Percentage Aggregations Carlos Ordonez Teradata, NCR San Diego, CA, USA ABSTRACT Existing SQL aggregate functions present important limitations to compute percentages. This article

More information

Horizontal Aggregation Function Using Multi Class Clustering (MCC) and Weighted (PCA)

Horizontal Aggregation Function Using Multi Class Clustering (MCC) and Weighted (PCA) Horizontal Aggregation Function Using Multi Class Clustering (MCC) and Weighted (PCA) Dr. K. Sathesh Kumar 1, P. Sabiya 2, S.Deepika 2 Assistant Professor, Department of Computer Science and Information

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Data warehouses Decision support The multidimensional model OLAP queries

Data warehouses Decision support The multidimensional model OLAP queries Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 03 Architecture of DW Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Basic

More information

20461: Querying Microsoft SQL Server 2014 Databases

20461: Querying Microsoft SQL Server 2014 Databases Course Outline 20461: Querying Microsoft SQL Server 2014 Databases Module 1: Introduction to Microsoft SQL Server 2014 This module introduces the SQL Server platform and major tools. It discusses editions,

More information

Writing Queries Using Microsoft SQL Server 2008 Transact- SQL

Writing Queries Using Microsoft SQL Server 2008 Transact- SQL Writing Queries Using Microsoft SQL Server 2008 Transact- SQL Course 2778-08; 3 Days, Instructor-led Course Description This 3-day instructor led course provides students with the technical skills required

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Data Set Preprocessing and Transformation in a Database System

Data Set Preprocessing and Transformation in a Database System Data Set Preprocessing and Transformation in a Database System Carlos Ordonez University of Houston Houston, TX 77204, USA Abstract In general, there is a significant amount of data mining analysis performed

More information

Correlation Based Feature Selection with Irrelevant Feature Removal

Correlation Based Feature Selection with Irrelevant Feature Removal Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Learning Alliance Corporation, Inc. For more info: go to

Learning Alliance Corporation, Inc. For more info: go to Writing Queries Using Microsoft SQL Server Transact-SQL Length: 3 Day(s) Language(s): English Audience(s): IT Professionals Level: 200 Technology: Microsoft SQL Server Type: Course Delivery Method: Instructor-led

More information

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9) Technology & Information Management Instructor: Michael Kremer, Ph.D. Class 6 Professional Program: Data Administration and Management MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9) AGENDA

More information

Querying Data with Transact-SQL (761)

Querying Data with Transact-SQL (761) Querying Data with Transact-SQL (761) Manage data with Transact-SQL Create Transact-SQL SELECT queries Identify proper SELECT query structure, write specific queries to satisfy business requirements, construct

More information

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept

More information

Developing SQL Data Models

Developing SQL Data Models Developing SQL Data Models 20768B; 3 Days; Instructor-led Course Description The focus of this 3-day instructor-led course is on creating managed enterprise BI solutions. It describes how to implement

More information

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4 Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is

More information

COURSE OUTLINE MOC 20461: QUERYING MICROSOFT SQL SERVER 2014

COURSE OUTLINE MOC 20461: QUERYING MICROSOFT SQL SERVER 2014 COURSE OUTLINE MOC 20461: QUERYING MICROSOFT SQL SERVER 2014 MODULE 1: INTRODUCTION TO MICROSOFT SQL SERVER 2014 This module introduces the SQL Server platform and major tools. It discusses editions, versions,

More information

Inverted Index for Fast Nearest Neighbour

Inverted Index for Fast Nearest Neighbour Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Evolution of Database Systems

Evolution of Database Systems Evolution of Database Systems Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies, second

More information

Improved Data Mining Analysis by Dataset creation using Horizontal Aggregation and B+ Tree

Improved Data Mining Analysis by Dataset creation using Horizontal Aggregation and B+ Tree Improved Data Mining Analysis by Dataset creation using Horizontal Aggregation and B+ Tree Avisha Wakode, Mrs. D. A. Chaudhari, DYPCOE - Akurdi, Savitribai Phule Pune University Abstract Data Mining is

More information

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka

University of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

Course Outline. Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led

Course Outline. Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led About this course This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days

More information

T-SQL Training: T-SQL for SQL Server for Developers

T-SQL Training: T-SQL for SQL Server for Developers Duration: 3 days T-SQL Training Overview T-SQL for SQL Server for Developers training teaches developers all the Transact-SQL skills they need to develop queries and views, and manipulate data in a SQL

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Querying Data with Transact-SQL 20761B; 5 Days; Instructor-led Course Description This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can

More information

Data Preprocessing. Data Preprocessing

Data Preprocessing. Data Preprocessing Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should

More information

20761 Querying Data with Transact SQL

20761 Querying Data with Transact SQL Course Overview The main purpose of this course is to give students a good understanding of the Transact-SQL language which is used by all SQL Server-related disciplines; namely, Database Administration,

More information

Optimization of Query Processing in XML Document Using Association and Path Based Indexing

Optimization of Query Processing in XML Document Using Association and Path Based Indexing Optimization of Query Processing in XML Document Using Association and Path Based Indexing D.Karthiga 1, S.Gunasekaran 2 Student,Dept. of CSE, V.S.B Engineering College, TamilNadu, India 1 Assistant Professor,Dept.

More information

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang Department of Computer Science, University of Houston, USA Abstract. We study the serial and parallel

More information

Basics of Dimensional Modeling

Basics of Dimensional Modeling Basics of Dimensional Modeling Data warehouse and OLAP tools are based on a dimensional data model. A dimensional model is based on dimensions, facts, cubes, and schemas such as star and snowflake. Dimension

More information

20761B: QUERYING DATA WITH TRANSACT-SQL

20761B: QUERYING DATA WITH TRANSACT-SQL ABOUT THIS COURSE This 5 day course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can be taught as a course to students requiring the knowledge

More information

Computing Data Cubes Using Massively Parallel Processors

Computing Data Cubes Using Massively Parallel Processors Computing Data Cubes Using Massively Parallel Processors Hongjun Lu Xiaohui Huang Zhixian Li {luhj,huangxia,lizhixia}@iscs.nus.edu.sg Department of Information Systems and Computer Science National University

More information

Querying Microsoft SQL Server

Querying Microsoft SQL Server Querying Microsoft SQL Server 20461D; 5 days, Instructor-led Course Description This 5-day instructor led course provides students with the technical skills required to write basic Transact SQL queries

More information

Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services Course Details Course Outline Module 1: Introduction to Microsoft SQL Server Analysis Services This module introduces

More information

Querying Microsoft SQL Server 2008/2012

Querying Microsoft SQL Server 2008/2012 Querying Microsoft SQL Server 2008/2012 Course 10774A 5 Days Instructor-led, Hands-on Introduction This 5-day instructor led course provides students with the technical skills required to write basic Transact-SQL

More information

Naive Bayes Classifiers Programmed in Query Language

Naive Bayes Classifiers Programmed in Query Language 166 Naive Bayes Classifiers Programmed in Query Language 1 Y.V. Siddartha Reddy, 2 Dr.Supreethi K.P 1 Student M.Tech, 2 Assistant Professor, JNTU Hyderabad, yisddarhareddy@gmail.com, supreethi.pujari@gmail.com

More information

Further GroupBy & Extend Operations

Further GroupBy & Extend Operations Slide 1 Further GroupBy & Extend Operations Objectives of the Lecture : To consider whole relation Grouping; To consider the SQL Grouping option Having; To consider the Extend operator & its implementation

More information

20461: Querying Microsoft SQL Server

20461: Querying Microsoft SQL Server 20461: Querying Microsoft SQL Server Length: 5 days Audience: IT Professionals Level: 300 OVERVIEW This 5 day instructor led course provides students with the technical skills required to write basic Transact

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Course Code: M20761 Vendor: Microsoft Course Overview Duration: 5 RRP: 2,177 Querying Data with Transact-SQL Overview This course is designed to introduce students to Transact-SQL. It is designed in such

More information

Concepts of Database Management Eighth Edition. Chapter 2 The Relational Model 1: Introduction, QBE, and Relational Algebra

Concepts of Database Management Eighth Edition. Chapter 2 The Relational Model 1: Introduction, QBE, and Relational Algebra Concepts of Database Management Eighth Edition Chapter 2 The Relational Model 1: Introduction, QBE, and Relational Algebra Relational Databases A relational database is a collection of tables Each entity

More information

Index. Bitmap Heap Scan, 156 Bitmap Index Scan, 156. Rahul Batra 2018 R. Batra, SQL Primer,

Index. Bitmap Heap Scan, 156 Bitmap Index Scan, 156. Rahul Batra 2018 R. Batra, SQL Primer, A Access control, 165 granting privileges to users general syntax, GRANT, 170 multiple privileges, 171 PostgreSQL, 166 169 relational databases, 165 REVOKE command, 172 173 SQLite, 166 Aggregate functions

More information

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept]

Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept] Interview Questions on DBMS and SQL [Compiled by M V Kamal, Associate Professor, CSE Dept] 1. What is DBMS? A Database Management System (DBMS) is a program that controls creation, maintenance and use

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Querying Data with Transact-SQL Course 20761C 5 Days Instructor-led, Hands on Course Information The main purpose of the course is to give students a good understanding of the Transact- SQL language which

More information

After completing this course, participants will be able to:

After completing this course, participants will be able to: Querying SQL Server T h i s f i v e - d a y i n s t r u c t o r - l e d c o u r s e p r o v i d e s p a r t i c i p a n t s w i t h t h e t e c h n i c a l s k i l l s r e q u i r e d t o w r i t e b a

More information

"Charting the Course to Your Success!" MOC D Querying Microsoft SQL Server Course Summary

Charting the Course to Your Success! MOC D Querying Microsoft SQL Server Course Summary Course Summary Description This 5-day instructor led course provides students with the technical skills required to write basic Transact-SQL queries for Microsoft SQL Server 2014. This course is the foundation

More information

Data about data is database Select correct option: True False Partially True None of the Above

Data about data is database Select correct option: True False Partially True None of the Above Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another

More information

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Galina Bogdanova, Tsvetanka Georgieva Abstract: Association rules mining is one kind of data mining techniques

More information

Robust Aggregation in Sensor Network: An Efficient Frequent itemset and Number of occurrence counting

Robust Aggregation in Sensor Network: An Efficient Frequent itemset and Number of occurrence counting Robust Aggregation in Sensor Network: An Efficient Frequent itemset and Number of occurrence counting Kayalvizhi s (1) and Vanitha k (2) 1 M.Phil Scholar, SITS, Dr. G R D College of Science, Coimbatore.

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

Querying Microsoft SQL Server

Querying Microsoft SQL Server Course Code: M20461 Vendor: Microsoft Course Overview Duration: 5 RRP: POA Querying Microsoft SQL Server Overview This 5-day instructor led course provides delegates with the technical skills required

More information

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Understanding Rule Behavior through Apriori Algorithm over Social Network Data Global Journal of Computer Science and Technology Volume 12 Issue 10 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172

More information

The Percentage Cube. Yiqun Zhang, Carlos Ordonez, Javier García-García, Ladjel Bellatreche, Humberto Carrillo

The Percentage Cube. Yiqun Zhang, Carlos Ordonez, Javier García-García, Ladjel Bellatreche, Humberto Carrillo The Percentage Cube Yiqun Zhang, Carlos Ordonez, Javier García-García, Ladjel Bellatreche, Humberto Carrillo Abstract OLAP cubes provide exploratory query capabilities combining joins and aggregations

More information

8) A top-to-bottom relationship among the items in a database is established by a

8) A top-to-bottom relationship among the items in a database is established by a MULTIPLE CHOICE QUESTIONS IN DBMS (unit-1 to unit-4) 1) ER model is used in phase a) conceptual database b) schema refinement c) physical refinement d) applications and security 2) The ER model is relevant

More information

AVANTUS TRAINING PTE LTD

AVANTUS TRAINING PTE LTD [MS20461]: Querying Microsoft SQL Server 2014 Length : 5 Days Audience(s) : IT Professionals Level : 300 Technology : SQL Server Delivery Method : Instructor-led (Classroom) Course Overview This 5-day

More information

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and

CHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4

More information

QUERYING MICROSOFT SQL SERVER COURSE OUTLINE. Course: 20461C; Duration: 5 Days; Instructor-led

QUERYING MICROSOFT SQL SERVER COURSE OUTLINE. Course: 20461C; Duration: 5 Days; Instructor-led CENTER OF KNOWLEDGE, PATH TO SUCCESS Website: QUERYING MICROSOFT SQL SERVER Course: 20461C; Duration: 5 Days; Instructor-led WHAT YOU WILL LEARN This 5-day instructor led course provides students with

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

Querying Microsoft SQL Server

Querying Microsoft SQL Server 20461 - Querying Microsoft SQL Server Duration: 5 Days Course Price: $2,975 Software Assurance Eligible Course Description About this course This 5-day instructor led course provides students with the

More information

PRO: Designing a Business Intelligence Infrastructure Using Microsoft SQL Server 2008

PRO: Designing a Business Intelligence Infrastructure Using Microsoft SQL Server 2008 Microsoft 70452 PRO: Designing a Business Intelligence Infrastructure Using Microsoft SQL Server 2008 Version: 33.0 QUESTION NO: 1 Microsoft 70452 Exam You plan to create a SQL Server 2008 Reporting Services

More information

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management

Management Information Systems Review Questions. Chapter 6 Foundations of Business Intelligence: Databases and Information Management Management Information Systems Review Questions Chapter 6 Foundations of Business Intelligence: Databases and Information Management 1) The traditional file environment does not typically have a problem

More information

Managing Data Resources

Managing Data Resources Chapter 7 Managing Data Resources 7.1 2006 by Prentice Hall OBJECTIVES Describe basic file organization concepts and the problems of managing data resources in a traditional file environment Describe how

More information

Optimizing OLAP Cube Processing on Solid State Drives

Optimizing OLAP Cube Processing on Solid State Drives Optimizing OLAP Cube Processing on Solid State Drives Zhibo Chen University of Houston Houston, TX 77204, USA Carlos Ordonez University of Houston Houston, TX 77204, USA ABSTRACT Hardware technology has

More information

Bayesian Classifiers Programmed in SQL

Bayesian Classifiers Programmed in SQL 1 Bayesian Classifiers Programmed in SQL Carlos Ordonez, Sasi K. Pitchaimalai University of Houston Houston, TX 77204, USA Abstract The Bayesian classifier is a fundamental classification technique. In

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Querying Data with Transact-SQL General Description This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can be taught as a course to students

More information

Contents of SAS Programming Techniques

Contents of SAS Programming Techniques Contents of SAS Programming Techniques Chapter 1 About SAS 1.1 Introduction 1.1.1 SAS modules 1.1.2 SAS module classification 1.1.3 SAS features 1.1.4 Three levels of SAS techniques 1.1.5 Chapter goal

More information

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes?

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more

More information

"Charting the Course... MOC C: Querying Data with Transact-SQL. Course Summary

Charting the Course... MOC C: Querying Data with Transact-SQL. Course Summary Course Summary Description This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can be taught as a course to students requiring the knowledge

More information

SEF DATABASE FOUNDATION ON ORACLE COURSE CURRICULUM

SEF DATABASE FOUNDATION ON ORACLE COURSE CURRICULUM On a Mission to Transform Talent SEF DATABASE FOUNDATION ON ORACLE COURSE CURRICULUM Table of Contents Module 1: Introduction to Linux & RDBMS (Duration: 1 Week)...2 Module 2: Oracle SQL (Duration: 3 Weeks)...3

More information

1. Inroduction to Data Mininig

1. Inroduction to Data Mininig 1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the

More information

COURSE OUTLINE: Querying Microsoft SQL Server

COURSE OUTLINE: Querying Microsoft SQL Server Course Name 20461 Querying Microsoft SQL Server Course Duration 5 Days Course Structure Instructor-Led (Classroom) Course Overview This 5-day instructor led course provides students with the technical

More information

Querying Microsoft SQL Server

Querying Microsoft SQL Server Querying Microsoft SQL Server Course 20461D 5 Days Instructor-led, Hands-on Course Description This 5-day instructor led course is designed for customers who are interested in learning SQL Server 2012,

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 7 More SQL: Complex Queries, Triggers, Views, and Schema Modification Slide 7-2 Chapter 7 Outline More Complex SQL Retrieval Queries Specifying Semantic Constraints as Assertions and Actions as

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

SQL Server Analysis Services

SQL Server Analysis Services DataBase and Data Mining Group of DataBase and Data Mining Group of Database and data mining group, SQL Server 2005 Analysis Services SQL Server 2005 Analysis Services - 1 Analysis Services Database and

More information

VALLIAMMAI ENGNIEERING COLLEGE SRM Nagar, Kattankulathur 603203. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III & VI Section : CSE - 2 Subject Code : IT6702 Subject Name : Data warehousing

More information

Constructing Horizontal layout and Clustering Horizontal layout by applying Fuzzy Concepts for Data mining Reasoning

Constructing Horizontal layout and Clustering Horizontal layout by applying Fuzzy Concepts for Data mining Reasoning www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 1 January 2015, Page No. 10028-10042 Constructing Horizontal layout and Clustering Horizontal layout

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2017/18 Unit 10 J. Gamper 1/37 Advanced Data Management Technologies Unit 10 SQL GROUP BY Extensions J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements: I

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 26 Enhanced Data Models: Introduction to Active, Temporal, Spatial, Multimedia, and Deductive Databases 26.1 Active Database Concepts and Triggers Database systems implement rules that specify

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?

More information

An Efficient Approach for Color Pattern Matching Using Image Mining

An Efficient Approach for Color Pattern Matching Using Image Mining An Efficient Approach for Color Pattern Matching Using Image Mining * Manjot Kaur Navjot Kaur Master of Technology in Computer Science & Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib,

More information

A Framework for Securing Databases from Intrusion Threats

A Framework for Securing Databases from Intrusion Threats A Framework for Securing Databases from Intrusion Threats R. Prince Jeyaseelan James Department of Computer Applications, Valliammai Engineering College Affiliated to Anna University, Chennai, India Email:

More information

Querying Microsoft SQL Server (MOC 20461C)

Querying Microsoft SQL Server (MOC 20461C) Querying Microsoft SQL Server 2012-2014 (MOC 20461C) Course 21461 40 Hours This 5-day instructor led course provides students with the technical skills required to write basic Transact-SQL queries for

More information

Integrating K-means Clustering with a Relational

Integrating K-means Clustering with a Relational 1 Integrating K-means Clustering with a Relational DBMS using SQL Carlos Ordonez Teradata, NCR San Diego, CA 92127, USA Abstract Integrating data mining algorithms with a relational DBMS is an important

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Querying Data with Transact-SQL Duration: 5 Days Course Code: M20761 Overview: This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can

More information