Constructing Horizontal layout and Clustering Horizontal layout by applying Fuzzy Concepts for Data mining Reasoning

Size: px
Start display at page:

Download "Constructing Horizontal layout and Clustering Horizontal layout by applying Fuzzy Concepts for Data mining Reasoning"

Transcription

1 International Journal Of Engineering And Computer Science ISSN: Volume 4 Issue 1 January 2015, Page No Constructing Horizontal layout and Clustering Horizontal layout by applying Fuzzy Concepts for Data mining Reasoning Kalluri N V Satya Naresh, Divya Vani.Y divyasudha99@gmail.com Shri Vishnu Engineering College for Women Bhimavaram, Andhra Pradesh, India Abstract: Clustering is one of the significant tasks in data mining which is benevolent for bounteous users by affording analysis and decision making. This paper inaugurates agile and dexterous way to conceive horizontal layout and forthright usage of horizontal layout in data mining algorithms like clustering. Predominantly educing a data set in data mining project for analysis is a time conceiving, striving task so horizontal layouts are created and stored in database which averts the burden of performing data preprocessing in data mining projects.the vertical layouts created by vertical aggregations in SQL are impotent for data mining algorithms so horizontal aggregations are used to create horizontal layouts. It is surpass to create horizontal layout instead of creating vertical layout as vertical layout only creates one column per aggregated group by using normal SQL (Structured Query Language) aggregations and horizontal layouts returns many values per aggregated group or row so they are useful for data mining algorithms. Through CASE and SPJ methods horizontal aggregations are evaluated for creating horizontal layouts dexterously and agilely. This paper induces how horizontal layout can be created easily with CASE method than by using SPJ method. To prepare a data set for clustering takes more time and effort so the created horizontal layout is obliged for clustering directly without wastage of time and effort. As in data uncertainty is the key feature so by using soft computing concepts like Fuzzy Set, clustering of horizontal layout is done, hence clustered data is serendipitous for users for analysis and decision making and the whole process is elucidated with examples and experimental results. Keywords: Horizontal Aggregation, Horizontal layout, Vertical layout, Vertical Aggregation, Data mining algorithms, Clustering, Fuzzy Concepts. 1. Introduction: Horizontal layouts are dreadfully of assistance in data mining algorithms, so this paper utterly perambulates about effortless creation and clustering of horizontal layout by superintendence imprecise data. Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10028

2 Generally erecting a data set for data mining projects is a most time conceiving process. The vertical layouts spawned by normal SQL aggregation functions (vertical aggregations) are discordant for using in data mining tasks or projects. Vertical layout spawned by vertical aggregations dwelled of more no of rows which are not I/O (Input or Output) efficient and are impotent for using in data mining tasks or projects. So to disentangle the problem of erecting data sets horizontal aggregations are adopted to create horizontal layout easily. Horizontal layouts are augment I/O efficient than vertical layout for using in data mining algorithms like classification, regression analysis, PDA, clustering. Horizontal layout can avoid the burden of creating data sets by performing data preprocessing phase and data set creation phase with complex SQL queries. Vertical layouts have some limitations to use for data mining algorithms which are erected by using normal SQL functions as they return only one column per aggregated group or row, so Horizontal layout is created by using functions called horizontal aggregations which create many columns or values per aggregated group or row instead of one value per row. They are many advantages with horizontal aggregations which are helpful for generating SQL code automatically and these are evaluated by using SPJ and CASE methods in this paper. In this paper it is clearly proved with example that it is easy and time efficient to create horizontal layout by using CASE method than using SPJ method. Without performing any data mining pre-processing tasks in-anticipation created horizontal layout is used unswervingly for clustering saving time and effort. Clustering of horizontal layout is performed by using Fuzzy Concepts handling impreciseness and vagueness of data. The mechanism where information is gleaned, asserted in a summary form and recycled for demographic analysis is known as data aggregation. Intension to get ample information about itemized groups from data based on peculiar variables such as gender, name, age, address, profession, phone number or income is called as general aggregation. Utmost data mining algorithms crave horizontal layout data set as input because horizontal layout return values per aggregated row instead of one value per aggregated row. A latest class of aggregate functions is contemplated to return a table or data set having horizontal layout aggregating expressions of numeric and transposing the results. Functions which belong to this type of class are horizontal aggregations. Horizontal aggregations epitomize the dilatation form of traditional SQL aggregations, which return a group of values or columns in a horizontal layout per aggregated row or group instead of a single column or value per aggregated row. Many vital operators and functions are needed to compute aggregations in SQL. Sum is the ultimate prevalently used aggregation of a column and assorted other aggregation operators return the row count, maximum, average and minimum over the groups of rows. For accomplishing aggregations all the extant operators have cramp to be used in data mining intendments to create large data sets. For OLTP (online transaction process) database schemas need to be profoundly normalized. But conventionally data mining, machine learning or statistical algorithms carve aggregated data to be in synopsized form. Data mining algorithms use suitable input as cross tabular (horizontal) pattern so for this intendment essential endeavor is required to compute aggregation. En masse creating a data set for data mining projects is a most time conceiving process. Horizontal layouts are I/O and time efficient for using in data mining algorithms like classification, regression analysis, PDA, clustering which can avoid the burden of creating data sets by performing data preprocessing Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10029

3 phase and data set creation phase with complex SQL queries. Vertical layouts have some cramp to use for data mining algorithms which are created by using normal SQL functions as they return only one column per aggregated group or row, so Horizontal layout is created by using functions called horizontal aggregations which create many columns or values per aggregated group or row instead of one value per row. They are many advantages with horizontal aggregations like procreate SQL code automatically and evaluated by using SPJ and CASE methods. An advanced class function is Horizontal aggregation to return attributes or columns that are aggregated in a horizontal layout. Most algorithms require datasets with horizontal layout as input. It is tenacious task to superintend data sets without rampart of DBMS. Intramural a Relational database it is worthier to try with different subsets of dimensions and data points are easier, faster and flexible than working outside with another alternative tool. Much like project, join, select, horizontal aggregation are performed by using operator and it is better to implement inside query processor. In everyday and advanced applications intersperse of soft computing and tools are invigorated by soft computing. In real applications data uncertainty is the clamorous feature and as hard computing cannot handle vague and uncertain data soft computing is used. Zadeh inaugurated the notion of graded membership by perceiving the concept of Fuzzy set in order to apprehend impreciseness in data, and theorize the characteristic function of sets. The most autonomous learning problem clustering is dealing with discovering a structure in a collection of unlabeled data. To cluster inexact and imprecise data Fuzzy based clustering algorithms are used. In clustering if the minimum no of elements in a cluster is fixed than it is K-Means algorithm and if no of clusters are fixed than it is fuzzy-c Means algorithm. As horizontal layout can be used precisely for data mining algorithms or projects we are using well-nigh for clustering because it is one of most important task in data mining. Clustering of Horizontal layout can be performed through Fuzzy C-Means algorithm. 2. Literature Review: Database is formulating data to model pertinent aspects of verisimilitude in a way to support processes requiring information. Data Base Management System (DBMS) are specially developed software applications that interact with applications, users and database to capture data and analyze data. DBMS is special software designed to allow define, create, update, query and administrate database. Some known DBMS are MYSQL, PostgreSQL, MariaDB, SQLLite, Oracle, Microsoft SQL Server, DBase, SAP HANA, FoxPro, Libre office Base, IBM DB2, and File Marker Pro. To select data from database SELECT statement is used. Projection is selecting of the columns of table that one wishes to appear in the answer or table or data set. SQL join is used to built data set or table based on the common field between tables from two or more tables to combine rows of tables. Left outer join returns the matched tuples or rows from the right table and all the tuples or rows from left table. Aggregation function groups multiple rows values to form a single value based on certain condition. The most commonly used aggregation functions are average (), maximum (), mode (), median (), count (), minimum (), sum (). These normal SQL aggregation functions are also called as vertical aggregation functions useful to create vertical layout. Group by clause performs gathering of all the rows that contains Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10030

4 data in the opted columns and allows aggregation functions to operate on one or more columns. Data mining is the process of extracting knowledge from data. Data present in various data sources is collected and stored in data warehouses than data mining functionalities are performed on preprocessed data giving results of user understandable form. All tasks Data cleaning, transforming, reducing, regression analysis, association rule generation, Classification, clustering, outlier analysis comes under data mining tasks. This paper deals with clustering among different functionalities of data mining. Data Clustering is the technique of partitioning a dataset into distinct clusters depending upon the property of same identity of elements. The Elements which are having identical features are kept in a single cluster, whereas not so identical elements are kept in different clusters. In 1965 Zadeh determined the sign of fuzzy set and deliberated fuzzy set. Membership function is accredited with fuzzy set and considerate to tackle with imprecise data. A fuzzy set is defined as A S, where S is a set in an universe, is defined by its membership function denoted by such that : X [0,1], that is every A A y A is associated with a real number ( y ), called the membership value of x, which satisfies 0< ( y A ) <1. To cluster data by super visioning impreciseness by using Fuzzy set concept, clustering is performed for the created Horizontal layouts and the clustered data is serendipitous for users to analysis and decision making purposes. 2.2 Need For Creating Horizontal Layout: Horizontal layout predominantly untangles the burden of data mining projects as educing of data sets in data A preparation phase takes lot of time and effort. The horizontal layouts can be precisely used as input data sets by data mining algorithms like classification, regression analysis, clustering and PDA without again preparing data sets from data tables. Prevalent SQL aggregation functions like min, avg, sum, and max can be used to create vertical layout. Vertical layouts elicited by using accustom SQL aggregation functions but cannot be opted as I/O efficient for data mining algorithms because they can generate only one column per aggregated group and legion rows. Therefore a horizontal layout is imperative having many columns per aggregated group i.e returning many values per row. By excogitating functions like horizontal aggregations educing horizontal layout can be comply. Data mining tools can perforce generate SQL code. To assay horizontal aggregations methods like CASE and SPJ can be afford. 2.3 Advantages of creating horizontal layouts using horizontal aggregations and clustering them: (.) In data mining tools SQL code can be generated as horizontal aggregation constructs a template and automates to reproduce, optimize and test SQL queries for correctness. (.) SQL queries generated axiomatically are more efficient than queries generated by end user. (.) The data set created by horizontal aggregations can be created unswervingly in the database. (.) The Horizontal layouts created can be straightly given as input for data mining algorithms like classification, regression analysis, clustering and PDA. (.) The clustered data created by clustering horizontal layout is more serendipitous by users for analysis and decision making. Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10031

5 2.4 Definitions: T is a database table with primary key P, C 1, C 2,.,C i as discrete columns, N as one numeric column and it is symbolized as T(P, C 1, C 2.C i, N). In OLAP terms it is interpreted as T is the fact table having P as primary key, i dimensions, N as measure column where M is the size of the table, C 1, C 2.C i are foreign keys in fact table and primary keys in lookup tables. T is the input table, by executing SQL queries tables T V, T H are created where Table T V is the vertical layout table, T H is the horizontal layout. Conversion of vertical layout to horizontal layout is the goal of horizontal aggregations. Let us consider the following table T as example having P as primary key, C 1, C 2 as discrete columns and N as numeric column. Table 2.2 Vertical layout After giving the above SQL Query with SQL aggregation function like sum, above table 2.2 is the output for query which is called a vertical layout. As this vertical layout is having only one aggregated column and both C 1, C 2 acting as primary key it is not useful for giving as input to data mining algorithms, So horizontal tabular layout is required. The following table 2.3 is horizontal layout having two aggregated columns and one primary key which is helpful for giving as input in data mining tasks or algorithms. Database Table Table 2.1 A Horizontal layout Table Methodology Consider the query Select C 1, C 2, Sum (N) from T group by C 1, C 2 order by C 1, C Horizontal Aggregations: Horizontal aggregations are abetting in times where the user wants to get output in horizontal form or craves amalgamating vertical layout with aggregations confide in on grouping columns. As vertical layout are not that abundantly commodious for data mining Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10032

6 algorithms horizontal layout are created by using horizontal aggregations. Horizontal aggregations revamp the vertical layout to horizontal layout by transmogrifying the aggregation column N to list of transposing columns Y 1.Y K. Consider an SQL Query that takes X 1..X m as subset from C 1..C p1. The syntax for conceiving vertical layout is as follows. Select X 1.X m, sum (N) from T group by X 1.X m. The above query will outturn a vertical layout data set possessing m+1 columns where the m columns X 1 X m act as primary and Sum (N) is the only one aggregated column. To metamorphose the Vertical layout to horizontal layout, horizontal aggregation functions are used. The syntax for erecting of Horizontal layout is as follows: SELECT X 1,.,X j, Ha(N BY Y 1,.,Y k ) FROM F GROUP BY X 1,.,X j. Consider a palpable example of stores database procuring stores information in Table transaction. Table transaction is possessing strid, deptid, date, month, year, day, rate, qty, totalsales, itemqty, costamt as columns. Suppose if we appetite to find out total sales for each storied by each day of the week. The normal SQL statement for the above query is Select strid, day, sum (totalsales) from transaction group by strid, day order by strid, day. This gives a vertical layout like below The indispensable desideratum of horizontal aggregations is to transmogrify aggregated column N by a list of columns Y 1 Y k where the Y 1.. Y k are subset of columns X 1 X m and k<m. So to inaugurate SQL code by horizontal aggregations there are four input parameters T, X 1.X m, N, Y 1.Y k Where T is the Input table, X 1.X m are the grouping columns, N is the aggregated column and Y 1.Y k are transposing columns. The frame of reference for horizontal aggregation is similar to the frame of reference for vertical aggregation. The horizontal aggregation function is connate by Ha(N BY Y 1,.,Y k ) where Ha is the standard SQL aggregation function, N is the aggregation column and Y 1..Y k are the transposing columns. Annexing of standard SQL aggregation or vertical aggregation function is rendered by using By clause which transmutes the aggregation column N to list of transposing columns Y 1 Y k which avails in conceiving a horizontal layout instead of vertical layout creation. Fig Vertical layout created by using vertical aggregations This vertical layout is not useful for data mining tasks as it has only one aggregated column and both strid, day of week act as primary key returning many records. So by using horizontal aggregations Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10033

7 horizontal layout is created having many aggregated columns and only strid as primary key. The SQL syntax with horizontal aggregations is as follows: Select strid, sum (total_sales BY day_of_week) from transaction group by strid. Architecture Fig 3.2 System Module 1(Selection Process) Fig Horizontal layout created by using Horizontal aggregations 3.2 Creation and Clustering Horizontal Layout This paper percolates creation of horizontal layout with CASE, SPJ methods and clusters the resulted Horizontal layout by using Fuzzy C-Means algorithm. An Example with results is also explained for understanding. Horizontal layouts can be created by CASE, SPJ and Pivot methods but PIVOT and CASE method give the same result with almost same time complexity but CASE method is having better time complexity than SPJ method. So we are only using CASE and SPJ methods in our process, both gives same result with different time complexities. Creation and clustering horizontal layouts is done in three modules. This is the proposed System architecture: We need to select the table from database and select the columns that we want to group by, aggregate, transpose for which we want to create horizontal layout. Select the group by column X 1..X j Select the aggregate column N Select the transposing column Y 1.Y k Module 2(Creation of Horizontal Layout) In this module horizontal layouts are created by using SPJ and CASE methods SPJ Method: In this caliber we aggregate the column in horizontal way with the help of SPJ (Select, Project, Join) method. The basic idea is to create one table with a vertical aggregation for each result column, and then join all those tables to produce F H. We aggregate from F into d projected tables with d Select-Project-Join-Aggregation queries (selection, projection, join, aggregation). Each table F 1 corresponds to one sub grouping combination and has {X 1 X j } as primary key and an aggregation on A as Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10034

8 the only non key column. It is necessary to introduce an additional table F 0 that will be outer joined with projected tables to get a complete result set. Three Main Steps in SPJ Method to create Horizontal layout: (.) First Table T 0 is created having distinct combination of group by columns X 1,..,X j. (.) For each unique combination of Transposing columns Y 1,,Y k, Tables T 1,.,T d are created. (.) Lastly Table T 0 is left outer joined with each table T 1 to T d. How these tables are created is clearly explained below. Table T 0 defines the number of result rows, and builds the primary key. T 0 is populated so that it contains every existing combination of X 1,..,X j. Table F 0 has X 1,,X j as primary key and it does not have any non key column. INSERT INTO T0 SELECT DISTINCT X 1,..., X j FROM T. We should create tables T 1 to T d. Tables T 1,,., T d contain individual aggregations for each combination of R 1,...,R k. The primary key of table T 1.T d is Y 1,.,Y k and N is aggregated column. INSERT INTO T 1 SELECT X i,.x j, V(N) FROM T/T v WHERE Y 1 = v 11 AND Y k = V k1 GROUP BY X i,.x j. Then each table T 1 aggregates only those rows that correspond to the I th unique combination of Y 1.Y k, given by the WHERE clause. A possible optimization is synchronizing table scans to compute the d tables in one pass. Finally, to get T H we need d left outer joins with the T 0 and d tables so that all individual aggregations are properly assembled as a set of d dimensions for each group. Outer joins set result columns to null for missing combinations for the given group. In general, nulls should be the default value for groups with missing combinations. We believe it would be incorrect to set the result to zero or some other number by default if there is no qualifying rows. Such approach should be considered on a per CASE basis. INSERT INTO T H SELECT T 0.X 1, T 0.X 2,..., T 0.X j, T 1.N, T 2.N,...,T d.n FROM T 0 LEFT OUTER JOIN T 1 ON T 0.X 1 = T 1.X 1 and... and T 0.X j =T 1.Xj LEFT OUTER JOIN F 2 ON T 0.X 1 = T 2.X 1 and... and T 0.X j = T 2.Xj..LEFT OUTER JOIN Fd ON T 0.X1 = T d.x1 and... and T 0.X j = T d.x j. Real Time Example for SPJ method: Consider a database having stores information and Transaction is a table in the database having StoreId, DepId, Date, Month, Year, Day, ItemId, Rate, Qty, Amt as columns. Suppose if want find total sales amount for each storied by each day of week. The following queries should be computed to construct horizontal layout by using SPJ method Query1: INSERT INTO F 0 SELECT DISTINCT storeid FROM Transaction. Query2: INSERT INTO F 1 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day= Mon GROUP BY storeid; Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10035

9 .Query3: INSERT INTO F 2 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day= Tue GROUP BY storeid;.query4: INSERT INTO F 3 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day= Wed GROUP BY storeid;.query5: INSERT INTO F 4 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day= Thu GROUP BY strid; Query6: INSERT INTO F 5 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day= Fri GROUP BY storeid;. Query7: INSERT INTO F 6 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day= Sat GROUP BY storeid;.query8: INSERT INTO F 7 SELECT storeid, sum (amt) AS totalsalesamt FROM F Transaction WHERE Day= Sun GROUP BY storeid; Query9: INSERT INTO F H SELECT F 0.storied, F 1.totalsalesamt AS Mon-amt, F 2.totalsalesamt AS Tue-amt, F 3.totalsalesamt AS Wed-amt, F 4.totalsalesamt AS Thu-amt, F 5.totalsalesamt AS fri-amt, F 6.totalsalesamt AS Sat-amt, F 7.totalsalesamt AS Sun-amt FROM F 0 LEFT OUTER JOIN F 1 on F 0.storeid=F 1.storeid LEFT OUTER JOIN F 2 on F 0.storrid=F 2.storeid LEFT OUTER JOIN F 3 on F 0.storeid=F 3.storeid LEFT OUTER JOIN F 4 on F 0.storeid=F 4.storeid LEFT OUTER JOIN F 5 on F 0.storeid=F 5.storeid LEFT OUTER JOIN F 6 on F 0.storeid=F 6.storeid LEFT OUTER JOIN F 7 on F 0.storeid=F 7.storeid. By evaluating above queries we will get the horizontal layout that we want but it takes lot of effort as more sub queries should be written and more join operations should be performed. Consider the same above query, to create vertical layout for this just one query is enough i.e select storied, day, sum (amt) from Transaction group by storied, day. But to create horizontal layout we are writing 9 queries, so to reduce the effort and time complexity CASE method can be used to create horizontal layout easily with less effort CASE Method: In this module we aggregate the column horizontally through CASE Method. The CASE statement returns a value selected from a set of values based on Boolean expressions. From a relational database theory point of view this is equivalent to doing a simple projection/aggregation query where each non key value is given by a function that returns a number based on some conjunction of conditions. In a similar manner to SPJ, the method directly aggregates from F. Horizontal aggregation queries can be evaluated by directly aggregating from F and transposing rows at the same time to produce F H. First, we need to get the unique combinations of R 1,.,R k that define the matching Boolean expression for result columns. The SQL code to compute horizontal aggregations directly from F is as follows: V () is a standard (vertical) SQL aggregation that has a CASE statement as argument. Horizontal aggregations need to set the result to null when there are no qualifying rows for the specific horizontal group to be consistent with the SPJ method and also with the extended relational model. Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10036

10 SQL Syntax for CASE method is given below, in the syntax T is the original table and T H is the horizontal layout: SELECT DISTINCT Y 1,,Y k FROM T. INSERT INTO TH SELECT X 1,..., X j, V(CASE WHEN Y 1 = v 11 and... and Y k =v k1 THEN N ELSE null END),V(CASE WHEN Y 1 = v 1d and... and Y k = v kd THEN N ELSE null END) FROM F GROUP BY X 1, X 2...,Y j. Example: Suppose in a store database if we want find out total items sold in each department of each store by each day of week. The following query is evaluated to create horizontal layout by using CASE method. select StoreId, DepId, sum( CASE when Day='Fri' then Qty else null end),sum( CASE when Day='Mon' then Qty else null end),sum( CASE when Day='Sat' then Qty else null end),sum( CASE when Day='Thr' then Qty else null end),sum( CASE when Day='Tue' then Qty else null end),sum( CASE when Day='Wed' then Qty else null end) from Trans1 Group By StoreId, DepId. previously created data set can be directly taken as input for clustering instead of again creating data set. The Horizontal layout clustered can be useful for analysis and decision making. As fuzzy C-means algorithm can handle vagueness of data, so to cluster Horizontal layouts fuzzy C-Means algorithm is used FUZZY C-MEANS ALGORITHM: As experienced in real life situations, the clustering of datasets by hard c-means leads to a partition of the dataset. But, this is unwanted in many cases and so the applicability of hard c-means has been limited. However, the concept of fuzzy sets, so that an element can belong to any number of clusters with different membership values. The objective function is n c m' 2 m(, ) ( ik ) ( ik ) k1 i1 J U v d m being a real number such that 1 m' and is called the fuzzifier. the k th pattern to v i. ik [0, 1] is the membership of Algorithm: Module 3(Clustering) The main objective in this paper is to create a data set easily so that it can be useful directly in data mining tasks or projects avoiding data preprocessing phase. The horizontal layout can be useful for any data mining algorithm so we are using it directly for clustering. The previously created horizontal layout is taken as input for clustering. Suppose if there is a stores data base, if we want to find the stores that are having same total sales amount for each day of week or if we want to cluster the stores based on total sales for each day of week than STEP 1: Fix c ( 2 c n ) and select a value m Initialize the partition matrix For r = 0, 1, 2,. Do STEP 2: Calculate the c centers using the formula v ij n k 1 n m'. x k 1 ik m' ik STEP 3: Update the partition matrix for the ( r ) U kj ( v r ), i 1, 2,... c i th r step Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10037

11 to ( r 1) U = ( ( r 1) ik ), where Query1: Taking I i c n d k ( r) { 2 ; ik 0} INSERT INTO F 0 SELECT DISTINCT storeid Query2: FROM Transaction. ( r1) ik STEP 4: If c j1 d d (r) ik ( r) jk 2/( m' 1) 1, if I, I ' 0, where i I k {1,2,... c} ( r 1) ( r) U U L k STOP Else go to STEP 2. Here C denotes number of clusters, V denotes cluster centers, X denotes data point, d denotes distance between cluster centre and data point and U is the partition matrix where each element of matrix represents the membership value of a data point X belonging to Cluster C. 4. Results: By taking one real time example construction of Horizontal layout by using SPJ method and CASE method is provided. After creating Horizontal layout, it is taken as input data set for clustering and clustering is done using fuzzy C-means algorithm. Example: Consider a database having stores information. Transaction is a table in the database having StoreId, DepId, Date, Month, Year, Day, ItemId, Rate, Qty, Amt as columns. Suppose if we want to find total sales amount for each storied by each day of week. SPJ Method: The following queries should be computed to construct horizontal layout by using SPJ method. k INSERT INTO F 1 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day= Mon GROUP BY storied..query3: INSERT INTO F 2 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day= Tue GROUP BY storied..query4: INSERT INTO F1 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day= Wed GROUP BY storied..query5: INSERT INTO F1 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day= Thu GROUP BY strid. Query6: INSERT INTO F1 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day= Fri GROUP BY storied. Query7: INSERT INTO F1 SELECT storeid, sum (amt) AS totalsalesamt FROM Transaction WHERE Day= Sat GROUP BY storied..query8: INSERT INTO F1 SELECT storeid, sum (amt) AS totalsalesamt FROM F Transaction WHERE Day= Sun GROUP BY storied. Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10038

12 Query9: INSERT INTO F H SELECT F 0.storied, F 1.totalsalesamt AS Mon-amt, F 2.totalsalesamt AS Tue-amt, F 3.totalsalesamt AS Wed-amt, F 4.totalsalesamt AS Thu-amt, F 5.totalsalesamt AS fri-amt, F 6.totalsalesamt AS Sat-amt, F 7.totalsalesamt AS Sun-amt FROM F 0 LEFT OUTER JOIN F 1 on F 0.storeid=F 1.storeid LEFT OUTER JOIN F 2 on F 0.storrid=F 2.storeid LEFT OUTER JOIN F 3 on F 0.storeid=F 3.storeid LEFT OUTER JOIN F 4 on F 0.storeid=F 4.storeid LEFT OUTER JOIN F 5 on F 0.storeid=F 5.storeid LEFT OUTER JOIN F 6 on F 0.storeid=F 6.storeid LEFT OUTER JOIN F 7 on F 0.storeid=F 7.storeid. By evaluating above queries we will get the horizontal layout that we want but it takes lot of effort as more sub queries should be written and more join operations should be performed. Consider in the above query to create vertical layout just one query is enough i.e select storied, day, sum(amt) from Transaction group by storied, day. But to create horizontal layout we are writing 9 queries, so to reduce the effort CASE method can be used to create horizontal layout easily with less effort. then Qty else null end) from Trans1 Group By StoreId, DepId.The results are as follows: First we need to select the Transaction table from database containing stores information for which we want to create horizontal layout. The input frame is as follows. By pressing the select table button we can select the Transaction table and by pressing display button the selected table is displayed as follows. After this by pressing generate button the SQL CODE GENERATION frame will be displayed. CASE Method: Suppose if we want find out total items sold in each department of each store by each day of week. The following query is evaluated to create horizontal layout by using CASE method. select StoreId, DepId, sum( CASE when Day='Fri' then Qty else null end),sum( CASE when Day='Mon' then Qty else null end),sum( CASE when Day='Sat' then Qty else null end),sum( CASE when Day='Thr' then Qty else null end),sum( CASE when Day='Tue' then Qty else null end),sum( CASE when Day='Wed' In this frame if we press view Columns button all the columns of the selected table will be displayed. Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10039

13 The clustering results are as follows: By selecting the columns that we want to group by, aggregate, transpose, aggregation function and method name and by clicking Generate button we get Horizontal layout as output. This is the input frame where we need to select the data set that we want to cluster by using the browse button. The above horizontal layout output is taken as input for clustering and clustering is performed by using fuzzy C-means algorithm. Here we are selecting the previously created horizontal layout as input for clustering. The data storeids are clustered by using Fuzzy C-Means algorithm. Suppose from the stores data base if we want to find the stores that are having same total sales amount for each day of week or if want cluster the stores based on total sales for each day of week than previously created data set can be directly taken as input for clustering instead of again creating data set. Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10040

14 4. Conclusion: (.)Preparing data set for data mining projects takes more effort and time but horizontal layout data set can be easily created using horizontal aggregation functions. (.)It is easy to create Horizontal Layout using CASE than SPJ method as SPJ method consists computing more sub queries where as in CASE method a single query is enough to compute. (.)Time Complexity of CASE method (O(NlogN+dknlogn+dN)) is better than time complexity of SPJ method (O(Nlog(N))+dknlogn+dN ) where N is the size of the input table F, n is the size of output table Horizontal layout, d is the distinct combination of transposing columns and k is the number of transposing columns. (.)Fuzzy C-Means algorithm can give better clustering results than K-Means and Hard C-Means algorithms as it handles vagueness of data. 5. Future Work: (.)Other data mining algorithms like classification, regression analysis, Decision Making can also be implemented by taking Horizontal layout as input. (.)Horizontal layout can be clustered by using other soft computing clustering algorithms to handle impreciseness in data. (.)Missing values in data is not handled, so rough set concept can be used to handle missing data. (.)To reduce the execution time of clustering algorithm it can be parallelized using OPEN_MP. REFERENCES [1] Carlos Ordonez and Zhibo Chen.: Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 4, APRIL [2] G. Bhargava, P. Goel, and B.R. Iyer, Hypergraph Based Reorderings of Outer Join Queries with Complex Predicates, Proc. ACM SIGMOD Int l Conf. Management of Data (SIGMOD 95), pp , [3] J.A. Blakeley, V. Rao, I. Kunen, A. Prout, M. Henaire, and C. Kleinerman,.NET Database Programmability and Extensibility in Microsoft SQL Server, Proc. ACM SIGMOD Int l Conf. Management of Data (SIGMOD 08), pp , [4] J. Clear, D. Dunn, B. Harvey, M.L. Heytens, and P. Lohman, Non- Stop SQL/MX Primitives for Knowledge Discovery, Proc. ACM SIGKDD Fifth Int l Conf. Knowledge Discovery and Data Mining (KDD 99), pp , [5] E.F. Codd, Extending the Database Relational Model to Capture More Meaning, ACM Trans. Database Systems, vol. 4, no. 4, pp , [6] C. Cunningham, G. Graefe, and C.A. Galindo- Legaria, PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS, Proc. 13th Int l Conf. Very Large Data Bases (VLDB 04), pp , [7] C. Galindo-Legaria and A. Rosenthal, Outer Join Simplification and Reordering for Query Optimization, ACM Trans. Database Systems, vol. 22, no. 1, pp , [8] H. Garcia-Molina, J.D. Ullman, and J. Widom, Database Systems: The Complete Book, first ed. Prentice Hall, [9] G. Graefe, U. Fayyad, and S. Chaudhuri, On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases, Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD 98), pp , [10] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross- Tab and Sub- Total, Proc. Int l Conf. Data Eng., pp , [11] J. Han and M. Kamber, Data Mining: Concepts and Techniques, first ed. Morgan Kaufmann, [12] G. Luo, J.F. Naughton, C.J. Ellmann, and M. Watzke, Locking Protocols for Materialized Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10041

15 Aggregate Join Views, IEEE Trans. Knowledge and Data Eng., vol. 17, no. 6, pp , June [13] C. Ordonez, Horizontal Aggregations for Building Tabular Data Sets, Proc. Ninth ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD 04), pp , [14] C. Ordonez, Vertical and Horizontal Percentage Aggregations, Proc. ACM SIGMOD Int l Conf. Management of Data (SIGMOD 04), pp , [15] C. Ordonez, Integrating K-Means Clustering with a Relational DBMS Using SQL, IEEE Trans. Knowledge and Data Eng., vol. 18, no. 2, pp , Feb [16] C. Ordonez, Statistical Model Computation with UDFs, IEEE Trans. Knowledge and Data Eng., vol. 22, no. 12, pp , Dec [17] C. Ordonez, Data Set Preprocessing and Transformation in a Database System, Intelligent Data Analysis, vol. 15, no. 4, pp , [18] C. Ordonez and S. Pitchaimalai, Bayesian Classifiers Programmed in SQL, IEEE Trans. Knowledge and Data Eng., vol. 22, no. 1, pp , Jan [19] S. Sarawagi, S. Thomas, and R. Agrawal, Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications, Proc. ACM SIGMOD Int l Conf. Management of Data (SIGMOD 98), pp , [20] H. Wang, C. Zaniolo, and C.R. Luo, ATLAS: A Small But Complete SQL Extension for Data Mining and Data Streams, Proc. 29th Int l Conf. Very Large Data Bases (VLDB 03), pp , [21] A. Witkowski, S. Bellamkonda, T. Bozkaya, G. Dorman, N. Folkert, A. Gupta, L. Sheng, and S. Subramanian, Spreadsheets in RDBMS for OLAP, Proc. ACM SIGMOD Int l Conf. Management of Data (SIGMOD 03), pp , [22] Zadeh, L. A.: Fuzzy sets, Information and Control, 8, (1965), pp [23]Sugeno, S.: Fuzzy measures and fuzzy integrals, in Fuzzy Automata and Decision Process, edited by M.Gupta, G.N. Sardis and B.R. Gaines (North Holland, Amsterdam, New York), (1977), pp [24]Attanasov, K. T.: Intuitionistic Fuzzy Sets, Fuzzy Sets and Systems, 20, (1986), pp Kalluri N V Satya Naresh, IJECS Volume 4 Issue 1 January, 2015 Page No Page 10042

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering Abstract Mrs. C. Poongodi 1, Ms. R. Kalaivani 2 1 PG Student, 2 Assistant Professor, Department of

More information

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL

Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Preparation of Data Set for Data Mining Analysis using Horizontal Aggregation in SQL Vidya Bodhe P.G. Student /Department of CE KKWIEER Nasik, University of Pune, India vidya.jambhulkar@gmail.com Abstract

More information

Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL

Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL Generating Data Sets for Data Mining Analysis using Horizontal Aggregations in SQL Sanjay Gandhi G 1, Dr.Balaji S 2 Associate Professor, Dept. of CSE, VISIT Engg College, Tadepalligudem, Scholar Bangalore

More information

Horizontal Aggregations in SQL to Generate Data Sets for Data Mining Analysis in an Optimized Manner

Horizontal Aggregations in SQL to Generate Data Sets for Data Mining Analysis in an Optimized Manner International Journal of Computer Science and Engineering Open Access Research Paper Volume-2, Issue-3 E-ISSN: 2347-2693 Horizontal Aggregations in SQL to Generate Data Sets for Data Mining Analysis in

More information

A Better Approach for Horizontal Aggregations in SQL Using Data Sets for Data Mining Analysis

A Better Approach for Horizontal Aggregations in SQL Using Data Sets for Data Mining Analysis Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 8, August 2013,

More information

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

An Overview of various methodologies used in Data set Preparation for Data mining Analysis An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of

More information

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY [Agrawal, 2(4): April, 2013] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY An Horizontal Aggregation Approach for Preparation of Data Sets in Data Mining Mayur

More information

Horizontal Aggregations for Mining Relational Databases

Horizontal Aggregations for Mining Relational Databases Horizontal Aggregations for Mining Relational Databases Dontu.Jagannadh, T.Gayathri, M.V.S.S Nagendranadh. Department of CSE Sasi Institute of Technology And Engineering,Tadepalligudem, Andhrapradesh,

More information

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator

Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator Horizontal Aggregations in SQL to Prepare Data Sets Using PIVOT Operator R.Saravanan 1, J.Sivapriya 2, M.Shahidha 3 1 Assisstant Professor, Department of IT,SMVEC, Puducherry, India 2,3 UG student, Department

More information

Horizontal Aggregation in SQL to Prepare Dataset for Generation of Decision Tree using C4.5 Algorithm in WEKA

Horizontal Aggregation in SQL to Prepare Dataset for Generation of Decision Tree using C4.5 Algorithm in WEKA Horizontal Aggregation in SQL to Prepare Dataset for Generation of Decision Tree using C4.5 Algorithm in WEKA Mayur N. Agrawal 1, Ankush M. Mahajan 2, C.D. Badgujar 3, Hemant P. Mande 4, Gireesh Dixit

More information

Horizontal Aggregations for Building Tabular Data Sets

Horizontal Aggregations for Building Tabular Data Sets Horizontal Aggregations for Building Tabular Data Sets Carlos Ordonez Teradata, NCR San Diego, CA, USA ABSTRACT In a data mining project, a significant portion of time is devoted to building a data set

More information

Horizontal Aggregation Function Using Multi Class Clustering (MCC) and Weighted (PCA)

Horizontal Aggregation Function Using Multi Class Clustering (MCC) and Weighted (PCA) Horizontal Aggregation Function Using Multi Class Clustering (MCC) and Weighted (PCA) Dr. K. Sathesh Kumar 1, P. Sabiya 2, S.Deepika 2 Assistant Professor, Department of Computer Science and Information

More information

Vertical and Horizontal Percentage Aggregations

Vertical and Horizontal Percentage Aggregations Vertical and Horizontal Percentage Aggregations Carlos Ordonez Teradata, NCR San Diego, CA, USA ABSTRACT Existing SQL aggregate functions present important limitations to compute percentages. This article

More information

Fundamental methods to evaluate horizontal aggregation in SQL

Fundamental methods to evaluate horizontal aggregation in SQL Fundamental methods to evaluate in SQL Krupali R. Dhawale 1, Vani A. Hiremani 2 Abstract In data mining, we are extracting data from historical knowledge and create data sets. Many hyper graph concepts

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Efficient integration of data mining techniques in DBMSs

Efficient integration of data mining techniques in DBMSs Efficient integration of data mining techniques in DBMSs Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex, FRANCE {bentayeb jdarmont

More information

Novel Materialized View Selection in a Multidimensional Database

Novel Materialized View Selection in a Multidimensional Database Graphic Era University From the SelectedWorks of vijay singh Winter February 10, 2009 Novel Materialized View Selection in a Multidimensional Database vijay singh Available at: https://works.bepress.com/vijaysingh/5/

More information

A Hybrid Approach for Horizontal Aggregation Function Using Clustering

A Hybrid Approach for Horizontal Aggregation Function Using Clustering A Hybrid Approach for Horizontal Aggregation Function Using Clustering 1 Dr.K.Sathesh Kumar, 2 Dr.S.Ramkumar 1 Assistant Professor, Department of Computer Science and Information Technology, 2 Assistant

More information

V Locking Protocol for Materialized Aggregate Join Views on B-tree Indices Gang Luo IBM T.J. Watson Research Center

V Locking Protocol for Materialized Aggregate Join Views on B-tree Indices Gang Luo IBM T.J. Watson Research Center V Locking Protocol for Materialized Aggregate Join Views on B-tree Indices Gang Luo IBM T.J. Watson Research Center luog@us.ibm.com Abstract. Immediate materialized view maintenance with transactional

More information

Keywords Data alignment, Data annotation, Web database, Search Result Record

Keywords Data alignment, Data annotation, Web database, Search Result Record Volume 5, Issue 8, August 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Annotating Web

More information

RECORD DEDUPLICATION USING GENETIC PROGRAMMING APPROACH

RECORD DEDUPLICATION USING GENETIC PROGRAMMING APPROACH Int. J. Engg. Res. & Sci. & Tech. 2013 V Karthika et al., 2013 Research Paper ISSN 2319-5991 www.ijerst.com Vol. 2, No. 2, May 2013 2013 IJERST. All Rights Reserved RECORD DEDUPLICATION USING GENETIC PROGRAMMING

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 1, Number 1 (2015), pp. 25-31 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials * Galina Bogdanova, Tsvetanka Georgieva Abstract: Association rules mining is one kind of data mining techniques

More information

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,

More information

Detection and Deletion of Outliers from Large Datasets

Detection and Deletion of Outliers from Large Datasets Detection and Deletion of Outliers from Large Datasets Nithya.Jayaprakash 1, Ms. Caroline Mary 2 M. tech Student, Dept of Computer Science, Mohandas College of Engineering and Technology, India 1 Assistant

More information

A HYBRID APPROACH FOR HANDLING UNCERTAINTY - PROBABILISTIC THEORY, CERTAINTY FACTOR AND FUZZY LOGIC

A HYBRID APPROACH FOR HANDLING UNCERTAINTY - PROBABILISTIC THEORY, CERTAINTY FACTOR AND FUZZY LOGIC Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 11, November 2013,

More information

Item Set Extraction of Mining Association Rule

Item Set Extraction of Mining Association Rule Item Set Extraction of Mining Association Rule Shabana Yasmeen, Prof. P.Pradeep Kumar, A.Ranjith Kumar Department CSE, Vivekananda Institute of Technology and Science, Karimnagar, A.P, India Abstract:

More information

Query Processing and Optimization using Set Predicates.C.Saranya et al.,

Query Processing and Optimization using Set Predicates.C.Saranya et al., International Journal of Technology and Engineering System (IJTES) Vol 7. No.5 2015 Pp. 470-477 gopalax Journals, Singapore available at : www.ijcns.com ISSN: 0976-1345 ---------------------------------------------------------------------------------------------------------------

More information

Databases Lectures 1 and 2

Databases Lectures 1 and 2 Databases Lectures 1 and 2 Timothy G. Griffin Computer Laboratory University of Cambridge, UK Databases, Lent 2009 T. Griffin (cl.cam.ac.uk) Databases Lectures 1 and 2 DB 2009 1 / 36 Re-ordered Syllabus

More information

Multi-Modal Data Fusion: A Description

Multi-Modal Data Fusion: A Description Multi-Modal Data Fusion: A Description Sarah Coppock and Lawrence J. Mazlack ECECS Department University of Cincinnati Cincinnati, Ohio 45221-0030 USA {coppocs,mazlack}@uc.edu Abstract. Clustering groups

More information

Optimization of Queries in Distributed Database Management System

Optimization of Queries in Distributed Database Management System Optimization of Queries in Distributed Database Management System Bhagvant Institute of Technology, Muzaffarnagar Abstract The query optimizer is widely considered to be the most important component of

More information

Query Optimization in Distributed Databases. Dilşat ABDULLAH

Query Optimization in Distributed Databases. Dilşat ABDULLAH Query Optimization in Distributed Databases Dilşat ABDULLAH 1302108 Department of Computer Engineering Middle East Technical University December 2003 ABSTRACT Query optimization refers to the process of

More information

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set

A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set A Rough Set Approach for Generation and Validation of Rules for Missing Attribute Values of a Data Set Renu Vashist School of Computer Science and Engineering Shri Mata Vaishno Devi University, Katra,

More information

Supporting Fuzzy Keyword Search in Databases

Supporting Fuzzy Keyword Search in Databases I J C T A, 9(24), 2016, pp. 385-391 International Science Press Supporting Fuzzy Keyword Search in Databases Jayavarthini C.* and Priya S. ABSTRACT An efficient keyword search system computes answers as

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Bitmap index-based decision trees

Bitmap index-based decision trees Bitmap index-based decision trees Cécile Favre and Fadila Bentayeb ERIC - Université Lumière Lyon 2, Bâtiment L, 5 avenue Pierre Mendès-France 69676 BRON Cedex FRANCE {cfavre, bentayeb}@eric.univ-lyon2.fr

More information

Correlation Based Feature Selection with Irrelevant Feature Removal

Correlation Based Feature Selection with Irrelevant Feature Removal Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

arxiv: v1 [cs.db] 10 May 2007

arxiv: v1 [cs.db] 10 May 2007 Decision tree modeling with relational views Fadila Bentayeb and Jérôme Darmont arxiv:0705.1455v1 [cs.db] 10 May 2007 ERIC Université Lumière Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information

EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES

EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES D.Kerana Hanirex Research Scholar Bharath University Dr.M.A.Dorai Rangaswamy Professor,Dept of IT, Easwari Engg.College Abstract

More information

Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique

Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique Research Paper Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique C. Sudarsana Reddy 1 S. Aquter Babu 2 Dr. V. Vasu 3 Department

More information

Implementation of CHUD based on Association Matrix

Implementation of CHUD based on Association Matrix Implementation of CHUD based on Association Matrix Abhijit P. Ingale 1, Kailash Patidar 2, Megha Jain 3 1 apingale83@gmail.com, 2 kailashpatidar123@gmail.com, 3 06meghajain@gmail.com, Sri Satya Sai Institute

More information

Optimization of Query Processing in XML Document Using Association and Path Based Indexing

Optimization of Query Processing in XML Document Using Association and Path Based Indexing Optimization of Query Processing in XML Document Using Association and Path Based Indexing D.Karthiga 1, S.Gunasekaran 2 Student,Dept. of CSE, V.S.B Engineering College, TamilNadu, India 1 Assistant Professor,Dept.

More information

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE

DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE DATA WAREHOUSING IN LIBRARIES FOR MANAGING DATABASE Dr. Kirti Singh, Librarian, SSD Women s Institute of Technology, Bathinda Abstract: Major libraries have large collections and circulation. Managing

More information

Web Mining Using Cloud Computing Technology

Web Mining Using Cloud Computing Technology International Journal of Scientific Research in Computer Science and Engineering Review Paper Volume-3, Issue-2 ISSN: 2320-7639 Web Mining Using Cloud Computing Technology Rajesh Shah 1 * and Suresh Jain

More information

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Lecture # 24: Data Warehousing / Data Mining (R&G, ch 25 and 26) Data mining detailed outline Problem

More information

Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP

Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP 324 Implementation of Aggregate Function in Multi Dimension Privacy Preservation Algorithms for OLAP Shivaji Yadav(131322) Assistant Professor, CSE Dept. CSE, IIMT College of Engineering, Greater Noida,

More information

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data

Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data Analysis of Dendrogram Tree for Identifying and Visualizing Trends in Multi-attribute Transactional Data D.Radha Rani 1, A.Vini Bharati 2, P.Lakshmi Durga Madhuri 3, M.Phaneendra Babu 4, A.Sravani 5 Department

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2011 Han, Kamber & Pei. All rights

More information

ANGUILT TECHNOLOGY TO PREVENT DATA LEAKAGE AND ITS DETECTION ON CLOUD

ANGUILT TECHNOLOGY TO PREVENT DATA LEAKAGE AND ITS DETECTION ON CLOUD Volume 118 No. 20 2018, 1935-1943 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu ANGUILT TECHNOLOGY TO PREVENT DATA LEAKAGE AND ITS DETECTION ON

More information

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM G.Amlu #1 S.Chandralekha #2 and PraveenKumar *1 # B.Tech, Information Technology, Anand Institute of Higher Technology, Chennai, India

More information

Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem.

Data mining - detailed outline. Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Problem. Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Data Warehousing / Data Mining (R&G, ch 25 and 26) C. Faloutsos and A. Pavlo Data mining detailed outline

More information

Improved Frequent Pattern Mining Algorithm with Indexing

Improved Frequent Pattern Mining Algorithm with Indexing IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov Dec. 2014), PP 73-78 Improved Frequent Pattern Mining Algorithm with Indexing Prof.

More information

TIM 50 - Business Information Systems

TIM 50 - Business Information Systems TIM 50 - Business Information Systems Lecture 15 UC Santa Cruz May 20, 2014 Announcements DB 2 Due Tuesday Next Week The Database Approach to Data Management Database: Collection of related files containing

More information

Rule-Based Method for Entity Resolution Using Optimized Root Discovery (ORD)

Rule-Based Method for Entity Resolution Using Optimized Root Discovery (ORD) American-Eurasian Journal of Scientific Research 12 (5): 255-259, 2017 ISSN 1818-6785 IDOSI Publications, 2017 DOI: 10.5829/idosi.aejsr.2017.255.259 Rule-Based Method for Entity Resolution Using Optimized

More information

Classification with Diffuse or Incomplete Information

Classification with Diffuse or Incomplete Information Classification with Diffuse or Incomplete Information AMAURY CABALLERO, KANG YEN Florida International University Abstract. In many different fields like finance, business, pattern recognition, communication

More information

Improving Resource Management And Solving Scheduling Problem In Dataware House Using OLAP AND OLTP Authors Seenu Kohar 1, Surender Singh 2

Improving Resource Management And Solving Scheduling Problem In Dataware House Using OLAP AND OLTP Authors Seenu Kohar 1, Surender Singh 2 Improving Resource Management And Solving Scheduling Problem In Dataware House Using OLAP AND OLTP Authors Seenu Kohar 1, Surender Singh 2 1 M.tech Computer Engineering OITM Hissar, GJU Univesity Hissar

More information

Data Warehouse Design Using Row and Column Data Distribution

Data Warehouse Design Using Row and Column Data Distribution Int'l Conf. Information and Knowledge Engineering IKE'15 55 Data Warehouse Design Using Row and Column Data Distribution Behrooz Seyed-Abbassi and Vivekanand Madesi School of Computing, University of North

More information

Interactive Exploration and Visualization of OLAP Cubes

Interactive Exploration and Visualization of OLAP Cubes Interactive Exploration and Visualization of OLAP Cubes Carlos Ordonez University of Houston Houston, TX 77204, USA Zhibo Chen University of Houston Houston, TX 77204, USA Javier García-García UNAM/IPN

More information

Integrated Usage of Heterogeneous Databases for Novice Users

Integrated Usage of Heterogeneous Databases for Novice Users International Journal of Networked and Distributed Computing, Vol. 3, No. 2 (April 2015), 109-118 Integrated Usage of Heterogeneous Databases for Novice Users Ayano Terakawa Dept. of Information Science,

More information

Integration new Apriori algorithm MDNC and Six Sigma to Improve Array yield in the TFT-LCD Industry

Integration new Apriori algorithm MDNC and Six Sigma to Improve Array yield in the TFT-LCD Industry Integration new Apriori algorithm MDNC and Six Sigma to Improve Array yield in the TFT-LCD Industry Chiung-Fen Huang *, Ruey-Shun Chen** * Institute of Information Management, Chiao Tung University Management

More information

Naive Bayes Classifiers Programmed in Query Language

Naive Bayes Classifiers Programmed in Query Language 166 Naive Bayes Classifiers Programmed in Query Language 1 Y.V. Siddartha Reddy, 2 Dr.Supreethi K.P 1 Student M.Tech, 2 Assistant Professor, JNTU Hyderabad, yisddarhareddy@gmail.com, supreethi.pujari@gmail.com

More information

Building a Concept Hierarchy from a Distance Matrix

Building a Concept Hierarchy from a Distance Matrix Building a Concept Hierarchy from a Distance Matrix Huang-Cheng Kuo 1 and Jen-Peng Huang 2 1 Department of Computer Science and Information Engineering National Chiayi University, Taiwan 600 hckuo@mail.ncyu.edu.tw

More information

Data warehouses Decision support The multidimensional model OLAP queries

Data warehouses Decision support The multidimensional model OLAP queries Data warehouses Decision support The multidimensional model OLAP queries Traditional DBMSs are used by organizations for maintaining data to record day to day operations On-line Transaction Processing

More information

Improved Data Mining Analysis by Dataset creation using Horizontal Aggregation and B+ Tree

Improved Data Mining Analysis by Dataset creation using Horizontal Aggregation and B+ Tree Improved Data Mining Analysis by Dataset creation using Horizontal Aggregation and B+ Tree Avisha Wakode, Mrs. D. A. Chaudhari, DYPCOE - Akurdi, Savitribai Phule Pune University Abstract Data Mining is

More information

Data about data is database Select correct option: True False Partially True None of the Above

Data about data is database Select correct option: True False Partially True None of the Above Within a table, each primary key value. is a minimal super key is always the first field in each table must be numeric must be unique Foreign Key is A field in a table that matches a key field in another

More information

Designing dashboards for performance. Reference deck

Designing dashboards for performance. Reference deck Designing dashboards for performance Reference deck Basic principles 1. Everything in moderation 2. If it isn t fast in database, it won t be fast in Tableau 3. If it isn t fast in desktop, it won t be

More information

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

SQL Based Frequent Pattern Mining with FP-growth

SQL Based Frequent Pattern Mining with FP-growth SQL Based Frequent Pattern Mining with FP-growth Shang Xuequn, Sattler Kai-Uwe, and Geist Ingolf Department of Computer Science University of Magdeburg P.O.BOX 4120, 39106 Magdeburg, Germany {shang, kus,

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points

Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Efficiency of k-means and K-Medoids Algorithms for Clustering Arbitrary Data Points Dr. T. VELMURUGAN Associate professor, PG and Research Department of Computer Science, D.G.Vaishnav College, Chennai-600106,

More information

T-SQL Training: T-SQL for SQL Server for Developers

T-SQL Training: T-SQL for SQL Server for Developers Duration: 3 days T-SQL Training Overview T-SQL for SQL Server for Developers training teaches developers all the Transact-SQL skills they need to develop queries and views, and manipulate data in a SQL

More information

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE Saravanan.Suba Assistant Professor of Computer Science Kamarajar Government Art & Science College Surandai, TN, India-627859 Email:saravanansuba@rediffmail.com

More information

QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR

QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR International Journal of Emerging Technology and Innovative Engineering QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR V.Megha Dept of Computer science and Engineering College Of Engineering

More information

MINING ASSOCIATION RULES WITH UNCERTAIN ITEM RELATIONSHIPS

MINING ASSOCIATION RULES WITH UNCERTAIN ITEM RELATIONSHIPS MINING ASSOCIATION RULES WITH UNCERTAIN ITEM RELATIONSHIPS Mei-Ling Shyu 1, Choochart Haruechaiyasak 1, Shu-Ching Chen, and Kamal Premaratne 1 1 Department of Electrical and Computer Engineering University

More information

Comparison of Online Record Linkage Techniques

Comparison of Online Record Linkage Techniques International Research Journal of Engineering and Technology (IRJET) e-issn: 2395-0056 Volume: 02 Issue: 09 Dec-2015 p-issn: 2395-0072 www.irjet.net Comparison of Online Record Linkage Techniques Ms. SRUTHI.

More information

FUZZY SQL for Linguistic Queries Poonam Rathee Department of Computer Science Aim &Act, Banasthali Vidyapeeth Rajasthan India

FUZZY SQL for Linguistic Queries Poonam Rathee Department of Computer Science Aim &Act, Banasthali Vidyapeeth Rajasthan India RESEARCH ARTICLE FUZZY SQL for Linguistic Queries Poonam Rathee Department of Computer Science Aim &Act, Banasthali Vidyapeeth Rajasthan India OPEN ACCESS ABSTRACT For Many Years, achieving unambiguous

More information

Proceedings of the IE 2014 International Conference AGILE DATA MODELS

Proceedings of the IE 2014 International Conference  AGILE DATA MODELS AGILE DATA MODELS Mihaela MUNTEAN Academy of Economic Studies, Bucharest mun61mih@yahoo.co.uk, Mihaela.Muntean@ie.ase.ro Abstract. In last years, one of the most popular subjects related to the field of

More information

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process.

Acknowledgment. MTAT Data Mining. Week 7: Online Analytical Processing and Data Warehouses. Typical Data Analysis Process. MTAT.03.183 Data Mining Week 7: Online Analytical Processing and Data Warehouses Marlon Dumas marlon.dumas ät ut. ee Acknowledgment This slide deck is a mashup of the following publicly available slide

More information

Temporal Weighted Association Rule Mining for Classification

Temporal Weighted Association Rule Mining for Classification Temporal Weighted Association Rule Mining for Classification Purushottam Sharma and Kanak Saxena Abstract There are so many important techniques towards finding the association rules. But, when we consider

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2017/18 Unit 13 J. Gamper 1/42 Advanced Data Management Technologies Unit 13 DW Pre-aggregation and View Maintenance J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:

More information

Basic Data Mining Technique

Basic Data Mining Technique Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm

More information

Data warehousing in telecom Industry

Data warehousing in telecom Industry Data warehousing in telecom Industry Dr. Sanjay Srivastava, Kaushal Srivastava, Avinash Pandey, Akhil Sharma Abstract: Data Warehouse is termed as the storage for the large heterogeneous data collected

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

BUSINESS INTELLIGENCE. SSAS - SQL Server Analysis Services. Business Informatics Degree

BUSINESS INTELLIGENCE. SSAS - SQL Server Analysis Services. Business Informatics Degree BUSINESS INTELLIGENCE SSAS - SQL Server Analysis Services Business Informatics Degree 2 BI Architecture SSAS: SQL Server Analysis Services 3 It is both an OLAP Server and a Data Mining Server Distinct

More information

Probabilistic Graph Summarization

Probabilistic Graph Summarization Probabilistic Graph Summarization Nasrin Hassanlou, Maryam Shoaran, and Alex Thomo University of Victoria, Victoria, Canada {hassanlou,maryam,thomo}@cs.uvic.ca 1 Abstract We study group-summarization of

More information

Handling Inconsistency through Effective Measurement of Referential

Handling Inconsistency through Effective Measurement of Referential Handling Inconsistency through Effective Measurement of Referential Dependencies in Databases 1 Abdollah Yousefzadeh, 2 Hrudaya Ku Tripathy 1 School of Computing and Technology Asia Pacific University

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing Data Mining: Concepts and Techniques (3 rd ed.) Chapter 3 1 Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks in Data Preprocessing Data Cleaning Data Integration Data

More information

Generating Cross level Rules: An automated approach

Generating Cross level Rules: An automated approach Generating Cross level Rules: An automated approach Ashok 1, Sonika Dhingra 1 1HOD, Dept of Software Engg.,Bhiwani Institute of Technology, Bhiwani, India 1M.Tech Student, Dept of Software Engg.,Bhiwani

More information

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database.

Keywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database. Volume 6, Issue 5, May 016 ISSN: 77 18X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Fuzzy Logic in Online

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining

Discovery of Frequent Itemset and Promising Frequent Itemset Using Incremental Association Rule Mining Over Stream Data Mining Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 5, May 2014, pg.923

More information

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach ABSTRACT G.Ravi Kumar 1 Dr.G.A. Ramachandra 2 G.Sunitha 3 1. Research Scholar, Department of Computer Science &Technology,

More information

Database Technologies for E-Business. Dongmei CUI

Database Technologies for E-Business. Dongmei CUI Database Technologies for E-Business 15 Database Technologies for E-Business Dongmei CUI Abstract In today's fast-paced business environment, business processes such as designing product, obtaining suppliers,

More information

PRO: Designing a Business Intelligence Infrastructure Using Microsoft SQL Server 2008

PRO: Designing a Business Intelligence Infrastructure Using Microsoft SQL Server 2008 Microsoft 70452 PRO: Designing a Business Intelligence Infrastructure Using Microsoft SQL Server 2008 Version: 33.0 QUESTION NO: 1 Microsoft 70452 Exam You plan to create a SQL Server 2008 Reporting Services

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views

Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Fast Discovery of Sequential Patterns Using Materialized Data Mining Views Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo

More information

Mining Conditional Cardinality Patterns for Data Warehouse Query Optimization

Mining Conditional Cardinality Patterns for Data Warehouse Query Optimization Mining Conditional Cardinality Patterns for Data Warehouse Query Optimization Miko laj Morzy 1 and Marcin Krystek 2 1 Institute of Computing Science Poznan University of Technology Piotrowo 2, 60-965 Poznan,

More information

Using Association Rules for Better Treatment of Missing Values

Using Association Rules for Better Treatment of Missing Values Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University

More information

A survey: Web mining via Tag and Value

A survey: Web mining via Tag and Value A survey: Web mining via Tag and Value Khirade Rajratna Rajaram. Information Technology Department SGGS IE&T, Nanded, India Balaji Shetty Information Technology Department SGGS IE&T, Nanded, India Abstract

More information