Chapter 1 Introduction to Data Mining

Size: px
Start display at page:

Download "Chapter 1 Introduction to Data Mining"

Transcription

1 1.1 Introduction to Data Mining Chapter 1 Introduction to Data Mining Data mining refers to the process of extracting or mining knowledge from ample amounts of data (Hand, et al., 2000). It is the process of searching available patterns by scanning the huge amount of data (Han & Kamber, 2006). Storing enormous quantity of data is utile to extract precious knowledge. To seek out constructive patterns within the data, there are different kinds of algorithms which can categorize the data either automatically or semiautomatically (Agrawal & Srikant, 1994). These patterns are used to obtain the sets of rules. The patterns discovered must be meaningful such that they may lead to many advantages like decisions making, market analysis, financial growth, business intelligence etc. To get such meaningful patterns, significantly large amount of data is required. To cope up with this huge data, data mining take the benefit of derived concept from machine learning and statistics. Data mining gain insights, understanding of data and provides knowledge. It is also provides capability to predict the future observations. Besides predicting future observation, data mining is also useful for summarizing the underlying relationship in data. Data mining can mine data from different data storage like text data, databases, data warehouse, transactional data, multimedia data, sequence, web, stream, time-series, multi-media, spatiotemporal, graphs & social and information networks etc. Now days, data mining has grown up so huge that it is producing fruitful results in many fields like insurance, risk management, health aids, customer management, financial analysis, operation activity in manufacturing and anticipates reimbursement of corporate expense claims etc. The focus of thesis is on how data mining is relevant in knowledge discovery at multiple levels of abstraction. Data mining examine data from various angles and sum up the outcome into precious information. It also explores data from different dimensions, after that it categorizes and summarizes the associations among them. To be precise, the process of finding the patterns and interrelation among data is known as data mining. Ongoing development in data mining contributed in several types of algorithms, drawn from the areas of database and statistics, machine learning and pattern recognition, which is utile for technology utilization and adaptation. Introduction to Data Mining Page 1

2 Data mining is mainly used today by companies to acquire information about their products, customers, marketing strategies and other affecting aspects (Barbara, et al., 2001). The companies can find out associations among the "external" element like customer demography and economic indicators etc and "internal" elements such as product positioning, staff skills and price etc by using Data mining. 1.2 History of Data Mining Data mining is the growth of an area with an extensive history (Coenen, 2004). The innovation of word takes place in 1990s. The origin of Data mining is vestige back by the side of three unit lines. First is the artificial intelligence (Wu, 2004), second is the statistics and third is the machine learning (Zhou, 2003). Artificial intelligence (AI) is based on heuristics; it tries to utilize human like thinking procedure to statistical jobs. Lots of high-end business products utilize various artificial intelligence techniques, for example, relational database systems utilize query optimization technique. Statistics acts as the base for the numerous data mining techniques, for example, standard variance, regression analysis, discriminate analysis, confidence intervals, standard deviation, standard distribution and cluster analysis etc. These are used to examine the data and their relationships. Machine learning (ML) (Michalski, et al., 1998) aggregates the artificial intelligence and statistic. The focus of machine learning is to develop algorithms which are able to teach themselves and are able to grow whenever they encounter new kind of data. It also uses several statistical techniques. With the help of these techniques, people can take various decisions which are based on the superiority of the data. In first, data mining algorithms were mainly developed for numeral data but it was further extend for all types of data like multimedia, text, spatial, picture and web etc. Initially, data mining starts with the analysis of individual data base, subsequently, data mining techniques have formulated for traditional and relational database, flat files and data warehouse. Afterwards, with the blend of machine learning techniques and statistics, diverse algorithms developed to mine organized and unorganized data. The area of data mining has been developing due to its tremendous attainment in terms of range of applications, scientific progress and understanding. The ever augmentative complexities in several fields and betterment in technology have posed fresh challenges to data mining. The various state of affairs include heterogeneous data formats, progress in networking, computation resources, scientific research fields and new business demands etc. Introduction to Data Mining Page 2

3 According to Fayyad (1996) KDD will continue its development in various fields like machine learning, artificial intelligence, databases, machine discovery, scientific discovery and information retrieval etc. (Fayyad, et al., 1996). The various techniques from all the above mentioned fields are used in knowledge discovery process. 1.3 Knowledge Discovery in Database To generate the knowledge in the form that a human can understand is the basic purpose of knowledge discovery. It is the way to extract valuable information and knowledge from a large volume of data. Data Mining is a footstep in Knowledge Discovery in Database (KDD) process which uses particular algorithms to pull out patterns (models) from data (Pazzani, 2000). The term KDD tells about the entire process of extracting useable knowledge from data. The KDD process constitutes five stages as shown in Figure 1.1. Figure 1.1: KDD Process Model The phases of this process are data selection, data preprocessing, data transformation, data mining and evaluation. At first, data is obtained from data warehouses or various data sources, then data preprocessing like data cleaning and data integration is applied. After that data transformation or data reduction is performed on preprocessed data. In the next phase of KDD process, an appropriate data mining technique has been applied on data. As the results of data mining process some useful patterns and structures get appeared. These patterns and structures are further used to interpret the knowledge. Therefore, Data mining Introduction to Data Mining Page 3

4 plays an essential role in the knowledge discovery process. The KDD process converts the data into high level of knowledge. This overall process of discovery of patterns and relationships from the database can be automated or semi-automated. In the KDD process, data mining is one of the most important steps. Although, the two terms KDD and DM are closely related, yet they refer to slightly different two concepts. Data mining is only the application of a specific algorithm based on the overall goal of the KDD process. The data mining phase is used to pull out the knowledge from the data. After that the data is represented in a form that any user can understand it. On the basis of this, the user is able to make important decision. Data mining can mine data from different data storage like text data, databases, data warehouse, transactional data, multimedia data, data stream, spatiotemporal, sequence, timeseries, multi-media, web, graphs, social and information networks etc (Han & Kamber, 2008). 1.4 Data Mining Models As per convention, Data mining model are mainly of two types i.e. descriptive model and predictive model Descriptive Model The primary goal of these models is to drive patterns (correlation, trends) that summarize the under laying relationship between data. Descriptive data mining is normally applied to generate correlation, frequency and cross tabulation (Witten & Frank, 2005). Descriptive model can be outlined to pull out interesting patterns in the data, to find antecedent unrevealed patterns and find fascinating subgroups in the data. For example, to identify the web pages those are accessed together by user. Under descriptive model, Association rules discovery, clustering, sequential patterns mining and summarization are used Predictive Model The idea behind these models is to design a framework by using the outcome of the known data and to anticipate the consequence of unknown data sets (Han & Kamber, 2008) (Witten & Frank, 2005). For example, a bank has the necessary data about the loans granted in the past terms. In this data, autonomous variables are the characteristics of the clients to whom the loan was granted and the dependent variable is, whether the loan is return back or not. In this way, the model build by this data will help in taking decision, whether the loan must be given to the client or not. For predictive data mining regression functionality, deviation detection and classification are used. Introduction to Data Mining Page 4

5 1.5 Functionality of Data Mining The function of data mining is to pull out the knowledge and interesting design from given data. There is lot of functionality available to extract patterns. Data mining investigates for interesting patterns from data. At early stages, these patterns are generally unknown but actually useable. Data mining provides several kind of functionality. The particular type of functionality can be selected on the basis of application area and kind of information to be mined. Using these functionalities various kind of knowledge like association rule, classification rule, characterization, clustering, discriminate rule, deviation analysis and predictive analysis etc can be mined. Data mining usefulness are rich and extensive; it can serve several applications and areas (Tan, et al., 2005). Figure 1.2 demonstrates the basic functionalities like outlier analysis, clustering, classification, frequent pattern mining and characterization etc. These functionalities are explained below. Figure 1.2: Data Mining Functionalities Classification & Prediction Classification approach in data mining is competent of processing an ample amount of data (Han & Kamber, 2008). Classification allots items in a data set to target classes. Classification anticipates the target category for each instance in the data. Classification assigns a label of class to a set of uncategorized cases. As the class tag is given to all the training data, this stage is termed as supervised learning. Classification is utilized to represent data items into various predefined classes (Weiss & Kulikowski, 1991). In this kind of task, firstly, the training samples are providing. On the basis of these samples, a model is designed which works on the values of some other attributes. The classification technique are utilized by knowledge disclosure applications as Introduction to Data Mining Page 5

6 the trends categorization in financial markets and in this way, it automatically recognize the interesting pattern from large databases. Classification techniques deduce a model from the database. The database comprises of numerous kinds of attributes which denotes the particular category of any tuple and these attributes are called as the predicted attributes. Beside these attributes there are leftover attributes which are known as the predicting attributes. An aggregation of values for the anticipated attributes specifies a class. In the process of learning the classification rules, first of all, the user must define conditions for all the classes. On the basis of these rules, the system predicts the class. After that the data mine system builds the descriptions for these classes. Initially, a tuple or case with definite known attribute values is the requirement of a system so that it can be able to predict the class related to the case. After defining the class, the system become capable to predict the patterns that decide the classification, therefore, the system becomes capable to find the interpretation of each class. In this way, the interpretation will refer the attributes of the training set which are helpful in prediction, due to that it find out related values which satisfy the interpretation and ignore the others. A case or rule is said to be accurate, if its interpretation is capable to find out the related entries of the classes and ignore the non-related. An illustration of classification's decision tree is shown in Figure 1.3. Figure 1.3: Classification using Decision Tree Image Source: [Weblink1] There are several methods of data mining classification like Decision Tree based Methods, Rule-based Methods, Naïve Bayes and Bayesian Belief Networks, Nearest-Neighbor Introduction to Data Mining Page 6

7 Method, Neural Networks, Support Vector Machines and Ensemble Methods usable for classification and prediction Clustering Clustering is the function of categorizing a group of objects in such a way that objects of similar kind are kept in the same clusters (Cheeseman & Stutz, 1996) (Ester & Kriegel, 1995) (Ng & Han, 1994). It is different from classification because it doesn't use any training data. It is a key technique of data mining, which is commonly used for statistical data analysis including machine learning, image analysis, pattern recognition, information retrieval, and bioinformatics. Basically, different kind of partitions are created in clustering (Witten & Frank, 2005), then on the basis of similarity that is based on some metric, participated values are kept into those partitions. The clustering method follows the unsupervised technique, in this technique categories or groups or classes are not defined already. In unsupervised technique the grouping of objects is done on the basis of the set of objects proximity or similarity. In such kind of learning, the discovery of the classes is done by the system itself i.e. the system will itself select an attribute based on the given data and on the basis of that it will partition the data. After that it select another attribute to partition the data and so on. Objects are often represented into mutually exclusive or/and exhaustive group of clusters. Figure 1.4: Clustering Image Source: [Weblink2] Clustering in terms of similarity is a very powerful method. It is able to interpret some instinctive measure of similar nature into the quantitative measure (Zhang & Ramakrishan, 1996). There are lots of perspectives for creating clusters. One perspective is to create rules which provide membership in the same group which is based on the degree of similarity Introduction to Data Mining Page 7

8 between members. Another perspective is to construct a set of functions that will measure the belongings of partitions as method of some parameter of the partition. Figure 1.4 shows clustering data mining functionality Characterization and Discrimination Data characterization is a summarization or abstract form of the general features or characteristics of a selective class of data (Witten & Frank, 2005). In data characterization, the abstraction is done on the behalf of the specific requirement of the users. Usually, the data can be collected by shooting a query. Summarization is the procedure to find a brief description for a subset of data (Fayyad, et al., 1996). There are lots of refined techniques for summarization and these are generally applied to perform data analysis and to assist in automatic report generation. In data discrimination the target objects of the class data are compared to the objects from one or many different classes with regard to specific generalized characteristic (Dash & Liu, 1997) (Pitt & Nayak, 2007) Outlier Analysis/ Deviation Detection Outliers are those objects which are not abide by the general model or behavior of data (Han & Kamber, 2008). If outliers are present in dataset, these are thrown-away before processing by using the other data mining functionalities. Figure 1.5: Outlier Analysis Image Source: [Weblink3] Generally, Outliers represents the noise or exceptions. Figure 1.5 shows outlier analysis, R represent data which is outlier. To detect the major changes in data from earlier normalized or calculated values, Deviation detection is used (Fayyad, et al., 1996). Introduction to Data Mining Page 8

9 1.5.5 Frequent Patterns Mining The patterns that appear frequently in the data are known as frequent patterns (Agrawal & Srikant, 1994). The itemsets, sequences and subsequences can be considered as patterns. A frequent pattern or large-itemset is an itemset that meets the minimum support requirement. Support of an item is number of occurrence of that item in all transactions. For example items A, B and C occur simultaneously in eight transactions out of ten, it means the itemset {A, B, C} has support 80%. Finding such frequent patterns play an essential role in mining association link and many other interesting relationship among data. Thus frequent pattern mining is an important data mining task and focused a lot in data mining research. Discovering frequent pattern is a very important data mining problem with a numerous of practical applications. The discovery of frequent pattern helps in many business decision making processes such as catalog design, cross marketing, customer shopping behavior etc. Frequent itemsets are used to generate association rules. Data mining functionalities covers wide range of applications and allows the discovery of different kinds of knowledge and at different levels of abstraction. Accordingly if appropriate data mining functionality applied to handle data, works effectively. This research work emphasize at mining of frequent patterns. And frequent pattern mining can be obtained using association rules which are explained in detail in section Association Rules Association rule mining is a method for finding the interesting relations among various items in databases (Agrawal & Srikant, 1994). Using various kinds of measures, it identifies strong rules from the databases. An association rule comprises of two parts, first is antecedent and second is consequent. The antecedent is the item found in the database and the consequent is the item which is found in the aggregation with the antecedent. Association rules are created by inspecting the data for frequent patterns and then using the concept of support and confidence to determine the most crucial relationships. Support is the number of any item occurs in the database and the confidence depends upon support, it is the proportion of the transaction that contains support item and its dependent item. Let us take an example such as "80% of all the records that contain items A also contain items B. So, A is playing the role of antecedent and B is the consequent. B's value depend upon A. Support is the individual count of items A and B where A is antecedent and B is consequent. Here, B's value depends upon A and its confidence value is 80%. On the basis of level, association rules can be classified in two categories. Introduction to Data Mining Page 9

10 1) Single level association rules 2) Multiple level association rules Single Level Association Rules Single level association rules can only provide loose detailed information. Moreover, it can only render general rules without getting the more precise rule. For example, it is good to find that 80% of customers that buy milk also buy bread but it will be better to find that among these customers who buy bread there are75% of people buy only wheat bread. This type of hidden information in or between levels of abstraction can be provided by Multiple- Level Association rules Multiple Level Association Rules There are varieties of associations or correlations that mainly grab the attention. These associations occur among hierarchies of items. On the basis of the domain nature the items can be divided into different hierarchies. For instance, beverages in a supermarket, things in a departmental store, or objects in a sports shop can be represented into classes and subclasses. With the help of these, hierarchies can be constructed, which plays a key role to provide association among items. These are essential for mining multiple level association rules. For discovering these rules two passes are required: Initially, on the basis of minimum support threshold, it generates the frequent patterns at each level of concept hierarchy and then, on the basis of these frequent patterns convenient association rules are generated. In multiple-level association rule mining, the items in an itemset are characterized by using a concept hierarchy. Mining occurs at multiple levels in the hierarchy. At lowest levels, it might be that no rules may match the constraints. At highest levels, rules can be extremely general. Generally, a top-down approach is used where the support threshold may be same or varies from level to level (support is reduced going from higher to lower levels) (Han & Fu, 1995) Association Rule Mining Applications Initially, the issue of mining association rules was intended to face the decision support problem which was the issue of the majority of retail organizations (Agrawal, et al., 1993). Due to development of the bar-code technology, it becomes possible for retail organizations to collect and store ample amounts of sales data, which was represented as the basket data. In such type of data, a record typically consists of the date of transaction and the items purchased in the transaction. As per the view of successful organizations such kind of databases was an important item of marketing infrastructure. They were curious about Introduction to Data Mining Page 10

11 establishing the information-driven process of marketing which could be managed by the database technology. It helps the marketers to evolve and utilize customized marketing strategies and program. Now days, there are variety of applications in multiple domain for association rules. At first, the description of market basket analysis is given and after that, the original motivation for mining association rules, followed by other applications. Market Basket Analysis: By mining transactional data, a retail store can discover associations among the sales of items. This information could be very useful in various ways. For example, the rules with Maintenance Agreement as the consequent might be supportive for increasing the Maintenance Agreement sales. Rules with Home Appliances might specify other related products. There is a related application known as loss-leader analysis. Generally stores sell various products in loss during a promotion. It was done with the hope that customers would buy some other items along with the loss-leader. However, lot of customers might cherry-pick the item on sale. By mining associations over the time interval of the promotion as well as ahead of the promotion, and keeping a look on the changes in support and confidence of rules participating the promotional product, the store can find whether or not cherry-picking take place (Rajak & Gupta, 2008). Item Placement: To determine the place of items in a store there must be the knowledge regarding what kind of items are sold together. A closely attached application is the arrangement of catalog. Mail-order companies can use associations rule mining to help in determining what kind of items should be kept on the similar page of a catalog. Attached Mailing: In spite of sending the identical catalog to everyone, direct marketing retailers can use the associations and sequential patterns to alter the catalog which is based upon the items a customer has bought. Moreover, these kinds of tailored catalogs may be much smaller and also mailed less frequently which will helpful in reducing the mailing costs. Fraud Detection: Insurance companies are very much concerned in finding groups of medical service providers like doctors or clinics, who ping-pong patients among each other for irrelevant test. With the help of the paid medical claims data, all the patients can be mapped to a transaction, and every doctor or clinic visited by a patient. In this case, the items in the association rule now correlate with the set of providers, and the support of the rule will correlate with the number of patients these medical service providers have in common. On the basis of these data the insurance company can investigate the claim records for the sets of medical service providers who have a large number of common patients to decide if any fraudulent activity actually occurred or not. Another practical application is Introduction to Data Mining Page 11

12 detecting the usage of wrong medical settlement codes. For example, insurance companies are concerned in detecting unbundling, where a set of settlement codes related to the components of a medical process are used to claim payment, rather than the settlement code for the overall procedure. (The motivation is that the amount of the payments for the constituent codes may be greater than the normal settlement for the procedure.) Associations among medical settlement codes can also be used for finding the sets of payment codes which are utilized frequently. Medical Research: A data-sequence may match to the symptoms or diseases of a patient, with the transaction related to the symptoms displayed or diseases diagnosed during a visit to the doctor. The patterns discovered with the help of this data could be used in disease study to help identify symptoms of the diseases that lead certain diseases (Serban, et al., 2006). 1.7 Outline of the Thesis The organization of the thesis is as follows: Chapter-1 The chapter deals with the basic introduction of data mining, different models and various functionalities of data mining. This chapter also explains the single level and multiple level association rules with their application in multiple domains. Chapter-2 This chapter covers the Literature Review related to association rules mining algorithms. The chapter is organized into sub-sections detailing various literature reviewed regarding different aspects of multiple level association rule mining. As the main focus of our research is mining multiple level association rules; it is prerequisite to have a look at what is association rules mining. So, this chapter firstly explores some basic concepts which are helpful in carrying out research work directly or indirectly. After that single-level association rules mining approaches are explained. Subsequently, this chapter presents an overview of pertinent literature and research of multiple levels association rules mining methods. And last segment comprises the study of miscellaneous research papers used to carrying out the research work. Chapter-3 This chapter outlined the definition of the problem based on which the research objectives were articulated to handle the challenges. It has highlighted the objectives of research and also outlined the significance of study. A research methodology to address the identified objectives is also given in this chapter. Chapter-4 The chapter gives a comprehensive survey and study of some problems about various existing methods. These existing methods have some issues and challenges in this Introduction to Data Mining Page 12

13 field. The heated discussion about shortcoming of evolutionary algorithms leads to some improvements. This chapter also provides an introduction about concept hierarchy and types of concept hierarchies. This chapter also investigates the requirement of concept hierarchies in multiple level association rules mining and other data warehousing and data mining applications. A case study of an efficient encoding scheme of concept hierarchy is described. Finally it provides the summary of concept hierarchies. Chapter-5 The traditional algorithms for mining association rules at multiple levels of abstraction are explained in this chapter. Accordingly two well established algorithms MLT2_L1 and Level Wise Filtered Table (LWFT) algorithm are presented in this chapter to find multiple level frequent itemsets. The main focus of this chapter is to identify the basic working of MLT2_L1 and Level Wise Filtered Table (LWFT) algorithms. At last critically examines the weakness of MLT2_L1 and LWFT algorithm. Chapter-6 In this chapter new algorithms TransTrie and MLTransTrie are proposed for discovery of association rules at different levels of abstraction. The MLTransTrie algorithm employs the TransTrie algorithm at each level for generation of frequent patterns. The working of proposed algorithm is demonstrated with an example database and follows by the summary of the chapter. Chapter-7 In this chapter the results of the proposed algorithm MLTransTrie has been given and discussed. To study the performance of the algorithm, different support threshold were used. In this experimental research, initially process model starts from selection of the datasets. The dataset used in this study has been taken from UCI Repository of Machine Learning databases available on line. In this study basically four datasets of various sizes and with different number of attributes are used. These real world datasets are Breast-cancer, Credit-g, Mushroom and Soybean. To prove the competence of proposed algorithm a comparative analysis is performed with well-known evolutionary algorithms. Finally, comes the wrapping up of the research work carried out in previous chapters and making inferences. Chapter-8 This chapter points out the detailed conclusion of the research work carried throughout the doctorial work and discussion on future research. This research work has expended the scope of the study of mining association rules from single level to multiple concept levels. Finally this chapter is followed with bibliography/ references and after that there is appendix. Introduction to Data Mining Page 13

14 After carrying out this research work the researcher believe that this effort will certainly be great contribution towards research community, academicians, society, corporate sector and decision analyst as well. Introduction to Data Mining Page 14

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 1 1 Acknowledgement Several Slides in this presentation are taken from course slides provided by Han and Kimber (Data Mining Concepts and Techniques) and Tan,

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

COMP 465 Special Topics: Data Mining

COMP 465 Special Topics: Data Mining COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,

More information

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional

More information

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a Data Mining and Information Retrieval Introduction to Data Mining Why Data Mining? Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently

More information

Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L

Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L Introduction to Data Mining S L I D E S B Y : S H R E E J A S W A L Books 2 Which Chapter from which Text Book? Chapter 1: Introduction from Han, Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2012 A second course in data mining!! http://www.it.uu.se/edu/course/homepage/infoutv2/vt12 Kjell Orsborn! Uppsala Database Laboratory! Department of Information Technology,

More information

An Improved Apriori Algorithm for Association Rules

An Improved Apriori Algorithm for Association Rules Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data

More information

1. Inroduction to Data Mininig

1. Inroduction to Data Mininig 1. Inroduction to Data Mininig 1.1 Introduction Universe of Data Information Technology has grown in various directions in the recent years. One natural evolutionary path has been the development of the

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

DATA MINING II - 1DL460

DATA MINING II - 1DL460 DATA MINING II - 1DL460 Spring 2016 A second course in data mining http://www.it.uu.se/edu/course/homepage/infoutv2/vt16 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Chapter 1, Introduction

Chapter 1, Introduction CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from

More information

Oracle9i Data Mining. Data Sheet August 2002

Oracle9i Data Mining. Data Sheet August 2002 Oracle9i Data Mining Data Sheet August 2002 Oracle9i Data Mining enables companies to build integrated business intelligence applications. Using data mining functionality embedded in the Oracle9i Database,

More information

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data

Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University

More information

Chapter 2 BACKGROUND OF WEB MINING

Chapter 2 BACKGROUND OF WEB MINING Chapter 2 BACKGROUND OF WEB MINING Overview 2.1. Introduction to Data Mining Data mining is an important and fast developing area in web mining where already a lot of research has been done. Recently,

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications

International Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 5, Issue 01, January -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 A Survey

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Knowledge Discovery in Data Bases

Knowledge Discovery in Data Bases Knowledge Discovery in Data Bases Chien-Chung Chan Department of CS University of Akron Akron, OH 44325-4003 2/24/99 1 Why KDD? We are drowning in information, but starving for knowledge John Naisbett

More information

Table Of Contents: xix Foreword to Second Edition

Table Of Contents: xix Foreword to Second Edition Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10) CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification

More information

Analysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan

Analysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan International Journal of Scientific & Engineering Research Volume 2, Issue 5, May-2011 1 Analysis of a Population of Diabetic Patients Databases in Weka Tool P.Yasodha, M. Kannan Abstract - Data mining

More information

Data Mining Concepts

Data Mining Concepts Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R, statistics foundations 5 Introduction to D3, visual analytics

More information

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW

APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW International Journal of Computer Application and Engineering Technology Volume 3-Issue 3, July 2014. Pp. 232-236 www.ijcaet.net APRIORI ALGORITHM FOR MINING FREQUENT ITEMSETS A REVIEW Priyanka 1 *, Er.

More information

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

What Is Data Mining? CMPT 354: Database I -- Data Mining 2 Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT

More information

COMP90049 Knowledge Technologies

COMP90049 Knowledge Technologies COMP90049 Knowledge Technologies Data Mining (Lecture Set 3) 2017 Rao Kotagiri Department of Computing and Information Systems The Melbourne School of Engineering Some of slides are derived from Prof Vipin

More information

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN... INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data

More information

Basic Data Mining Technique

Basic Data Mining Technique Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm

More information

Introduction to Data Mining and Data Analytics

Introduction to Data Mining and Data Analytics 1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns

More information

Contents. Foreword to Second Edition. Acknowledgments About the Authors

Contents. Foreword to Second Edition. Acknowledgments About the Authors Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1

More information

Winter Semester 2009/10 Free University of Bozen, Bolzano

Winter Semester 2009/10 Free University of Bozen, Bolzano Data Warehousing and Data Mining Winter Semester 2009/10 Free University of Bozen, Bolzano DW Lecturer: Johann Gamper gamper@inf.unibz.it DM Lecturer: Mouna Kacimi mouna.kacimi@unibz.it http://www.inf.unibz.it/dis/teaching/dwdm/index.html

More information

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INFORMATION TECHNOLOGY DEFINITIONS AND TERMINOLOGY Course Name : DATA WAREHOUSING AND DATA MINING Course Code : AIT006 Program

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand

More information

Jarek Szlichta

Jarek Szlichta Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns

More information

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition

A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition A Study on Mining of Frequent Subsequences and Sequential Pattern Search- Searching Sequence Pattern by Subset Partition S.Vigneswaran 1, M.Yashothai 2 1 Research Scholar (SRF), Anna University, Chennai.

More information

Prognosis of Lung Cancer Using Data Mining Techniques

Prognosis of Lung Cancer Using Data Mining Techniques Prognosis of Lung Cancer Using Data Mining Techniques 1 C. Saranya, M.Phil, Research Scholar, Dr.M.G.R.Chockalingam Arts College, Arni 2 K. R. Dillirani, Associate Professor, Department of Computer Science,

More information

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management

Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Frequent Item Set using Apriori and Map Reduce algorithm: An Application in Inventory Management Kranti Patil 1, Jayashree Fegade 2, Diksha Chiramade 3, Srujan Patil 4, Pradnya A. Vikhar 5 1,2,3,4,5 KCES

More information

Optimization using Ant Colony Algorithm

Optimization using Ant Colony Algorithm Optimization using Ant Colony Algorithm Er. Priya Batta 1, Er. Geetika Sharmai 2, Er. Deepshikha 3 1Faculty, Department of Computer Science, Chandigarh University,Gharaun,Mohali,Punjab 2Faculty, Department

More information

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University

Data Mining. Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Data Mining Yi-Cheng Chen ( 陳以錚 ) Dept. of Computer Science & Information Engineering, Tamkang University Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused Web data, e-commerce

More information

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,

More information

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer Data Mining George Karypis Department of Computer Science Digital Technology Center University of Minnesota, Minneapolis, USA. http://www.cs.umn.edu/~karypis karypis@cs.umn.edu Overview Data-mining What

More information

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University

CS423: Data Mining. Introduction. Jakramate Bootkrajang. Department of Computer Science Chiang Mai University CS423: Data Mining Introduction Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS423: Data Mining 1 / 29 Quote of the day Never memorize something that

More information

Clustering Analysis based on Data Mining Applications Xuedong Fan

Clustering Analysis based on Data Mining Applications Xuedong Fan Applied Mechanics and Materials Online: 203-02-3 ISSN: 662-7482, Vols. 303-306, pp 026-029 doi:0.4028/www.scientific.net/amm.303-306.026 203 Trans Tech Publications, Switzerland Clustering Analysis based

More information

Data mining fundamentals

Data mining fundamentals Data mining fundamentals Elena Baralis Politecnico di Torino Data analysis Most companies own huge bases containing operational textual documents experiment results These bases are a potential source of

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

Study on the Application Analysis and Future Development of Data Mining Technology

Study on the Application Analysis and Future Development of Data Mining Technology Study on the Application Analysis and Future Development of Data Mining Technology Ge ZHU 1, Feng LIN 2,* 1 Department of Information Science and Technology, Heilongjiang University, Harbin 150080, China

More information

Using Association Rules for Better Treatment of Missing Values

Using Association Rules for Better Treatment of Missing Values Using Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine Intelligence Group) National University

More information

A SURVEY ON DATA MINING TECHNIQUES FOR CLASSIFICATION OF IMAGES

A SURVEY ON DATA MINING TECHNIQUES FOR CLASSIFICATION OF IMAGES A SURVEY ON DATA MINING TECHNIQUES FOR CLASSIFICATION OF IMAGES 1 Preeti lata sahu, 2 Ms.Aradhana Singh, 3 Mr.K.L.Sinha 1 M.Tech Scholar, 2 Assistant Professor, 3 Sr. Assistant Professor, Department of

More information

KNOWLEDGE DISCOVERY AND DATA MINING

KNOWLEDGE DISCOVERY AND DATA MINING KNOWLEDGE DISCOVERY AND DATA MINING Prof. Fabio A. Schreiber Dipartimento di Elettronica e Informazione Politecnico di Milano INFORMATION MANAGEMENT TECHNOLOGIES DATA WAREHOUSE DECISION SUPPORT SYSTEMS

More information

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014 Data Mining Data mining processes What technological infrastructure is required? Data mining is a system of searching through large amounts of data for patterns. It is a relatively new concept which is

More information

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm

A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm A Technical Analysis of Market Basket by using Association Rule Mining and Apriori Algorithm S.Pradeepkumar*, Mrs.C.Grace Padma** M.Phil Research Scholar, Department of Computer Science, RVS College of

More information

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA. Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan

More information

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples. Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce

More information

Image Mining: frameworks and techniques

Image Mining: frameworks and techniques Image Mining: frameworks and techniques Madhumathi.k 1, Dr.Antony Selvadoss Thanamani 2 M.Phil, Department of computer science, NGM College, Pollachi, Coimbatore, India 1 HOD Department of Computer Science,

More information

A REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING

A REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING A REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING Abhinav Kathuria Email - abhinav.kathuria90@gmail.com Abstract: Data mining is the process of the extraction of the hidden pattern from the data

More information

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 The Survey Of Data Mining And Warehousing Architha.S, A.Kishore Kumar Department of Computer Engineering Department of computer engineering city engineering college VTU Bangalore, India ABSTRACT: Data

More information

Hierarchical Online Mining for Associative Rules

Hierarchical Online Mining for Associative Rules Hierarchical Online Mining for Associative Rules Naresh Jotwani Dhirubhai Ambani Institute of Information & Communication Technology Gandhinagar 382009 INDIA naresh_jotwani@da-iict.org Abstract Mining

More information

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Warehouse and Data Mining. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Warehouse and Data Mining Li Xiong Department of Mathematics and Computer Science Emory University 1 1960s: Evolution of Database Technology Data collection, database creation,

More information

Data Clustering With Leaders and Subleaders Algorithm

Data Clustering With Leaders and Subleaders Algorithm IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719, Volume 2, Issue 11 (November2012), PP 01-07 Data Clustering With Leaders and Subleaders Algorithm Srinivasulu M 1,Kotilingswara

More information

Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values

Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine

More information

Mining Frequent Patterns with Counting Inference at Multiple Levels

Mining Frequent Patterns with Counting Inference at Multiple Levels International Journal of Computer Applications (097 7) Volume 3 No.10, July 010 Mining Frequent Patterns with Counting Inference at Multiple Levels Mittar Vishav Deptt. Of IT M.M.University, Mullana Ruchika

More information

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús

More information

DATA MINING TRANSACTION

DATA MINING TRANSACTION DATA MINING Data Mining is the process of extracting patterns from data. Data mining is seen as an increasingly important tool by modern business to transform data into an informational advantage. It is

More information

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42 Pattern Mining Knowledge Discovery and Data Mining 1 Roman Kern KTI, TU Graz 2016-01-14 Roman Kern (KTI, TU Graz) Pattern Mining 2016-01-14 1 / 42 Outline 1 Introduction 2 Apriori Algorithm 3 FP-Growth

More information

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013 Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

Chapter 3: Data Mining:

Chapter 3: Data Mining: Chapter 3: Data Mining: 3.1 What is Data Mining? Data Mining is the process of automatically discovering useful information in large repository. Why do we need Data mining? Conventional database systems

More information

Correlation Based Feature Selection with Irrelevant Feature Removal

Correlation Based Feature Selection with Irrelevant Feature Removal Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

Parametric Comparisons of Classification Techniques in Data Mining Applications

Parametric Comparisons of Classification Techniques in Data Mining Applications Parametric Comparisons of Clas Techniques in Data Mining Applications Geeta Kashyap 1, Ekta Chauhan 2 1 Student of Masters of Technology, 2 Assistant Professor, Department of Computer Science and Engineering,

More information

Chapter 4 Data Mining A Short Introduction

Chapter 4 Data Mining A Short Introduction Chapter 4 Data Mining A Short Introduction Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining 3. Clustering 4. Classification Data Mining - 2 2 1. Data Mining Overview

More information

Data Mining & Machine Learning F2.4DN1/F2.9DM1

Data Mining & Machine Learning F2.4DN1/F2.9DM1 Data Mining & Machine Learning F2.4DN1/F2.9DM1 Nick Taylor N.K.Taylor@hw.ac.uk Room EM1.62 Data Data Mining - Content Introduction to Data Mining What it is, Who does it and Why Data Warehousing Virtuous

More information

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA

Knowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

An Introduction to Data Mining BY:GAGAN DEEP KAUSHAL

An Introduction to Data Mining BY:GAGAN DEEP KAUSHAL An Introduction to Data Mining BY:GAGAN DEEP KAUSHAL Trends leading to Data Flood More data is generated: Bank, telecom, other business transactions... Scientific Data: astronomy, biology, etc Web, text,

More information

Clustering and Association using K-Mean over Well-Formed Protected Relational Data

Clustering and Association using K-Mean over Well-Formed Protected Relational Data Clustering and Association using K-Mean over Well-Formed Protected Relational Data Aparna Student M.Tech Computer Science and Engineering Department of Computer Science SRM University, Kattankulathur-603203

More information

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset M.Hamsathvani 1, D.Rajeswari 2 M.E, R.Kalaiselvi 3 1 PG Scholar(M.E), Angel College of Engineering and Technology, Tiruppur,

More information

The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm

The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm The Transpose Technique to Reduce Number of Transactions of Apriori Algorithm Narinder Kumar 1, Anshu Sharma 2, Sarabjit Kaur 3 1 Research Scholar, Dept. Of Computer Science & Engineering, CT Institute

More information

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA

More information

Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts.

Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts. Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts. BY SCOTT A. BARNES, CPA, CFF, CGMA The adversarial nature of the American legal system creates a natural conflict between

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

SOCIAL MEDIA MINING. Data Mining Essentials

SOCIAL MEDIA MINING. Data Mining Essentials SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

Association Pattern Mining. Lijun Zhang

Association Pattern Mining. Lijun Zhang Association Pattern Mining Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction The Frequent Pattern Mining Model Association Rule Generation Framework Frequent Itemset Mining Algorithms

More information

Foundation of Data Mining: Introduction

Foundation of Data Mining: Introduction Foundation of Data Mining: Introduction Hillol Kargupta CSEE Department, UMBC hillol@cs.umbc.edu ITE 342, (410) 455-3972 www.cs.umbc.edu/~hillol Acknowledgement: Tan, Steinbach, and Kumar provided some

More information

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44 Data Mining Piotr Paszek piotr.paszek@us.edu.pl Introduction (Piotr Paszek) Data Mining DM KDD 1 / 44 Plan of the lecture 1 Data Mining (DM) 2 Knowledge Discovery in Databases (KDD) 3 CRISP-DM 4 DM software

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Data Mining The word data mining is known as the technique which deals with the removal or distillation of unseen predictive knowledge from large database. It includes different

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 2017 International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 4, Jul Aug 17 RESEARCH ARTICLE OPEN ACCESS Classifying Brain Dataset Using Classification Based Association Rules

More information

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace [Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 20 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(20), 2014 [12526-12531] Exploration on the data mining system construction

More information

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA) International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 12 No. 1 Nov. 2014, pp. 217-222 2014 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/

More information