SOFTWARE TOOLS FOR TEACHING UNDERGRADUATE DATA MINING COURSE

Size: px
Start display at page:

Download "SOFTWARE TOOLS FOR TEACHING UNDERGRADUATE DATA MINING COURSE"

Transcription

1 SOFTWARE TOOLS FOR TEACHING UNDERGRADUATE DATA MINING COURSE Ashwin Satyanarayana New York City College of Technology, N-913, 300 Jay Street, Brooklyn, NY Abstract: Data mining, a growingly popular field in Computer Science, is the transformation of large amounts of data into meaningful patterns and rules. Recent studies have noted the rise of data mining as a career path with increasing opportunities for graduates. Data mining introduces new challenges for faculty in universities who teach courses in this area. One of the main challenges for faculty would be to identify which software tool to use to introduce this subject in a one semester undergraduate course. In this paper, we compare and contrast three popular commercial and three popular open source tools that are available for faculty. Keywords: Data Mining, SAS, IBM SPSS Modeler, MATLAB, R, WEKA, RapidMiner Introduction: Enormous amounts of data are generated every minute. Some sources of data, such as those found on the Internet are obvious. Social networking sites, search and retrieval engines, media sharing sites, stock trading sites, and news sources continually store enormous amounts of new data throughout the day [5]. We are in a new era in modern information technology - the Big Data era. In March, 2012, the U.S. Government announced a Big Data Research and Development Initaitve -- a $200 million dollar commitment to improve our ability to extract knowledge and insights from large and complex collections of digital data. Government agencies such as NSF, NIH, and DoD are investing hundreds of millions of dollars toward the development of systems that can help them extract knowledge from their data. The career potential for our graduates continue to blossom in this field. A recent study released by Gartner projects that in 2013, big data is forecast to drive $34 billion of IT spending, with a total of $232 billion to be spent through 2016 [1]. Over the last 10 years, a number of commercial and open source tools have been developed to examine and transform data. In this paper, an overview of some tools are presented that can be used to teach data mining in a one semester course. We present 3 of the most popular commercial tools and 3 open source tools. Tools: The ultimate goal of data mining is to create a model, a model that can improve the way you read and interpret your existing data and your future data. Since there are so many commercial and open source tools available for data mining, the major step is to choose the right tools to create a good model. From there, the tools are then used to refine the model to make it more useful. There are 3 main needs that we would like to achieve for the tools that we pick for the course for teaching a single semester basic data mining course:

2 1. Easily available for students to download 2. Easily to load data, build models, and interpret results 3. Should work with big data According to a survey by KDnuggets [2], the most popular tools used by 900 data miners, here are the most popular data mining tools used by real projects: Name of the Tool % of users Commercial (C)/Open Source (O) RapidMiner 37.80% O R 29.80% O Weka 14.30% O SAS 12.00% C Matlab 9.20% C IBM SPSS Statistics 7.90% C Most Popular Tools For Data Mining IBM SPSS Statistics Matlab SAS Weka % of users R RapidMiner 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% Figure 1. Most popular tools for Data Mining (according to KDNuggets poll 2010) Commercial Tools: In this section we will discuss the most popular commercial tools that are available for real projects. One of the benefits of using commercial tools in universities is that students will get hands on experience with working with tools that are used in industry. However one of the drawbacks of using commercial tools is that they are expensive and not easily accessible to students to download. Considering this, it would be best to use commercial tools for an advanced data mining course and not for an introductory course. We will see later in this paper, that using open source tools for introductory courses is ideal.

3 In this paper we will discuss 3 popular commercial tools: (a) SAS Enterprise Miner (b) MATLAB (c) IBM SPSS Modeler SAS Enterprise Miner SAS Enterprise Miner software streamlines the data mining process to create highly accurate predictive and descriptive models. The models are based on analysis of vast amounts of data from across an enterprise. Interactive statistical and visualization tools help you better search for trends and anomalies and help you focus on the model development process. Here are good sources of information that can be used for getting an introduction to this tool: Enterprise Miner nodes are arranged into the following categories according the SAS process for data mining: SEMMA. Sample identify input data sets (identify input data; sample from a larger data set; partition data set into training, validation, and test data sets). Explore explore data sets statistically and graphically (plot the data, obtain descriptive statistics, identify important variables, perform association analysis). Modify prepare the data for analysis (create additional variables or transform existing variables for analysis, identify outliers, replace missing values, modify the way in which variables are used for the analysis, perform cluster analysis, analyze data with selforganizing maps (known as SOMs) or Kohonen networks). Model fit a predictive model (model a target variable by using a regression model, a decision tree, a neural network, or a user-defined model). Assess compare competing predictive models (build charts that plot the percentage of respondents, percentage of respondents captured, lift, and profit). Since most of the work of SAS Enterprise mining deals with Nodes, here are some sample notes: The Input Data Source node reads data sources and defines their attributes for later processing by Enterprise Miner. The Sampling node enables you to perform random sampling, stratified random sampling, and cluster sampling. Sampling is recommended for extremely large databases because it can significantly decrease modeltraining time.

4 The Data Partition node enables you to partition data sets into training, test, and validation data sets. The training data set is used for preliminary model fitting. The validation data set is used to monitor and tune the model weights during estimation and is also used for model assessment. The test data set is an additional data set that you can use for model assessment. The Distribution Explorer node enables you to explore large volumes of data in multidimensional histograms. The Multiplot node enables you to explore large volumes of data graphically The Association node enables you to identify association relationships within the data. For example, if a customer buys a loaf of bread, how likely is the customer to buy a gallon of milk as well? The Clustering node enables you to segment your data; that is, it enables you to identify data observations that are similar in some way. The Regression node enables you to fit both linear and logistic regression models to your data. You can use both continuous and discrete variables as inputs. Here is a sample of how to combine nodes to create a model: Figure 2: Combining Nodes to Create a Model Once the nodes are created, all we need to do is right-click on the Assessment node and click Run. As each node finishes, it will turn green. In Summary, SAS Enterprise Miner, is a simple commercial tool, which will help students play with different aspects of Data Mining: Data Preparation and Investigation, Fitting and Comparing Candidate Models and Generating Reports.

5 MATLAB: MATLAB has excellent built-in support for many data analysis and visualization routines, [3] in particular, one of its most useful facilities is that of efficient exploratory data analysis, which is a natural fit in the context of data mining. Advantages of MATLAB: Two major advantages are that of portability and domain specific representations. MATLAB's portability comes from the fact that all MATLAB users will have the same range of basic functions at their disposal. The representation which MATLAB implements, is dealing with all data in the form of matrices. This allows for many varied algorithmic implementations [4], which, as we shall see, is crucial for any data mining package. Other advantages of MATLAB include its interactive interface, debugging facilities, object oriented nature and in particular its high quality graphics and visualization. facilities [Burton 2006]. Lastly and perhaps most importantly, MATLAB's add on feature, in the form of toolboxes, makes it possible to extend the existing capabilities of the language with ease [Burton 2006]. Data mining, which, for the most part, consists of numerical methodologies [Woolf 2005], is thus a natural fit for implementation with the MATLAB package. Disadvantages: The main drawback of MATLAB is the fact that it is an interpreted language, which leads to performance cuts, as compared with third generation languages such as C, upon which MATLAB is built. In the context of data mining, this can be a very serious issue, particularly when one is dealing with enormous quantities of data. Dwinnell [3] points out that 4GL's with their own compilers are able to largely overcome this disadvantage. MATLAB does in fact possess its own compiler but it is distributed as a toolbox by The MathWorks (distributors of MATLAB). One of the central themes of MATLAB is to synthesize many different data mining tools to discover the potential of MATLAB for data mining. Here are a list of MATLAB toolboxes that are used for Data Mining: Name of the Toolbox Classification toolbox ARMADA Data Mining Tool NetLab - Neural Network Toolbox Bootstrap Matlab Toolbox The Spider Bayes Net Toolbox for Matlab SVM and Kernel Methods Matlab Toolbox Link: ab/ ndex.html html

6 IBM SPSS Modeler [7]: IBM SPSS Modeler is a data mining software application from IBM. It is a data mining and text analytics workbench used to build predictive models. It has a visual interface which allows users to leverage statistical and data mining algorithms without programming. IBM SPSS Modeler offers a variety of modeling methods taken from machine learning, artificial intelligence, and statistics. The methods available on the Modeling palette allow you to derive new information from your data and to develop predictive models. Each method has certain strengths and is best suited for particular types of problems. Modeling methods are divided into three categories: Classification Association Segmentation Classification Models: Classification models use the values of one or more input fields to predict the value of one or more output, or target, fields. Some examples of these techniques are: decision trees (C&R Tree, QUEST, CHAID and C5.0 algorithms), regression (linear, logistic, generalized linear, and Cox regression algorithms), neural networks, support vector machines, and Bayesian networks. Association Models: Association models find patterns in your data where one or more entities (such as events, purchases, or attributes) are associated with one or more other entities. The

7 models construct rule sets that define these relationships. Here the fields within the data can act as both inputs and targets. You could find these associations manually, but association rule algorithms do so much more quickly, and can explore more complex patterns. Apriori and Carma models are examples of the use of such algorithms. One other type of association model is a sequence detection model, which finds sequential patterns in time-structured data. Segmentation Models: Segmentation models divide the data into segments, or clusters, of records that have similar patterns of input fields. As they are only interested in the input fields, segmentation models have no concept of output or target fields. Examples of segmentation models are Kohonen networks, K-Means clustering, two-step clustering and anomaly detection.

8 Open Source Tools: In this section we will discuss 3 tools which are available free for download. The benefit of using Open Source tools is that students can download these tools at home. Here are the 3 most popular tools that we will be discussing in this paper: (a) RapidMiner (b) R (c) WEKA RapidMiner: RapidMiner [8][9] (formerly Yale) is an environment for machine learning and data mining processes. A modular operator concept allows the design of complex nested operator chains for a huge number of learning problems. The data handling is transparent to the operators. They do not have to cope with the actual data format or different data views - the RapidMiner core takes care of the necessary transformations. Today, RapidMiner is the world-wide leading open-source data mining solution and is widely used by researchers and companies. Working with RapidMiner fundamentally consists in defining analysis processes by indicating a succession of individual work steps. In RapidMiner these process components are called operators. An operator is defined by several things: The description of the expected inputs, The description of the supplied outputs, The action performed by the operator on the inputs, which ultimately leads to the supply of the outputs, A number of parameters which can control the action performed. Figure 4: Operator used in RapidMiner If several operators are interconnected, then we speak of an analysis process or process for short. Such a succession of steps can for example load a data set, transform the data, compute a model and apply the model to another data set.

9 Figure 5: An analysis process consisting of several operators. Such processes can easily grow to several hundred operators in size in RapidMiner and spread over several levels or subprocesses. The process inspections continually performed in the background as well as the process navigation aids shown below ensure that you do not lose track and that you define correct processes,even for more complex tasks. RapidMiner is the top voted tool according to KDnuggets. Students would find the animated operators easy to drag and drop, and learn and understand data mining. We would strongly recommend this tool for an introductory undergraduate course. R: R [10] is a free software environment for statistical computing and graphics. R can be easily extended with 4,728 packages available on CRAN(as of Sept 6, 2013). Some of the important Data mining tasks (Classification, Clustering and Association Rules) along with the R commands for them are shown below: Classification with R: Decision Trees Random Forest SVM Neural Network rparty, party randomforest, party e1071, kernlab nnet, neuralnet, RSNNS Building a Decision Tree can be very simple using a few commands. Here is a sample of the commands:

10 # build a decision tree library(party) iris.formula <- Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width iris.ctree <- ctree(iris.formula, data=iris.train) plot(iris.ctree) Clustering: k-means k-mediods Hierarchical clustering BIRCH kmeans(), kmeansruns() pam(), pamk() hcust(), agnes(), diana() Birch To do k-means clustering on the Iris dataset involves the following simple steps: set.seed(8953) iris2 <- iris # remove class IDs iris2$species <- NULL # k-means clustering iris.kmeans <- kmeans(iris2, 3) # check result table(iris$species, iris.kmeans$cluster) ## ## ## setosa ## versicolor ## virginica

11 # plot clusters and their centers plot(iris2[c("sepal.length", "Sepal.Width")], col=iris.kmeans$cluster) points(iris.kmeans$centers[, c("sepal.length", "Sepal.Width")], col=1:3, pch="*", cex=5) Association Rule Mining: Association Rules Sequential Patterns Visualisation of associations apriori(), eclat() in package arules arulessequence arulesviz A simple association rule on the titanic dataset can be done as follows: # find association rules with the APRIORI algorithm library(arules) rules <- apriori(titanic.raw, control=list(verbose=f), parameter=list(minlen=2, supp=0.005, conf=0.8), appearance=list(rhs=c("survived=no", "Survived=Yes"), default="lhs")) # sort rules quality(rules) <- round(quality(rules), digits=3) rules.sorted <- sort(rules, by="lift") # have a look at rules # inspect(rules.sorted) # lhs rhs support confidence lift

12 # 1 fclass=2nd, # Age=Childg => fsurvived=yesg # 2 fclass=2nd, # Sex=Female, # Age=Childg => fsurvived=yesg # 3 fclass=1st, # Sex=Femaleg => fsurvived=yesg # 4 fclass=1st, # Sex=Female, # Age=Adultg => fsurvived=yesg # 5 fclass=2nd, # Sex=Male, # Age=Adultg => fsurvived=nog # 6 fclass=2nd, # Sex=Femaleg => fsurvived=yesg # 7 fclass=crew, # Sex=Femaleg => fsurvived=yesg # 8 fclass=crew, # Sex=Female, # Age=Adultg => fsurvived=yesg # 9 fclass=2nd, # Sex=Maleg => fsurvived=nog In summary, R is a very useful tool for dataminers. It comes with a lot of packages which make it very simple for data mining operations. One of the challenges of teaching R in an introductory course is that it requires a steep learning curve. Students should be encouraged to learn R for an advanced course. WEKA: WEKA [11] is the product of the University of Waikato (New Zealand) and was first implemented in its modern form in It uses the GNU General Public License (GPL). The software is written in the Java language and contains a GUI for interacting with data files and producing visual results (think tables and curves). It also has a general API, so you can embed WEKA, like any other library, in your own applications to such things as automated server-side data-mining tasks. We will briefly look at how you can do the following using WEKA: (a) Building a dataset (b) Loading the dataset (c) Creating the model (d) Interpreting the model To load data into WEKA, we have to put it into a format that will be understood. WEKA's preferred method for loading data is in the Attribute-Relation File Format (ARFF), where you can define the type of data being loaded, then supply the data itself. In the file, you define each column and what each column contains. Here is an example of building a dataset using WEKA:

13 In the case of the regression model, you are limited to a NUMERIC or a DATE column. Finally, you supply each row of data in a comma-delimited format. The ARFF file we'll be using with WEKA appears below. Notice in the rows of data that we've left out my house. Since we are creating the model, we cannot input my house into it since the selling price is housesize lotsize bedrooms granite bathroom sellingprice 3529,9191,6,0,0, ,10061,5,1,1, ,10150,5,0,1, ,14156,4,1,0, ,9600,4,0,1, ,19994,6,1,1, (b) Loading the dataset into WEKA: Now that the data file has been created, it's time to create our regression model. Start WEKA, and then choose the Explorer. You'll be taken to the Explorer screen, with the Preprocess tab selected. Select the Open File button and select the ARFF file you created in the section above. After selecting the file, your WEKA Explorer should look similar to the screenshot in Figure below.

14 (c) Creating a model (regression model): To create the model, click on the Classify tab. The first step is to select the model we want to build, so WEKA knows how to work with the data, and how to create the appropriate model: 1. Click the Choose button, then expand the functions branch. 2. Select the LinearRegression leaf. This tells WEKA that we want to build a regression model. As you can see from the other choices, though, there are lots of possible models to build. Lots! This should give you a good indication of how we are only touching the surface of this subject. Also of note: There is another choice called SimpleLinearRegression in the same branch. Do not choose this because simple regression only looks at one variable, and we have six. When you've selected the right model, your WEKA Explorer should look like Figure below: WEKA doesn't mess around. It puts the regression model right there in the output, as shown in below.

15 sellingprice = ( * housesize) + ( * lotsize) + ( * bedrooms) + ( * bathroom) In summary, this section discussed the first data-mining model, the regression model (specifically, the linear regression multi-variable model), and showed how to use it in WEKA. This regression model is easy to use and can be used for myriad data sets. In summary, WEKA can be used as an introductory tool for datamining students mainly because of its GUI and its explorer which make is easy for students to explore the different aspects of data mining. Conclusion: In this paper, we have presented 3 commercial tools (SAS Enterprise Miner, IBM SPSS Modeler, MATLAB), and 3 open source tools (RapidMiner, R, WEKA), and discussed how to build models using them with pros and cons of each. We feel that using open source tools in a university setting is the best for teaching a basic undergraduate class in one semester. If however, it is an advanced data mining class, the commercial tools can be considered for teaching this course. References: [1] Gartner Press Release. Gartner Says Big Data Will Drive $28 Billion of IT Spending in October 17, [2] [3] Murphy, K., Bayes Net Toolbox for MATLAB, 2005, Accessed: 10 May 2006, < [4] Dwinnell, W., Modeling Methodology 5: Mathematical Programming Languages, 1998, < [5] B. King, A.Satyanarayana Teaching Data Mining in the Era of Big Data ASEE Annual Conference. [6] IBM SPSS Modeler: [7] IBM SPSS Modeler Documentation: ementine_documentation.htm [8] Rapid Miner: Installation, Tutorial: [9] Rapid Miner Manual: [10] R: [11] WEKA: Download, Documentation:

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

Non-trivial extraction of implicit, previously unknown and potentially useful information from data CS 795/895 Applied Visual Analytics Spring 2013 Data Mining Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs795-s13/ What is Data Mining? Many Definitions Non-trivial extraction of implicit, previously

More information

CHAPTER 4 METHODOLOGY AND TOOLS

CHAPTER 4 METHODOLOGY AND TOOLS CHAPTER 4 METHODOLOGY AND TOOLS 4.1 RESEARCH METHODOLOGY In an effort to test empirically the suggested data mining technique, the data processing quality, it is important to find a real-world for effective

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

arulescba: Classification for Factor and Transactional Data Sets Using Association Rules

arulescba: Classification for Factor and Transactional Data Sets Using Association Rules arulescba: Classification for Factor and Transactional Data Sets Using Association Rules Ian Johnson Southern Methodist University Abstract This paper presents an R package, arulescba, which uses association

More information

Now, Data Mining Is Within Your Reach

Now, Data Mining Is Within Your Reach Clementine Desktop Specifications Now, Data Mining Is Within Your Reach Data mining delivers significant, measurable value. By uncovering previously unknown patterns and connections in data, data mining

More information

CS 8520: Artificial Intelligence. Weka Lab. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CS 8520: Artificial Intelligence. Weka Lab. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek CS 8520: Artificial Intelligence Weka Lab Paula Matuszek Fall, 2015!1 Weka is Waikato Environment for Knowledge Analysis Machine Learning Software Suite from the University of Waikato Been under development

More information

GETTING STARTED WITH DATA MINING

GETTING STARTED WITH DATA MINING GETTING STARTED WITH DATA MINING Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIR Forum 2017 Washington, D.C. 1 Using Data

More information

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software 1 CHAPTER 1 Introduction to SAS Enterprise Miner Software Data Mining Overview 1 Layout of the SAS Enterprise Miner Window 2 Using the Application Main Menus 3 Using the Toolbox 8 Using the Pop-Up Menus

More information

Application of Data Mining in Manufacturing Industry

Application of Data Mining in Manufacturing Industry International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 3, Number 2 (2011), pp. 59-64 International Research Publication House http://www.irphouse.com Application of Data Mining

More information

Tanagra: An Evaluation

Tanagra: An Evaluation Tanagra: An Evaluation Jessica Enright Jonathan Klippenstein November 5th, 2004 1 Introduction to Tanagra Tanagra was written as an aid to education and research on data mining by Ricco Rakotomalala [1].

More information

Gain Greater Productivity in Enterprise Data Mining

Gain Greater Productivity in Enterprise Data Mining Clementine 9.0 Specifications Gain Greater Productivity in Enterprise Data Mining Discover patterns and associations in your organization s data and make decisions that lead to significant, measurable

More information

Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016

Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016 Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 1 November 2016 Overview Overview (short: we covered most of this in the tutorial) Why infographics and visualisation What s the

More information

An Introduction to WEKA Explorer. In part from: Yizhou Sun 2008

An Introduction to WEKA Explorer. In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer In part from: Yizhou Sun 2008 What is WEKA? Waikato Environment for Knowledge Analysis It s a data mining/machine learning tool developed by Department of Computer Science,,

More information

PROJECT 1 DATA ANALYSIS (KR-VS-KP)

PROJECT 1 DATA ANALYSIS (KR-VS-KP) PROJECT 1 DATA ANALYSIS (KR-VS-KP) Author: Tomáš Píhrt (xpiht00@vse.cz) Date: 12. 12. 2015 Contents 1 Introduction... 1 1.1 Data description... 1 1.2 Attributes... 2 1.3 Data pre-processing & preparation...

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

Prototyping DM Techniques with WEKA and YALE Open-Source Software

Prototyping DM Techniques with WEKA and YALE Open-Source Software TIES443 Contents Tutorial 1 Prototyping DM Techniques with WEKA and YALE Open-Source Software Department of Mathematical Information Technology University of Jyväskylä Mykola Pechenizkiy Course webpage:

More information

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS

More information

CMPUT 695 Fall 2004 Assignment 2 Xelopes

CMPUT 695 Fall 2004 Assignment 2 Xelopes CMPUT 695 Fall 2004 Assignment 2 Xelopes Paul Nalos, Ben Chu November 5, 2004 1 Introduction We evaluated Xelopes, a data mining library produced by prudsys 1. Xelopes is available for Java, C++, and CORBA

More information

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before

More information

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Data analysis case study using R for readily available data set using any one machine learning Algorithm Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning

More information

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014 Data Mining Data mining processes What technological infrastructure is required? Data mining is a system of searching through large amounts of data for patterns. It is a relatively new concept which is

More information

A Survey of Statistical Modeling Tools

A Survey of Statistical Modeling Tools 1 of 6 A Survey of Statistical Modeling Tools Madhuri Kulkarni (A survey paper written under the guidance of Prof. Raj Jain) Abstract: A plethora of statistical modeling tools are available in the market

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22

Knowledge Discovery. URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics

More information

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN... INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction

Chapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle

More information

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo (luigi.grimaudo@polito.it) DataBase And Data Mining Research Group (DBDMG) Summary RapidMiner project Strengths

More information

Summary. RapidMiner Project 12/13/2011 RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE

Summary. RapidMiner Project 12/13/2011 RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo (luigi.grimaudo@polito.it) DataBase And Data Mining Research Group (DBDMG) Summary RapidMiner project Strengths

More information

Community edition(open-source) Enterprise edition

Community edition(open-source) Enterprise edition Suseela Bhaskaruni Rapid Miner is an environment for machine learning and data mining experiments. Widely used for both research and real-world data mining tasks. Software versions: Community edition(open-source)

More information

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA) International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 12 No. 1 Nov. 2014, pp. 217-222 2014 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/

More information

Enterprise Miner Tutorial Notes 2 1

Enterprise Miner Tutorial Notes 2 1 Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

IJMIE Volume 2, Issue 9 ISSN:

IJMIE Volume 2, Issue 9 ISSN: WEB USAGE MINING: LEARNER CENTRIC APPROACH FOR E-BUSINESS APPLICATIONS B. NAVEENA DEVI* Abstract Emerging of web has put forward a great deal of challenges to web researchers for web based information

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

The Explorer. chapter Getting started

The Explorer. chapter Getting started chapter 10 The Explorer Weka s main graphical user interface, the Explorer, gives access to all its facilities using menu selection and form filling. It is illustrated in Figure 10.1. There are six different

More information

Data Mining Concepts

Data Mining Concepts Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms Sequential

More information

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of

More information

Data Mining Tools. Jean-Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, Paris, Cedex 05

Data Mining Tools. Jean-Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, Paris, Cedex 05 Data Mining Tools Jean-Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, 75252 Paris, Cedex 05 Jean-Gabriel.Ganascia@lip6.fr DATA BASES Data mining Extraction Data mining Interpretation/

More information

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree World Applied Sciences Journal 21 (8): 1207-1212, 2013 ISSN 1818-4952 IDOSI Publications, 2013 DOI: 10.5829/idosi.wasj.2013.21.8.2913 Decision Making Procedure: Applications of IBM SPSS Cluster Analysis

More information

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

CE4031 and CZ4031 Database System Principles

CE4031 and CZ4031 Database System Principles CE431 and CZ431 Database System Principles Course CE/CZ431 Course Database System Principles CE/CZ21 Algorithms; CZ27 Introduction to Databases CZ433 Advanced Data Management (not offered currently) Lectures

More information

Gain Insight and Improve Performance with Data Mining

Gain Insight and Improve Performance with Data Mining Clementine 11.0 Specifications Gain Insight and Improve Performance with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

Tutorial on Machine Learning Tools

Tutorial on Machine Learning Tools Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow

More information

Data Mining With Weka A Short Tutorial

Data Mining With Weka A Short Tutorial Data Mining With Weka A Short Tutorial Dr. Wenjia Wang School of Computing Sciences University of East Anglia (UEA), Norwich, UK Content 1. Introduction to Weka 2. Data Mining Functions and Tools 3. Data

More information

Oracle9i Data Mining. Data Sheet August 2002

Oracle9i Data Mining. Data Sheet August 2002 Oracle9i Data Mining Data Sheet August 2002 Oracle9i Data Mining enables companies to build integrated business intelligence applications. Using data mining functionality embedded in the Oracle9i Database,

More information

Association Rule Mining. Introduction 46. Study core 46

Association Rule Mining. Introduction 46. Study core 46 Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent

More information

Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial)

Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial) From this diagram, you can see that the aggregated mining model preserves the overall range and trends in values while minimizing the fluctuations in the individual data series. Conclusion You have learned

More information

Fraud Detection Using Random Forest Algorithm

Fraud Detection Using Random Forest Algorithm Fraud Detection Using Random Forest Algorithm Eesha Goel Computer Science Engineering and Technology, GZSCCET, Bhatinda, India eesha1992@rediffmail.com Abhilasha Computer Science Engineering and Technology,

More information

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION JOZEF MOFFAT, ANALYTICS & INNOVATION PRACTICE, SAS UK 10, MAY 2016 DATA EXPLORATION AND VISUALISATION AGENDA SAS Webinar 10th May 2016 at 10:00 AM

More information

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SHRI ANGALAMMAN COLLEGE OF ENGINEERING & TECHNOLOGY (An ISO 9001:2008 Certified Institution) SIRUGANOOR,TRICHY-621105. DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year / Semester: IV/VII CS1011-DATA

More information

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007 What s New in Spotfire DXP 1.1 Spotfire Product Management January 2007 Spotfire DXP Version 1.1 This document highlights the new capabilities planned for release in version 1.1 of Spotfire DXP. In this

More information

Association Rule Mining Using Revolution R for Market Basket Analysis

Association Rule Mining Using Revolution R for Market Basket Analysis Association Rule Mining Using Revolution R for Market Basket Analysis Veepu Uppal 1, Dr.Rajesh Kumar Singh 2 1 Assistant Professor, Manav Rachna University, Faridabad, INDIA 2 Principal, Shaheed Udham

More information

MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by

MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by 1 MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by MathWorks In 2004, MATLAB had around one million users

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Automate Transform Analyze

Automate Transform Analyze Competitive Intelligence 2.0 Turning the Web s Big Data into Big Insights Automate Transform Analyze Introduction Today, the web continues to grow at a dizzying pace. There are more than 1 billion websites

More information

SCHEME OF COURSE WORK. Data Warehousing and Data mining

SCHEME OF COURSE WORK. Data Warehousing and Data mining SCHEME OF COURSE WORK Course Details: Course Title Course Code Program: Specialization: Semester Prerequisites Department of Information Technology Data Warehousing and Data mining : 15CT1132 : B.TECH

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining

More information

SAS Visual Analytics 8.1: Getting Started with Analytical Models

SAS Visual Analytics 8.1: Getting Started with Analytical Models SAS Visual Analytics 8.1: Getting Started with Analytical Models Using This Book Audience This book covers the basics of building, comparing, and exploring analytical models in SAS Visual Analytics. The

More information

Curriculum Scheme. Dr. Ambedkar Institute of Technology, Bengaluru-56 (An Autonomous Institute, Affiliated to V T U, Belagavi)

Curriculum Scheme. Dr. Ambedkar Institute of Technology, Bengaluru-56 (An Autonomous Institute, Affiliated to V T U, Belagavi) Curriculum Scheme INSTITUTION VISION & MISSION VISION: To create Dynamic, Resourceful, Adept and Innovative Technical professionals to meet global challenges. MISSION: To offer state of the art undergraduate,

More information

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining

More information

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa An Introduction to Data Mining in Institutional Research Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa AIR/SPSS Professional Development Series Background Covering variety

More information

Data Mining: Exploring Data

Data Mining: Exploring Data Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar But we start with a brief discussion of the Friedman article and the relationship between Data

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques

Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques 24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE

More information

Big Data Specialized Studies

Big Data Specialized Studies Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate

More information

Basic Concepts Weka Workbench and its terminology

Basic Concepts Weka Workbench and its terminology Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know

More information

Understanding Rule Behavior through Apriori Algorithm over Social Network Data

Understanding Rule Behavior through Apriori Algorithm over Social Network Data Global Journal of Computer Science and Technology Volume 12 Issue 10 Version 1.0 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172

More information

Slide and Sample Data.

Slide and Sample Data. Data Mining with R ว ชา การค นพบองค ความร และการทำเหม องข อม ลช นส ง Veerasak Kritsanapraphan Chulalongkorn University Email : veerasak.kr568@cbs.chula.ac.th @veerasakk Slide and Sample Data https://github.com/vkrit/chula_datamining

More information

Paper SAS Taming the Rule. Charlotte Crain, Chris Upton, SAS Institute Inc.

Paper SAS Taming the Rule. Charlotte Crain, Chris Upton, SAS Institute Inc. ABSTRACT Paper SAS2620-2016 Taming the Rule Charlotte Crain, Chris Upton, SAS Institute Inc. When business rules are deployed and executed--whether a rule is fired or not if the rule-fire outcomes are

More information

Tutorial Case studies

Tutorial Case studies 1 Topic Wrapper for feature subset selection Continuation. This tutorial is the continuation of the preceding one about the wrapper feature selection in the supervised learning context (http://data-mining-tutorials.blogspot.com/2010/03/wrapper-forfeature-selection.html).

More information

A study of classification algorithms using Rapidminer

A study of classification algorithms using Rapidminer Volume 119 No. 12 2018, 15977-15988 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A study of classification algorithms using Rapidminer Dr.J.Arunadevi 1, S.Ramya 2, M.Ramesh Raja

More information

Pre-Requisites: CS2510. NU Core Designations: AD

Pre-Requisites: CS2510. NU Core Designations: AD DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification

More information

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

Data Mining Technology Based on Bayesian Network Structure Applied in Learning , pp.67-71 http://dx.doi.org/10.14257/astl.2016.137.12 Data Mining Technology Based on Bayesian Network Structure Applied in Learning Chunhua Wang, Dong Han College of Information Engineering, Huanghuai

More information

To enable us to input the real world data warehouse requirements into cost comparison model in a way that would not favor any single vendor, we needed

To enable us to input the real world data warehouse requirements into cost comparison model in a way that would not favor any single vendor, we needed Probable Cost of Ownership for Enterprise Data Warehouse Software Providing a financial basis for choosing the right software to power your next Data Warehouse project Executive Summary The purchase of

More information

Machine Learning: Algorithms and Applications Mockup Examination

Machine Learning: Algorithms and Applications Mockup Examination Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature

More information

Modelling Structures in Data Mining Techniques

Modelling Structures in Data Mining Techniques Modelling Structures in Data Mining Techniques Ananth Y N 1, Narahari.N.S 2 Associate Professor, Dept of Computer Science, School of Graduate Studies- JainUniversity- J.C.Road, Bangalore, INDIA 1 Professor

More information

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University

Cse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before

More information

An Effectual Approach to Swelling the Selling Methodology in Market Basket Analysis using FP Growth

An Effectual Approach to Swelling the Selling Methodology in Market Basket Analysis using FP Growth An Effectual Approach to Swelling the Selling Methodology in Market Basket Analysis using FP Growth P.Sathish kumar, T.Suvathi K.S.Rangasamy College of Technology suvathi007@gmail.com Received: 03/01/2017,

More information

Advance analytics and Comparison study of Data & Data Mining

Advance analytics and Comparison study of Data & Data Mining Advance analytics and Comparison study of Data & Data Mining Review paper on Concepts and Practice with RapidMiner Vandana Kaushik 1, Dr. Vikas Siwach 2 M.Tech (Software Engineering) Kaushikvandana22@gmail.com

More information

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE

AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE AN IMPROVISED FREQUENT PATTERN TREE BASED ASSOCIATION RULE MINING TECHNIQUE WITH MINING FREQUENT ITEM SETS ALGORITHM AND A MODIFIED HEADER TABLE Vandit Agarwal 1, Mandhani Kushal 2 and Preetham Kumar 3

More information

PREDICTION OF POPULAR SMARTPHONE COMPANIES IN THE SOCIETY

PREDICTION OF POPULAR SMARTPHONE COMPANIES IN THE SOCIETY PREDICTION OF POPULAR SMARTPHONE COMPANIES IN THE SOCIETY T.Ramya 1, A.Mithra 2, J.Sathiya 3, T.Abirami 4 1 Assistant Professor, 2,3,4 Nadar Saraswathi college of Arts and Science, Theni, Tamil Nadu (India)

More information

CE4031 and CZ4031 Database System Principles

CE4031 and CZ4031 Database System Principles CE4031 and CZ4031 Database System Principles Academic AY1819 Semester 1 CE/CZ4031 Database System Principles s CE/CZ2001 Algorithms; CZ2007 Introduction to Databases CZ4033 Advanced Data Management (not

More information

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES

COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES USING DIFFERENT DATASETS V. Vaithiyanathan 1, K. Rajeswari 2, Kapil Tajane 3, Rahul Pitale 3 1 Associate Dean Research, CTS Chair Professor, SASTRA University,

More information

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise

More information

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

More information

Knowledge Engineering and Data Mining. Knowledge engineering has 6 basic phases:

Knowledge Engineering and Data Mining. Knowledge engineering has 6 basic phases: Knowledge Engineering and Data Mining Knowledge Engineering The process of building intelligent knowledge based systems is called knowledge engineering Knowledge engineering has 6 basic phases: 1. Problem

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Sensor Based Time Series Classification of Body Movement

Sensor Based Time Series Classification of Body Movement Sensor Based Time Series Classification of Body Movement Swapna Philip, Yu Cao*, and Ming Li Department of Computer Science California State University, Fresno Fresno, CA, U.S.A swapna.philip@gmail.com,

More information

Hands on Datamining & Machine Learning with Weka

Hands on Datamining & Machine Learning with Weka Step1: Click the Experimenter button to launch the Weka Experimenter. The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze

More information

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 The Survey Of Data Mining And Warehousing Architha.S, A.Kishore Kumar Department of Computer Engineering Department of computer engineering city engineering college VTU Bangalore, India ABSTRACT: Data

More information

Intrusion Detection Using Data Mining Technique (Classification)

Intrusion Detection Using Data Mining Technique (Classification) Intrusion Detection Using Data Mining Technique (Classification) Dr.D.Aruna Kumari Phd 1 N.Tejeswani 2 G.Sravani 3 R.Phani Krishna 4 1 Associative professor, K L University,Guntur(dt), 2 B.Tech(1V/1V),ECM,

More information

SAS E-MINER: AN OVERVIEW

SAS E-MINER: AN OVERVIEW SAS E-MINER: AN OVERVIEW Samir Farooqi, R.S. Tomar and R.K. Saini I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 012 Samir@iasri.res.in; tomar@iasri.res.in; saini@iasri.res.in Introduction SAS Enterprise

More information

The Procedure Proposal of Manufacturing Systems Management by Using of Gained Knowledge from Production Data

The Procedure Proposal of Manufacturing Systems Management by Using of Gained Knowledge from Production Data The Procedure Proposal of Manufacturing Systems Management by Using of Gained Knowledge from Production Data Pavol Tanuska Member IAENG, Pavel Vazan, Michal Kebisek, Milan Strbo Abstract The paper gives

More information

Data Warehousing and Machine Learning

Data Warehousing and Machine Learning Data Warehousing and Machine Learning Introduction Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 47 What is Data Mining?? Introduction DWML Spring

More information

IBM SPSS Statistics and open source: A powerful combination. Let s go

IBM SPSS Statistics and open source: A powerful combination. Let s go and open source: A powerful combination Let s go The purpose of this paper is to demonstrate the features and capabilities provided by the integration of IBM SPSS Statistics and open source programming

More information