Prototyping DM Techniques with WEKA and YALE Open-Source Software
|
|
- Silvester Hamilton
- 6 years ago
- Views:
Transcription
1 TIES443 Contents Tutorial 1 Prototyping DM Techniques with WEKA and YALE Open-Source Software Department of Mathematical Information Technology University of Jyväskylä Mykola Pechenizkiy Course webpage: November 7, Brief Review of DM Software Commercial Open-source WEKA YALE The R Project for Statistical Computing Pentaho whole BI solutions. Matlab Sami will tell you more during the 2nd Tutorial WEKA vs. YALE Comparison Exploration Experimentation Visualization 1 st Assignment 2 Data Mining Software Many providers of commercial DM software SAS Enterprise Miner, SPSS Clementine, Statistica Data Miner, MS SQL Server, Polyanalyst, KnowledgeSTUDIO, IBM Intelligent Miner. Universities can now receive free copies of DB2 and Intelligent Miner for educational or research purposes. See for a list Open Source: WEKA (Waikato Environment for Knowledge Analysis) YALE (Yet Another Learning Environment) Many others MLC++, Minitab, AlphaMiner, Rattle, KNIME The Pentaho BI project a pioneering initiative by the Open Source development community to provide organizations with a comprehensive set of BI capabilities that enable them to radically improve business performance, efficiency, and effectiveness. Data Mining with WEKA The following slides are from by Eibe Frank Copyright: Martin Kramer (mkramer@wxs.nl) 3 4 WEKA: the software WEKA only deals with flat files Machine learning/data mining software written in Java (distributed under the GNU Public License) Used for research, education, and applications Complements Data Mining book by Witten & Frank Main features: Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods Graphical user interfaces (incl. data visualization) Environment for comparing learning age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present
2 WEKA only deals with flat age sex { female, chest_pain_type { typ_angina, asympt, non_anginal, cholesterol exercise_induced_angina { no, class { present, 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present Command line tutorial Explorer: Pre-processing the Data Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WEKA are called filters WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes,
3
4
5
6 31 32 Explorer: building classifiers Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets, Meta -classifiers include: Bagging, boosting, stacking, error-correcting output codes, locally weighted learning,
7
8
9
10
11 QuickTime and a TIFF (LZW) decompressor are needed to see this picture
12 QuickTime and a TIFF (LZW) decompressor are needed to see this picture. QuickTime and a TIFF (LZW) decompressor are needed to see this picture
13
14 79 80 Explorer: clustering data WEKA contains clusterers for finding groups of similar instances in a dataset Implemented schemes are: k-means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to true clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution
15
16
17 Explorer: finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence
18 Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, Very flexible: WEKA allows (almost) arbitrary combinations of these two
19 Explorer: Data Visualization Visualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values Jitter option to deal with nominal attributes (and to detect hidden data points) Zoom-in function
20
21
22 Performing Experiments Experimenter makes it easy to compare the performance of different learning schemes For classification and regression problems Results can be written into file or database Evaluation options: cross-validation, learning curve, holdout Can also iterate over different parameter settings Significance-testing built in!
23
24 The Knowledge Flow GUI New graphical user interface for WEKA Java-Beans-based interface for setting up and running machine learning experiments Data sources, classifiers, etc. are beans and can be connected graphically Data flows through components: e.g., data source -> filter -> classifier -> evaluator Layouts can be saved and loaded again later
25
26
27 Conclusion: try it yourself! WEKA is available at Also has a list of projects based on WEKA YALE has different interfaces and ideas behind but it also integrates all available DM techniques from WEKA
28 The following slides are compiled from screenshots and related descriptions available from YALE pages YALE Yet Another Learning Environment Artificial Intelligence Unit of the University of Dortmund. Features of YALE freely available open-source knowledge discovery environment 100% pure Java (runs on every major platform and operating system) KD processes are modeled as simple operator trees which is both intuitive and powerful operator trees or subtrees can be saved as building blocks for later re-use internal XML representation ensures standardized interchange format of data mining experiments simple scripting language allowing for automatic largescale experiments multi-layered data view concept ensures efficient and transparent data handling Features of YALE Flexibility in using YALE: graphical user interface (GUI) for interactive prototyping command line mode (batch mode) for automated large-scale applications Java API to ease usage of YALE from your own programs simple plugin and extension mechanisms, some plugins already exists and you can easily add your own powerful plotting facility offering a large set of sophisticated highdimensional visualization techniques for data and models more than 350 machine learning, evaluation, in- and output, pre- and post-processing, and visualization operators plus numerous meta optimization schemes machine learning library WEKA fully integrated YALE s potential application include text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining. 165 Experiment Setup the initial operator tree which only consist of a root node. The "Tree View" tab is the most often used editor for YALE experiments. Left: the current operator tree. Right: a table with the parameters of the currently selected operator. The lower part of the YALE main frame serves for displaying and viewing log and error messages. 166 After the learning operator "J48", a breakpoint indicates that the intermediate results can be inspected. Due to the modular concept of YALE, it is always possible to inspect and save intermediate results, e.g. the results for each individual run in a cross validation add new operators to the experiment: directly from the context menu of its parent. the new operator dialog shown in this screenshot. Several search constrains exist and a short description for each operator is shown
29 The operator trees are coded and represented by a simple XML format. The XML editor tab allows for fast and direct manipulations of the current experiment. All views can also be printed and exported to a wide range of graphic formats including jpg, png, ps and pdf. The "Box View" - is another viewer for YALE experiments. the box format is an intuitive way of representing the nesting of the operators. but editing is not possible "Monitor" tab provides an overview of the currently used memory and is an important tool for large-scale data mining tasks on huge data sets. The amount of used memory during an experiment run can even be logged in the same way like all other provided logging values. Data can be imported from several file formats with the attribute editor. Other file formats like Arff, C45, csv, and dbase can be loaded with specialized operators. Attribute Editor can be used to create meta data descriptions from almost arbitrary file formats. These meta data descriptions can then be used for an input operator which actually loads the data Additional attributes (features) can easily be constructed from your data. YALE provides several approaches to construct the best feature space automatically. These approaches range from feature space transformations like PCA, GHA, ICA or the kernel versions to standard feature selection techniques to several evolutionary approaches for feature construction and extraction. 173 Help features to ease the learning phase for new users: An online tutorial, tool tip texts, a beginner and expert mode, operator info screens, a GUI manual, and the YALE tutorial
30 Data Visualization Each time a data set is presented in the results tab (e.g. after loading it), several views appear: a meta data view describing all attributes, a data view showing the actual data and a plot view providing a large set of (high-dimensional) plotters for the data set at hand. The basic scatter plotter: Two of the attribute are used as axes, the class label attribute is used for colorization. The legend at the top maps the colors used to the classes or, in case of a real-valued color plot column, to the corresponding real values The standard scatter plotter even allows jittering, zooming, and displaying example ids. Doubleclicking a data point opens a visualizer. The standard example visualizer is presented here. 2D scatter plots can be put together to a scatter plot matrix where for all pairs of dimensions a usual scatter plot is drawn. This plotter is only available for less then 10 dimensions. For higher number of dimensions one of the other high-dimensional data plotter presented below should be used A 3D scatter plot exists similar to the colorized 2D scatter plot discussed above. The viewport can be rotated and zoomed to fit your needs. The built-in 2D and 3D plotters are a quick and easy way to view your numerical and nominal results, even as online plot at experiment runtime! SOM (Self-Organizing Map) plotter which uses a Kohonen net for dimensionality reduction. Plotting of the U-, the P-, and the U*-Matrix are supported with different color schemes. The data points can be colorized by one of the data columns, e.g. with the prediction label
31 SOM (Self-Organizing Map) plotter which uses a Kohonen net for dimensionality reduction. a gray scale color scheme was used to plot the U- Matrix. The parallel plotter prints the axes of all dimensions parallel to each other. This is the natural visualization technique for series data but can also be useful for other types of data. The main advantage of parallel plots is that a very high number of dimensions can be visualized with this technique. The dimensions are colorized with the feature weights. The more yellow a dimension is marked, the more important this column is quartile plots (also known as box plots) are often used for experiment results like performance values but it is possible to summarize the statistical properties of data columns in general with this type of plot. Histogram plots (also known as distribution plots) RadViz is another highdimensional data plotter where the data columns are placed as radial dimension anchors. Each data point is connected to each anchor with a spring corresponding to the feature values. This will lead to a fixed position in the two-dimensional plane. Again, weights are used to mark the more important columns. A survey plot is a sort of vertical histogram matrix also suitable for a large number of dimensions. Each line corresponds to one data point and can be colorized by one of the columns. The length of each section corresponds to the value of the data point for that dimension. For up to three dimensions the order of the histograms can be selected
32 Visualization of Models and other Results Andrews curves are another way of visualizing highdimensional data. Each data point is projected onto a set of orthogonal trigonometric functions and displayed as a curve. It is known that Andrews curves preserve distances, so they have many uses for data analysis and exploration. Often outliers and hidden patterns can be well detected in these plots. The result of a learning step is called model. Some models provide a graphical representation of the learned hypothesis. This screenshot presents a learned decision tree for the widely known "labor negotiations" data set from the UCI repository. Results like learned models, performance values, data sets or selected attributes are displayed when the experiment is completed or a breakpoint is reached In cases where no graphical representation of a learned model is available, at least a textual description of the learned model is presented. In this screenshot you see a Stacking model consisting of a rule model (the upper half) and a neural network (starts at the lower half). Both base models are described by simple and understandable texts. This is a density plot (similar to a contour plot) of the decision function of a Support Vector Machine (SVM). Almost all SVM implementations in YALE provide a table and a plot view of the learned model. In this screenshot, red points refer to support vectors, blue points to normal training examples. Bluish regions will be predicted negative, reddish regions will be predicted positive only the support vectors are shown colorized by the preditcted function value for the corresponding data point. Examples on the red side will be predicted positive; examples on the blue side will be predicted negative. There is a perfectly linear separation in two of the dimensions and it seems to be that the parameters were not chosen optimal since the number of support vectors is rather high. alpha values (Lagrange multipliers) of the SVM are plotted against the function values and colorized with the true label. We applied a slight jittering to make more points visible. This model seems to be "well-learned", since only few points have a alpha value not equal to zero and these are the points with function values approximately
33 This surface plot presents the result of a meta optimization experiment: the parameters of one of the operators are optimized. the plot can be rotated and zoomed WEKA & YALE Comparison You tell me in your report Now lets go through the first assignment 1 st Assignment nment1.pdf My advise for you is to come back to this assignment and WEKA and YALE tools after each forthcoming lecture to see how the things are implemented and can be used in practice
An Introduction to WEKA Explorer. In part from: Yizhou Sun 2008
An Introduction to WEKA Explorer In part from: Yizhou Sun 2008 What is WEKA? Waikato Environment for Knowledge Analysis It s a data mining/machine learning tool developed by Department of Computer Science,,
More informationClassification using Weka (Brain, Computation, and Neural Learning)
LOGO Classification using Weka (Brain, Computation, and Neural Learning) Jung-Woo Ha Agenda Classification General Concept Terminology Introduction to Weka Classification practice with Weka Problems: Pima
More informationDATA ANALYSIS WITH WEKA. Author: Nagamani Mutteni Asst.Professor MERI
DATA ANALYSIS WITH WEKA Author: Nagamani Mutteni Asst.Professor MERI Topic: Data Analysis with Weka Course Duration: 2 Months Objective: Everybody talks about Data Mining and Big Data nowadays. Weka is
More informationData Mining With Weka A Short Tutorial
Data Mining With Weka A Short Tutorial Dr. Wenjia Wang School of Computing Sciences University of East Anglia (UEA), Norwich, UK Content 1. Introduction to Weka 2. Data Mining Functions and Tools 3. Data
More informationTutorial on Machine Learning Tools
Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow
More informationTanagra: An Evaluation
Tanagra: An Evaluation Jessica Enright Jonathan Klippenstein November 5th, 2004 1 Introduction to Tanagra Tanagra was written as an aid to education and research on data mining by Ricco Rakotomalala [1].
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationJue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb Outline
Learn to Use Weka Jue Wang (Joyce) Department of Computer Science, University of Massachusetts, Boston Feb-09-2010 Outline Introduction of Weka Explorer Filter Classify Cluster Experimenter KnowledgeFlow
More informationRAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE
RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo (luigi.grimaudo@polito.it) DataBase And Data Mining Research Group (DBDMG) Summary RapidMiner project Strengths
More informationSummary. RapidMiner Project 12/13/2011 RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE
RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo (luigi.grimaudo@polito.it) DataBase And Data Mining Research Group (DBDMG) Summary RapidMiner project Strengths
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationWhat is KNIME? workflows nodes standard data mining, data analysis data manipulation
KNIME TUTORIAL What is KNIME? KNIME = Konstanz Information Miner Developed at University of Konstanz in Germany Desktop version available free of charge (Open Source) Modular platform for building and
More informationData Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44
Data Mining Piotr Paszek piotr.paszek@us.edu.pl Introduction (Piotr Paszek) Data Mining DM KDD 1 / 44 Plan of the lecture 1 Data Mining (DM) 2 Knowledge Discovery in Databases (KDD) 3 CRISP-DM 4 DM software
More informationData Mining Practical Machine Learning Tools and Techniques
Engineering the input and output Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Attribute selection z Scheme-independent, scheme-specific
More informationWEKA Explorer User Guide for Version 3-4
WEKA Explorer User Guide for Version 3-4 Richard Kirkby Eibe Frank July 28, 2010 c 2002-2010 University of Waikato This guide is licensed under the GNU General Public License version 2. More information
More informationCommunity edition(open-source) Enterprise edition
Suseela Bhaskaruni Rapid Miner is an environment for machine learning and data mining experiments. Widely used for both research and real-world data mining tasks. Software versions: Community edition(open-source)
More information8. MINITAB COMMANDS WEEK-BY-WEEK
8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More information9. Conclusions. 9.1 Definition KDD
9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]
More informationCHAPTER 6 EXPERIMENTS
CHAPTER 6 EXPERIMENTS 6.1 HYPOTHESIS On the basis of the trend as depicted by the data Mining Technique, it is possible to draw conclusions about the Business organization and commercial Software industry.
More informationWEKA homepage.
WEKA homepage http://www.cs.waikato.ac.nz/ml/weka/ Data mining software written in Java (distributed under the GNU Public License). Used for research, education, and applications. Comprehensive set of
More information> Data Mining Overview with Clementine
> Data Mining Overview with Clementine This two-day course introduces you to the major steps of the data mining process. The course goal is for you to be able to begin planning or evaluate your firm s
More informationUsing Decision Boundary to Analyze Classifiers
Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision
More informationContents. ACE Presentation. Comparison with existing frameworks. Technical aspects. ACE 2.0 and future work. 24 October 2009 ACE 2
ACE Contents ACE Presentation Comparison with existing frameworks Technical aspects ACE 2.0 and future work 24 October 2009 ACE 2 ACE Presentation 24 October 2009 ACE 3 ACE Presentation Framework for using
More information1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file
1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/
More informationEvaluation Report on PolyAnalyst 4.6
1. INTRODUCTION CMPUT695: Assignment#2 Evaluation Report on PolyAnalyst 4.6 Hongqin Fan and Yunping Wang PolyAnalyst 4.6 professional edition (PA) is a commercial data mining tool developed by Megaputer
More informationKNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Data Understanding Exercise: Market Basket Analysis Exercise:
More informationKNIME for the life sciences Cambridge Meetup
KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016 What is KNIME? A bit of motivation: tool blending, data blending, documentation, automation, reproducibility More
More informationIntroduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)
Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationApplication of Data Mining in Manufacturing Industry
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 3, Number 2 (2011), pp. 59-64 International Research Publication House http://www.irphouse.com Application of Data Mining
More informationWeka: Practical machine learning tools and techniques with Java implementations
Weka: Practical machine learning tools and techniques with Java implementations AI Tools Seminar University of Saarland, WS 06/07 Rossen Dimov 1 Supervisors: Michael Feld, Dr. Michael Kipp, Dr. Alassane
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationPoints Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked
Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More informationPROJECT 1 DATA ANALYSIS (KR-VS-KP)
PROJECT 1 DATA ANALYSIS (KR-VS-KP) Author: Tomáš Píhrt (xpiht00@vse.cz) Date: 12. 12. 2015 Contents 1 Introduction... 1 1.1 Data description... 1 1.2 Attributes... 2 1.3 Data pre-processing & preparation...
More informationEnterprise Miner Version 4.0. Changes and Enhancements
Enterprise Miner Version 4.0 Changes and Enhancements Table of Contents General Information.................................................................. 1 Upgrading Previous Version Enterprise Miner
More informationGain Greater Productivity in Enterprise Data Mining
Clementine 9.0 Specifications Gain Greater Productivity in Enterprise Data Mining Discover patterns and associations in your organization s data and make decisions that lead to significant, measurable
More informationNow, Data Mining Is Within Your Reach
Clementine Desktop Specifications Now, Data Mining Is Within Your Reach Data mining delivers significant, measurable value. By uncovering previously unknown patterns and connections in data, data mining
More informationData Science. Data Analyst. Data Scientist. Data Architect
Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationData Mining. Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka. I. Data sets. I.1. Data sets characteristics and formats
Data Mining Lab 1: Data sets: characteristics, formats, repositories Introduction to Weka I. Data sets I.1. Data sets characteristics and formats The data to be processed can be structured (e.g. data matrix,
More informationOracle Big Data Science
Oracle Big Data Science Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com @VlamisSoftware Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification
More informationBasic Concepts Weka Workbench and its terminology
Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know
More informationACHIEVEMENTS FROM TRAINING
LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM
More informationAssignment 1: CS Machine Learning
Assignment 1: CS7641 - Machine Learning Saad Khan September 18, 2015 1 Introduction I intend to apply supervised learning algorithms to classify the quality of wine samples as being of high or low quality
More informationSupervised Clustering of Yeast Gene Expression Data
Supervised Clustering of Yeast Gene Expression Data In the DeRisi paper five expression profile clusters were cited, each containing a small number (7-8) of genes. In the following examples we apply supervised
More informationGUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV
GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATIONS (MCA) Semester: IV Subject Name: Elective I Data Warehousing & Data Mining (DWDM) Subject Code: 2640005 Learning Objectives: To understand
More informationEnterprise Miner Software: Changes and Enhancements, Release 4.1
Enterprise Miner Software: Changes and Enhancements, Release 4.1 The correct bibliographic citation for this manual is as follows: SAS Institute Inc., Enterprise Miner TM Software: Changes and Enhancements,
More informationWEKA KnowledgeFlow Tutorial for Version 3-5-6
WEKA KnowledgeFlow Tutorial for Version 3-5-6 Mark Hall Peter Reutemann June 1, 2007 c 2007 University of Waikato Contents 1 Introduction 2 2 Features 3 3 Components 4 3.1 DataSources..............................
More informationGain Insight and Improve Performance with Data Mining
Clementine 11.0 Specifications Gain Insight and Improve Performance with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Input: Concepts, instances, attributes Terminology What s a concept?
More informationTechnical Support Minitab Version Student Free technical support for eligible products
Technical Support Free technical support for eligible products All registered users (including students) All registered users (including students) Registered instructors Not eligible Worksheet Size Number
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationDATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:
DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business
More informationPractical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer
Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before
More informationShort instructions on using Weka
Short instructions on using Weka G. Marcou 1 Weka is a free open source data mining software, based on a Java data mining library. Free alternatives to Weka exist as for instance R and Orange. The current
More informationThe Explorer. chapter Getting started
chapter 10 The Explorer Weka s main graphical user interface, the Explorer, gives access to all its facilities using menu selection and form filling. It is illustrated in Figure 10.1. There are six different
More informationCOMP 6838 Data MIning
COMP 6838 Data MIning LECTURE 1: Introduction Dr. Edgar Acuna Departmento de Matematicas Universidad de Puerto Rico- Mayaguez math.uprm.edu/~edgar 1 Course s Objectives Understand the basic concepts to
More informationData Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software
1 CHAPTER 1 Introduction to SAS Enterprise Miner Software Data Mining Overview 1 Layout of the SAS Enterprise Miner Window 2 Using the Application Main Menus 3 Using the Toolbox 8 Using the Pop-Up Menus
More informationCALUMMA Management Tool User Manual
CALUMMA Management Tool User Manual CALUMMA Management Tool Your Data Management SIMPLIFIED. by RISC Software GmbH The CALUMMA system is a highly adaptable data acquisition and management system, for complex
More informationA Survey of Statistical Modeling Tools
1 of 6 A Survey of Statistical Modeling Tools Madhuri Kulkarni (A survey paper written under the guidance of Prof. Raj Jain) Abstract: A plethora of statistical modeling tools are available in the market
More informationLecture Topic Projects
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data
More informationWEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov
WEKA: Practical Machine Learning Tools and Techniques in Java Seminar A.I. Tools WS 2006/07 Rossen Dimov Overview Basic introduction to Machine Learning Weka Tool Conclusion Document classification Demo
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationPre-Requisites: CS2510. NU Core Designations: AD
DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification
More informationInformation Driven Healthcare:
Information Driven Healthcare: Machine Learning course Lecture: Feature selection I --- Concepts Centre for Doctoral Training in Healthcare Innovation Dr. Athanasios Tsanas ( Thanasis ), Wellcome Trust
More informationEffect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction
International Journal of Computer Trends and Technology (IJCTT) volume 7 number 3 Jan 2014 Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction A. Shanthini 1,
More informationRight-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table
Q Cheat Sheets What to do when you cannot figure out how to use Q What to do when the data looks wrong Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help
More informationOracle Big Data Science IOUG Collaborate 16
Oracle Big Data Science IOUG Collaborate 16 Session 4762 Tim and Dan Vlamis Tuesday, April 12, 2016 Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri Developed 200+ Oracle
More informationSAS Visual Analytics 8.2: Working with Report Content
SAS Visual Analytics 8.2: Working with Report Content About Objects After selecting your data source and data items, add one or more objects to display the results. SAS Visual Analytics provides objects
More information6.034 Design Assignment 2
6.034 Design Assignment 2 April 5, 2005 Weka Script Due: Friday April 8, in recitation Paper Due: Wednesday April 13, in class Oral reports: Friday April 15, by appointment The goal of this assignment
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationCOMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES
COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES USING DIFFERENT DATASETS V. Vaithiyanathan 1, K. Rajeswari 2, Kapil Tajane 3, Rahul Pitale 3 1 Associate Dean Research, CTS Chair Professor, SASTRA University,
More informationTutorials Case studies
1. Subject Three curves for the evaluation of supervised learning methods. Evaluation of classifiers is an important step of the supervised learning process. We want to measure the performance of the classifier.
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationSPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL
SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered
More informationRecitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002
Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002 Introduction Neural networks are flexible nonlinear models that can be used for regression and classification
More informationPROGRAMMING AND ENGINEERING COMPUTING WITH MATLAB Huei-Huang Lee SDC. Better Textbooks. Lower Prices.
PROGRAMMING AND ENGINEERING COMPUTING WITH MATLAB 2018 Huei-Huang Lee SDC P U B L I C AT I O N S Better Textbooks. Lower Prices. www.sdcpublications.com Powered by TCPDF (www.tcpdf.org) Visit the following
More informationMACHINE LEARNING Example: Google search
MACHINE LEARNING Lauri Ilison, PhD Data Scientist 20.11.2014 Example: Google search 1 27.11.14 Facebook: 350 million photo uploads every day The dream is to build full knowledge of the world and know everything
More informationSubject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.
Subject Copy paste feature into the diagram. When we define the data analysis process into Tanagra, it is possible to copy components (or entire branches of components) towards another location into the
More informationLinear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines
Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationMinitab 18 Feature List
Minitab 18 Feature List * New or Improved Assistant Measurement systems analysis * Capability analysis Graphical analysis Hypothesis tests Regression DOE Control charts * Graphics Scatterplots, matrix
More informationData Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?
Data Mining (Big Data Analytics) Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://user.engineering.uiowa.edu/~ankusiak/
More informationGraphing Calculator Tutorial
Graphing Calculator Tutorial This tutorial is designed as an interactive activity. The best way to learn the calculator functions will be to work the examples on your own calculator as you read the tutorial.
More informationExcel Manual X Axis Labels Below Chart 2010 Scatter
Excel Manual X Axis Labels Below Chart 2010 Scatter Of course, I want the chart itself to remain the same, so, the x values of dots are in row "b(o/c)", their y values are in "a(h/c)" row, and their respective
More informationMachine Learning in Action
Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting
More informationLearn What s New. Statistical Software
Statistical Software Learn What s New Upgrade now to access new and improved statistical features and other enhancements that make it even easier to analyze your data. The Assistant Data Customization
More informationSAS Visual Analytics 8.2: Getting Started with Reports
SAS Visual Analytics 8.2: Getting Started with Reports Introduction Reporting The SAS Visual Analytics tools give you everything you need to produce and distribute clear and compelling reports. SAS Visual
More informationNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà Laboratory for Relational Algorithmics, Complexity and Learning LARCA UPC-Barcelona Tech, Catalonia
More informationAn Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm
Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy
More informationK236: Basis of Data Science
Schedule of K236 K236: Basis of Data Science Lecture 6: Data Preprocessing Lecturer: Tu Bao Ho and Hieu Chi Dam TA: Moharasan Gandhimathi and Nuttapong Sanglerdsinlapachai 1. Introduction to data science
More informationMICROSOFT BUSINESS INTELLIGENCE
SSIS MICROSOFT BUSINESS INTELLIGENCE 1) Introduction to Integration Services Defining sql server integration services Exploring the need for migrating diverse Data the role of business intelligence (bi)
More information