A Soft-Computing Approach to Knowledge Flow Synthesis and Optimization
|
|
- Alice Wright
- 5 years ago
- Views:
Transcription
1 A Soft-Computing Approach to Knowledge Flow Synthesis and Optimization Tomáš Řehořek Pavel Kordík Computational Intelligence Group (CIG), Faculty of Information Technology (FIT), Czech Technical University (CTU) in Prague September 5, 2012 T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow SynthesisSeptember and Optimization 5, / 15
2 Introduction: Knowledge Flows in Predictive Analysis Tasks in Predictive Analysis Common tasks in Predictive Data Analysis: 1 Data Preprocessing Feature Selection, Transformation of Space (e.g. PCA) Methods can be put into a Chain 2 Selection Selection of Algorithms k-nn, Naive Bayes, Decision Tree, Neural Networks,..., Ensemble Techniques (Majority Vote, Bagging, Stacking... ) Parameter Tuning Distance Measure and k in k-nn, Minimal Gain for Split in Decision Tree T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow SynthesisSeptember and Optimization 5, / 15
3 Introduction: Knowledge Flows in Predictive Analysis Dependence on Data The optimal choice of Preprocessing and ing methods is -dependent: Different Datasets require different preprocessing and are suitable for different modeling algorithms Traditional approach: Explore the in trial-and-error manner Even mining experts follow this scenario T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow SynthesisSeptember and Optimization 5, / 15
4 Introduction: Knowledge Flows in Predictive Analysis Knowledge Flows Modern approach to express the whole process: Knowledge Flows (KFs) Directed Graphs of interconnected, properly configured Actions Example: KFs in RapidMiner 5 learning environment: T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow SynthesisSeptember and Optimization 5, / 15
5 Optimization of Knowledge Flows Evolving Good Graphs Our Approach: KFs are viewed as Directed Acyclic Graphs (DAGs) with labeled nodes. Knowledge Discovery Workflow Process Labeled Directed Acyclic Graph R 1 d1 L m P d A d S d W R 2 d2 These graphs are subject to optimization by means of evolutionary computation. T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow SynthesisSeptember and Optimization 5, / 15
6 Problem Statement: Finding Optimal Predictive Analysis KF for a given Dataset Given a Dataset D of examples from R n L, find: 1 Sequence p 1,..., p k of properly configured preprocessing actions, 2 Properly configured learning algorithm (or possible hierarchical ensemble of such algorithms) m such that model obtained as m (p k (... p 1 (D)...)) has minimal generalization error. Example: Feature Selection PCA Majority Vote (Decision Tree, k-nn, Naive Bayes) Dataset p 1 p 2 m Dataset T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow SynthesisSeptember and Optimization 5, / 15
7 Methodology Embryonic STGP [Koza1997] We use Embryonic Strongly Typed Genetic Programming Proposed by J.R.Koza for evolving Analog Electrical Circuits [Koza1997] Encodes the graph in form of rooted tree The tree codes a plan for developing a complete graph from a simple graph referred to as the embryo Fitness function: classification accuracy 1 T (x,l) T { 0, m(x) l 1, m(x) = l T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow SynthesisSeptember and Optimization 5, / 15
8 Methodology Embryonic KF in RapidMiner 5 We use the RapidMiner 5 software to measure fitness Embryonic Knowledge Flow for our experiment: Preprocessing Cross-Validation Preprocessing Dataset Proprocessing Learning Classification Testing Classification Accuracy Process Body Fixed Variable T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow SynthesisSeptember and Optimization 5, / 15
9 Methodology STGP Grammar: Validaton Process Validation Process <ValidationProcess> ::= Process(<Preprocessing>,<Learner>) Preprocessing Learner Syntax Insert arbitrary number of preprocessing steps Substitute learner 20-fold Cross Validation Testing Semantics Dataset Training Learner Learning subprocess application Performace evaluation Classification accuracy Classification accuracy Testing subprocess T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow SynthesisSeptember and Optimization 5, / 15
10 application Performace evaluation Testing Classification accuracy application Performace evaluation Testing Classification accuracy application Performace evaluation Testing Classification accuracy Methodology STGP Grammar: Preprocessing, ing Learner <Learner> ::= <KNN> <NaiveBayes> <DecisionTree> <MajorityVote> <Bagging> K-NN Naive Bayes Decision Tree Majority Vote Bagging Syntax Preprocessing <Preprocessing> ::= <PCA> <FilterAttributes> PreprocessingTerminal K-NN Use k-nearest neighbor learner 20-fold Cross Validation Select PCA Projection Attributes <PCA> ::= PCA(<NaturalNumber>),<Preprocessing> PCA Projection Preprocessing Terminal Syntax Semantics Insert Principal Component Analysis (PCA) Dimensionality Reduction Natural Number Boolean constant Distance Measure Training Distance measure used? Weighted vote? K-NN Number of neighbors? Learning subprocess Testing subprocess Preprocessing sequence so far (potentially empty) PCA Naive Bayes Use Naive Bayes Learner 20-fold Cross Validation Natural Number Preprocessing <SelectAttributes> ::= SelectAttributes( <SetOfPositiveIntegers>, BooleanConstant), <Preprocessing>; Select Attributes Insert next preprocessing Set number of components here Insert Feature Selection Dimensionality Redution Preprocessing sequence so far (potentially empty) Select Attributes criterion Decision Tree Criterion Boolean constant Decision Tree pre-pruning Boolean constant Training Laplace Correlation? Use Decision Tree Learner Training Naive Bayes Learning subprocess 20-fold Cross Validation Naive Bayes Testing subprocess Set of Positive Integers Boolean constant Preprocessing Invert the selection? Insert next preprocessing here minimal size for split Natural Number pruning Boolean constant Apply parameters Learning subprocess Testing subprocess Set indices of attributes to be selected minimal leaf size Natural Number prepruning alternatives Real Number from [0, ] Preprocessing Terminal Do not insert any node Preprocessing sequence so far (potentially empty) minimal gain Real Number from [0, ] maximal depth Natural Size Bound confidence level Real Number from [1e-7, 0.5] Semantics T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow Synthesis September and Optimization 5, / 15
11 Preliminary Results Ecoli Dataset: Sample Tree Evolved Sample tree evolved on the Ecoli tset Attribute Selection (remove attribute #0) SA VP BG Naive Bayes Learner Bagging AS true PCA NB 0 3 EPS PCA Projection (3 components) Preprocessing Sequence Terminal T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow Synthesis September and Optimization 5, / 15
12 Preliminary Results Ecoli Dataset: Evolved KF in RapidMiner 5 T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow Synthesis September and Optimization 5, / 15
13 Preliminary Results Ecoli Dataset: Generation vs. Average Fitness of the Population 0.9 Ecoli Dataset: Generation vs. Average Fitness of the Population Average Fitness Generation Ecoli Dataset: Generation vs. Best Known Fitness T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow Synthesis September and Optimization 5, / 15
14 EPS VP NB true VM MSH NB true KNN 10 EuclideanDistance DTC 7 sqrt 8 abs NB NTB 2 * DT ^2 ^2 ^ * CNL * true DTC sqrt 9 10 abs DT NTB 5 CNL abs * 9 9 sqrt 2 KNN DTC CamberraDistance sqrt 9 8 abs NB NTB 5 DT CNL sqrt abs * KNN 8 8 CamberraDistance DTC * sqrt 5 5 NB 9 abs NB true NTB 3 DT CNL ^ DTC abs abs DT NTB 8 MST CNL abs 7 true Preliminary Results Very Complex Tree Evolved of the Vehicle Dataset A very complex tree evolved on the Vehicle set T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow Synthesis September and Optimization 5, / 15
15 Questions & Discussion Thank you for you attention! Tomáš Řehořek T. Řehořek, P. Kordík (FIT CTU Prague) A Soft-Computing Approach to Knowledge Flow Synthesis September and Optimization 5, / 15
Data mining: concepts and algorithms
Data mining: concepts and algorithms Practice Data mining Objective Exploit data mining algorithms to analyze a real dataset using the RapidMiner machine learning tool. The practice session is organized
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationEnsemble Learning. Another approach is to leverage the algorithms we have via ensemble methods
Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning
More informationOn Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions
On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions CAMCOS Report Day December 9th, 2015 San Jose State University Project Theme: Classification The Kaggle Competition
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationCAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification
CAMCOS Report Day December 9 th, 2015 San Jose State University Project Theme: Classification On Classification: An Empirical Study of Existing Algorithms based on two Kaggle Competitions Team 1 Team 2
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2013/12/09 1 Practice plan 2013/11/11: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate
More informationSubject. Dataset. Copy paste feature of the diagram. Importing the dataset. Copy paste feature into the diagram.
Subject Copy paste feature into the diagram. When we define the data analysis process into Tanagra, it is possible to copy components (or entire branches of components) towards another location into the
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationOverview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8
Tutorial 3 1 / 8 Overview Non-Parametrics Models Definitions KNN Ensemble Methods Definitions, Examples Random Forests Clustering Definitions, Examples k-means Clustering 2 / 8 Non-Parametrics Models Definitions
More informationOrange Model Maps Documentation v0.2.8
Orange Model Maps Documentation v0.2.8 Release 0.2.8 Miha Stajdohar, University of Ljubljana, FRI January 30, 2014 Contents 1 Scripting Reference 3 1.1 Model Maps (modelmaps).......................................
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationUsing Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear
Using Machine Learning to Identify Security Issues in Open-Source Libraries Asankhaya Sharma Yaqin Zhou SourceClear Outline - Overview of problem space Unidentified security issues How Machine Learning
More informationName of the lecturer Doç. Dr. Selma Ayşe ÖZEL
Y.L. CENG-541 Information Retrieval Systems MASTER Doç. Dr. Selma Ayşe ÖZEL Information retrieval strategies: vector space model, probabilistic retrieval, language models, inference networks, extended
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 06/0/ Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationMidterm Examination CS540-2: Introduction to Artificial Intelligence
Midterm Examination CS540-2: Introduction to Artificial Intelligence March 15, 2018 LAST NAME: FIRST NAME: Problem Score Max Score 1 12 2 13 3 9 4 11 5 8 6 13 7 9 8 16 9 9 Total 100 Question 1. [12] Search
More informationData Mining: An experimental approach with WEKA on UCI Dataset
Data Mining: An experimental approach with WEKA on UCI Dataset Ajay Kumar Dept. of computer science Shivaji College University of Delhi, India Indranath Chatterjee Dept. of computer science Faculty of
More informationRegularization of Evolving Polynomial Models
Regularization of Evolving Polynomial Models Pavel Kordík Dept. of Computer Science and Engineering, Karlovo nám. 13, 121 35 Praha 2, Czech Republic kordikp@fel.cvut.cz Abstract. Black box models such
More informationSupervised and Unsupervised Learning (II)
Supervised and Unsupervised Learning (II) Yong Zheng Center for Web Intelligence DePaul University, Chicago IPD 346 - Data Science for Business Program DePaul University, Chicago, USA Intro: Supervised
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationSUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018
SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationData Mining Practical Machine Learning Tools and Techniques
Engineering the input and output Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Attribute selection z Scheme-independent, scheme-specific
More informationModel Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal
More informationInformation theory methods for feature selection
Information theory methods for feature selection Zuzana Reitermanová Department of Computer Science Faculty of Mathematics and Physics Charles University in Prague, Czech Republic Diplomový a doktorandský
More informationMachine Learning in Action
Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting
More informationTour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers
Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background
More informationCSE 158. Web Mining and Recommender Systems. Midterm recap
CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationPractical Guidance for Machine Learning Applications
Practical Guidance for Machine Learning Applications Brett Wujek About the authors Material from SGF Paper SAS2360-2016 Brett Wujek Senior Data Scientist, Advanced Analytics R&D ~20 years developing engineering
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationA study of classification algorithms using Rapidminer
Volume 119 No. 12 2018, 15977-15988 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A study of classification algorithms using Rapidminer Dr.J.Arunadevi 1, S.Ramya 2, M.Ramesh Raja
More informationContents. Foreword to Second Edition. Acknowledgments About the Authors
Contents Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments About the Authors xxxi xxxv Chapter 1 Introduction 1 1.1 Why Data Mining? 1 1.1.1 Moving toward the Information Age 1
More informationTrade-offs in Explanatory
1 Trade-offs in Explanatory 21 st of February 2012 Model Learning Data Analysis Project Madalina Fiterau DAP Committee Artur Dubrawski Jeff Schneider Geoff Gordon 2 Outline Motivation: need for interpretable
More informationEffect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction
International Journal of Computer Trends and Technology (IJCTT) volume 7 number 3 Jan 2014 Effect of Principle Component Analysis and Support Vector Machine in Software Fault Prediction A. Shanthini 1,
More informationInstance and case-based reasoning
Instance and case-based reasoning ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado https://www.scss.tcd.ie/kevin.koidl/cs462/ kevin.koidl@scss.tcd.ie, maldonaa@tcd.ie 27 Instance-based
More informationMachine Learning with MATLAB --classification
Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which
More informationClassification. Slide sources:
Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline
More informationLecture #17: Autoencoders and Random Forests with R. Mat Kallada Introduction to Data Mining with R
Lecture #17: Autoencoders and Random Forests with R Mat Kallada Introduction to Data Mining with R Assignment 4 Posted last Sunday Due next Monday! Autoencoders in R Firstly, what is an autoencoder? Autoencoders
More informationFeature Selection with Decision Tree Criterion
Feature Selection with Decision Tree Criterion Krzysztof Grąbczewski and Norbert Jankowski Department of Computer Methods Nicolaus Copernicus University Toruń, Poland kgrabcze,norbert@phys.uni.torun.pl
More information7. Metalearning for Automated Workflow Design
AutoML 2017. at ECML PKDD 2017, Skopje. Automatic Selection, Configuration & Composition of ML Algorithms 7. Metalearning for Automated Workflow Design by. Pavel Brazdil, Frank Hutter, Holger Hoos, Joaquin
More informationTopics In Feature Selection
Topics In Feature Selection CSI 5388 Theme Presentation Joe Burpee 2005/2/16 Feature Selection (FS) aka Attribute Selection Witten and Frank book Section 7.1 Liu site http://athena.csee.umbc.edu/idm02/
More informationVISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS
VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering, Faculty of Eletrical Engineering, Czech
More information7 Techniques for Data Dimensionality Reduction
7 Techniques for Data Dimensionality Reduction Rosaria Silipo KNIME.com The 2009 KDD Challenge Prediction Targets: Churn (contract renewals), Appetency (likelihood to buy specific product), Upselling (likelihood
More informationA Comparative Study of Classification Techniques for Fire Data Set
A Comparative Study of Classification Techniques for Fire Data Set Rachna Raghuwanshi M.Tech CSE Gyan Ganga Institute of Technology & Science, Jabalpur Abstract:Classification of data has become an important
More informationk-nearest Neighbors + Model Selection
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders
More informationPreprocessing of Stream Data using Attribute Selection based on Survival of the Fittest
Preprocessing of Stream Data using Attribute Selection based on Survival of the Fittest Bhakti V. Gavali 1, Prof. Vivekanand Reddy 2 1 Department of Computer Science and Engineering, Visvesvaraya Technological
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationContents. ACE Presentation. Comparison with existing frameworks. Technical aspects. ACE 2.0 and future work. 24 October 2009 ACE 2
ACE Contents ACE Presentation Comparison with existing frameworks Technical aspects ACE 2.0 and future work 24 October 2009 ACE 2 ACE Presentation 24 October 2009 ACE 3 ACE Presentation Framework for using
More informationEnsemble Methods, Decision Trees
CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationCHAPTER 6 EXPERIMENTS
CHAPTER 6 EXPERIMENTS 6.1 HYPOTHESIS On the basis of the trend as depicted by the data Mining Technique, it is possible to draw conclusions about the Business organization and commercial Software industry.
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationK Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat
K Nearest Neighbor Wrap Up K- Means Clustering Slides adapted from Prof. Carpuat K Nearest Neighbor classification Classification is based on Test instance with Training Data K: number of neighbors that
More informationFeature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process
Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process KITTISAK KERDPRASOP and NITTAYA KERDPRASOP Data Engineering Research Unit, School of Computer Engineering, Suranaree
More information6.034 Quiz 2, Spring 2005
6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationPROBLEM 4
PROBLEM 2 PROBLEM 4 PROBLEM 5 PROBLEM 6 PROBLEM 7 PROBLEM 8 PROBLEM 9 PROBLEM 10 PROBLEM 11 PROBLEM 12 PROBLEM 13 PROBLEM 14 PROBLEM 16 PROBLEM 17 PROBLEM 22 PROBLEM 23 PROBLEM 24 PROBLEM 25
More informationLaplacian Eigenmaps and Bayesian Clustering Based Layout Pattern Sampling and Its Applications to Hotspot Detection and OPC
Laplacian Eigenmaps and Bayesian Clustering Based Layout Pattern Sampling and Its Applications to Hotspot Detection and OPC Tetsuaki Matsunawa 1, Bei Yu 2 and David Z. Pan 3 1 Toshiba Corporation 2 The
More informationBIG DATA SCIENTIST Certification. Big Data Scientist
BIG DATA SCIENTIST Certification Big Data Scientist Big Data Science Professional (BDSCP) certifications are formal accreditations that prove proficiency in specific areas of Big Data. To obtain a certification,
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationUninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall
Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes
More informationApplied Statistics for Neuroscientists Part IIa: Machine Learning
Applied Statistics for Neuroscientists Part IIa: Machine Learning Dr. Seyed-Ahmad Ahmadi 04.04.2017 16.11.2017 Outline Machine Learning Difference between statistics and machine learning Modeling the problem
More informationData Engineering. Data preprocessing and transformation
Data Engineering Data preprocessing and transformation Just apply a learner? NO! Algorithms are biased No free lunch theorem: considering all possible data distributions, no algorithm is better than another
More informationFunction Approximation and Feature Selection Tool
Function Approximation and Feature Selection Tool Version: 1.0 The current version provides facility for adaptive feature selection and prediction using flexible neural tree. Developers: Varun Kumar Ojha
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationIntroduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)
Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationPerceptron-Based Oblique Tree (P-BOT)
Perceptron-Based Oblique Tree (P-BOT) Ben Axelrod Stephen Campos John Envarli G.I.T. G.I.T. G.I.T. baxelrod@cc.gatech sjcampos@cc.gatech envarli@cc.gatech Abstract Decision trees are simple and fast data
More informationBasic Data Mining Technique
Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm
More informationLecture 7. Data Stream Mining. Building decision trees
1 / 26 Lecture 7. Data Stream Mining. Building decision trees Ricard Gavaldà MIRI Seminar on Data Streams, Spring 2015 Contents 2 / 26 1 Data Stream Mining 2 Decision Tree Learning Data Stream Mining 3
More informationRAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE
RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo (luigi.grimaudo@polito.it) DataBase And Data Mining Research Group (DBDMG) Summary RapidMiner project Strengths
More informationCS 395T Computational Learning Theory. Scribe: Wei Tang
CS 395T Computational Learning Theory Lecture 1: September 5th, 2007 Lecturer: Adam Klivans Scribe: Wei Tang 1.1 Introduction Many tasks from real application domain can be described as a process of learning.
More informationSummary. RapidMiner Project 12/13/2011 RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE
RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE Luigi Grimaudo (luigi.grimaudo@polito.it) DataBase And Data Mining Research Group (DBDMG) Summary RapidMiner project Strengths
More informationAn Ensemble Data Mining Approach for Intrusion Detection in a Computer Network
International Journal of Science and Engineering Investigations vol. 6, issue 62, March 2017 ISSN: 2251-8843 An Ensemble Data Mining Approach for Intrusion Detection in a Computer Network Abisola Ayomide
More informationData Mining and Machine Learning: Techniques and Algorithms
Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,
More informationA Novel Feature Selection Framework for Automatic Web Page Classification
International Journal of Automation and Computing 9(4), August 2012, 442-448 DOI: 10.1007/s11633-012-0665-x A Novel Feature Selection Framework for Automatic Web Page Classification J. Alamelu Mangai 1
More information6.034 Design Assignment 2
6.034 Design Assignment 2 April 5, 2005 Weka Script Due: Friday April 8, in recitation Paper Due: Wednesday April 13, in class Oral reports: Friday April 15, by appointment The goal of this assignment
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations
More informationCse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse352 Artifficial Intelligence Short Review for Midterm Professor Anita Wasilewska Computer Science Department Stony Brook University Midterm Midterm INCLUDES CLASSIFICATION CLASSIFOCATION by Decision
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationIEE 520 Data Mining. Project Report. Shilpa Madhavan Shinde
IEE 520 Data Mining Project Report Shilpa Madhavan Shinde Contents I. Dataset Description... 3 II. Data Classification... 3 III. Class Imbalance... 5 IV. Classification after Sampling... 5 V. Final Model...
More informationMachine Learning. Supervised Learning. Manfred Huber
Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D
More informationTutorial on Machine Learning Tools
Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More information