Orange Documentation. Release 3.0. Biolab
|
|
- Carol Parker
- 6 years ago
- Views:
Transcription
1 Orange Documentation Release 3.0 Biolab November 10, 2014
2
3 Contents 1 Data model (data) Data Storage (storage) Data Table (table) SQL table (data.sql) Domain description (domain) Variable Descriptors (variable) Values (value) Data Instance (instance) Data Filters (filter) Continuization and normalization (continuizer) Data discretization (discretization) Feature (feature) Scoring (scoring) Feature discretization (discretization) Classification (classification) Majority (majority) Naive Bayes (naive_bayes) Support Vector Machines (svm) Evaluation (evaluation) Sampling procedures for testing models (testing) Miscellaneous (misc) Symbolic constants (enum) Development Widget development Indices and tables 19 Bibliography 21 i
4 ii
5 CHAPTER 1 Data model (data) Orange stores data in Orange.data.Storage classes. The most commonly used storage is Orange.data.Table, which stores all data in two-dimensional numpy arrays. Each row of the data represents a data instance. Individual data instances are represented as instances of Orange.data.Instance. Different storage classes may derive subclasses of Instance to represent the retrieved rows in the data more efficiently and to allow modifying the data through modifying data instance. For example, if table is Orange.data.Table, table[0] returns the row as Orange.data.RowInstance. Every storage class and data instance has an associated domain description domain (an instance of Orange.data.Domain) that stores descriptions of data columns. Every column is described by an instance of a class derived from Orange.data.Variable. The subclasses correspond to continuous variables (Orange.data.ContinuousVariable), discrete variables (Orange.data.DiscreteVariable) and string variables (Orange.data.StringVariable). These descriptors contain the variable s name, symbolic values, number of decimals in printouts and similar. The data is divided into attributes (features, independent variables), class variables (classes, targets, outcomes, dependent variables) and meta attributes. This division applies to domain descriptions, data storages that contain separate arrays for each of the three parts of the data and data instances. Attributes and classes are represented with numeric values and are used in modelling. Meta attributes contain additional data which may be of any type. (Currently, only string values are supported in addition to continuous and numeric.) In indexing, columns can be referred to by their names, descriptors or an integer index. For example, if inst is a data instance and var is a descriptor of type Continuous, referring to the first column in the data, which is also names petal length, then inst[var], inst[0] and inst[ petal length ] refer to the first value of the instance. Negative indices are used for meta attributes, starting with -1. Continuous and discrete values can be represented by any numerical type; by default, Orange uses double precision (64-bit) floats. Discrete values are represented by whole numbers. 1.1 Data Storage (storage) Orange.data.storage.Storage is an abstract class representing a data object in which rows represent data instances (examples, in machine learning terminology) and columns represent variables (features, attributes, classes, targets, meta attributes). Data is divided into three parts that represent independent variables (X), dependent variables (Y) and meta data (metas). If practical, the class should expose those parts as properties. In the associated domain (Orange.data.Domain), the three parts correspond to lists of variable descriptors attributes, class_vars and metas. 1
6 Any of those parts may be missing, dense, sparse or sparse boolean. The difference between the later two is that the sparse data can be seen as a list of pairs (variable, value), while in the latter the variable (item) is present or absent, like in market basket analysis. The actual storage of sparse data depends upon the storage type. There is no uniform constructor signature: every derived class provides one or more specific constructors. There are currently two derived classes Orange.data.Table and Orange.data.sql.Table, the former storing the data in-memory, in numpy objects, and the latter in SQL (currently, only PostreSQL is supported). Derived classes must implement at least the methods for getting rows and the number of instances ( getitem and len ). To make storage fast enough to be practically useful, it must also reimplement a number of filters, preprocessors and aggregators. For instance, method _filter_values(self, filter) returns a new storage which only contains the rows that match the criteria given in the filter. Orange.data.Table implements an efficient method based on numpy indexing, and Orange.data.sql.Table, which stores a table as an SQL query, converts the filter into a WHERE clause. Orange.data.storage.domain(:obj: Orange.data.Domain ) The domain describing the columns of the data Data access Orange.data.storage. getitem (self, index) Return one or more rows of the data. If the index is an int, e.g. data[7]; the corresponding row is returned as an instance of Instance. Concrete implementations of Storage use specific derived classes for instances. If the index is a slice or a sequence of ints (e.g. data[7:10] or data[[7, 42, 15]], indexing returns a new storage with the selected rows. If there are two indices, where the first is an int (a row number) and the second can be interpreted as columns, e.g. data[3, 5] or data[3, gender ] or data[3, y] (where y is an instance of Variable), a single value is returned as an instance of Value. In all other cases, the first index should be a row index, a slice or a sequence, and the second index, which represent a set of columns, should be an int, a slice, a sequence or a numpy array. The result is a new storage with a new domain.. len (self ) Return the number of data instances (rows) Inspection Storage.X_density, Storage.Y_density, Storage.metas_density Indicates whether the attributes, classes and meta attributes are dense (Storage.DENSE) or sparse (Storage.SPARSE). If they are sparse and all values are 0 or 1, it is marked as (Storage.SPARSE_BOOL). The Storage class provides a default DENSE. If the data has no attibutes, classes or meta attributes, the corresponding method should re Filters Storage should define the following methods to optimize the filtering operations as allowed by the underlying data structure. Orange.data.Table executes them directly through numpy (or bottleneck or related) methods, while Orange.data.sql.Table appends them to the WHERE clause of the query that defines the data. 2 Chapter 1. Data model (data)
7 These methods should not be called directly but through the classes defined in Orange.data.filter. Methods in Orange.data.filter also provide the slower fallback functions for the functions not defined in the storage. Orange.data.storage._filter_is_defined(self, columns=none, negate=false) Extract rows without undefined values. Parameters columns (sequence of ints, variable names or descriptors) optional list of columns that are checked for unknowns negate (bool) invert the selection Returns a new storage of the same type or Table Return type Orange.data.storage.Storage Orange.data.storage._filter_has_class(self, negate=false) Return rows with known value of the target attribute. If there are multiple classes, all must be defined. Parameters negate (bool) invert the selection Returns a new storage of the same type or Table Return type Orange.data.storage.Storage Orange.data.storage._filter_same_value(self, column, value, negate=false) Select rows based on a value of the given variable. Parameters column (int, str or Orange.data.Variable) the column that is checked value (int, float or str) the value of the variable negate (bool) invert the selection Returns a new storage of the same type or Table Return type Orange.data.storage.Storage Orange.data.storage._filter_values(self, filter) Apply a the given filter to the data. Parameters filter (Orange.data.Filter) A filter for selecting the rows Returns a new storage of the same type or Table Return type Orange.data.storage.Storage Aggregators Similarly to filters, storage classes should provide several methods for fast computation of statistics. These methods are not called directly but by modules within Orange.statistics. _compute_basic_stats( self, columns=none, include_metas=false, compute_variance=false) Compute basic statistics for the specified variables: minimal and maximal value, the mean and a varianca (or a zero placeholder), the number of missing and defined values. Parameters columns (list of ints, variable names or descriptors of type Orange.data.Variable) a list of columns for which the statistics is computed; if None, the function computes the data for all variables 1.1. Data Storage (storage) 3
8 include_metas (bool) a flag which tells whether to include meta attributes (applicable only if columns is None) compute_variance (bool) a flag which tells whether to compute the variance Returns a list with tuple (min, max, mean, variance, #nans, #non-nans) for each variable Return type list Orange.data.storage._compute_distributions(self, columns=none) Compute the distribution for the specified variables. The result is a list of pairs containing the distribution and the number of rows for which the variable value was missing. For discrete variables, the distribution is represented as a vector with absolute frequency of each value. For continuous variables, the result is a 2-d array of shape (2, number-of-distinct-values); the first row contains (distinct) values of the variables and the second has their absolute frequencies. Parameters columns (list of ints, variable names or descriptors of type Orange.data.Variable) a list of columns for which the distributions are computed; if None, the function runs over all variables Returns a list of distributions Return type list of numpy arrays 1.2 Data Table (table) Constructors The preferred way to construct a table is to invoke a named constructor Inspection Row manipulation Weights 1.3 SQL table (data.sql) 1.4 Domain description (domain) 1.5 Variable Descriptors (variable) Every variable is associated with a descriptor that stores its name, type and other properties, and takes care of conversion of values from textual format to floats and back. Derived classes store lists of existing descriptors for variables of each particular type. The provide a supplementary constructor make, which returns an existing variable instead of creating a new one, when possible. Descriptors facilitate computation of variables from other variables. This is used in domain conversion: if the destination domain contains a variable that is not present in the original domain, the variables value is computed by calling its method compute_value, passing the data instance from the original domain as an argument. 4 Chapter 1. Data model (data)
9 Method compute_value calls get_value_from if defined and returns its result. If the variables does not have get_value_from, compute_value returns Unknown. 1.6 Values (value) 1.7 Data Instance (instance) Class Instance represents a data instance. The base class stores its own data, and derived classes are used to represent views into various data storages. Like data tables, every data instance is associated with a domain and its data is split into attributes, classes, meta attributes and the weight. The instance s data can be retrieved through attributes x, y and metas. For derived classes, changing this data does not necessarily modify the corresponding data tables or other structures. Use indexing to modify the data Rows of Data Tables 1.8 Data Filters (filter) Instances of classes derived from Filter are used for filtering the data. When called with an individual data instance (Orange.data.Instance), they accept or reject the instance by returning either True or False. When called with a data storage (e.g. an instance of Orange.data.Table) they check whether the corresponding class provides the method that implements the particular filter. If so, the method is called and the result should be of the same type as the storage; e.g., filter methods of Orange.data.Table return new instances of Orange.data.Table, and filter methods of SQL proxies return new SQL proxies. If the class corresponding to the storage does not implement a particular filter, the fallback computes the indices of the rows to be selected and returns data[indices]. 1.9 Continuization and normalization (continuizer) class Orange.data.continuizer.DomainContinuizer Construct a domain in which discrete attributes are replaced by continuous; existing continuous attributes can be normalized. The attributes are treated according to their types: binary variables are transformed into 0.0/1.0 or -1.0/1.0 indicator variables multinomial variables are treated according to the flag multinomial_treatment. discrete attribute with less than two possible values are removed; continuous variables can be normalized or left unchanged The typical use of the class is as follows: 1.6. Values (value) 5
10 continuizer = Orange.data.continuization.DomainContinuizer() continuizer.multinomial_treatment = continuizer.lowestisbase domain0 = continuizer(data) data0 = Orange.data.Table(domain0, data) Domain continuizers can be given either a data set or a domain, and return a new domain. When given only the domain, they cannot normalize continuous attributes or used the most frequent value as the base value, and will raise an exception with such settings. Constructor can also be passed data or domain, in which case the constructed continuizer is immediately applied to the data or domain and the transformed domain is returned instead of the continuizer instance. By default, the class does not change continuous and class attributes, discrete attributes are replaced with N attributes (NValues) with values 0 and 1. zero_based Determines the value used as the low value of the variable. When binary variables are transformed into continuous, when multivalued variable is transformed into multiple variables, the transformed variable can either have values 0.0 and 1.0 (default, zero_based is True) or -1.0 and 1.0 (zero_based is False). This attribute also determines the interval for normalized continuous variables (either [-1, 1] or [0, 1]). The following text assumes the default case. (Default: False) multinomial_treatment Decides the treatment of multinomial variables. Let N be the number of the variables s values. (Default: NValues) DomainContinuizer.NValues The variable is replaced by N indicator variables, each corresponding to one value of the original variable. For each value of the original attribute, only the corresponding new attribute will have a value of one and others will be zero. Note that these variables are not independent, so they cannot be used (directly) in, for instance, linear or logistic regression. For example, data set bridges has feature RIVER with values M, A, O and Y, in that order. Its value for the 15th row is M. Continuization replaces the variable with variables RIVER=M, RIVER=A, RIVER=O and RIVER=Y. For the 15th row, the first has value 1 and others are 0. DomainContinuizer.LowestIsBase Similar to the above except that it creates only N-1 variables. The missing indicator belongs to the lowest value: when the original variable has the lowest value all indicators are 0. If the variable descriptor has the base_value defined, the specified value is used as base instead of the lowest one. Continuizing the variable RIVER gives similar results as above except that it would omit RIVER=M ; all three variables would be zero for the 15th data instance. DomainContinuizer.FrequentIsBase Like above, except that the most frequent value is used as the base. If there are multiple most frequent values, the one with the lowest index is used. The frequency of values is extracted from data, so this option cannot be used if constructor is given only a domain. Variable RIVER would be continuized similarly to above except that it omits RIVER=A, which is the most frequent value. DomainContinuizer.Ignore Discrete variables are omitted. 6 Chapter 1. Data model (data)
11 DomainContinuizer.IgnoreMulti Discrete variables with more than two values are omitted; two-valued are treated the same as in LowestIsBase. DomainContinuizer.ReportError Raise an error if there are any multinominal variables in the data. DomainContinuizer.AsOrdinal Multivalued variables are treated as ordinal and replaced by a continuous variables with the values index, e.g. 0, 1, 2, 3... DomainContinuizer.AsNormalizedOrdinal As above, except that the resulting continuous value will be from range 0 to 1, e.g. 0, 0.25, 0.5, 0.75, 1 for a five-valued variable. normalize_continuous If None, continuous variables are left unchanged. If DomainContinuizer.NormalizeBySD, they are replaced with standardized values by subtracting the average value and dividing by the standard deviation. Attribute zero_based has no effect on this standardization. If DomainContinuizer.NormalizeBySpan, they are replaced with normalized values by subtracting min value of the data and dividing by span (max - min). Statistics are computed from the data, so constructor must be given data, not just domain. (Default: None) transform_class If True the class is replaced by continuous attributes or normalized as well. Multiclass problems are thus transformed to multitarget ones. (Default: False) 1.10 Data discretization (discretization) Continuous features in the data can be discretized using a uniform discretization method. Discretizatione replaces continuous features with the corresponding categorical features: import Orange iris = Orange.data.Table("iris.tab") disc_iris = Orange.data.discretization.DiscretizeTable(iris, method=orange.feature.discretization.equalfreq(n=3)) print("original data set:") for e in iris[:3]: print(e) print("discretized data set:") for e in disc_iris[:3]: print(e) Discretization introduces new categorical features with discretized values: Original data set: [5.1, 3.5, 1.4, 0.2 Iris-setosa] [4.9, 3.0, 1.4, 0.2 Iris-setosa] [4.7, 3.2, 1.3, 0.2 Iris-setosa] Discretized data set: [< , >= , < , < Iris-setosa] [< , [ , ), < , < Iris-setosa] [< , >= , < , < Iris-setosa] Data discretization uses feature discretization classes from Orange.feature.discretization and applies them on entire data set. Default discretization method (equal frequency with four intervals) can be replaced with other discretization approaches as demonstrated below: Data discretization (discretization) 7
12 disc = Orange.data.discretization.DiscretizeTable() disc.method = Orange.feature.discretization.EqualFreq(n=2) disc_iris = disc(iris) Data discretization classes 8 Chapter 1. Data model (data)
13 CHAPTER 2 Feature (feature) 2.1 Scoring (scoring) Feature score is an assessment of the usefulness of the feature for prediction of the dependant (class) variable. Orange provides classes that compute the common feature scores for classification and regression regression. The code below computes the information gain of feature tear_rate in the Lenses data set: >>> data = Orange.data.Table("lenses") >>> Orange.feature.scoring.InfoGain("tear_rate", data) An alternative way of invoking the scorers is to construct the scoring object, like in the following example: >>> gain = Orange.feature.scoring.InfoGain() >>> for att in data.domain.attributes:... print("%s %.3f" % (att.name, gain(att, data))) age prescription astigmatic tear_rate Feature scoring in classification problems class Orange.feature.scoring.InfoGain Information gain is the expected decrease of entropy. See Wikipedia entry on information gain. class Orange.feature.scoring.GainRatio Information gain ratio is the ratio between information gain and the entropy of the feature s value distribution. The score was introduced in [Quinlan1986] to alleviate overestimation for multi-valued features. See Wikipedia entry on gain ratio. class Orange.feature.scoring.Gini Gini index is the probability that two randomly chosen instances will have different classes. See Wikipedia entry on gini index Feature scoring in regression problems TBD. 9
14 Bibliography 2.2 Feature discretization (discretization) This module transforms continuous features into discrete. Usually, one would like to discretize the whole data set with the Orange.data.discretization module. This module is limited to single features. Discretization Algorithms return a discretized variable (with fixed parameters) that can transform either the learning or the testing data. Parameter learning is separate to the transformation, as in machine learning only the training set should be used to induce parameters. To obtain discretized features, call a discretization algorithm with with the data and the feature to discretize. The feature can be given either as an index name or Orange.data.Variable. The following example creates a discretized feature. import Orange data = Orange.data.Table("iris.tab") disc = Orange.feature.discretization.EqualFreq(n=4) disc_var = disc(data, 0) The values of the first attribute will be discretized the data is transformed to the Orange.data.Domain domain that includes disc_var. In the example below we add the discretized first attribute to the original domain. ndomain = Orange.data.Domain([disc_var] + list(data.domain.attributes), data.domain.class_vars) ndata = Orange.data.Table(ndomain, data) print(ndata) The printout: [[< , 5.1, 3.5, 1.4, 0.2 Iris-setosa], [< , 4.9, 3.0, 1.4, 0.2 Iris-setosa], [< , 4.7, 3.2, 1.3, 0.2 Iris-setosa], [< , 4.6, 3.1, 1.5, 0.2 Iris-setosa], [< , 5.0, 3.6, 1.4, 0.2 Iris-setosa],... ] Discretization Algorithms 10 Chapter 2. Feature (feature)
15 CHAPTER 3 Classification (classification) 3.1 Majority (majority) Accuracy of models is often compared with the default accuracy, that is, the accuracy of a model which classifies all instances to the majority class. Fitting such a model consists of computing the class distribution and its modus. The model is represented as an instance of Orange.classification.majority.ConstantClassifier Example >>> iris = Orange.data.Table( iris ) >>> maj = Orange.classification.majority.MajorityLearner() >>> model = maj(iris[30:110]) >>> print(model) ConstantClassifier [ ] >>> model(iris[0]) Value( iris, Iris-versicolor) 3.2 Naive Bayes (naive_bayes) The module for Naive Bayes (NB) classification contains two implementations: Orange.classification.naive_bayes.BayesClassifier is based on the popular scikit-learn package. Orange.classification.naive_bayes.BayesStorageClassifier demonstrates the use of the contingency table provided by the underlying storage. It returns only the predicted value Example >>> from Orange.data import Table >>> from Orange.classification.naive_bayes import * >>> iris = Table( iris ) >>> fitter = NaiveBayes() >>> model = fitter(iris[:-1]) >>> model(iris[-1]) Value( iris, Iris-virginica) 11
16 3.3 Support Vector Machines (svm) The module for Support Vector Machine (SVM) classification is based on the popular scikit-learn package, which uses LibSVM and LIBLINEAR libraries. It provides several learning algorithms: Orange.classification.svm.SVMLearner, a general SVM learner; Orange.classification.svm.LinearSVMLearner, a fast learner useful for data sets with a large number of features; Orange.classification.svm.NuSVMLearner, a general SVM learner that uses a parameter to control the number of support vectors; Orange.classification.svm.SVRLearner, a general learner for support vector regression; Orange.classification.svm.NuSVRLearner, which is similar to NuSVMLearner for regression; Orange.classification.svm.OneClassSVMLearner, unsupervised learner useful for novelty and outlier detection Example >>> from Orange.data import Table >>> from Orange.classification.svm import SVMLearner >>> from Orange.evaluation import CrossValidation, CA >>> iris = Table( iris ) >>> fitter = SVMLearner() >>> cross = CrossValidation(iris, fitter) >>> prediction = cross.kfold(10) >>> CA(iris, prediction[0]) Chapter 3. Classification (classification)
17 CHAPTER 4 Evaluation (evaluation) 4.1 Sampling procedures for testing models (testing) 13
18 14 Chapter 4. Evaluation (evaluation)
19 CHAPTER 5 Miscellaneous (misc) 5.1 Symbolic constants (enum) 15
20 16 Chapter 5. Miscellaneous (misc)
21 CHAPTER 6 Development 6.1 Widget development Library of Common GUI Controls gui is a library of functions which allow constructing a control (like check box, line edit or a combo), inserting it into the parent s layout, setting tooltips, callbacks and so forth, establishing synchronization with a Python object s attribute (including saving and retrieving when the widgets is closed and reopened)... in a single call. Almost all functions need three arguments: the widget into which the control is inserted, the master widget with one whose attributes the control s value is synchronized, the name of that attribute (value). All other arguments should be given as keyword arguments for clarity and also for allowing the potential future compatibility-breaking changes in the module. Several arguments that are common to all functions must always be given as keyword arguments. Common options All controls accept the following arguments that can only be given as keyword arguments. tooltip A string for a tooltip that appears when mouse is over the control. disabled Tells whether the control be disabled upon the initialization. addspace Gives the amount of space that is inserted after the control (or the box that encloses it). If True, a space of 8 pixels is inserted. Default is 0. addtolayout The control is added to the parent s layout unless this flag is set to False. stretch The stretch factor for this widget, used when adding to the layout. Default is 0. sizepolicy The size policy for the box or the control. Common Arguments Many functions share common arguments. widget Widget on which control will be drawn. 17
22 master Object which includes the attribute that is used to store the control s value; most often the self of the widget into which the control is inserted. value String with the name of the master s attribute that synchronizes with the the control s value.. box Indicates if there should be a box that around the control. If box is False (default), no box is drawn; if it is a string, it is also used as the label for the box s name; if box is any other true value (such as True :), an unlabeled box is drawn. callback A function that is called when the state of the control is changed. This can be a single function or a list of functions that will be called in the given order. The callback function should not change the value of the controlled attribute (the one given as the value argument described above) to avoid a cycle (a workaround is shown in the description of listbox function. label A string that is displayed as control s label. labelwidth The width of the label. This is useful for aligning the controls. orientation When the label is given used, this argument determines the relative placement of the label and the control. Label can be above the control, ( vertical or True - this is the default) or in the same line with control, ( horizontal or False). Orientation can also be an instance of QLayout. Common Attributes box If the constructed widget is enclosed into a box, the attribute box refers to the box. Widgets This section describes the wrappers for controls like check boxes, buttons and similar. Using them is preferred over calling Qt directly, for convenience, readability, ease of porting to newer versions of Qt and, in particular, because they set up a lot of things that happen in behind. Other widgets Utility functions Internal functions and classes This part of documentation describes some classes and functions that are used internally. The classes will likely maintain compatibility in the future, while the functions may be changed. Wrappers for Qt classes Wrappers for Python classes Other functions 18 Chapter 6. Development
23 CHAPTER 7 Indices and tables genindex modindex search 19
24 20 Chapter 7. Indices and tables
25 Bibliography [Kononenko2007] Igor Kononenko, Matjaz Kukar: Machine Learning and Data Mining, Woodhead Publishing, [Quinlan1986] J R Quinlan: Induction of Decision Trees, Machine Learning, [Breiman1984] L Breiman et al: Classification and Regression Trees, Chapman and Hall, [Kononenko1995] I Kononenko: On biases in estimating multi-valued attributes, International Joint Conference on Artificial Intelligence,
26 22 Bibliography
27 Index Symbols getitem () (in module Orange.data.storage), 2 len () (Orange.data.storage. method), 2 _compute_distributions() (in module Orange.data.storage), 4 _filter_has_class() (in module Orange.data.storage), 3 _filter_is_defined() (in module Orange.data.storage), 3 _filter_same_value() (in module Orange.data.storage), 3 _filter_values() (in module Orange.data.storage), 3 C classification majority classifier, 11 support vector machines (SVM), 12 D Data, 8 data discretization, 7 discretization, 7, 10 domain (in module Orange.data.storage), 2 F feature discretization, 10 feature scoring, 9 feature scoring, 9 gain ratio, 9 gini index, 9 information gain, 9 G GainRatio (class in Orange.feature.scoring), 9 Gini (class in Orange.feature.scoring), 9 I InfoGain (class in Orange.feature.scoring), 9 M majority classifier, 11 classification, 11 multinomial_treatment (Orange.data.continuizer.Orange.data.continuizer.DomainContinuizer attribute), 6 N Naive Bayes classifier, 11 normalize_continuous (Orange.data.continuizer.Orange.data.continuizer.DomainContinuizer attribute), 7 O Orange.data.continuizer.DomainContinuizer (class in Orange.data.continuizer), 5 S support vector machines (SVM), 12 classification, 12 T transform_class (Orange.data.continuizer.Orange.data.continuizer.DomainC attribute), 7 Z zero_based (Orange.data.continuizer.Orange.data.continuizer.DomainContin attribute), 6 23
Orange Data Mining Library Documentation
Orange Data Mining Library Documentation Release 3 Orange Data Mining Jan 30, 2018 Contents 1 Tutorial 1 1.1 The Data................................................. 1 1.1.1 Data Input............................................
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Input: Concepts, instances, attributes Terminology What s a concept?
More informationMachine Learning Chapter 2. Input
Machine Learning Chapter 2. Input 2 Input: Concepts, instances, attributes Terminology What s a concept? Classification, association, clustering, numeric prediction What s in an example? Relations, flat
More informationBasic Concepts Weka Workbench and its terminology
Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know
More informationInput: Concepts, Instances, Attributes
Input: Concepts, Instances, Attributes 1 Terminology Components of the input: Concepts: kinds of things that can be learned aim: intelligible and operational concept description Instances: the individual,
More informationData Mining Practical Machine Learning Tools and Techniques
Input: Concepts, instances, attributes Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Terminology What s a concept z Classification,
More informationCSC Advanced Scientific Computing, Fall Numpy
CSC 223 - Advanced Scientific Computing, Fall 2017 Numpy Numpy Numpy (Numerical Python) provides an interface, called an array, to operate on dense data buffers. Numpy arrays are at the core of most Python
More informationOrange3 Educational Add-on Documentation
Orange3 Educational Add-on Documentation Release 0.1 Biolab Jun 01, 2018 Contents 1 Widgets 3 2 Indices and tables 27 i ii Widgets in Educational Add-on demonstrate several key data mining and machine
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationDATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS
DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute
More informationClassification: Decision Trees
Classification: Decision Trees IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University 1 Decision Tree Example Will a pa)ent have high-risk based on the ini)al 24-hour observa)on?
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationDATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:
DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More information\n is used in a string to indicate the newline character. An expression produces data. The simplest expression
Chapter 1 Summary Comments are indicated by a hash sign # (also known as the pound or number sign). Text to the right of the hash sign is ignored. (But, hash loses its special meaning if it is part of
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization
More informationIntroduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)
Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data
More informationDATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data
DATA ANALYSIS I Types of Attributes Sparse, Incomplete, Inaccurate Data Sources Bramer, M. (2013). Principles of data mining. Springer. [12-21] Witten, I. H., Frank, E. (2011). Data Mining: Practical machine
More informationPython With Data Science
Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,
More informationOrange3-Prototypes Documentation. Biolab, University of Ljubljana
Biolab, University of Ljubljana Dec 17, 2018 Contents 1 Widgets 1 2 Indices and tables 11 i ii CHAPTER 1 Widgets 1.1 Contingency Table Construct a contingency table from given data. Inputs Data input
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationMachine Learning: Algorithms and Applications Mockup Examination
Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationData analysis case study using R for readily available data set using any one machine learning Algorithm
Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning
More informationWeka VotedPerceptron & Attribute Transformation (1)
Weka VotedPerceptron & Attribute Transformation (1) Lab6 (in- class): 5 DIC 2016-13:15-15:00 (CHOMSKY) ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationChapter 1 - The Spark Machine Learning Library
Chapter 1 - The Spark Machine Learning Library Objectives Key objectives of this chapter: The Spark Machine Learning Library (MLlib) MLlib dense and sparse vectors and matrices Types of distributed matrices
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Output: Knowledge representation Tables Linear models Trees Rules
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationData Mining Tools. Jean-Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, Paris, Cedex 05
Data Mining Tools Jean-Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, 75252 Paris, Cedex 05 Jean-Gabriel.Ganascia@lip6.fr DATA BASES Data mining Extraction Data mining Interpretation/
More informationThe Explorer. chapter Getting started
chapter 10 The Explorer Weka s main graphical user interface, the Explorer, gives access to all its facilities using menu selection and form filling. It is illustrated in Figure 10.1. There are six different
More informationCredit card Fraud Detection using Predictive Modeling: a Review
February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationEPL451: Data Mining on the Web Lab 5
EPL451: Data Mining on the Web Lab 5 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Predictive modeling techniques IBM reported in June 2012 that 90% of data available
More informationCh.1 Introduction. Why Machine Learning (ML)?
Syllabus, prerequisites Ch.1 Introduction Notation: Means pencil-and-paper QUIZ Means coding QUIZ Why Machine Learning (ML)? Two problems with conventional if - else decision systems: brittleness: The
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More information> Data Mining Overview with Clementine
> Data Mining Overview with Clementine This two-day course introduces you to the major steps of the data mining process. The course goal is for you to be able to begin planning or evaluate your firm s
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationData Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationChapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.
Chapter 1 Summary Comments are indicated by a hash sign # (also known as the pound or number sign). Text to the right of the hash sign is ignored. (But, hash loses its special meaning if it is part of
More informationSCIENCE. An Introduction to Python Brief History Why Python Where to use
DATA SCIENCE Python is a general-purpose interpreted, interactive, object-oriented and high-level programming language. Currently Python is the most popular Language in IT. Python adopted as a language
More informationPython for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT
Python for Data Analysis Prof.Sushila Aghav-Palwe Assistant Professor MIT Four steps to apply data analytics: 1. Define your Objective What are you trying to achieve? What could the result look like? 2.
More informationData Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification
Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationPython Scripting for Computational Science
Hans Petter Langtangen Python Scripting for Computational Science Third Edition With 62 Figures 43 Springer Table of Contents 1 Introduction... 1 1.1 Scripting versus Traditional Programming... 1 1.1.1
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationFeatures: representation, normalization, selection. Chapter e-9
Features: representation, normalization, selection Chapter e-9 1 Features Distinguish between instances (e.g. an image that you need to classify), and the features you create for an instance. Features
More informationMachine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme
Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationIntroduction to Machine Learning
Introduction to Machine Learning Eric Medvet 16/3/2017 1/77 Outline Machine Learning: what and why? Motivating example Tree-based methods Regression trees Trees aggregation 2/77 Teachers Eric Medvet Dipartimento
More informationegrapher Language Reference Manual
egrapher Language Reference Manual Long Long: ll3078@columbia.edu Xinli Jia: xj2191@columbia.edu Jiefu Ying: jy2799@columbia.edu Linnan Wang: lw2645@columbia.edu Darren Chen: dsc2155@columbia.edu 1. Introduction
More informationIn stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time,
Chapter 2 Although stochastic gradient descent can be considered as an approximation of gradient descent, it typically reaches convergence much faster because of the more frequent weight updates. Since
More informationLearning and Evaluating Classifiers under Sample Selection Bias
Learning and Evaluating Classifiers under Sample Selection Bias Bianca Zadrozny IBM T.J. Watson Research Center, Yorktown Heights, NY 598 zadrozny@us.ibm.com Abstract Classifier learning methods commonly
More informationA Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York
A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationData Mining and Analytics
Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/
More informationCOMP 465: Data Mining Classification Basics
Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised
More informationPython Scripting for Computational Science
Hans Petter Langtangen Python Scripting for Computational Science Third Edition With 62 Figures Sprin ger Table of Contents 1 Introduction 1 1.1 Scripting versus Traditional Programming 1 1.1.1 Why Scripting
More information09/05/2018. Spark MLlib is the Spark component providing the machine learning/data mining algorithms
Spark MLlib is the Spark component providing the machine learning/data mining algorithms Pre-processing techniques Classification (supervised learning) Clustering (unsupervised learning) Itemset mining
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationPython Working with files. May 4, 2017
Python Working with files May 4, 2017 So far, everything we have done in Python was using in-memory operations. After closing the Python interpreter or after the script was done, all our input and output
More informationNearest Neighbor Classification
Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest
More informationCIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Data Visualization Value of Visualization Data And Image Models
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationClassification. Instructor: Wei Ding
Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute
More informationSemi-supervised Learning
Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationMachine Learning for. Artem Lind & Aleskandr Tkachenko
Machine Learning for Object Recognition Artem Lind & Aleskandr Tkachenko Outline Problem overview Classification demo Examples of learning algorithms Probabilistic modeling Bayes classifier Maximum margin
More informationRecursion and Induction: Haskell; Primitive Data Types; Writing Function Definitions
Recursion and Induction: Haskell; Primitive Data Types; Writing Function Definitions Greg Plaxton Theory in Programming Practice, Spring 2005 Department of Computer Science University of Texas at Austin
More informationCh.1 Introduction. Why Machine Learning (ML)? manual designing of rules requires knowing how humans do it.
Ch.1 Introduction Syllabus, prerequisites Notation: Means pencil-and-paper QUIZ Means coding QUIZ Code respository for our text: https://github.com/amueller/introduction_to_ml_with_python Why Machine Learning
More informationAdvanced Algorithms and Computational Models (module A)
Advanced Algorithms and Computational Models (module A) Giacomo Fiumara giacomo.fiumara@unime.it 2014-2015 1 / 34 Python's built-in classes A class is immutable if each object of that class has a xed value
More information2/26/2017. Spark MLlib is the Spark component providing the machine learning/data mining algorithms. MLlib APIs are divided into two packages:
Spark MLlib is the Spark component providing the machine learning/data mining algorithms Pre-processing techniques Classification (supervised learning) Clustering (unsupervised learning) Itemset mining
More informationData Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3
Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?
More informationarulescba: Classification for Factor and Transactional Data Sets Using Association Rules
arulescba: Classification for Factor and Transactional Data Sets Using Association Rules Ian Johnson Southern Methodist University Abstract This paper presents an R package, arulescba, which uses association
More informationCONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM
1 CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM John R. Koza Computer Science Department Stanford University Stanford, California 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu
More informationThe Warhol Language Reference Manual
The Warhol Language Reference Manual Martina Atabong maa2247 Charvinia Neblett cdn2118 Samuel Nnodim son2105 Catherine Wes ciw2109 Sarina Xie sx2166 Introduction Warhol is a functional and imperative programming
More informationA Two-level Learning Method for Generalized Multi-instance Problems
A wo-level Learning Method for Generalized Multi-instance Problems Nils Weidmann 1,2, Eibe Frank 2, and Bernhard Pfahringer 2 1 Department of Computer Science University of Freiburg Freiburg, Germany weidmann@informatik.uni-freiburg.de
More informationStudy on Classifiers using Genetic Algorithm and Class based Rules Generation
2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules
More informationDevelopment of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1
Development of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1 Dongkyu Jeon and Wooju Kim Dept. of Information and Industrial Engineering, Yonsei University, Seoul, Korea
More informationDATA STRUCTURES CHAPTER 1
DATA STRUCTURES CHAPTER 1 FOUNDATIONAL OF DATA STRUCTURES This unit introduces some basic concepts that the student needs to be familiar with before attempting to develop any software. It describes data
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside) 1 Classification:
More informationBabu Madhav Institute of Information Technology, UTU 2015
Five years Integrated M.Sc.(IT)(Semester 5) Question Bank 060010502:Programming in Python Unit-1:Introduction To Python Q-1 Answer the following Questions in short. 1. Which operator is used for slicing?
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationData Mining Practical Machine Learning Tools and Techniques
Output: Knowledge representation Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision tables Decision trees Decision rules
More informationVariable and Data Type I
Islamic University Of Gaza Faculty of Engineering Computer Engineering Department Lab 2 Variable and Data Type I Eng. Ibraheem Lubbad September 24, 2016 Variable is reserved a location in memory to store
More informationData Mining Algorithms: Basic Methods
Algorithms: The basic methods Inferring rudimentary rules Data Mining Algorithms: Basic Methods Chapter 4 of Data Mining Statistical modeling Constructing decision trees Constructing rules Association
More information