Orange Documentation. Release 3.0. Biolab

Size: px

Start display at page:

Download "Orange Documentation. Release 3.0. Biolab"

Carol Parker
6 years ago
Views:

1 Orange Documentation Release 3.0 Biolab November 10, 2014

3 Contents 1 Data model (data) Data Storage (storage) Data Table (table) SQL table (data.sql) Domain description (domain) Variable Descriptors (variable) Values (value) Data Instance (instance) Data Filters (filter) Continuization and normalization (continuizer) Data discretization (discretization) Feature (feature) Scoring (scoring) Feature discretization (discretization) Classification (classification) Majority (majority) Naive Bayes (naive_bayes) Support Vector Machines (svm) Evaluation (evaluation) Sampling procedures for testing models (testing) Miscellaneous (misc) Symbolic constants (enum) Development Widget development Indices and tables 19 Bibliography 21 i

4 ii

5 CHAPTER 1 Data model (data) Orange stores data in Orange.data.Storage classes. The most commonly used storage is Orange.data.Table, which stores all data in two-dimensional numpy arrays. Each row of the data represents a data instance. Individual data instances are represented as instances of Orange.data.Instance. Different storage classes may derive subclasses of Instance to represent the retrieved rows in the data more efficiently and to allow modifying the data through modifying data instance. For example, if table is Orange.data.Table, table[0] returns the row as Orange.data.RowInstance. Every storage class and data instance has an associated domain description domain (an instance of Orange.data.Domain) that stores descriptions of data columns. Every column is described by an instance of a class derived from Orange.data.Variable. The subclasses correspond to continuous variables (Orange.data.ContinuousVariable), discrete variables (Orange.data.DiscreteVariable) and string variables (Orange.data.StringVariable). These descriptors contain the variable s name, symbolic values, number of decimals in printouts and similar. The data is divided into attributes (features, independent variables), class variables (classes, targets, outcomes, dependent variables) and meta attributes. This division applies to domain descriptions, data storages that contain separate arrays for each of the three parts of the data and data instances. Attributes and classes are represented with numeric values and are used in modelling. Meta attributes contain additional data which may be of any type. (Currently, only string values are supported in addition to continuous and numeric.) In indexing, columns can be referred to by their names, descriptors or an integer index. For example, if inst is a data instance and var is a descriptor of type Continuous, referring to the first column in the data, which is also names petal length, then inst[var], inst[0] and inst[ petal length ] refer to the first value of the instance. Negative indices are used for meta attributes, starting with -1. Continuous and discrete values can be represented by any numerical type; by default, Orange uses double precision (64-bit) floats. Discrete values are represented by whole numbers. 1.1 Data Storage (storage) Orange.data.storage.Storage is an abstract class representing a data object in which rows represent data instances (examples, in machine learning terminology) and columns represent variables (features, attributes, classes, targets, meta attributes). Data is divided into three parts that represent independent variables (X), dependent variables (Y) and meta data (metas). If practical, the class should expose those parts as properties. In the associated domain (Orange.data.Domain), the three parts correspond to lists of variable descriptors attributes, class_vars and metas. 1

6 Any of those parts may be missing, dense, sparse or sparse boolean. The difference between the later two is that the sparse data can be seen as a list of pairs (variable, value), while in the latter the variable (item) is present or absent, like in market basket analysis. The actual storage of sparse data depends upon the storage type. There is no uniform constructor signature: every derived class provides one or more specific constructors. There are currently two derived classes Orange.data.Table and Orange.data.sql.Table, the former storing the data in-memory, in numpy objects, and the latter in SQL (currently, only PostreSQL is supported). Derived classes must implement at least the methods for getting rows and the number of instances ( getitem and len ). To make storage fast enough to be practically useful, it must also reimplement a number of filters, preprocessors and aggregators. For instance, method _filter_values(self, filter) returns a new storage which only contains the rows that match the criteria given in the filter. Orange.data.Table implements an efficient method based on numpy indexing, and Orange.data.sql.Table, which stores a table as an SQL query, converts the filter into a WHERE clause. Orange.data.storage.domain(:obj: Orange.data.Domain ) The domain describing the columns of the data Data access Orange.data.storage. getitem (self, index) Return one or more rows of the data. If the index is an int, e.g. data[7]; the corresponding row is returned as an instance of Instance. Concrete implementations of Storage use specific derived classes for instances. If the index is a slice or a sequence of ints (e.g. data[7:10] or data[[7, 42, 15]], indexing returns a new storage with the selected rows. If there are two indices, where the first is an int (a row number) and the second can be interpreted as columns, e.g. data[3, 5] or data[3, gender ] or data[3, y] (where y is an instance of Variable), a single value is returned as an instance of Value. In all other cases, the first index should be a row index, a slice or a sequence, and the second index, which represent a set of columns, should be an int, a slice, a sequence or a numpy array. The result is a new storage with a new domain.. len (self ) Return the number of data instances (rows) Inspection Storage.X_density, Storage.Y_density, Storage.metas_density Indicates whether the attributes, classes and meta attributes are dense (Storage.DENSE) or sparse (Storage.SPARSE). If they are sparse and all values are 0 or 1, it is marked as (Storage.SPARSE_BOOL). The Storage class provides a default DENSE. If the data has no attibutes, classes or meta attributes, the corresponding method should re Filters Storage should define the following methods to optimize the filtering operations as allowed by the underlying data structure. Orange.data.Table executes them directly through numpy (or bottleneck or related) methods, while Orange.data.sql.Table appends them to the WHERE clause of the query that defines the data. 2 Chapter 1. Data model (data)

7 These methods should not be called directly but through the classes defined in Orange.data.filter. Methods in Orange.data.filter also provide the slower fallback functions for the functions not defined in the storage. Orange.data.storage._filter_is_defined(self, columns=none, negate=false) Extract rows without undefined values. Parameters columns (sequence of ints, variable names or descriptors) optional list of columns that are checked for unknowns negate (bool) invert the selection Returns a new storage of the same type or Table Return type Orange.data.storage.Storage Orange.data.storage._filter_has_class(self, negate=false) Return rows with known value of the target attribute. If there are multiple classes, all must be defined. Parameters negate (bool) invert the selection Returns a new storage of the same type or Table Return type Orange.data.storage.Storage Orange.data.storage._filter_same_value(self, column, value, negate=false) Select rows based on a value of the given variable. Parameters column (int, str or Orange.data.Variable) the column that is checked value (int, float or str) the value of the variable negate (bool) invert the selection Returns a new storage of the same type or Table Return type Orange.data.storage.Storage Orange.data.storage._filter_values(self, filter) Apply a the given filter to the data. Parameters filter (Orange.data.Filter) A filter for selecting the rows Returns a new storage of the same type or Table Return type Orange.data.storage.Storage Aggregators Similarly to filters, storage classes should provide several methods for fast computation of statistics. These methods are not called directly but by modules within Orange.statistics. _compute_basic_stats( self, columns=none, include_metas=false, compute_variance=false) Compute basic statistics for the specified variables: minimal and maximal value, the mean and a varianca (or a zero placeholder), the number of missing and defined values. Parameters columns (list of ints, variable names or descriptors of type Orange.data.Variable) a list of columns for which the statistics is computed; if None, the function computes the data for all variables 1.1. Data Storage (storage) 3

8 include_metas (bool) a flag which tells whether to include meta attributes (applicable only if columns is None) compute_variance (bool) a flag which tells whether to compute the variance Returns a list with tuple (min, max, mean, variance, #nans, #non-nans) for each variable Return type list Orange.data.storage._compute_distributions(self, columns=none) Compute the distribution for the specified variables. The result is a list of pairs containing the distribution and the number of rows for which the variable value was missing. For discrete variables, the distribution is represented as a vector with absolute frequency of each value. For continuous variables, the result is a 2-d array of shape (2, number-of-distinct-values); the first row contains (distinct) values of the variables and the second has their absolute frequencies. Parameters columns (list of ints, variable names or descriptors of type Orange.data.Variable) a list of columns for which the distributions are computed; if None, the function runs over all variables Returns a list of distributions Return type list of numpy arrays 1.2 Data Table (table) Constructors The preferred way to construct a table is to invoke a named constructor Inspection Row manipulation Weights 1.3 SQL table (data.sql) 1.4 Domain description (domain) 1.5 Variable Descriptors (variable) Every variable is associated with a descriptor that stores its name, type and other properties, and takes care of conversion of values from textual format to floats and back. Derived classes store lists of existing descriptors for variables of each particular type. The provide a supplementary constructor make, which returns an existing variable instead of creating a new one, when possible. Descriptors facilitate computation of variables from other variables. This is used in domain conversion: if the destination domain contains a variable that is not present in the original domain, the variables value is computed by calling its method compute_value, passing the data instance from the original domain as an argument. 4 Chapter 1. Data model (data)

9 Method compute_value calls get_value_from if defined and returns its result. If the variables does not have get_value_from, compute_value returns Unknown. 1.6 Values (value) 1.7 Data Instance (instance) Class Instance represents a data instance. The base class stores its own data, and derived classes are used to represent views into various data storages. Like data tables, every data instance is associated with a domain and its data is split into attributes, classes, meta attributes and the weight. The instance s data can be retrieved through attributes x, y and metas. For derived classes, changing this data does not necessarily modify the corresponding data tables or other structures. Use indexing to modify the data Rows of Data Tables 1.8 Data Filters (filter) Instances of classes derived from Filter are used for filtering the data. When called with an individual data instance (Orange.data.Instance), they accept or reject the instance by returning either True or False. When called with a data storage (e.g. an instance of Orange.data.Table) they check whether the corresponding class provides the method that implements the particular filter. If so, the method is called and the result should be of the same type as the storage; e.g., filter methods of Orange.data.Table return new instances of Orange.data.Table, and filter methods of SQL proxies return new SQL proxies. If the class corresponding to the storage does not implement a particular filter, the fallback computes the indices of the rows to be selected and returns data[indices]. 1.9 Continuization and normalization (continuizer) class Orange.data.continuizer.DomainContinuizer Construct a domain in which discrete attributes are replaced by continuous; existing continuous attributes can be normalized. The attributes are treated according to their types: binary variables are transformed into 0.0/1.0 or -1.0/1.0 indicator variables multinomial variables are treated according to the flag multinomial_treatment. discrete attribute with less than two possible values are removed; continuous variables can be normalized or left unchanged The typical use of the class is as follows: 1.6. Values (value) 5

10 continuizer = Orange.data.continuization.DomainContinuizer() continuizer.multinomial_treatment = continuizer.lowestisbase domain0 = continuizer(data) data0 = Orange.data.Table(domain0, data) Domain continuizers can be given either a data set or a domain, and return a new domain. When given only the domain, they cannot normalize continuous attributes or used the most frequent value as the base value, and will raise an exception with such settings. Constructor can also be passed data or domain, in which case the constructed continuizer is immediately applied to the data or domain and the transformed domain is returned instead of the continuizer instance. By default, the class does not change continuous and class attributes, discrete attributes are replaced with N attributes (NValues) with values 0 and 1. zero_based Determines the value used as the low value of the variable. When binary variables are transformed into continuous, when multivalued variable is transformed into multiple variables, the transformed variable can either have values 0.0 and 1.0 (default, zero_based is True) or -1.0 and 1.0 (zero_based is False). This attribute also determines the interval for normalized continuous variables (either [-1, 1] or [0, 1]). The following text assumes the default case. (Default: False) multinomial_treatment Decides the treatment of multinomial variables. Let N be the number of the variables s values. (Default: NValues) DomainContinuizer.NValues The variable is replaced by N indicator variables, each corresponding to one value of the original variable. For each value of the original attribute, only the corresponding new attribute will have a value of one and others will be zero. Note that these variables are not independent, so they cannot be used (directly) in, for instance, linear or logistic regression. For example, data set bridges has feature RIVER with values M, A, O and Y, in that order. Its value for the 15th row is M. Continuization replaces the variable with variables RIVER=M, RIVER=A, RIVER=O and RIVER=Y. For the 15th row, the first has value 1 and others are 0. DomainContinuizer.LowestIsBase Similar to the above except that it creates only N-1 variables. The missing indicator belongs to the lowest value: when the original variable has the lowest value all indicators are 0. If the variable descriptor has the base_value defined, the specified value is used as base instead of the lowest one. Continuizing the variable RIVER gives similar results as above except that it would omit RIVER=M ; all three variables would be zero for the 15th data instance. DomainContinuizer.FrequentIsBase Like above, except that the most frequent value is used as the base. If there are multiple most frequent values, the one with the lowest index is used. The frequency of values is extracted from data, so this option cannot be used if constructor is given only a domain. Variable RIVER would be continuized similarly to above except that it omits RIVER=A, which is the most frequent value. DomainContinuizer.Ignore Discrete variables are omitted. 6 Chapter 1. Data model (data)

11 DomainContinuizer.IgnoreMulti Discrete variables with more than two values are omitted; two-valued are treated the same as in LowestIsBase. DomainContinuizer.ReportError Raise an error if there are any multinominal variables in the data. DomainContinuizer.AsOrdinal Multivalued variables are treated as ordinal and replaced by a continuous variables with the values index, e.g. 0, 1, 2, 3... DomainContinuizer.AsNormalizedOrdinal As above, except that the resulting continuous value will be from range 0 to 1, e.g. 0, 0.25, 0.5, 0.75, 1 for a five-valued variable. normalize_continuous If None, continuous variables are left unchanged. If DomainContinuizer.NormalizeBySD, they are replaced with standardized values by subtracting the average value and dividing by the standard deviation. Attribute zero_based has no effect on this standardization. If DomainContinuizer.NormalizeBySpan, they are replaced with normalized values by subtracting min value of the data and dividing by span (max - min). Statistics are computed from the data, so constructor must be given data, not just domain. (Default: None) transform_class If True the class is replaced by continuous attributes or normalized as well. Multiclass problems are thus transformed to multitarget ones. (Default: False) 1.10 Data discretization (discretization) Continuous features in the data can be discretized using a uniform discretization method. Discretizatione replaces continuous features with the corresponding categorical features: import Orange iris = Orange.data.Table("iris.tab") disc_iris = Orange.data.discretization.DiscretizeTable(iris, method=orange.feature.discretization.equalfreq(n=3)) print("original data set:") for e in iris[:3]: print(e) print("discretized data set:") for e in disc_iris[:3]: print(e) Discretization introduces new categorical features with discretized values: Original data set: [5.1, 3.5, 1.4, 0.2 Iris-setosa] [4.9, 3.0, 1.4, 0.2 Iris-setosa] [4.7, 3.2, 1.3, 0.2 Iris-setosa] Discretized data set: [< , >= , < , < Iris-setosa] [< , [ , ), < , < Iris-setosa] [< , >= , < , < Iris-setosa] Data discretization uses feature discretization classes from Orange.feature.discretization and applies them on entire data set. Default discretization method (equal frequency with four intervals) can be replaced with other discretization approaches as demonstrated below: Data discretization (discretization) 7

12 disc = Orange.data.discretization.DiscretizeTable() disc.method = Orange.feature.discretization.EqualFreq(n=2) disc_iris = disc(iris) Data discretization classes 8 Chapter 1. Data model (data)

13 CHAPTER 2 Feature (feature) 2.1 Scoring (scoring) Feature score is an assessment of the usefulness of the feature for prediction of the dependant (class) variable. Orange provides classes that compute the common feature scores for classification and regression regression. The code below computes the information gain of feature tear_rate in the Lenses data set: >>> data = Orange.data.Table("lenses") >>> Orange.feature.scoring.InfoGain("tear_rate", data) An alternative way of invoking the scorers is to construct the scoring object, like in the following example: >>> gain = Orange.feature.scoring.InfoGain() >>> for att in data.domain.attributes:... print("%s %.3f" % (att.name, gain(att, data))) age prescription astigmatic tear_rate Feature scoring in classification problems class Orange.feature.scoring.InfoGain Information gain is the expected decrease of entropy. See Wikipedia entry on information gain. class Orange.feature.scoring.GainRatio Information gain ratio is the ratio between information gain and the entropy of the feature s value distribution. The score was introduced in [Quinlan1986] to alleviate overestimation for multi-valued features. See Wikipedia entry on gain ratio. class Orange.feature.scoring.Gini Gini index is the probability that two randomly chosen instances will have different classes. See Wikipedia entry on gini index Feature scoring in regression problems TBD. 9

14 Bibliography 2.2 Feature discretization (discretization) This module transforms continuous features into discrete. Usually, one would like to discretize the whole data set with the Orange.data.discretization module. This module is limited to single features. Discretization Algorithms return a discretized variable (with fixed parameters) that can transform either the learning or the testing data. Parameter learning is separate to the transformation, as in machine learning only the training set should be used to induce parameters. To obtain discretized features, call a discretization algorithm with with the data and the feature to discretize. The feature can be given either as an index name or Orange.data.Variable. The following example creates a discretized feature. import Orange data = Orange.data.Table("iris.tab") disc = Orange.feature.discretization.EqualFreq(n=4) disc_var = disc(data, 0) The values of the first attribute will be discretized the data is transformed to the Orange.data.Domain domain that includes disc_var. In the example below we add the discretized first attribute to the original domain. ndomain = Orange.data.Domain([disc_var] + list(data.domain.attributes), data.domain.class_vars) ndata = Orange.data.Table(ndomain, data) print(ndata) The printout: [[< , 5.1, 3.5, 1.4, 0.2 Iris-setosa], [< , 4.9, 3.0, 1.4, 0.2 Iris-setosa], [< , 4.7, 3.2, 1.3, 0.2 Iris-setosa], [< , 4.6, 3.1, 1.5, 0.2 Iris-setosa], [< , 5.0, 3.6, 1.4, 0.2 Iris-setosa],... ] Discretization Algorithms 10 Chapter 2. Feature (feature)

15 CHAPTER 3 Classification (classification) 3.1 Majority (majority) Accuracy of models is often compared with the default accuracy, that is, the accuracy of a model which classifies all instances to the majority class. Fitting such a model consists of computing the class distribution and its modus. The model is represented as an instance of Orange.classification.majority.ConstantClassifier Example >>> iris = Orange.data.Table( iris ) >>> maj = Orange.classification.majority.MajorityLearner() >>> model = maj(iris[30:110]) >>> print(model) ConstantClassifier [ ] >>> model(iris[0]) Value( iris, Iris-versicolor) 3.2 Naive Bayes (naive_bayes) The module for Naive Bayes (NB) classification contains two implementations: Orange.classification.naive_bayes.BayesClassifier is based on the popular scikit-learn package. Orange.classification.naive_bayes.BayesStorageClassifier demonstrates the use of the contingency table provided by the underlying storage. It returns only the predicted value Example >>> from Orange.data import Table >>> from Orange.classification.naive_bayes import * >>> iris = Table( iris ) >>> fitter = NaiveBayes() >>> model = fitter(iris[:-1]) >>> model(iris[-1]) Value( iris, Iris-virginica) 11

16 3.3 Support Vector Machines (svm) The module for Support Vector Machine (SVM) classification is based on the popular scikit-learn package, which uses LibSVM and LIBLINEAR libraries. It provides several learning algorithms: Orange.classification.svm.SVMLearner, a general SVM learner; Orange.classification.svm.LinearSVMLearner, a fast learner useful for data sets with a large number of features; Orange.classification.svm.NuSVMLearner, a general SVM learner that uses a parameter to control the number of support vectors; Orange.classification.svm.SVRLearner, a general learner for support vector regression; Orange.classification.svm.NuSVRLearner, which is similar to NuSVMLearner for regression; Orange.classification.svm.OneClassSVMLearner, unsupervised learner useful for novelty and outlier detection Example >>> from Orange.data import Table >>> from Orange.classification.svm import SVMLearner >>> from Orange.evaluation import CrossValidation, CA >>> iris = Table( iris ) >>> fitter = SVMLearner() >>> cross = CrossValidation(iris, fitter) >>> prediction = cross.kfold(10) >>> CA(iris, prediction[0]) Chapter 3. Classification (classification)

17 CHAPTER 4 Evaluation (evaluation) 4.1 Sampling procedures for testing models (testing) 13

18 14 Chapter 4. Evaluation (evaluation)

19 CHAPTER 5 Miscellaneous (misc) 5.1 Symbolic constants (enum) 15

20 16 Chapter 5. Miscellaneous (misc)

21 CHAPTER 6 Development 6.1 Widget development Library of Common GUI Controls gui is a library of functions which allow constructing a control (like check box, line edit or a combo), inserting it into the parent s layout, setting tooltips, callbacks and so forth, establishing synchronization with a Python object s attribute (including saving and retrieving when the widgets is closed and reopened)... in a single call. Almost all functions need three arguments: the widget into which the control is inserted, the master widget with one whose attributes the control s value is synchronized, the name of that attribute (value). All other arguments should be given as keyword arguments for clarity and also for allowing the potential future compatibility-breaking changes in the module. Several arguments that are common to all functions must always be given as keyword arguments. Common options All controls accept the following arguments that can only be given as keyword arguments. tooltip A string for a tooltip that appears when mouse is over the control. disabled Tells whether the control be disabled upon the initialization. addspace Gives the amount of space that is inserted after the control (or the box that encloses it). If True, a space of 8 pixels is inserted. Default is 0. addtolayout The control is added to the parent s layout unless this flag is set to False. stretch The stretch factor for this widget, used when adding to the layout. Default is 0. sizepolicy The size policy for the box or the control. Common Arguments Many functions share common arguments. widget Widget on which control will be drawn. 17

22 master Object which includes the attribute that is used to store the control s value; most often the self of the widget into which the control is inserted. value String with the name of the master s attribute that synchronizes with the the control s value.. box Indicates if there should be a box that around the control. If box is False (default), no box is drawn; if it is a string, it is also used as the label for the box s name; if box is any other true value (such as True :), an unlabeled box is drawn. callback A function that is called when the state of the control is changed. This can be a single function or a list of functions that will be called in the given order. The callback function should not change the value of the controlled attribute (the one given as the value argument described above) to avoid a cycle (a workaround is shown in the description of listbox function. label A string that is displayed as control s label. labelwidth The width of the label. This is useful for aligning the controls. orientation When the label is given used, this argument determines the relative placement of the label and the control. Label can be above the control, ( vertical or True - this is the default) or in the same line with control, ( horizontal or False). Orientation can also be an instance of QLayout. Common Attributes box If the constructed widget is enclosed into a box, the attribute box refers to the box. Widgets This section describes the wrappers for controls like check boxes, buttons and similar. Using them is preferred over calling Qt directly, for convenience, readability, ease of porting to newer versions of Qt and, in particular, because they set up a lot of things that happen in behind. Other widgets Utility functions Internal functions and classes This part of documentation describes some classes and functions that are used internally. The classes will likely maintain compatibility in the future, while the functions may be changed. Wrappers for Qt classes Wrappers for Python classes Other functions 18 Chapter 6. Development

23 CHAPTER 7 Indices and tables genindex modindex search 19

24 20 Chapter 7. Indices and tables

25 Bibliography [Kononenko2007] Igor Kononenko, Matjaz Kukar: Machine Learning and Data Mining, Woodhead Publishing, [Quinlan1986] J R Quinlan: Induction of Decision Trees, Machine Learning, [Breiman1984] L Breiman et al: Classification and Regression Trees, Chapman and Hall, [Kononenko1995] I Kononenko: On biases in estimating multi-valued attributes, International Joint Conference on Artificial Intelligence,

26 22 Bibliography

27 Index Symbols getitem () (in module Orange.data.storage), 2 len () (Orange.data.storage. method), 2 _compute_distributions() (in module Orange.data.storage), 4 _filter_has_class() (in module Orange.data.storage), 3 _filter_is_defined() (in module Orange.data.storage), 3 _filter_same_value() (in module Orange.data.storage), 3 _filter_values() (in module Orange.data.storage), 3 C classification majority classifier, 11 support vector machines (SVM), 12 D Data, 8 data discretization, 7 discretization, 7, 10 domain (in module Orange.data.storage), 2 F feature discretization, 10 feature scoring, 9 feature scoring, 9 gain ratio, 9 gini index, 9 information gain, 9 G GainRatio (class in Orange.feature.scoring), 9 Gini (class in Orange.feature.scoring), 9 I InfoGain (class in Orange.feature.scoring), 9 M majority classifier, 11 classification, 11 multinomial_treatment (Orange.data.continuizer.Orange.data.continuizer.DomainContinuizer attribute), 6 N Naive Bayes classifier, 11 normalize_continuous (Orange.data.continuizer.Orange.data.continuizer.DomainContinuizer attribute), 7 O Orange.data.continuizer.DomainContinuizer (class in Orange.data.continuizer), 5 S support vector machines (SVM), 12 classification, 12 T transform_class (Orange.data.continuizer.Orange.data.continuizer.DomainC attribute), 7 Z zero_based (Orange.data.continuizer.Orange.data.continuizer.DomainContin attribute), 6 23

Orange Data Mining Library Documentation

Orange Data Mining Library Documentation Release 3 Orange Data Mining Jan 30, 2018 Contents 1 Tutorial 1 1.1 The Data................................................. 1 1.1.1 Data Input............................................