Orange3-Prototypes Documentation. Biolab, University of Ljubljana

Size: px

Start display at page:

Download "Orange3-Prototypes Documentation. Biolab, University of Ljubljana"

Judith Parks
5 years ago
Views:

1 Biolab, University of Ljubljana Dec 17, 2018

3 Contents 1 Widgets 1 2 Indices and tables 11 i

4 ii

5 CHAPTER 1 Widgets 1.1 Contingency Table Construct a contingency table from given data. Inputs Data input dataset Outputs Contingency Table data table with frequency counts Contingency Table computes occurrences (frequencies) of two discrete variables (rows and columns). 1. Attribute values placed in rows. 2. Attribute values placed in columns. 3. Click Apply to commit the changes. To communicate changes automatically tick Apply Automatically. 4. Access widget help and produce report. 1

1.1.1 Example Contingency table can be computed only for discrete variables, so we will use titanic data set as an example. Load the data in the File widget and pass it to Contingency Table.

6 1.1.1 Example Contingency table can be computed only for discrete variables, so we will use titanic data set as an example. Load the data in the File widget and pass it to Contingency Table. Say I want to know how many second class passengers on Titanic were children. Let us select status for rows and age for columns. We can observe the computed table in a Data Table widget. The answer to our question seems to be EnKlik Anketa Import data from EnKlikAnketa (1ka.si) public URL. Inputs None Outputs Data survey results The EnKlik Anketa widget retrieves survey results obtained from the EnKlikAnketa service. You need to create a public link to to retrieve the results. Go to the survey you wish to retrieve, then select Data (Podatki) tab and create a public link (javna povezava) at the top right corner. Then insert the link into the Public link URL field. The link should look something like this: podatki/123456/78a9b1cd/. 1. A public link to the survey results. To observe the results live, set the reload rate (5s - 5 min). 2. Attribute list. You can change the attribute type and role, just like in the File widget. 2 Chapter 1. Widgets

7 1.2. EnKlik Anketa 3

3. Survey meta information. 4. Tick the box on the left to commit the changes automatically. Alternatively, click Commit. 5. Access widget help. 1.2.

8 3. Survey meta information. 4. Tick the box on the left to commit the changes automatically. Alternatively, click Commit. 5. Access widget help Example EnKlik Anketa widget is great for observing results from online surveys. We have created a sample survey and imported it into the widget. We have 41 responses and we have asked 8 questions, 7 of which were recognized as features and 1 as a meta attribute. The widget sets questions from the survey as feature names. This, however, might be slighlty impractical for analytical purposes, as we can see in the Data Table. We will shorten the names with Edit Domain widget. Edit Domain enables us to change attribute names and even rename attribute values for discrete attributes. Now our attribute names are much easier to work with, as we can see in Data Table (1). 1.3 Neighbors Compute nearest neighbors in data according to reference Signals Inputs: 4 Chapter 1. Widgets

Data An input data set. Reference A reference data instance for neighbor computation. Outputs: Neighbors A data table of nearest neighbors according to reference. 1.3.

9 Data An input data set. Reference A reference data instance for neighbor computation. Outputs: Neighbors A data table of nearest neighbors according to reference Description The Neighbors widget computes nearest neighbors for a given reference and for a given distance measure. 1. Information on the input data. 2. Distance measure for computing neighbors. Supported measures are: Euclidean, Manhattan, Mahalanobis, Cosine, Jaccard, Spearman, absolute Spearman, Pearson, absolute Pearson. If Exclude references is ticked, reference data won t be included in the output. 3. Number of neighbors on the output. 4. Click Apply to commit the changes. To communicate changes automatically tick Apply Automatically. 5. Access widget help Examples In the first example, we used iris data and passed it to Neighbors and to Data Table. In Data Table, we selected an instance of iris, that will serve as our reference, meaning we wish to retrieve 10 closest examples to the select data instance. We connect Data Table to Neighbors as well. We can observe the results of neighbor computation in Data Table (1), where we can see 10 closest images to our selected iris flower. Another example requires the installation of Image Analytics add-on. We loaded 15 paintings from famous painters with Import Images widget and passed them to Image Embedding, where we selected Painters embedder Neighbors 5

Then the procedure is the same as above. We passed embedded images to Image Viewer and selected a painting from Monet to serve as our reference image.

10 Then the procedure is the same as above. We passed embedded images to Image Viewer and selected a painting from Monet to serve as our reference image. We passed the image to Neighbors, where we set the distance measure to cosine, ticked off Exclude reference and set the neighbors to 2. This allows us to find the actual closest neighbor to a reference painting and observe them side by side in Image Viewer (1). 1.4 Parallel Coordinates Parallel coordinates display of multi-dimensional data. Inputs Outputs Data input dataset Features list of attributes Selected Data instances selected from the plot Annotated Data data with an additional column showing whether a point is selected Features list of attributes The Parallel Coordinates widget shows high-dimensional data in a plot. The widget will display the first 9 attributes and color them by class, if class is present. The widget also enables plot optimization and subset selection. 1. Color lines (instances) by an attribute. Colored by class by default. 2. Select the dimensions you wish to display. Click Optimizie Selected Dimensions to optimize the plot. 3. Click Apply to commit the changes. To communicate changes automatically tick Apply Automatically. 4. Access help, save image and produce a report. To select a subset from the plot, hover over the dimension until you see the cursor change to a + and drag the selection across the dimension. Selected data instances will be on the output of the widget. You can select several subsets of 6 Chapter 1. Widgets

11 1.4. Parallel Coordinates 7

dimensions - only those data instances that match the all the criteria will be on the output. To remove the selection, click on the dimension outside of the selected range. 1.4.

12 dimensions - only those data instances that match the all the criteria will be on the output. To remove the selection, click on the dimension outside of the selected range Example Parallel Coordinates can display multi-dimensional data, hence we will use heart-disease data set. We load it with the File widget and send it to Parallel Coordinates. We optimized the projection and selected patients who have left vent hypertrophy and a cholesterol level between 200 and 300. Finally, we sent the selected patients to Data Table for observation. 1.5 Feature Statistics Show basic statistics for data features. Inputs Data input dataset Outputs None The Feature Statistics widget displays basic infomation on feature type, its distribution, center, standard deviation, minimum and maximum value and the proportion of missing values. 1. Information on the input data. 2. Color histrograms in Distribution column by a feature. 3. Click Send to commit the changes. To communicate changes automatically tick Send Automatically. 4. Access widget help and create a report. 5. Feature statistics table: feature type (numeric, categorical, text or time) 8 Chapter 1. Widgets

13 1.5. Feature Statistics 9

value percentage of missing values 1.5.1 Example We will use iris data in the File widget and pass it to Feature Statistics.

14 feature name distribution in histogram (continuous variables binned in 10 bins). Class variable is used as color by default. center of the feature (mean for numeric, mode for categorical, text and time report nan) dispersion of the feature (standard deviation for numeric, entropy for categorical) minimum value maximum value percentage of missing values Example We will use iris data in the File widget and pass it to Feature Statistics. We see that iris data has 4 numeric features and 1 categorical class variable. Distributions in the widget are colored by class, where iris-setosa is blue, iris-versicolor red and iris-virginica green. We can observe other basic statistics and see whether there are any missing values in our data. Try changing the data set to housing, banking-crises and zoo to observe different feature types. 10 Chapter 1. Widgets

15 CHAPTER 2 Indices and tables genindex modindex search 11

Orange3 Educational Add-on Documentation

Orange3 Educational Add-on Documentation Release 0.1 Biolab Jun 01, 2018 Contents 1 Widgets 3 2 Indices and tables 27 i ii Widgets in Educational Add-on demonstrate several key data mining and machine