KNIME User Training KNIME AG. Copyright 2017 KNIME AG

Size: px
Start display at page:

Download "KNIME User Training KNIME AG. Copyright 2017 KNIME AG"

Transcription

1 KNIME User Training KNIME AG

2 Overview KNIME Analytics Platform 1 2

3 What is KNIME Analytics Platform? A tool for data analysis, manipulation, visualization, and reporting Based on the graphical programming paradigm Provides a diverse array of extensions: Text Mining Network Mining Cheminformatics Weka machine learning Many integrations, such as Java, R, Python, etc. 2 3

4 Additional Resources KNIME pages ( SOLUTIONS for example workflows RESOURCES/LEARNING HUB RESOURCES/NODE GUIDE KNIME Tech pages (tech.knime.org) FORUM for questions and answers DOCUMENTATION for docs, FAQ, changelogs,... COMMUNITY CONTRIBUTIONS for dev instructions and third party nodes KNIME TV on YouTube 3 4

5 The KNIME Analytics Platform 4 5

6 Visual KNIME Workflows NODES perform tasks on data Inputs Outputs Status Not Configured Idle Executed Error Nodes are combined to create WORKFLOWS 5 6

7 Data Access Databases MySQL, PostgreSQL any JDBC (Oracle, DB2, MS SQL Server) Files Csv, txt Excel, Word, PDF SAS, SPSS XML PMML Images, texts, networks, chem Web, Cloud REST, Web services Twitter, Google 6 7

8 Big Data Spark HDFS support Hive Impala HP Vertica In-database processing 7 8

9 Transformation Preprocessing Row, column, matrix based Data blending Join, concatenate, append Aggregation Grouping, pivoting, binning Feature Creation and Selection 8 9

10 Analyze & Data Mining Regression Linear, logistic Classification Decision tree, ensembles, SVM, MLP, Naïve Bayes Clustering k-means, DBSCAN, hierarchical Validation Cross-validation, scoring, ROC Misc PCA, MDS, item set mining External R, Weka 9 10

11 Visualization Interactive Scatter plot, histogram, pie charts, box plot Highlighting (brushing) JFreeChart JavaScript Misc Tag cloud, open street map, networks, molecules External R 10 11

12 Deployment Database Files Excel, csv, txt XML PMML to: local, KNIME Server, SSH-, FTP-Server BIRT Reporting 11 12

13 Over 1500 native and embedded nodes included: Data Access MySQL, Oracle,... SAS, SPSS,... Excel, Flat,... Hive, Impala,... XML, JSON, PMML Text, Doc, Image,... Web Crawlers Industry Specific Community / 3rd Transformation Row, Column Matrix Text, Image Time Series Java Python Community / 3rd Analysis & Mining Statistics Data Mining Machine Learning Web Analytics Text Mining Network Analysis Social Media Analysis R, Weka, Python Community / 3rd Visualization R JFreeChart JavaScript Community / 3rd Deployment via BIRT PMML XML, JSON Databases Excel, Flat, etc. Text, Doc, Image Industry Specific Community / 3rd 12 13

14 Overview Installing KNIME Analytics Platform The KNIME Workspace The KNIME File Extensions The KNIME Workbench Workflow editor Explorer Node repository Node description Installing new features 13 14

15 Install KNIME Analytics Platform Select the KNIME version for your computer: Mac, Win, or Linux and 32 / 64bit Note different downloads (minimal or full) Download archive and extract the file, or download installer package and run it 14 15

16 Start KNIME Analytics Platform Go to the installation directory and launch KNIME, or use the shortcut created on your Desktop

17 The KNIME Workspace The workspace is the folder/directory in which workflows (and potentially data files) are stored for the current KNIME session. Workspaces are portable (just like KNIME) 16 17

18 Welcome Page 17 18

19 The KNIME Workbench Servers and Workflows Workflow Editor Node Recommendations Node Description Node Repository Console Outline Licensed under a Creative Commons Attribution

20 Creating New Workflows, Importing and Exporting Right-click Workspace in KNIME Explorer to create new workflow or workflow group or to import workflow Right-click on workflow or workflow group to export 20

21 KNIME File Extensions Dedicated file extensions for Workflows and Workflow groups associated with KNIME Analytics Platform *.knwf for KNIME Workflow Files *.knar for KNIME Archive Files 20 21

22 More on Nodes A node can have 3 states: Idle: The node is not yet configured and cannot be executed with its current settings. Configured: The node has been set up correctly, and may be executed at any time Executed: The node has been successfully executed. Results may be viewed and used in downstream nodes

23 Inserting and Connecting Nodes Insert nodes into workspace by dragging them from Node Repository or by double-clicking in Node Repository Connect nodes by left-clicking output port of Node A and dragging the cursor to (matching) input port of Node B Common port types: Model Image Flow Variable Data Database Conection Database Query 22 23

24 Node Configuration Most nodes require configuration To access a node configuration window: Double-click the node Right-click > Configure 23 24

25 Node Execution Right-click node Select Execute in context menu If execution is successful, status shows green light If execution encounters errors, status shows red light 24 25

26 Node Views Right-click node Select Views in context menu Select output port to inspect execution results Plot View Data View 25 26

27 Workflow Coach Recommendation engine It gives hints about which node use next in the workflow Based on KNIME communities' usage statistics Usage statistics available also with Personal Productivity Extension and KNIME Server products (these products require a purchased license) 26 27

28 Getting Started: KNIME Example Server Public repository with large selection of example workflows for many, many applications Connect via KNIME Explorer 27 28

29 Online Node Guide Workflows from Example Server also available online

30 Hot Keys (for future reference) Task Hot key Description Node Configuration F6 opens the configuration window of the selected node Node Execution Move Nodes and Annotations Workflow Operations F7 Shift + F7 Shift + F10 F9 Shift + F9 Ctrl + Shift + Arrow Ctrl + Shift + PgUp/PgDown F8 Ctrl + S Ctrl + Shift + S Ctrl + Shift + W executes selected configured nodes executes all configured nodes executes all configured nodes and opens all views cancels selected running nodes cancels all running nodes moves the selected node in the arrow direction moves the selected annotation in the front or in the back of all overlapping annotations resets selected nodes saves the workflow saves all open workflows closes all open workflows Meta-node Shift + F12 opens meta-node wizard 29 30

31 Importing Data Accessing files and databases 1 31

32 Data Source Nodes Typically characterized by: Orange color No input ports, 1-2 output ports Status Output port Node description 2 32

33 New Node: File Reader Workhorse of the KNIME Source nodes Reads text based files Many advanced features allow it to read most weird files Short lines, inline comments, headers and special encoding YouTube KNIME TV Channel video:

34 File Reader Configuration File path Basic Settings Advanced Settings Preview Help Button 4 34

35 Alternative Faster Way Drag & Drop OR Copy & Paste 5 35

36 Filenames and the knime:// protocol Absolute URL Mountpoint-relative URL Local path 36

37 Workflow Relative File Paths Best choice if workflows are to be shared Requires matching folder structure within workflow group Independent of environment outside of workflow group Example: Path to Sentiment Analysis.table Local path: C:\Users\rb\knime-workspace\KNIMEUserTraining\data\Sentiment Analysis.table Workflow relative: YouTube KNIME TV Channel:

38 New Node: Excel Reader (XLS) Reads.xls and.xlsx file from Microsoft Excel Supports reading from multiple sheets 8 38

39 Excel Reader Configuration File path Sheet specific settings Preview 9 39

40 New Node: Table Reader Reads tables from the native KNIME Format. Maximum performance, minimum configuration File path YouTube KNIME TV channel video:

41 Database Connectivity Read data from any JDBC enabled database Write your own SQL or model it using dedicated nodes 11 41

42 New Nodes: Database Connectors Native: Postgres, MySQL, SQLite, MSSQL (SQL Server) Database Connector (e.g. Oracle, DB2, HANA). Commercial extensions: HIVE and Impala 12 42

43 New Node: SQLite Propagate connection information to other DB nodes File-based database Useful for prototyping (switch to real connector on deployment) 13 43

44 New Node: Database Table Selector Take connection information and construct a query Explore DB metadata Output is an SQL query 14 44

45 New Node: Database Connection Table Reader Executes SQL Query Reads results into a KNIME table 15 45

46 Other Useful Data Sources PMML Reader reads standard predictive models XML Reader with XPATH support Python/R Source nodes SAS7BDAT (Labs) REST/SOAP for web services, and many more 16 46

47 Importing Data Exercise Starting with exercise: Importing Data Read the following files Sentiment Analysis.table Sentiment Rating.csv Product Data2.xls Optional: Read table web_activity from the database WebActivity.sqlite (hint: drag and drop the file to your workflow to get started) 17 47

48 Data Manipulation Clean, join, aggregate 1 48

49 Data Manipulation Nodes Yellow color with a variety of input and output ports Apply a transformation to input data Many, many nodes! 2 49

50 New Node: Concatenate Combine rows from 2 tables with shared columns Handles duplicate row keys gracefully Take the union or intersection of columns 3 50

51 New Node: Cell Replacer Replaces the content of a column based on a lookup Top port references the table to be searched Bottom port holds the lookup table (search keys and replacement values) 4 51

52 New Node: String Manipulation Create and edit values in String columns Clean up capitalization (eg. Lowercase) Modify existing strings or create new columns 5 52

53 Data Manipulation Exercise, Activity I Starting with exercise: Data Manipulation, Activity I Concatenate web activity data from old and new systems Replace sentiment evaluation (strings) with corresponding numeric values Use String Manipulation to ensure that all entries of the Products column are lower case from the product data spreadsheet. 6 53

54 Joining Columns of Data Left Table Join by ID Right Table Inner Join Left Outer Join Missing values in the right table. Missing values in the left table. Right Outer Join 7 54

55 Joining Columns of Data Left Table Join by ID Right Table Full Outer Join Missing values in the left table. Missing values in the right table. 8 55

56 New Node: Joiner Combines columns from 2 different tables Top port contains Left data table Bottom port contains the Right data table 9 56

57 Joiner Configuration Linking Rows Values to join on. Multiple joining columns are allowed

58 Joiner Configuration Column Selection Columns from left table to output table Columns from right table to output table 11 58

59 Data Aggregation aggregated on group by method: sum( value ) 12 59

60 New Node: GroupBy Aggregate to remove duplicates or summarize data First tab provides grouping options Second tab provides control over aggregation details Aggregation columns Aggregation methods YouTube KNIME TV video:

61 In-database Data Manipulation Model SQL query using nodes DB versions of GroupBy, Joiner, Row Filter, Sorter, etc

62 Comments & Annotations Double-click to write Right-click to change properties Double-click to write Right-click to change properties YouTube KNIME TV Channel:

63 Workflow Organisation Good Practices Workflow annotations Node labels Metanodes Right click -> Collapse... Organize workflow by task Hide complexity & improve readability 16 63

64 Data Manipulation Exercise, Activity II Starting with exercise Data Manipulation, Activity II Join all data together using a series of joiner nodes and the Customer Key field Resolve duplicates in the joined dataset (hint: GroupBy node) Clean up and document your workflow using annotations, node labels and metanodes 17 64

65 Data Visualization Charts and tables 1 65

66 Data Visualization 2 66

67 Data Visualization Interactive plots and tables (with Highlighting) JavaScript integration for interactive views R View nodes for building advanced graphics in KNIME 3 67

68 New Node: Color Manager One of several visual property managers (e.g. size, shape) Color by nominal or continuous values Sync colors between views using the color model port 4 68

69 Hiliting Hilited data is visible across all views Keep multiple views open to explore complex data 5 69

70 New Node: Scatter Plot Plot different columns on X and Y Displays data including pre calculated visual properties (size, shape, color) Supports highlighting Produces a view, no image 6 70

71 New Node: JavaScript Scatter Plot Plot different columns on X and Y Displays data including pre calculated visual properties (size, shape, color) Does not support highlighting Produces an interactive view and an image 7 71

72 New Node: JavaScript Scatter Plot 3 configuration tabs 8 72

73 New Node: Scatter Plot (JFreeChart) Plot different columns on X and Y Displays data including pre-calculated visual properties (size, shape, color) Does not support highlighting Produces a static view and an image 9 73

74 New Node: Interactive Table No graphics Supports highlighting 10 74

75 Other Nodes: R View R View nodes for maximum customizibility 11 75

76 Visualization Exercise Start with exercise: Visualization Color data by Product Produce a scatter plot of Age vs. Estimated Yearly Income In the Scatter Plot node highlight some data points and view only highlighted points using an Interactive Table node Optional: Visualize data with the R (View node) (start with a script like: plot(knime.in) 12 76

77 Data Mining Partition, learn, predict, score 1 77

78 Data Mining Strategies Example applications: Anomaly Detection (fraud, predictive maintenance) Association Rule Learning (market basket analysis) Clustering (market segmentation) Classification (next best offer, churn preventions) Regression (trend estimation) 2 78

79 Data Mining: Process Overview Original Data Set Training Set Train Model Apply Model Score Model Test Set Partition data Train and apply models Evaluate performance 3 79

80 Data Mining in KNIME KNIME has many modeling tools! Decision tree, random forest, SVM, regression, neural networks, clustering, and integrations with other libraries: WEKA, libsvm, R, Python (scikit-learn) etc. And many model evaluation nodes ROC, standard, numeric and entropy scorers Feature elimination Cross validation 4 80

81 New Node: Partitioning Use to split data into training and evaluation sets Partition by count (e.g. 10 rows) or fraction (e.g. 10%) Sample by a variety of methods; random, linear, stratified 5 81

82 Learner-predictor Motif Most data mining approaches in KNIME use a Learner-predictor motif. The Learner node trains the model with its input data. Trained Model The Predictor node applies the model to a different subset of data. New data! 6 82

83 Classification Predict nominal outcomes on existing data (supervised) Applications Churn analysis (yes/no) Chemical activity (active/inactive) Spam detection (spam/not spam) Optical character recognition (A-Z) Methods Decision Trees Neural Networks Naïve Bayes Logistic Regression 7 83

84 KNIME s Decision Tree J.R. Quinlan, C4.5 Programs for machine learning J. Shafer, R. Agrawal, M. Mehta, SPRINT: A Scalable Parallel Classifier for Data Mining C4.5 builds a tree from a set of training data using the concept of information entropy. At each node of the tree, the attribute of the data with the highest normalized information gain (difference in entropy) is chosen to split the data. The C4.5 algorithm then recurses on the smaller sub lists. 8 84

85 New Node: Decision Tree Learner 9 85

86 Decision Tree View Most unmarried people earn < 50K per year 10 86

87 New Node: Decision Tree Predictor Takes a decision tree model & applies it to new data Check the box to append class probabilities 11 87

88 New Node: Scorer Compare predicted results to known truth in order to evaluate model quality Confusion matrix shows the distribution of model errors An accuracy statistics table provides a detailed analysis of model quality

89 New Node: Scorer True Positives Income = <=50K Predicted = <=50K False Positives Income = <=50K Predicted = >50K False Negatives True Negatives 13 89

90 Scorer: Accuracy Measures 14 90

91 Receiver Operating Characteristics Sort by confidence in target class Plot true positive rate vs false positive rate Ideal models achieve 100% TPR with 0% FPR Area under the curve indicates model quality (1=ideal model, 0.5 = random outcome) 15 91

92 New Node: ROC Curve Requires individual class probabilities from a preceding predictor User must define: 1. Original class column 2. Positive class value 3. Probability for that class from 1 or more models See also the JavaScript ROC Curve node 16 92

93 Data Mining Exercise, Activity I Starting with exercise: Data Mining, Activity I: Partition the fully joined data 50%, Stratified Sampling Train a decision tree on the training data (Learn against Target column) Use the model to predict the upsell potential for remaining records. Evaluate the quality of a model with a Scorer. Optional: Find AUC for the model using ROC curve

94 Regression Predict numeric outcomes on existing data (supervised) Applications Forecasting Quantitative Analysis Methods Linear Polynomial Regression Trees Partial Least Squares 18 94

95 New Nodes: Linear Regression Learner & Regression Predictor A linear model relating a dependent variable to 1 or more independent variables Model coefficients provided in 2nd output port Also available: Polynomial and Tree Ensemble Regression nodes 19 95

96 New Node: Numeric Scorer Similar to scorer node, but for nodes with numeric predictions (e.g. linear/polynomial regression) Compare dependent variable values to predicted values to evaluate goodness of fit. Report R 2, RMSD, SEM etc

97 Data Mining Exercise, Activity II Starting with exercise: Data Mining, Activity II: Partition the fully joined data 50%, Stratified Sampling Train a linear regression model that predicts age as a function of some other parameters in the data set Use the model to predict the age of the remaining users Evaluate the quality of a model with a Numeric Scorer. Is this model useful for predicting customer age? 21 97

98 Clustering Discover hidden structure in unlabeled data (unsupervised) Applications Market Segmentation Diversity picking Methods K-means/medoids Hierarchical DBScan Neighbourgrams 22 98

99 New Nodes: k-means Clustering Looks at n observations to define the means for k clusters. Each observation is then assigned to its closest cluster center. You must provide k

100 New Node: Entropy Scorer Similar to scorer node, but used with unsupervised learning (no target to predict) Cluster labels and reference clusters do not need to be in the same domain (e.g. Match Cluster 1 to iris setosa ) Reports entropy based statistics which indicate model quality (low entropy, high quality is the aim)

101 Data Mining Exercise, Activity III Start with exercise: Data Mining, Activity III Read the location_data.table file Filter to entries from California (region_code = CA) Train a k-means model with k=3. Use only position data for clustering (latitude and longitude) Evaluate with Entropy Scorer. Compare cluster labels to the City column. Hint: use the k-means output for both ports in the input to the scorer. Optional: Plot latitude and longitude in a view (OSM Map or Scatter Plot) and use that to help you visually optimize k

102 Integrating External Tools 1 102

103 Goal of This Session This session gives a quick overview of the external tools that can be called within KNIME, e.g.: Java, R, any external tool of choice, web service, and so on

104 KNIME Labs KNIME Labs enable you to preview new KNIME features and plug-ins that are still under development. The nodes provided in KNIME Labs (KNIME Tech) are not (yet) part of the official KNIME version because the functionality and/or API may not be finalized. You can get these plug-ins by installing the KNIME Labs extension package. Most of the plug-ins come already with a detailed description and some workflow examples

105 Java Snippet Fastest running scripting node in KNIME Syntax highlighting, auto completion, error checking Templates allow you to save scripts for later re-use Import custom libraries 4 105

106 Java Edit Variable Same as Java snippet, but without table ports 5 106

107 R Integration Run R inside KNIME. Works with existing R installations. Nodes for many tasks, but all with similiar interfaces First run: install.packages( Rserve ) and install.packages( Cairo )* *mac only 6 107

108 R Integration Syntax Highlighting Create and store templates Select R workspace Evaluate script R error messages Show Results 7 108

109 Python Integration Run Python inside KNIME Works with existing installations UI modeled after R integration 8 109

110 Python Scripting UI 9 110

111 RESTful Web Services In KNIME Labs: JSON Response: XML Response:

112 RESTful Web Services

113 KNIME Server as a REST resource

114 KNIME Server as a REST resource Use curl, SOAPUI or Chrome extension Postman to explore the REST API

115 Generic Web Service Client (SOAP) X resets all fields Input Values Analyze parses the wsdl file and automatically fills the details Output Values Select to include

116 External Tool Node

117 JSON and JSONPath Use the JSON Reader (or the GET Resource) nodes to get an JSON cell Use JSONPath nodes to query the JSON and extract certain parameters Editor window simplifies construction of JSON queries by auto-generating them (click on properties)

118 JSONPath

119 XML and XPath Use the XML Reader (or the GET Resource) nodes to get an XML cell Use XPath nodes to query the XML and extract certain parameters Editor window simplifies construction of XPath queries by auto-generating them (click on XML elements)

120 XPath

121 Hive, HDFS, Spark Architecture

122 Big Data Connector Extensions

123 Query Big Data Platform with Hive

124 Loading Data to Hive

125 Spark Functionality

126 Machine Learning in Spark

127 Exercises Use the REST nodes to call an external web service Choose either books.json or books.xml and use the appropriate tools to extract the book name, author and title Programmers: choose your favorite from Java, Python and R Take the KNIME table and multiply all values in one column by 10. Python/R: Build a simple decision tree learner Alternatively use the Generic JavaScript node and D3.js to create a custom JavaScript visualization that can be viewed in KNIME Analytics Platform, and KNIME WebPortal

128 Exporting Data & Deployment 1 128

129 Exporting Data After an analysis is completed, what next? Write results to a file Create/update a database Save the model for use elsewhere Generate a rich report Deploy via KNIME WebPortal Deploy via workflow as RESTful web service 2 129

130 Input/Output in Deployment Input File (CSV, Table, XLS, ) Database JSON for REST API Output Report (BIRT, Tableau, Spotfire) File (CSV, table, XLS, ) Dashboard on WebPortal 3 130

131 To Report / To BIRT Report Also available Nodes to Tableau and Spotfire 4 131

132 To File / Database 5 132

133 REST API on Server 6 133

134 To Dashboard on WebPortal Step 1 Upload File Step 2 Dashboard 7 134

135 Workflow on KNIME WebPortal Step 1 Upload File Step 2 Dashboard Available in KNIME Server 8 135

136 Wrapped Node to produce Dashboard on Web Page Interactive Table / Plots on web page Bar Chart HTML Text 9 136

137 Data Export Nodes Typically characterized by: Magenta color 1 input port, no output ports Create file on file system or write to database

138 New Node: Table Writer

139 New Node: XLS Writer

140 New Node: Database Writer Only if no Connector node

141 New Node: Database Update

142 Reporting in KNIME Reporting in KNIME is done via a 3 rd party application named BIRT (Business Intelligence Reporting Tool) Data is sent to BIRT from KNIME using special nodes. Reports in BIRT are constructed from report items, which may include images, tables, charts and labels. Reports may be generated in a variety of formats (html, pdf, pptx, xlsx, docx)

143 New Node: Data to Report Send a data table to BIRT

144 New Node: Image to Report Send an image to BIRT PNG and SVG are supported formats (see node description for details)

145 Edit the Report Open the workflow > Click the Report Editor button in the tool bar Licensed under a Creative Commons Attribution

146 Reporting Perspective Data from KNIME Views Report Layout Report Items

147 Charting in BIRT Many chart types Fine control of plot appearance Familiar Excel Like interface Supports interactivity

148 Exporting Data Exercise Starting with exercise: Exporting Data Write your predictions to disk as a KNIME table Create a heatmap of the normalized confusion matrix and send it to BIRT Send your model accuracy table to BIRT Define a very simple report showing the accuracy of your model Generate a PDF of your report

149 Flow Variables 1 149

150 Goal of this Session What is a Flow Variable? Create a Flow Variable Use a Flow Variable as a parameter in the node settings Use a quickform node to parameterise a Wrapped Metanode 2 150

151 Flow variables: Usage example Each month I am asked to produce a sales report for the most popular product Filter only rows With Products = Gold Investment 3 151

152 Flow variables: Usage example Each month I need to launch the Analytics Platform, run the workflow up to the GroupBy node, then get the most popular product and update the Row Filter. Or do I? Perhaps flow variables can help 4 152

153 Automatically filter by most popular product Count products, and put most important at the top of the list Create a variable containing the most popular product Pass the flow variable to the row filter 5 153

154 Flow Variable Ports 6 154

155 Apply a Flow Variable (button) The Flow Variable button 7 155

156 Apply a Flow Variable (advanced) The Flow Variable tab List of available Flow Variables 8 156

157 Create a Flow Variable (button) Name of the new Flow Variable 9 157

158 Create a Flow Variable (advanced) Converting a setting value into a Flow Variable Name of the new variable

159 Summary: Flow Variables Flow Variables are workflow parameters used to overwrite existing node settings. A Flow Variable is carried along workflow branches (parallel branches don t share local flow variables). Flow Variables can be of type String, Integer, or Double. Flow Variables can be created in the Flow Variables tab of any node, using the Table Row to Variable Node or using QuickForms

160 New Node: Sorter Sorts a table! Choice of ascending or descending Sort by multiple columns

161 Flow Variables Exercise: Activity I Starting with exercise: Flow Variables Create a workflow that filters the products to contain only rows for the Gold Investment product. Find the most common product type and automatically filter the table according to that value

162 Quickform Nodes for Variable Creation and Output

163 Quickform Node Configuration Use Quickforms to create Flow Variables Default value is selected (here P+B Investment)

164 Wrapped Metanodes Similar to Metanodes Differ in key areas: Limited variable scope (c.f. global scope for Metanodes) Use with Quick Form nodes (Analytics Platform 3.1+) Key to advanced functionality in KNIME products: Use for new WebPortal pages

165 Metanodes vs. Wrapped Metanodes Metanodes Quick Forms Legacy Standard Variable scope Global Local Wrapped Metanodes* WebPortal Execution Old New (work with loops/switches) JavaScript views in WebPortal Not supported Supported WebPortal Usage Quickforms used globally Views/Quickforms must be embedded in a Wrapped Metanode Recommended uses Legacy workflows New developments Compatibility KNIME Server 3.x/4.x KNIME Server 4.2+ * Valid for KNIME Analytics Platform 3.1 and above

166 Create a Wrapped Metanode Select nodes to wrap Right-click a node Choose Encapsulate into Wrapped Metanode

167 Simple configuration of Wrapped Metanode Right-click a Wrapped Metanode to configure Use in WebPortal

168 Configure Wrapped Metanode Ports Add ports to metanodes or remove them to adapt to changes after creation of metanode

169 Passing Variables from Wrapped Metanodes Flow variables by default only available locally within wrapped metanode If needed, they can be passed to context outside metanode

170 Workflows with Reports in the KNIME WebPortal Step 1 Select workflow to run, configure notification, and report formats Step 2 Run settings and set of workflow parameters Last Step See results and report Optionally: Embed Interactive JavaScript visualizations Available in KNIME Server

171 Flow Variables Exercise: Activity II Starting with exercise: Flow Variables Create a Wrapped Metanode that allows users to filter records by choosing a value from the Products column. 171

172 Date/Time Data 1 172

173 Date & Time Overview Dedicated data type for date and time data Supported in Date&Time nodes (and others: GroupBy, Pivot, Line Plot) Complete re-write in KNIME

174 New Node: String to Date&Time Convert date/time data from string into a native Date&time cell Automaticall guesses correct format for many different types of date formatting Enter format manually if autoguessing didn t work KNIME automatically adds custom formats to auto-guess list Convert multiple columns of same date format in one node Select columns to transform Enter date format manually Select type of output column Click to auto-guess format 3 174

175 Date&Time Data Types Date Time Date & Time Date & Time + Time zone 175

176 New Node: Date&Time-based Row Filter Filter rows from a specified time period Range can be limited on upper bound, lower bound or both Options for end point: Date&Time: Fixed data and time Duration: Duration string (e.g. 2y 3M) Numerical: Select granularity from dropdown and enter number 5 176

177 New Node: String to Duration Takes a string and converts it to a duration cell Three different options to format input strings Example: Convert 1 year, 2 months, 3 weeks, and 4 days to duration cell ISO-8601: P1Y2M3W4D Short letter: 1y 2M 3w 4d Long word: 1 year 2 months 3 weeks 4 days 6 177

178 Duration-based Filtering Date&Time-based Row Filter allows to extract time periods From the start date, select all rows within the defined period Hours, days, weeks, months, etc. 1 Month 7 178

179 New Node: Date&Time Difference Choose desired resolution (days, hours, minutes, etc.) Check the difference between a time column and Another time column Execution time User defined time Time from previous row To calculate difference to second column, both columns need to have the same type! 8 179

180 New Node: Date&Time Shift Shifts date or time by either a Duration or a numerical value Use Duration: Use duration column Or shift by user-defined value E.g. 1y, 2M, 5h, etc. Use Numerical in combination with user-defined granularity Use numerical column Or shift by user-defined value Select granularity 180

181 New Nodes: Modify Time / Modify Date Modify Date&Time columns Three options: Append time (date) to to date (time) column Change time (date) to fixed value Remove time (date) from Date&Time column Column selection shows only columns suitable for currently selected option 181

182 Modify Time / Modify Date: Options Modify Time: Available options Modify Date: Available options Input: Date Input: Date Append time N/A Input: Time Input: Time N/A Append date Input: Date&Time Input: Date&Time Change time; Remove time Change date; Remove date Input: Date&Time (Time zone) Input: Date&Time (Time zone) Change time; Remove time Change date; Remove date 182

183 New Node: Modify Time Zone Similar to Modify Time/Modify Date Input: Date&Time Set time zone Input: Date&Time (Time zone) Set time zone Shift time zone Remove time zone Select time zone from dropdown list 183

184 Date and Time Analysis Exercise, Activity I Starting with exercise: Date and Time Analysis, Activity I Read the file: meter_data.csv Use String Manipulation to construct a date/timestamp column from the individual date and time columns Convert the date/time stamp column to a KNIME timestamp with the String to Date&Time node Extract entries from January 2007 using the Date&Time-based Row Filter

185 New Node: Extract Date&Time Fields Extract date fields (year, day, month) or time fields (hour, minute, second) from a date&time cell. Pick and choose which fields to include Useful when used in combination with data aggregation nodes (groupby, pivot etc.)

186 New Node: Moving Average Effectively a smoothing node Smoothing defined by a window type (centered, forward or backward) and weighted or not Useful when plotting aggregated time-series data to more easily see trends

187 New Node: Moving Aggregation Blend of GroupBy + Moving Average Functionality Group by moving window Aggregate using standard KNIME methods

188 New Node: JavaScript Line Plot Line plot with support for Date columns

189 Date and Time Analysis Exercise, Activity II Starting with exercise: Date and Time Analysis, Activity II Read the file: sampled_meter_data.table Use Extract Date&Time Fields to pull out the Year and Day of Year and Hour values from the timestamp Use GroupBy to aggregate the data by year, day and hour. Group by these values and calculate mean time and intensity Calculate a Gaussian centered moving average for the Intensity column and plot this data in a line plot Calculate the Maximum of the intensity column for the preceding day (1440 minutes) Optional: Plot the raw intensity, moving average, and moving maximum in a Line Chart

190 Workflow Control Loops, switches, try-catch 1 190

191 Workflow Control Structures Loops Iterate over a workflow snippet with variable inputs. Switches Direct the path of a workflow by selectively executing one or more workflow branches. Try-Catch Handle workflow branches that may fail in execution and you don t know before execution 2 191

192 The Loop Block A loop block is defined by appropriate loop start and loop end nodes. Loop body = Nodes in between and side branches. Loop body Loop start node Loop end node 3 192

193 New Node: Group Loop Start Similar to GroupBy except without aggregation tab. Each iteration of the loop passes the next group of rows. You implement the aggregation task. It can be anything from a complex calculation to updating a database

194 New Node: Create File Name Inputs: Directory, base file name, file Extension Output: Flow variable -> use as input for e.g. writer node 5 194

195 Example: Writing aggregated files Group Loop Start Variable Loop End Group data by specific column values Iterate over all groups of data Create an appropriate file name Write grouped data to tables with new file name 6 195

196 Workflow Control Exercise, Activity I Starting with exercise: Workflow Control, Activity I Read the file: CurrentDetailData.table Group over all of the values in the Products column For each group of data, write a new KNIME table to your KNIME Workspace. Give it an appropriate filename. (Hint: Group Loop Start creates a flow variable naming the current group) 7 196

197 New Node: List Files List all files in a directory Restrict to: top level directory (i.e. not recursive), specific file extensions matching name patterns (regex or wildcard) Provides file references as a table of URLs and absolute paths 8 197

198 New Node: TableRow to Variable Loop Start Similar to TableRow to Variable node. Each iteration of the loop converts the next row of the input table into flow variables. Inject variables into other nodes (often file readers) to reexecute subflows with a progression of settings 9 198

199 Example: Reading Many Files List all files in a directory Convert each file to a flow variable (1 per iteration) In each iteration, read the file, collect the results

200 Workflow Control Exercise, Activity II Starting with exercise: Workflow Control, Activity II Use List files to find the file names of the files created in exercise 2 Iterate over that list of files, and read them into a single KNIME Table

201 Switches A switch allows you to selectively activate branches of a workflow Inactive branches are marked with a red x on their output ports. Inactive nodes propagate down stream. Inactive Active

202 New Node: Rule Engine/Rule Engine Variable Define custom logic to using simple rules. Rules like: <Antecedent> => <Consequence> (1=1 => true ) May be used in flow variables or tables Easiest way to encode logic for switches

203 New Node: If Switch Control which branches of your workflow are active programatically. Controlled with a flow variable, setting the value to the literal strings: top, bottom, both May be used in flow variables or tables (different nodes)

204 Workflow Control Exercise, Activity III Starting with exercise: Workflow Control, Activity III Read the data file: CurrentDetailData.table Use a Single Input Quickform to let a user choose the values scatter or bar Use a Rule Engine Variable node to convert that into a format readable by an IF Switch ( top or bottom ) Use an IF Switch to create either a scatter plot or a bar plot depending on the input

205 Try-catch A way to catch errors in workflows. Useful when it is hard to know if a node will execute (for example, when connecting to a web service). KNIME tries to execute the nodes, but if it fails will fall back to an alternate branch

206 Streaming Standard execution: Node by node. Node processes all data, finishes, then passes data to next node, etc. Streaming: Nodes executed concurrently, each nodes passes data to the next as soon as it is available, i.e. before node is fully executed Faster execution, esp. for reading/preprocessing data Create wrapped metanode -> Configure -> Job Manager Selection -> Simple Streaming Not available for all nodes (show in node repository) Can only execute entire metanode, not individual nodes Intermediate results not available since nothing is cached

207 Streaming

208 Advanced Data Mining Random Forest, Tree Ensembles, Parameter Optimization, Cross Validation 1 208

209 New Nodes: Random Forest Ensemble learning method for classification and regression tasks It consists of a chosen number of decision trees Each of the decision tree models is learned on a different set of rows (records) and a different set of columns (describing attributes) 2 209

210 KNIME s Tree Ensemble models The general idea is to take advantage of the wisdom of the crowd : combining predictions from a large number of weak predictors leads to a more accurate predictor. This is called bagging. X P1 P2 Pn y Typically: for classification the individual models vote and the majority wins; for regression, the individual predictions are averaged 3 210

211 How does bagging work? Pick a different random subset of the training data for each model in the ensemble (bag). Build tree Build tree Build tree

212 An extra benefit of bagging: out of bag estimation Allows testing the model using the training data: when validating, each model should only vote on data points that were not used to train it X 1 X P1 P2 Pn P1 P2 Pn y 1 OOB y 2 OOB 5 212

213 Random Forests Bags of decision trees, but an extra element of randomization is applied when building the trees: each node in the decision tree only sees a subset of the input columns, typically N. Random forests tend to be very robust w.r.t. overfitting (though the individual trees are almost certainly overfit) Extra benefit: training tends to be much faster Build tree

214 New Nodes: Random Forest The output model describes a random forest and is applied in the corresponding predictor node using a simple majority vote. Tree Ensemble Learner node provides more functionalities 7 214

215 Gradient Boosting Another algorithm for creating ensembles of decision trees Starts with a tree built on a subset of the data Builds additional trees to fit the residual errors Typically uses fairly shallow trees Can introduce randomness in choice of data subsets ( stochastic gradient boosting ) and in variable choice

216 Out-of-bag vs. leave out model scoring Results from predicting quality of white wine 9 216

217 Tree Ensembles Random Forest variant More options to set Trees may be trained using subsets of rows and/or columns and this approach may lead to greater accuracy

218 Tree Ensembles Optimization of a tree ensemble is complex due to a surplus of configuration options (number of models, number of columns, number of rows, tree depth etc.)

219 New Nodes: Tree Ensemble Learner/Predictor Choose which columns to include Configure a prototype tree (depth, split criteria etc.) Setup ensemble parameters (model count, row/column subsampling)

220 Advanced Data Mining Exercise, Activity I Starting with exercise: Advanced Data Mining, Activity I Read the data file CurrentDetailData.table Partition the data 50/50 using stratified sampling on the Products column Create a Tree Ensemble model to predict the Products column Use a tree depth of 5, 50 models, and 75% of rows and columns for each iteration. Calculate the overall accuracy of the model and filter that value into a flow variable

221 Parameter Optimization Some modeling approaches are very sensitive to their configuration. Calculating optimum settings is not always possible. Parameter Optimization loops may help find a good configuration

222 New Node: Parameter Optimization Loop Start Define some parameters to optimize Set upper/lower bounds and step sizes (and flag integers) Choose an optimization method Brute force for maximum accuracy but slower computation Hill climbing for better faster runtimes but may get stuck in local optimum settings

223 New Node: Parameter Optimization Loop End Collects some value to optimize in a flow variable. Value may be maximized (accuracy) or minimized (error)

224 Advanced Data Mining Exercise, Activity II Starting with exercise: Advanced Data Mining, Activity II Add a parameter optimization loop to your Tree Ensemble Model Use Hill climbing to determine the optimum number of models (min=10,max=200,step=10, int = yes) Maximize the accuracy in the loop and node. What were the optimal settings? (Hint: don t forget to use the flow variable in your learner)

225 Cross Validation Used to evaluate model stability. Re-execute the modeling process many times using different data partitions. Collect aggregated statistics on model accuracy

226 Example: Cross Validation X-Partitioner X-Aggregator X-Partitioner replaces Partition X-Aggregator replaces Scorer Can be used with any learner/predictor

227 Advanced Data Mining Exercise, Activity III Starting with exercise: Advanced Data Mining, Activity III Create a 10-fold cross validation for your Tree Ensemble Learner. Calculate the mean error for the cross validation. Does the model seem stable?

228 Model Selection Compare and deploy 1 228

229 Model Selection We can build many models, but which should we use? Probably the best one Most accurate?, Best ROC?, Other criteria? In this section we will: Bring our models and performance indicators into a single table Sort that table to choose the best model Extract the best model and use it to make predictions 2 229

230 New Nodes: PMML to Cell/Cell to PMML Insert PMML models to or extract them from a KNIME table cell

231 New Node: Constant Value Column Insert a constant value as a new column to the table 4 231

232 New Node: PMML Predictor Score a table using a standard PMML model No configuration required 5 232

233 Model Selection Exercise Starting with exercise: Model Selection Train a decision tree on all the data. Partition and score the data as in the logistic regression example Join this model to the accuracy from the scorer. Label your model and concatenate this and other models together. Convert the best scoring model back to a model port. Predict new results for CurrentDetailData.table 6 233

234 Supplemental Workflows 1 234

235 Webportal Enabled Data Mining Builds and scores model on uploaded data Quickform nodes for stepwise configuration of workflow 1. Upload data file 2. Select target and features 3. Specify training / test fraction 4. Create downloadable PMML file 5. Report generation 2 235

236 WebPortal Enabled Data Mining 3 236

237 Webportal Enabled Data Mining Execution on Server via WebPortal

238 Webportal Enabled Data Mining Execution on Server via WebPortal

239 Webportal Enabled Data Mining Execution on Server via WebPortal

240 Webportal Enabled Data Mining Execution on Server via WebPortal

241 Webportal Enabled Data Mining Execution on Server via WebPortal

242 Webportal Enabled Data Mining Execution on Server via WebPortal

243 RESTful Geolocation Try Catch Block REST call to get lat/long for IPs

244 RESTful Geolocation Translates IPs to geo coordinates via RESTful service GET Resource: access RESTful API via GET IP to geo coordinates (lat/lon) Read REST Representation: parse REST result JSON, XML, CSV, Try Catch nodes to log errors gracefully

245 Geographic Analysis

246 Geographic Analysis Reads IPs from download weblog and related geo-coordinates Aggregates downloads by city, country, and US states OSM Map View to visualize geo-coordinates OSM Map to Image to create image of map view

247 Text and Network Mining KNIME Forums Create forum user interaction network Extract important terms, topics from forum sections

248 Text and Network Mining KNIME Forums

249 Text and Network Mining KNIME Forums Crawls KNIME forums and creates report Text Mining: tagging of users, topics, nodes Text Mining: tag cloud creation Network Mining: creation of user interaction networks Network Mining: visualization of user networks

250 Open Session

251 The End

KNIME Analytics Platform Course for Beginners

KNIME Analytics Platform Course for Beginners KNIME Analytics Platform Course for Beginners KNIME AG Overview KNIME Analytics Platform 1 2 What is KNIME Analytics Platform? A tool for data analysis, manipulation, visualization, and reporting Based

More information

Installation KNIME AG. All rights reserved. 1

Installation KNIME AG. All rights reserved. 1 Installation 1. Install KNIME Analytics Platform (from thumb drive) 2. Help > Install New Software > Add (> Archive): 00_InstallationFiles/CommunityContributions_trunk.zip https://update.knime.org/community-contributions/trunk

More information

KNIME Big Data Training

KNIME Big Data Training KNIME Big Data Training education@knime.com Overview KNIME Analytics Platform 1 2 What is KNIME Analytics Platform? A tool for data analysis, manipulation, visualization, and reporting Based on the graphical

More information

Text Mining Course for KNIME Analytics Platform

Text Mining Course for KNIME Analytics Platform Text Mining Course for KNIME Analytics Platform KNIME AG Table of Contents 1. The Open Analytics Platform 2. The Text Processing Extension 3. Importing Text 4. Enrichment 5. Preprocessing 6. Transformation

More information

KNIME for the life sciences Cambridge Meetup

KNIME for the life sciences Cambridge Meetup KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016 What is KNIME? A bit of motivation: tool blending, data blending, documentation, automation, reproducibility More

More information

Copyright 2018 by KNIME Press

Copyright 2018 by KNIME Press 2 Copyright 2018 by KNIME Press All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval

More information

What is KNIME? workflows nodes standard data mining, data analysis data manipulation

What is KNIME? workflows nodes standard data mining, data analysis data manipulation KNIME TUTORIAL What is KNIME? KNIME = Konstanz Information Miner Developed at University of Konstanz in Germany Desktop version available free of charge (Open Source) Modular platform for building and

More information

The Top 10 New Features in KNIME 2.8. Rosaria Silipo KNIME.com AG, San Francisco

The Top 10 New Features in KNIME 2.8. Rosaria Silipo KNIME.com AG, San Francisco The Top 10 New Features in KNIME 2.8 Rosaria Silipo KNIME.com AG, San Francisco KNIME 2.8 KNIME 2.8 was out end of July 2013 Many New Features Documentation available at: http://tech.knime.org/whats-new-in-knime-28

More information

KNIME What s new?! Bernd Wiswedel KNIME.com AG, Zurich, Switzerland

KNIME What s new?! Bernd Wiswedel KNIME.com AG, Zurich, Switzerland KNIME What s new?! Bernd Wiswedel KNIME.com AG, Zurich, Switzerland Data Access ASCII (File/CSV Reader, ) Excel Web Services Remote Files (http, ftp, ) Other domain standards (e.g. Sdf) Databases Data

More information

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Data Understanding Exercise: Market Basket Analysis Exercise:

More information

Intellicus Enterprise Reporting and BI Platform

Intellicus Enterprise Reporting and BI Platform Designing Adhoc Reports Intellicus Enterprise Reporting and BI Platform Intellicus Technologies info@intellicus.com www.intellicus.com Designing Adhoc Reports i Copyright 2012 Intellicus Technologies This

More information

Style Report Enterprise Edition

Style Report Enterprise Edition INTRODUCTION Style Report Enterprise Edition Welcome to Style Report Enterprise Edition! Style Report is a report design and interactive analysis package that allows you to explore, analyze, monitor, report,

More information

Business Insight Authoring

Business Insight Authoring Business Insight Authoring Getting Started Guide ImageNow Version: 6.7.x Written by: Product Documentation, R&D Date: August 2016 2014 Perceptive Software. All rights reserved CaptureNow, ImageNow, Interact,

More information

1 Topic. Image classification using Knime.

1 Topic. Image classification using Knime. 1 Topic Image classification using Knime. The aim of image mining is to extract valuable knowledge from image data. In the context of supervised image classification, we want to assign automatically a

More information

Rosaria Silipo, Michael P. Mazanetz. The KNIME Cookbook Recipes for the Advanced User

Rosaria Silipo, Michael P. Mazanetz. The KNIME Cookbook Recipes for the Advanced User Rosaria Silipo, Michael P. Mazanetz The KNIME Cookbook Recipes for the Advanced User 1 Copyright 2012 by KNIME Press All rights reserved. This publication is protected by copyright, and permission must

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

Business Intelligence and Reporting Tools

Business Intelligence and Reporting Tools Business Intelligence and Reporting Tools Release 1.0 Requirements Document Version 1.0 November 8, 2004 Contents Eclipse Business Intelligence and Reporting Tools Project Requirements...2 Project Overview...2

More information

As a reference, please find a version of the Machine Learning Process described in the diagram below.

As a reference, please find a version of the Machine Learning Process described in the diagram below. PREDICTION OVERVIEW In this experiment, two of the Project PEACH datasets will be used to predict the reaction of a user to atmospheric factors. This experiment represents the first iteration of the Machine

More information

SAS Visual Analytics 8.2: Getting Started with Reports

SAS Visual Analytics 8.2: Getting Started with Reports SAS Visual Analytics 8.2: Getting Started with Reports Introduction Reporting The SAS Visual Analytics tools give you everything you need to produce and distribute clear and compelling reports. SAS Visual

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Tutorial on Machine Learning Tools

Tutorial on Machine Learning Tools Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow

More information

EZY Intellect Pte. Ltd., #1 Changi North Street 1, Singapore

EZY Intellect Pte. Ltd., #1 Changi North Street 1, Singapore Tableau in Business Intelligence Duration: 6 Days Tableau Desktop Tableau Introduction Tableau Introduction. Overview of Tableau workbook, worksheets. Dimension & Measures Discrete and Continuous Install

More information

Sitecore Experience Platform 8.0 Rev: September 13, Sitecore Experience Platform 8.0

Sitecore Experience Platform 8.0 Rev: September 13, Sitecore Experience Platform 8.0 Sitecore Experience Platform 8.0 Rev: September 13, 2018 Sitecore Experience Platform 8.0 All the official Sitecore documentation. Page 1 of 455 Experience Analytics glossary This topic contains a glossary

More information

ACHIEVEMENTS FROM TRAINING

ACHIEVEMENTS FROM TRAINING LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM

More information

ACTIVE Net Insights user guide. (v5.4)

ACTIVE Net Insights user guide. (v5.4) ACTIVE Net Insights user guide (v5.4) Version Date 5.4 January 23, 2018 5.3 November 28, 2017 5.2 October 24, 2017 5.1 September 26, 2017 ACTIVE Network, LLC 2017 Active Network, LLC, and/or its affiliates

More information

Tableau Training Content

Tableau Training Content TABLEAU DESKTOP INTRODUCTION AND GETTING STARTED Tableau desktop role in the tableau product line Application terminology View terminology Data terminology Visual cues for fields BEST PRACTICES IN CONNECTING

More information

GOBENCH IQ Release v

GOBENCH IQ Release v GOBENCH IQ Release v1.2.3.3 2018-06-11 New Add-Ons / Features / Enhancements in GOBENCH IQ v1.2.3.3 GOBENCH IQ v1.2.3.3 contains several new features and enhancements ** New version of the comparison Excel

More information

Switching to Sheets from Microsoft Excel Learning Center gsuite.google.com/learning-center

Switching to Sheets from Microsoft Excel Learning Center gsuite.google.com/learning-center Switching to Sheets from Microsoft Excel 2010 Learning Center gsuite.google.com/learning-center Welcome to Sheets Now that you've switched from Microsoft Excel to G Suite, learn how to use Google Sheets

More information

Release notes SPSS Statistics 20.0 FP1 Abstract Number Description

Release notes SPSS Statistics 20.0 FP1 Abstract Number Description Release notes SPSS Statistics 20.0 FP1 Abstract This is a comprehensive list of defect corrections for the SPSS Statistics 20.0 Fix Pack 1. Details of the fixes are listed below under the tab for the respective

More information

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core) Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data

More information

Tutorial Case studies

Tutorial Case studies 1 Topic Wrapper for feature subset selection Continuation. This tutorial is the continuation of the preceding one about the wrapper feature selection in the supervised learning context (http://data-mining-tutorials.blogspot.com/2010/03/wrapper-forfeature-selection.html).

More information

SAS Web Report Studio 3.1

SAS Web Report Studio 3.1 SAS Web Report Studio 3.1 User s Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS Web Report Studio 3.1: User s Guide. Cary, NC: SAS

More information

Prototyping Data Intensive Apps: TrendingTopics.org

Prototyping Data Intensive Apps: TrendingTopics.org Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page

More information

How to choose the right approach to analytics and reporting

How to choose the right approach to analytics and reporting SOLUTION OVERVIEW How to choose the right approach to analytics and reporting A comprehensive comparison of the open source and commercial versions of the OpenText Analytics Suite In today s digital world,

More information

Datameer for Data Preparation:

Datameer for Data Preparation: Datameer for Data Preparation: Explore, Profile, Blend, Cleanse, Enrich, Share, Operationalize DATAMEER FOR DATA PREPARATION: EXPLORE, PROFILE, BLEND, CLEANSE, ENRICH, SHARE, OPERATIONALIZE Datameer Datameer

More information

Intellicus Enterprise Reporting and BI Platform

Intellicus Enterprise Reporting and BI Platform Working with Query Objects Intellicus Enterprise Reporting and BI Platform ` Intellicus Technologies info@intellicus.com www.intellicus.com Working with Query Objects i Copyright 2012 Intellicus Technologies

More information

Data Explorer: User Guide 1. Data Explorer User Guide

Data Explorer: User Guide 1. Data Explorer User Guide Data Explorer: User Guide 1 Data Explorer User Guide Data Explorer: User Guide 2 Contents About this User Guide.. 4 System Requirements. 4 Browser Requirements... 4 Important Terminology.. 5 Getting Started

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:

More information

DATA 301 Introduction to Data Analytics Visualization. Dr. Ramon Lawrence University of British Columbia Okanagan

DATA 301 Introduction to Data Analytics Visualization. Dr. Ramon Lawrence University of British Columbia Okanagan DATA 301 Introduction to Data Analytics Visualization Dr. Ramon Lawrence University of British Columbia Okanagan ramon.lawrence@ubc.ca DATA 301: Data Analytics (2) Why learn Visualization? Visualization

More information

Enterprise Miner Tutorial Notes 2 1

Enterprise Miner Tutorial Notes 2 1 Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender

More information

ZENworks Reporting System Reference. January 2017

ZENworks Reporting System Reference. January 2017 ZENworks Reporting System Reference January 2017 Legal Notices For information about legal notices, trademarks, disclaimers, warranties, export and other use restrictions, U.S. Government rights, patent

More information

Data Should Not be a Four Letter Word Microsoft Excel QUICK TOUR

Data Should Not be a Four Letter Word Microsoft Excel QUICK TOUR Toolbar Tour AutoSum + more functions Chart Wizard Currency, Percent, Comma Style Increase-Decrease Decimal Name Box Chart Wizard QUICK TOUR Name Box AutoSum Numeric Style Chart Wizard Formula Bar Active

More information

User Guide. Kronodoc Kronodoc Oy. Intelligent methods for process improvement and project execution

User Guide. Kronodoc Kronodoc Oy. Intelligent methods for process improvement and project execution User Guide Kronodoc 3.0 Intelligent methods for process improvement and project execution 2003 Kronodoc Oy 2 Table of Contents 1 User Guide 5 2 Information Structure in Kronodoc 6 3 Entering and Exiting

More information

Introduction to BEST Viewpoints

Introduction to BEST Viewpoints Introduction to BEST Viewpoints This is not all but just one of the documentation files included in BEST Viewpoints. Introduction BEST Viewpoints is a user friendly data manipulation and analysis application

More information

What s New In Sawmill 8 Why Should I Upgrade To Sawmill 8?

What s New In Sawmill 8 Why Should I Upgrade To Sawmill 8? What s New In Sawmill 8 Why Should I Upgrade To Sawmill 8? Sawmill 8 is a major new version of Sawmill, the result of several years of development. Nearly every aspect of Sawmill has been enhanced, and

More information

BEAWebLogic. Portal. Overview

BEAWebLogic. Portal. Overview BEAWebLogic Portal Overview Version 10.2 Revised: February 2008 Contents About the BEA WebLogic Portal Documentation Introduction to WebLogic Portal Portal Concepts.........................................................2-2

More information

Data Science Essentials

Data Science Essentials Data Science Essentials Lab 6 Introduction to Machine Learning Overview In this lab, you will use Azure Machine Learning to train, evaluate, and publish a classification model, a regression model, and

More information

alteryx training courses

alteryx training courses alteryx training courses alteryx designer 2 day course This course covers Alteryx Designer for new and intermediate Alteryx users. It introduces the User Interface and works through core Alteryx capability,

More information

SAS Visual Analytics 8.2: Working with Report Content

SAS Visual Analytics 8.2: Working with Report Content SAS Visual Analytics 8.2: Working with Report Content About Objects After selecting your data source and data items, add one or more objects to display the results. SAS Visual Analytics provides objects

More information

Designing Ad hoc Report. Version: 7.3

Designing Ad hoc Report. Version: 7.3 Designing Ad hoc Report Version: 7.3 Copyright 2015 Intellicus Technologies This document and its content is copyrighted material of Intellicus Technologies. The content may not be copied or derived from,

More information

Imagine. Create. Discover. User Manual. TopLine Results Corporation

Imagine. Create. Discover. User Manual. TopLine Results Corporation Imagine. Create. Discover. User Manual TopLine Results Corporation 2008-2009 Created: Tuesday, March 17, 2009 Table of Contents 1 Welcome 1 Features 2 2 Installation 4 System Requirements 5 Obtaining Installation

More information

Applied Machine Learning

Applied Machine Learning Applied Machine Learning Lab 3 Working with Text Data Overview In this lab, you will use R or Python to work with text data. Specifically, you will use code to clean text, remove stop words, and apply

More information

Export out report results in multiple formats like PDF, Excel, Print, , etc.

Export out report results in multiple formats like PDF, Excel, Print,  , etc. Edition Comparison DOCSVAULT Docsvault is full of features that can help small businesses and large enterprises go paperless. The feature matrix below displays Docsvault s abilities for its Enterprise

More information

Edition 3.2. Tripolis Solutions Dialogue Manual version 3.2 2

Edition 3.2. Tripolis Solutions Dialogue Manual version 3.2 2 Edition 3.2 Tripolis Solutions Dialogue Manual version 3.2 2 Table of Content DIALOGUE SETUP... 7 Introduction... 8 Process flow... 9 USER SETTINGS... 10 Language, Name and Email address settings... 10

More information

Instructor : Dr. Sunnie Chung. Independent Study Spring Pentaho. 1 P a g e

Instructor : Dr. Sunnie Chung. Independent Study Spring Pentaho. 1 P a g e ABSTRACT Pentaho Business Analytics from different data source, Analytics from csv/sql,create Star Schema Fact & Dimension Tables, kettle transformation for big data integration, MongoDB kettle Transformation,

More information

User Guide. Copyright Wordfast, LLC All rights reserved.

User Guide. Copyright Wordfast, LLC All rights reserved. User Guide All rights reserved. Table of Contents About this Guide... 7 Conventions...7 Typographical... 7 Icons... 7 1 Release Notes Summary... 8 New Features and Improvements... 8 Fixed Issues... 8 2

More information

Spotfire: Brisbane Breakfast & Learn. Thursday, 9 November 2017

Spotfire: Brisbane Breakfast & Learn. Thursday, 9 November 2017 Spotfire: Brisbane Breakfast & Learn Thursday, 9 November 2017 CONFIDENTIALITY The following information is confidential information of TIBCO Software Inc. Use, duplication, transmission, or republication

More information

Talend Data Preparation Free Desktop. Getting Started Guide V2.1

Talend Data Preparation Free Desktop. Getting Started Guide V2.1 Talend Data Free Desktop Getting Guide V2.1 1 Talend Data Training Getting Guide To navigate to a specific location within this guide, click one of the boxes below. Overview of Data Access Data And Getting

More information

MicroStrategy Desktop

MicroStrategy Desktop MicroStrategy Desktop Quick Start Guide MicroStrategy Desktop is designed to enable business professionals like you to explore data, simply and without needing direct support from IT. 1 Import data from

More information

Quick Start Guide. Version R94. English

Quick Start Guide. Version R94. English Custom Reports Quick Start Guide Version R94 English December 12, 2016 Copyright Agreement The purchase and use of all Software and Services is subject to the Agreement as defined in Kaseya s Click-Accept

More information

Logi Info v12.5 WHAT S NEW

Logi Info v12.5 WHAT S NEW Logi Info v12.5 WHAT S NEW Introduction Logi empowers companies to embed analytics into the fabric of their organizations and products enabling anyone to analyze data, share insights, and make informed

More information

Integration Service. Admin Console User Guide. On-Premises

Integration Service. Admin Console User Guide. On-Premises Kony MobileFabric TM Integration Service Admin Console User Guide On-Premises Release 7.3 Document Relevance and Accuracy This document is considered relevant to the Release stated on this title page and

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

Designing Ad hoc Reports. Version: 16.0

Designing Ad hoc Reports. Version: 16.0 Designing Ad hoc Reports Version: 16.0 Copyright 2017 Intellicus Technologies This document and its content is copyrighted material of Intellicus Technologies. The content may not be copied or derived

More information

Web Analysis in 4 Easy Steps. Rosaria Silipo, Bernd Wiswedel and Tobias Kötter

Web Analysis in 4 Easy Steps. Rosaria Silipo, Bernd Wiswedel and Tobias Kötter Web Analysis in 4 Easy Steps Rosaria Silipo, Bernd Wiswedel and Tobias Kötter KNIME Forum Analysis KNIME Forum Analysis Steps: 1. Get data into KNIME 2. Extract simple statistics (how many posts, response

More information

The following topics describe how to work with reports in the Firepower System:

The following topics describe how to work with reports in the Firepower System: The following topics describe how to work with reports in the Firepower System: Introduction to Reports Introduction to Reports, on page 1 Risk Reports, on page 1 Standard Reports, on page 2 About Working

More information

KNIME Enalos+ Molecular Descriptor nodes

KNIME Enalos+ Molecular Descriptor nodes KNIME Enalos+ Molecular Descriptor nodes A Brief Tutorial Novamechanics Ltd Contact: info@novamechanics.com Version 1, June 2017 Table of Contents Introduction... 1 Step 1-Workbench overview... 1 Step

More information

MicroStrategy Academic Program

MicroStrategy Academic Program MicroStrategy Academic Program Creating a center of excellence for enterprise analytics and mobility. DATA PREPARATION: HOW TO WRANGLE, ENRICH, AND PROFILE DATA APPROXIMATE TIME NEEDED: 1 HOUR TABLE OF

More information

From Insight to Action: Analytics from Both Sides of the Brain. Vaz Balasingham Director of Solutions Consulting

From Insight to Action: Analytics from Both Sides of the Brain. Vaz Balasingham Director of Solutions Consulting From Insight to Action: Analytics from Both Sides of the Brain Vaz Balasingham Director of Solutions Consulting vbalasin@tibco.com Insight to Action from Both Sides of the Brain Value Grow Revenue Reduce

More information

Python Certification Training

Python Certification Training Introduction To Python Python Certification Training Goal : Give brief idea of what Python is and touch on basics. Define Python Know why Python is popular Setup Python environment Discuss flow control

More information

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing

More information

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS

More information

SmartView. User Guide - Analysis. Version 2.0

SmartView. User Guide - Analysis. Version 2.0 SmartView User Guide - Analysis Version 2.0 Table of Contents Page i Table of Contents Table Of Contents I Introduction 1 Dashboard Layouts 2 Dashboard Mode 2 Story Mode 3 Dashboard Controls 4 Dashboards

More information

User Guide. Copyright Wordfast, LLC All rights reserved.

User Guide. Copyright Wordfast, LLC All rights reserved. User Guide All rights reserved. Table of Contents Summary... 7 New Features and Improvements... 7 Fixed Issues... 8 1 About Pro... 9 Key Advantages... 9 2 Get Started... 10 Requirements... 10 Install and

More information

End-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved.

End-to-End data mining feature integration, transformation and selection with Datameer Datameer, Inc. All rights reserved. End-to-End data mining feature integration, transformation and selection with Datameer Fastest time to Insights Rapid Data Integration Zero coding data integration Wizard-led data integration & No ETL

More information

HarePoint Analytics. For SharePoint. User Manual

HarePoint Analytics. For SharePoint. User Manual HarePoint Analytics For SharePoint User Manual HarePoint Analytics for SharePoint 2013 product version: 15.5 HarePoint Analytics for SharePoint 2016 product version: 16.0 04/27/2017 2 Introduction HarePoint.Com

More information

Intermediate Microsoft Excel 2010

Intermediate Microsoft Excel 2010 P a g e 1 Intermediate Microsoft Excel 2010 ABOUT THIS CLASS This class is designed to continue where the Microsoft Excel 2010 Basics class left off. Specifically, we will cover additional ways to organize

More information

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course: DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business

More information

Excel Shortcuts Increasing YOUR Productivity

Excel Shortcuts Increasing YOUR Productivity Excel Shortcuts Increasing YOUR Productivity CompuHELP Division of Tommy Harrington Enterprises, Inc. tommy@tommyharrington.com https://www.facebook.com/tommyharringtonextremeexcel Excel Shortcuts Increasing

More information

SAS Data Explorer 2.1: User s Guide

SAS Data Explorer 2.1: User s Guide SAS Data Explorer 2.1: User s Guide Working with SAS Data Explorer Understanding SAS Data Explorer SAS Data Explorer and the Choose Data Window SAS Data Explorer enables you to copy data to memory on SAS

More information

Expedient User Manual Getting Started

Expedient User Manual Getting Started Volume 1 Expedient User Manual Getting Started Gavin Millman & Associates Pty Ltd 281 Buckley Street Essendon VIC 3040 Phone 03 9331 3944 Web www.expedientsoftware.com.au Table of Contents Logging In...

More information

Release notes for version 3.7.2

Release notes for version 3.7.2 Release notes for version 3.7.2 Important! Create a backup copy of your projects before updating to the new version. Projects saved in the new version can t be opened in versions earlier than 3.7. Breaking

More information

WINDEV 23 - WEBDEV 23 - WINDEV Mobile 23 Documentation version

WINDEV 23 - WEBDEV 23 - WINDEV Mobile 23 Documentation version WINDEV 23 - WEBDEV 23 - WINDEV Mobile 23 Documentation version 23-1 - 04-18 Summary Part 1 - Report editor 1. Introduction... 13 2. How to create a report... 23 3. Data sources of a report... 43 4. Describing

More information

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs 1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be

More information

Enterprise Miner Version 4.0. Changes and Enhancements

Enterprise Miner Version 4.0. Changes and Enhancements Enterprise Miner Version 4.0 Changes and Enhancements Table of Contents General Information.................................................................. 1 Upgrading Previous Version Enterprise Miner

More information

Teiid Designer User Guide 7.5.0

Teiid Designer User Guide 7.5.0 Teiid Designer User Guide 1 7.5.0 1. Introduction... 1 1.1. What is Teiid Designer?... 1 1.2. Why Use Teiid Designer?... 2 1.3. Metadata Overview... 2 1.3.1. What is Metadata... 2 1.3.2. Editing Metadata

More information

User Guide. Copyright Wordfast, LLC All rights reserved.

User Guide. Copyright Wordfast, LLC All rights reserved. User Guide All rights reserved. Table of Contents About this Guide... 7 Conventions...7 Typographical... 7 Icons... 7 1 Release Notes Summary... 8 New Features and Improvements... 8 Fixed Issues... 8 Known

More information

Abstract. For notes detailing the changes in each release, see the MySQL for Excel Release Notes. For legal information, see the Legal Notices.

Abstract. For notes detailing the changes in each release, see the MySQL for Excel Release Notes. For legal information, see the Legal Notices. MySQL for Excel Abstract This is the MySQL for Excel Reference Manual. It documents MySQL for Excel 1.3 through 1.3.7. Much of the documentation also applies to the previous 1.2 series. For notes detailing

More information

Spreadsheet definition: Starting a New Excel Worksheet: Navigating Through an Excel Worksheet

Spreadsheet definition: Starting a New Excel Worksheet: Navigating Through an Excel Worksheet Copyright 1 99 Spreadsheet definition: A spreadsheet stores and manipulates data that lends itself to being stored in a table type format (e.g. Accounts, Science Experiments, Mathematical Trends, Statistics,

More information

Release notes for version 3.9.2

Release notes for version 3.9.2 Release notes for version 3.9.2 What s new Overview Here is what we were focused on while developing version 3.9.2, and a few announcements: Continuing improving ETL capabilities of EasyMorph by adding

More information

Introduction to Data Science

Introduction to Data Science Introduction to Data Science Lab 4 Introduction to Machine Learning Overview In the previous labs, you explored a dataset containing details of lemonade sales. In this lab, you will use machine learning

More information

SPSS Statistics Patch Description

SPSS Statistics Patch Description SPSS Statistics 18.0.1 Patch Description Product: SPSS Statistics 18.0.1 Date: December 8, 2009 Description: This patch resolves the following issues: 1. A problem with reading large SAS data files was

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

SAS Factory Miner 14.2: User s Guide

SAS Factory Miner 14.2: User s Guide SAS Factory Miner 14.2: User s Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. SAS Factory Miner 14.2: User s Guide. Cary, NC: SAS Institute

More information

Manipulating Database Objects

Manipulating Database Objects Manipulating Database Objects Purpose This tutorial shows you how to manipulate database objects using Oracle Application Express. Time to Complete Approximately 30 minutes. Topics This tutorial covers

More information

Embarcadero PowerSQL 1.1 Evaluation Guide. Published: July 14, 2008

Embarcadero PowerSQL 1.1 Evaluation Guide. Published: July 14, 2008 Embarcadero PowerSQL 1.1 Evaluation Guide Published: July 14, 2008 Contents INTRODUCTION TO POWERSQL... 3 Product Benefits... 3 Product Benefits... 3 Product Benefits... 3 ABOUT THIS EVALUATION GUIDE...

More information

Tackling Big Data Using MATLAB

Tackling Big Data Using MATLAB Tackling Big Data Using MATLAB Alka Nair Application Engineer 2015 The MathWorks, Inc. 1 Building Machine Learning Models with Big Data Access Preprocess, Exploration & Model Development Scale up & Integrate

More information

TUTORIAL. HCS- Tools + Scripting Integrations

TUTORIAL. HCS- Tools + Scripting Integrations TUTORIAL HCS- Tools + Scripting Integrations HCS- Tools... 3 Setup... 3 Task and Data... 4 1) Data Input Opera Reader... 7 2) Meta data integration Expand barcode... 8 3) Meta data integration Join Layout...

More information

Visual Workflow Implementation Guide

Visual Workflow Implementation Guide Version 30.0: Spring 14 Visual Workflow Implementation Guide Note: Any unreleased services or features referenced in this or other press releases or public statements are not currently available and may

More information

Enterprise Data Catalog for Microsoft Azure Tutorial

Enterprise Data Catalog for Microsoft Azure Tutorial Enterprise Data Catalog for Microsoft Azure Tutorial VERSION 10.2 JANUARY 2018 Page 1 of 45 Contents Tutorial Objectives... 4 Enterprise Data Catalog Overview... 5 Overview... 5 Objectives... 5 Enterprise

More information