Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1
Prepare the Data Statistica can read from Excel,.txt and many other types of files Compared with WEKA, Statistica is much easier in terms of data preparing Open an Excel File Click the Import selected sheet to Spreadsheet Select the desired Excel sheet where your data is stored Get variable names from the first row 2
Open an Excel File Change variable type Open an Excel File Change variable type 3
Classification and Regression C&RT Boosting tree Neural Networks C&RT Classification Iris data is used as a example data set 4
C&RT Classification Click Data Mining menu and find the Interactive Trees C&RT Classification View the final tree and understand the results 5
C&RT---Regression Use the CPU data set and select the regression analysis Regression tree structure C&RT---Regression 6
C&RT---Regression Pr redicted values Boosting tree Classification In Data Mining menu and find the Boosted tree classifier and regression 7
Boosting tree Classification See the results and predictor s importance Boosting tree Classification See the results and predictor s importance 8
CPU data set Boosting tree Regression Boosting tree Classification See the results and predictor s importance Pr redicted values 9
Boosting tree Classification See the results and predictor s importance Boosting tree Classification See the results and predictor s importance 10
Neural Networks Classification In Data Mining menu and find the Automated Neural Networks Neural Networks Classification Choose Classification, then select variables 11
Neural Networks Classification Statistica will try a set of different neural networks and keep the best ones Neural Networks Classification See the classification results 12
Neural Networks Classification See the classification results---predictions Neural Networks Classification See the classification results---predictions 13
Neural Networks Classification See the classification results---confusion matrix Neural Networks Regression CPU data set 14
Neural Networks Regression CPU data set, select variables Neural Networks Regression Training and results 15
Neural Networks Regression Predictions Neural Networks Regression Some statistics about the predictions 16
Clustering Use the Deere data set Clustering Select k-means and choose the variables 17
Clustering Choose the distance metrics and initial cluster centers 5 clusters and see the results Clustering 18
Centroids (cluster means) Clustering Clustering Members and their distance to the centroids 19
Use the Deere data set Association rules Association rules Select variables and set up proper parameters 20
Association rules See rules Graphic User Interface Divide CPU data into training and testing data set 21
Graphic User Interface Graphic User Interface Choose different algorithms 22
Graphic User Interface Insert the selected data mining algorithms into workspace Graphic User Interface Select data sources 23
Graphic User Interface Specify whether the data is used to build the model or used as a testing set Graphic User Interface Connect the data with data mining algorithms 24
Graphic User Interface Connect the data with data mining algorithms Graphic User Interface Set up deployment, double click the data mining algorithm icon 25
Graphic User Interface Click Run button Graphic User Interface See the deployment code by double click the icons in Reports section C code 26
Graphic User Interface Test the learnt models by testing data set First disable the connections between training i data set and dthe data mining i algorithms Connect the testing data set with the data mining algorithms Graphic User Interface Test the learnt models 27
Graphic User Interface See the prediction results 28