Community edition(open-source) Enterprise edition

Suseela Bhaskaruni

Rapid Miner is an environment for machine learning and data mining experiments. Widely used for both research and real-world data mining tasks. Software versions: Community edition(open-source) Enterprise edition

Developed in Java Knowledge discovery processes are modelled as operator trees. Scripting language allows for automating large-scale experiments GUI, command-line mode and Java API Visualization schemes for data and models

Text mining Multimedia mining Feature engineering Data stream mining Development of ensemble methods Distributed mining Tracking drifting concepts

Can read data files, read& write models, parameter sets and attribute sets.

RapidMiner offers a number of learning techniques to implement the following : -support vector machines(svm) -decision tree -rule learners, -Bayesian learners -Logistic learners. -Association rule mining and clustering -meta learning schemes including Bayesian Boosting.

1. Process Designing Canvas: Here you design mining processes of arbitrary complexity using building blocks provided in panel #2. Note how the building blocks are pipelined to indicate the dataflow between components. 2. Operators & Repositories: The Operators panel contains hundreds of building blocks organized in categories. There exist components for pretty much everything (data transformations, modeling, evaluation, etc.)! The Repositories panel provides access to sample and user defined datasets and processes. 3. Component Metadata: provides access to the parameters (metadata) of the selected block in the design canvas. In Figure 2 you can see the parameters of the Decision Tree operator located in the middle on the Designing Canvas. 4. Help: provides documentation for the selected block in the Designing Canvas (1) or the selected component in the Operators panel (2). The information provided is always up-to-date as the content is retrieved from the RapidWiki (the on-line documentation of RapidMiner). 5. Reporting Area: The Log panel gives feedback on the steps taking place whereas the Problems panel explains what is going wrong (if any) and suggests solutions. 6. Overview: You can see an overview of the Designing Canvas (1) and can easily navigate to subareas of a huge/complex process.

Ø A mathematical entity or an algorithm that analyses the data and help us in discovering the pattern Ø SVM provides learning technique for Pattern Recognition Regression Estimation Ø Solutions Provided Theoretically Elegant Computationally Efficient Very Effective in many large practical problems

Face detection Object Recognition Handwritten Character/digit recognition Speaker/Speech recognition Image Retrieval Prediction Data Condensation

Numerical prediction Only one independent variable, x Relationship between x and y is described by a linear function Changes in y are assumed to be caused by changes in x Generally given by the equation: y = b0 + b1x1 + b2x2 +... + bpxp+ ε.

Positive Linear Relationship Relationship NOT Linear Negative Linear Relationship No Relationship

Error values (ε) are statistically independent Error values are normally distributed for any given value of x The probability distribution of the errors is normal The probability distribution of the errors has constant variance The underlying relationship between the x variable and the y variable is linear

An inductive learning task Use particular facts to make more generalized conclusions A predictive model based on a branching series of Boolean tests These smaller Boolean tests are less complex than a onestage classifier. Useful for Classification Prediction Fitting data

http://rapid-i.com/content/view/181/190/lang,en/ http://auburnbigdata.blogspot.com/2013/03/linearregression-in-rapidminer.html http://www.slideshare.net/rapidminercontent/rapidminerdata-mining-and-rapid-miner-3667259 http://www.tableausoftware.com/public/community/sampledata-sets http://www.youtube.com/watch?v=27rqrur7ubc