DATA MINING TEAM #1. Kristen Durst Mark Gillespie Banan Mandura. MBA 664: Database Management

Size: px
Start display at page:

Download "DATA MINING TEAM #1. Kristen Durst Mark Gillespie Banan Mandura. MBA 664: Database Management"

Transcription

1 DATA MINING TEAM #1 Kristen Durst Mark Gillespie Banan Mandura : Database Management

2 OUTLINE INTRODUCTION 1 DATA MINING DEFINITION AND EXAMPLES 1 DATA MINING PRODUCTS 2 DATA MINING PROCESS 4 DATA MINING TECHNIQUES 7 DATA MINING EXAMPLE 11 CONCLUSION 14 REFERENCES 14 APPENDIX: FIGURES 15 Team #1 ii

3 INTRODUCTION The purpose of this paper is to provide a brief overview of data mining and how data mining complements database technology. First, a definition for data mining will be provided and some example applications will be discussed. Next, a few of the more well known data mining companies will be presented along with the software and services they provide. Following the review of data mining products, an approach to the data mining process will be discussed along with an overview of a few of the more prominent data mining analysis techniques. Finally, a data mining example will be presented that illustrates the data mining process by means of a data collection and statistical approach to a real world problem. The intent is to provide the reader with a better feel for the data mining process and how it may be applied in actual applications. DATA MINING DEFINITION AND APPLICATIONS Data mining is an analysis process applied to large amounts of data with the intent of identifying hidden, unknown patterns and relationships within the data thereby enabling the user to draw conclusions and predict future outcomes. Practitioners of data mining are not as concerned with determining what has happened based on an analysis of their data as they are about predicting what will happen in the future. Data mining has grown in interest and application over the last several years as advances in computer processing and digital data storage have greatly increased the speed with which data can be accessed and processed while simultaneously reducing the cost and infrastructure required to store the data and the results. As will be discussed later, data mining does require a process, but in practice, the data mining process is not uniform from user to user. However, the data mining process will generally include the following three high level steps: ( a ) Description of the data to summarize attributes of the available data ( b ) Predictive modeling derived from a portion of the existing data ( c ) Verification of the model against the larger domain of data in the real world Despite the wide interest in and buzzword status of data mining, a user who wishes to implement data mining must recognize what data mining is not and what data mining cannot do. Team #1 1

4 Data mining is not simply the blind application of a series of algorithms to large sets of data. The data mining analyst must still understand the data and its origins, the business in which the data originated and is used, as well as the analytical methods that are applied to the data and the results of that analysis. Furthermore, data mining does not indicate what you must do with the data and the results. Only a knowledgeable user of the data will be able to assess the value of the patterns and relationships gleaned from the data mining approach and apply them to make a positive impact to their business. Data mining can be implemented in any business to aid the analysis and resolution of multiple problems; however, the use of data mining has been most widely noted in the telecommunications, credit card, financial and retail industries among others. For instance, the telecommunications industry has studied data to determine which customers are most likely to turn over or churn on their cell phone contracts; the credit card industry is able to detect and track fraudulent use of their services; financial companies are able to predict corporate stock performance; and retailers are able to tailor which products to stock and offer to particular customers. Unfortunately, the benefits of data mining do not come without a cost, and practitioners of data mining must recognize the potential legal and ethical concerns resulting from the widespread application of data mining tools. In particular, the ability to track and identify individual consumer behavior through the aggregation of data from multiple sources when the original data was in fact anonymous is of concern and has resulted in the adoption of data control policies within many corporations. DATA MINING PRODUCTS A wide range of data mining software and service providers exist in the marketplace today and they serve a wide range of customers. According to a 2008 study by the Gartner Group, an information technology research and advisory firm, five of the largest data mining software companies are indicated below: AGNOSS SOFTWARE COMPANY ( Agnoss offers a suite of software tools to perform predictive diagnostics. These tools cover all phases of the data mining process including profiling, exploration, modeling, implementation, scoring and validation. Key software tools include Knowledge SEEKER Team #1 2

5 for profiling and visualization, Knowledge STUDIO, a decision tree based tool for predictive analytics, and Strategy BUILDER, a tool combining analysis results into business rules. INFOR GLOBAL SOLUTIONS ( Infor Global Solutions is the world s third largest software company and has acquired a wide range of software applications that include Infor CRM Epiphany an integrated software tool that performs marketing, sales and service analytics. PORTRAIT SOFTWARE ( Portrait Software provides a suite of marketing analysis tools to support marketing, service and selling activities. Portrait Software offers products that perform marketing automation as well as predictive analytics. Quadstone Analytics is one of their predictive modeling tools and it employs various techniques including decision trees, regression, additive scorecards, clustering and uplift modeling. SAS INSTITUTE ( SAS is a leader in the data mining community and provides tools and solutions to a broad range of customers. SAS Enterprise Miner and SAS Analytics offer customers access to a multitude of methods and techniques to perform statistical analysis, data visualization, forecasting, and model management and deployment. (SAS was originally an acronym for Statistical Analysis System.) SPSS INC ( SPSS Inc. provides a range of products in four families allowing customers to perform Data Collection, Modeling, Statistical Analysis, and Deployment. These tools can be integrated with Clementine a data mining workbench that uses a wide range of data mining techniques. (The name SPSS is derived from Statistical Package for the Social Sciences). Team #1 3

6 DATA MINING PROCESS A formal, uniformly accepted methodology for the process of data mining does not truly exist. However, a 2002 survey by KDnuggets.com, a leading web based data mining resource, indicated that 51% of the 189 respondents do follow CRISP-DM (CRoss Industry Standard Process for Data Modeling) a methodology developed and advocated by SPSS. Another 12% of respondents reported that they apply the tools described by SAS s SEMMA approach. Nevertheless, the remaining 38% of those taking the survey indicated that they follow their own methodology, the methodology devised by their employer, or nothing at all. Despite the apparent lack of a uniform process for data mining, all approaches to data mining will likely incorporate activities to accomplish the tasks of (1) problem definition, (2) data collection, (3) data review, (4) data conditioning, (5) model building, (6) model evaluation, and (7) documentation and deployment. As known leaders in the data mining community, SPSS s CRISP-DM method and SAS s SEMMA approach will be discussed in more detail below. Although these approaches do not explicitly call out the seven activities just described, those seven activities are embedded within the SPSS and SAS approaches, and they will likely be incorporated into any successful data mining approach. CRISP-DM (CRoss Industry Standard Process for Data Mining) CRISP-DM was conceived in 1996 by a consortium consisting of Daimler Chrysler, SPSS, and NCR. The intent was to develop a data mining approach that was not specific to any particular industry, application, or analysis tool. With funding from the European Commission, the consortium conducted a workshop and upon finding general agreement for the need of a data mining template, CRIPS-DM was born. CRISP-DM is a hierarchical process model that consists of a set of tasks with various degrees of definition. The top level of the hierarchy is the Phase. Each Phase consists of generic tasks, the second level of the hierarchy. The tasks are generic in order to maintain the neutrality of the process, and they are intended to be complete, applicable to the entire process, as well as stable, tolerant of new and unplanned developments. Specialized tasks form the third level, and these are designed for the unique, particular nature of problems to be solved. Finally, records of actions, decisions, and results form the fourth and final level of the CRISP-DM hierarchy. The Team #1 4

7 data mining context will determine the mapping from the generic levels (levels 1 and 2) to the more specific levels (levels 3 and 4). Moreover, CRISP-DM is described by a six phase reference model that flows in a particular sequence but does not require the user to follow the phases in a fixed path. User s will likely find a need to move back and forth iteratively between phases as individual phase results come into focus. The CRIPS-DM methodology is accommodating of that requirement. Finally, CRISP-DM is designed to be cyclical in nature with an understanding that the data mining activity may not end once a solution is derived. New questions and problems are likely to be identified from the solution that may demand a continuous flow of follow-on activity. The six phase CRIPS-DM cyclical model is briefly described below. Phase 1 Business Understanding: The purpose of Business Understanding is to assess the objectives and requirements of the business and articulate these needs into a specific problem or problems the business wishes to solve. Phase 2 Data Understanding: Data Understanding consists of preliminary data collection along with the assessment of any insights into the data and any data quality issues. Potential data segregation may occur and preliminary hypotheses may be formed. Phase 3 Data Preparation: In data preparation, data quality issues are resolved and the final data set for analysis is generated. Any required data transforms are completed as is necessary data cleansing. Multiple iterations may be required. Phase 4 Modeling: The methodology is neutral to any of the various data modeling approaches. Multiple modeling choices may be reviewed and tailored to the specific problem and available data. If the desired modeling technique requires specific data conditions, a return to Phase 3 may be required. Multiple techniques may be applied. Phase 5 Evaluation: In Phase 5, the model is complete and validated for sufficient quality. If quality in the model is lacking or it fails to meet the needs of the business, a review and return to Phase 4 may be necessary. Team #1 5

8 Phase 6 Deployment: In Deployment, the data and model are organized and presented to the customer for use. Data visualization is critical as is documentation of all detailed process steps and their results. SEMMA (Sample, Explore, Modify, Model, Assess) SAS proposes that SEMMA is not so much a data mining methodology as it is a set of tools deployed within their SAS Enterprise Miner software that can be integrated into any data mining method. SEMMA articulates that it is the user s responsibility to define the business problem to be solved and acquire and condition the data appropriately. The SEMMA focus is on model development. A brief description of the five elements of SEMMA follows: SAMPLE: The Sample activity consists of extracting a statistically significant data set from the larger data domain. The data set must adequately represent the larger data set but be small enough for ease of manipulation. The data may be portioned to facilitate model training, validation and test. EXPLORE: In the Explore activity, different views and plots of the data are generated and trends or unusual data instances are discovered. Additionally, traditional statistical analysis tools or data mining techniques may be employed to ascertain any data subgroups. MODIFY: Modification results from the creation, selection and transformation of data in preparation for the modeling activity. New variables or groups may be defined and any outliers, data points resulting from special cause variation, may be eliminated. The data set is updated accordingly. MODEL: Modeling allows the user to fit the data using a wide variety of modeling techniques and predict outcomes as derived from the overarching business need. Techniques that may be applied include neural nets, decision trees, logistic regression, and k-nearest neighbor to name a few. Team #1 6

9 ASSESS: Finally, in assessment, the model is evaluated for usefulness in solving the articulated business problem and validated against the subset of data. As in all data mining approaches, the model is checked for over fitting to ensure the model is not tuned so tightly to the model development subset that it cannot adequately predict outcomes from other data sets. From the brief assessment of CRISP-DM and SEMMA above, it is clear that there is commonality of activity in any data mining approach even if the terminology and articulation of methods are different. As indicated in the introductory paragraph, any good data mining approach will include the tasks of (1) problem definition, (2) data collection, (3) data review, (4) data conditioning, (5) model building, (6) model evaluation, and (7) documentation and deployment. DATA MINING TECHNIQUES STATISTICAL METHODS The data mining technique that most people are familiar with are statistical methods such as sample statistics or linear regression. These are usually used for very simple problems that have very few predictive variables. If the problem was more complex then another method would be more appropriate. Sample statistics involve looking at particular variables and calculating the minimum value, maximum value, mean, median, and variance. For example, a retail store could analyze their sales data and find out for the previous quarter the summary statistics for their particular products. They can quickly make conclusions about particular product lines and if they find something they do not expect or looks interesting they can mine further using more complex methods. Linear regression is an easy way to predict values based on a simple equation. There could be many interactions involved to find the correct model, in which case linear regression would not be appropriate. However, for simple situations this can be a very powerful method. For example, Company ABC has data on their customer s income level and their sales data. As Team #1 7

10 shown in Figure 1, as the customer s income increases their total purchase amount increases. A line is fit through the data to minimize the error between the data points and the line. The line then becomes an equation: Total Purchase Amount = y-intercept + (Slope * Customer s Income). ABC can predict what the sales amount will be based on what the customer s income level is by putting it into the equation. Also, since ABC knows the relationship between income and sales, they can restrict their marketing to only certain income levels. NEAREST NEIGHBOR FOR PREDICTION Nearest neighbor for prediction is a very easy data mining technique to understand. The concept comes from the idea that you can predict the outcome or how something is going to behave based on how other predictive variables near it behave. An everyday example of how this used is in real estate. When someone buys a house the realtor will check to see what other houses in the area sold for because this is a good predictor of what the house for sale should be worth. This technique works best and is ideal to use when there are a few amount of predictive variables. A simple example of how a business might use this technique is if a company had a product they wanted to start selling in a new city. Company XYZ wants to estimate how many units will sell so they can determine if it is worth moving into the new market. XYZ has a database of the current sales data of each city where the product is already being sold. The predictive variables are the population of the city and the distance away from where their competitor s product is sold in relation to the city. As shown in Figure 2, each city is represented by a letter, which corresponds to three categories of the amount of units sold: >200 units, units, and <100 units. These markers are placed in the graph by the population of the city and the distance away from their competitor. There is a U marker that represents the new city where the amount of product sold is unknown and we want to predict. Using the nearest neighbor for prediction method the U marker is nearest to more cities falling into the A sales category than any other sales category. This means that we could predict that this city will behave in the same way and will have sales greater than 200 units. XYZ should plan on extending their product line to this market given the high prediction of sales. With the nearest neighbor for prediction method it is also possible to estimate how confident the company is with their prediction. If there prediction variables are extremely close Team #1 8

11 to their neighbors then there is a higher level of confidence. However, if there are not any prediction variables that are close a prediction can still be made, but with very little confidence. This is extremely valuable because a company would not want to follow through with a major investment with a prediction that has a low level of confidence. NEURAL NETWORK A neural network is a data mining technique that is much more complex. Some benefits of this technique is that it can use extremely large amounts of predictive variables, once the network has been created and it has been confirmed as successful then it can be used again and again, and it can be used in many different types of situations. The disadvantages of this technique are that the outcomes are not very easy to interpret and it can be very time consuming to get the data into the right format for running the model. A neural network is a complex computer model that takes input variables and then outputs a solution. Neural networks include an input layer, hidden layer, and output layer. The input layer consists of the predictive variables that go into the model. The hidden layer is created by the computer model and is not seen by the user. The output layer is the end prediction that has been calculated by the model. All of the variables that go into the neural network have to be converted into numeric variables with values between 0 and 1. Company XYZ could use a simple neural network to predict the same scenario that was used for the nearest neighbor for prediction method. The database that has the population of the city, the distance away from where their competitor s product is sold in relation to the city, and the product sales would be used create the model. The computer would use the population of the city and the distance away from the competition for the input layer and then go through a testing phase. During the testing phase the computer will assign various weights to each of the variables and then output a number that represents the predicted product sales. This number will be between 0 and 1 and needs to be interpreted as how that relates to the actual ranges that are provided in the database. For example, an output of less than means the product sales are less than 100 units, an output between and means the product sales are between 100 and 200 units, and an output greater than means the product sales are greater than 200 units. The computer will keep testing the actual data and adjusting the weights as needed to create the best model for computing the predicted product sales. As shown in Figure 3, once the Team #1 9

12 model has finished testing the data, the new city can be entered into the model and the predicted product sales can be computed. This model has an output of 0.736, so the product sales are predicted to be greater than 200 units. CLUSTERING/SEGMENTING Clustering or segmenting is a data mining technique where there is not something specific that is predicted. This technique forms groups that are similar and groups that are very different. This can help to give a good overall view of the data and what is going on in the business. For example if Company MNO has a database of demographic information on their customers and their buying habits then segmenting can be used to find buying patterns based on that demographic data. As shown in Figure 4, male consumers under the age of forty are behaving in the same way which is drastically different than female consumers over the age of forty. For this particular example you can group into gender differences, age differences, and also both gender and age. These groupings can then be used to for different marketing campaigns. The marketing techniques should be different for each group. Not all the variables in the database will be used for clustering or segmentation and some will need to be removed by the user if they do not make any meaningful sense. Clustering can also be used to identify potential problems by finding outliers. For example, through clustering company MNO determined they have a much higher sale volume for snowboards in their stores where they are within fifty miles of a ski resort. However, they found one store where there is a low volume of sales even though they are only twenty-four miles away from a ski resort. With some more research, company MNO realized that their sales were down in that store because that area had become saturated with so many competitors. With this new information they decided to pull their store out of this area because they cannot compete with the larger stores. DECISION TREE Decision trees are a predictive model that group together classification variables into a tree. Each branch represents a classification group that has been divided. The decision tree splits the data into groups by examining all of the data and picking the variable that has the greatest split between categories first. Then the category can continue to be split at each level Team #1 10

13 until there are no more logical splits to be made. Decision trees are designed to handle categorical data, but numeric data can be made into categories to use in the tree. The advantages of decision trees are they can be very easy to interpret, there is not much involved in getting the data ready to process, and they can be used for a variety of situations. One of the disadvantages of decision trees is sometimes with simpler problems it is more time consuming to use this method than linear regression. Recently a decision tree was used at a consumer products company to help determine why a consumer study we had done did not produce the results that were predicted. The conclusions and numbers are the same, but the product and categories have been changed to protect confidentiality. A study was designed to test how well a consumer likes the change in the design of a chair cushion. The consumer ranked how comfortable they thought the chair was on a scale of zero to five with zero being extremely uncomfortable and five being extremely comfortable and a 0.5 step increment in between. Then a second chair was presented to the consumer and they scored this chair as well. It was predicted that the second chair would score higher and the average increase in score from chair one to chair two was calculated across all the consumers. As shown in Figure 5, the change was minimal and a decision tree was constructed to provide insight into the reason why. The largest split between what influenced the average score was what the baseline score was, or the score for chair one. If the score for chair one was below 3.75 then the average change in score was If the score for chair one was greater than 3.75 then the change in score was The company continued to split the tree into what was influencing the scores further down, but the true value of the tree was in the first split. The consumers who started with high score did not have much room for improvement and actually averaged a decrease. The consumers that started out with a low score did improve their score, as predicted. From this information it was determined that the study was designed incorrectly and they should have only recruited consumers that were using more of the scale in their evaluation of the first chair. DATA MINING EXAMPLE A brief data mining example follows. Although this example did not specifically follow one of the more widely accepted data mining methodologies or use any of the sophisticated data Team #1 11

14 mining modeling techniques, it does illustrate the use of the principles of data mining in that large amounts of data were extracted from multiple data bases and a meaningful model was generated to describe a business process and ultimately change behavior. In this example, the business problem in question was assessing the status of and improving the on time delivery of New Product Introduction (NPI) hardware. NPI hardware is defined as fabricated assemblies that are all components of a larger machine assembly. The delivery problem stems from the fact that the fabrication of the very first set of NPI hardware required for the assembly of the very first machine is often late relative to the required due date. For the study at hand, the actual delivery of the first set of hardware was over 25 days late to the customer orders with an inter-quartile range of 35 days. To initiate analysis, a fishbone diagram was derived to assess potential causes of late delivery and a data collection plan was generated. In the data collection plan, data was identified for extraction from two different databases, the engineering product definition database and the manufacturing database. The data extracted from these two databases was merged into a single data set for analysis. An initial review of the data following evaluation of the process capability for on time delivery indicated two distinct subgroups of data: assemblies described as brackets and assemblies described as not brackets. From this segregation of data, analysis proceeded on the distinct subgroups and new data was generated to describe design and manufacturing subprocesses based on the extraction of time based event data from the databases. Preliminary plots of the data were generated and initial models were attempted using linear regression and general linear model approaches. Unfortunately, no single regression was able to adequately describe the data. Finally, an attempt was made to categorize some of the sub-process process times by percentiles and plot the data against on time delivery as a main effects plot. This plot revealed that manufacturing activity was related to on time delivery but design activity was not. In fact, the main effects plot indicated that parts designed closer to the due date actually had better on time delivery than parts designed further from the due date. Thus it was clear that no single regression of design and manufacturing data could adequately describe the process. From this new understanding of the data, an approach was made to fit only the manufacturing data to on time delivery and an acceptable regression was found. Similarly, an Team #1 12

15 attempt was made to fit the design data to one of the manufacturing variables, the creation of the part identity in the manufacturing database, and again a suitable regression was found. These two regressions were combined through a common term to form a single regression equation. This equation, while not fitting the on time delivery data very well, as evidenced by a poor R Squared correlation coefficient, did produce an accurate representation of the on time delivery distribution. Thus while the combined regression model could not predict how late a particular part would be, it could, based on the design release distribution, predict what percentage of parts would be late and when all the parts would be available for assembly to the first machine. Moreover, this model showed that the greatest contributor to the variation in on time delivery was the variation in design lead time. By using the derived regression equation along with the known distributions for the input parameters and the desired on time delivery distribution, a Monte Carlo simulation was employed to calculate the required cumulative distribution for design releases to ensure on time delivery. This cumulative distribution was compared to previous design release plans and the design release plans were shown to be faulty. As a result of this study, design plans were updated to reflect the learning and the required design release schedules. A data visualization tool was developed to extract data from the design database and plot the design release requirements against the design release plan and the actual design releases. This tool allowed for an assessment of the ability to meet on time delivery before all designs were released and the taking of corrective action in advance of the hardware due date. Furthermore, a data tool was developed to extract and format manufacturing data (Bill of Material, manufacturing status, quantity on hand, and manufacturing work order information) for coordination of design and manufacturing processes along with the rapid assessment of manufacturing status for all assemblies required to build a particular machine. Unfortunately, this example is not a direct implementation of one of the data mining methodologies described in the paper. Nor did this example illustrate use of any of the elaborate data modeling techniques that might be used on larger more complex data sets. However, this example does illustrate the basic principles of the data mining process (defining a business need, data collection, data review, data conditioning, model generation, model evaluation, and documentation and deployment) and the use of data and data modeling to change behavior and solve a business need. Team #1 13

16 CONCLUSION This paper has provided an overview of data mining and has included a data mining definition, a review of where data mining may be applied, a summary of data mining products, an assessment of the data mining process, and a synopsis of some of the major data mining techniques. The paper concludes with an example project that attempts to illustrate the data mining process and how it may be used to solve a real world problem. REFERENCES Angoss Software. Angoss Software Corporation, Inc. Last visited 7 APR 09. < Chapman, Pete et al., CRISP-DM Retrieved from < Step-by- Step_Data_Mining_Guide.pdf> 5 APR 09. Infor Solutions. Infor Global Solutions. Last visited 7 APR 09. < Poll: What Main Methodology Are You Using for Data Mining. JUL KDnuggets. Last visited 5 APR 09. < Portrait Software Solutions. Portrait Software plc. Last visited 7 APR 09. < SAS Enterprise Miner. SAS Institute, Inc. Last visited 5 APR 09. < SAS Products and Solutions. SAS Institute, Inc. Last visited 7 APR 09. < SPSS Software. SPSS, Inc. Last visited 7 APR 09. < Two Crows Corporation Introduction to Data Mining and Knowledge Discovery, Third Edition. (ISBN: ). Retrieved from < 25 FEB 09. Team #1 14

17 Figure 1: KDnuggets.com, 2002 Survey; Data Mining Process Figure 2: CRISP-CM Breakdown Team #1 15

18 Figure 3: CRISP-DM Phases and Flow Figure 4: Linear Regression Technique Team #1 16

19 Figure 5: Nearest Neighbor Technique Figure 6: Neural Net Technique Team #1 17

20 Figure 7: Clustering / Segmenting Technique Figure 8: Decision Tree Technique Team #1 18

21 On Time Delivery Probability Delivery Actual - fit Delivery Required Figure 9: Example On Time Delivery Problem Statement Brainstorm Variation Sources Data Collection Plan Figure 10: Example On Time Delivery Data Collection Team #1 19

22 TOTAL LEAD TIME by Part Type: p <.05 Level N Mean StDev BRACKET 520 x6.76 x3.14 (--*-) DUCT 138 x6.70 x0.40 (----*---) MANIFOLD 44 x9.95 x4.68 ( * ) TUBE 47 x3.60 x2.79 (------* ) Pooled StDev = Figure 11: Example Data Segmentation SHIP_DUE IR CREATE BOM CREATE BOMC_MODC BOMC_MODP BOMC_MODI MODC_DUE MODI_DUE BOMC_DUE Main Effects Plot - Data Means for SHIP-DUE MODI_MODC CAT MODEL_CR CAT BOM_CR-D CAT SCHED_ST CAT MO_FINIS CAT MOD_ISSU CAT MAN-DUE CAT MO_START 60 SHIP-DUE Figure 12.1: Example Model Building X make more negative 52.8% Y make smaller X make smaller 3.5% 28.3% X make smaller - Time + Time 0 X make smaller 7.1% X make smaller 8.4% Model Create IR Create Model PRE BOM Create Model / DWG MAN Issue Release Components Scheduled Available MO Start MO Start MO Finish SHIP DATE SHIP-DUE = *(MODEL_CR-DUE) *(CR-ISS) *(MAN_BOMC) *(SCH_ST-MAN) *(MOS_MOFIN) [R^2A 4.4%] {R^2A(1) 76.5%, R^2A(2) 68.0%} Figure 12.2: Example Model Building DUE DATE Team #1 20

23 1.2 1 Overlay Chart Actual Delivery Probability SHIP DUE MODEL Predicted Delivery (Regression) SHIP DUE ACTUAL Figure 13: Example Model Evaluation Issue Required for On-Time Delivery 1.2 Overlay Chart Probability MODI ACT modi calc new Issue Actual Figure 14: Example Model Evaluation Team #1 21

24 BRACKETS SUMMARY Plan Requirements 70 Actual Number of Parts CUM Req Issue CUM Plan Issue CUM Actual Issue 30 *** WARNINGS *** 1.1 BRACKET PLANNING 20 # Issed No PRE - 6 # Issued Post Due - 0 # Multiple Issued Files - 12 # Complex Not Planned Early - 0 # Complex Not Issued Early Cumulative Percent OLD PLAN 0 NEW PLAN 08/06/05 REQUIRED 08/20/05 09/03/05 09/17/05 10/01/05 10/15/05 10/29/05 11/12/05 11/26/05 12/10/05 12/24/05 01/07/06 01/21/06 02/04/06 02/18/06 03/04/06 03/18/06 04/01/06 04/15/06 04/29/06 05/13/06 05/27/06 06/10/06 06/24/ Date Days All Due Dates Figure 15: Example Deployment Team #1 22

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA)

A Comparative Study of Data Mining Process Models (KDD, CRISP-DM and SEMMA) International Journal of Innovation and Scientific Research ISSN 2351-8014 Vol. 12 No. 1 Nov. 2014, pp. 217-222 2014 Innovative Space of Scientific Research Journals http://www.ijisr.issr-journals.org/

More information

Now, Data Mining Is Within Your Reach

Now, Data Mining Is Within Your Reach Clementine Desktop Specifications Now, Data Mining Is Within Your Reach Data mining delivers significant, measurable value. By uncovering previously unknown patterns and connections in data, data mining

More information

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos

KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW. Ana Azevedo and M.F. Santos KDD, SEMMA AND CRISP-DM: A PARALLEL OVERVIEW Ana Azevedo and M.F. Santos ABSTRACT In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN... INTRODUCTION... 2 WHAT IS DATA MINING?... 2 HOW TO ACHIEVE DATA MINING... 2 THE ROLE OF DARWIN... 3 FEATURES OF DARWIN... 4 USER FRIENDLY... 4 SCALABILITY... 6 VISUALIZATION... 8 FUNCTIONALITY... 10 Data

More information

Oracle9i Data Mining. An Oracle White Paper December 2001

Oracle9i Data Mining. An Oracle White Paper December 2001 Oracle9i Data Mining An Oracle White Paper December 2001 Oracle9i Data Mining Benefits and Uses of Data Mining... 2 What Is Data Mining?... 3 Data Mining Concepts... 4 Using the Past to Predict the Future...

More information

The CRISP-DM Process Model

The CRISP-DM Process Model CRISP-DM Discussion Paper March, 1999 The CRISP-DM Process Model Pete Chapman (NCR) Julian Clinton (SPSS) Thomas Khabaza (SPSS) Thomas Reinartz (DaimlerChrysler) Rüdiger Wirth (DaimlerChrysler) This discussion

More information

> Data Mining Overview with Clementine

> Data Mining Overview with Clementine > Data Mining Overview with Clementine This two-day course introduces you to the major steps of the data mining process. The course goal is for you to be able to begin planning or evaluate your firm s

More information

opensap Getting Started with Data Science

opensap Getting Started with Data Science opensap Getting Started with Data Science Week 1 Unit 1 00:00:11 Hello and welcome to the opensap course "Getting Started with Data Science". My name is Stuart Clarke and I am a consultant with SAP, specializing

More information

Oracle9i Data Mining. Data Sheet August 2002

Oracle9i Data Mining. Data Sheet August 2002 Oracle9i Data Mining Data Sheet August 2002 Oracle9i Data Mining enables companies to build integrated business intelligence applications. Using data mining functionality embedded in the Oracle9i Database,

More information

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 The Survey Of Data Mining And Warehousing Architha.S, A.Kishore Kumar Department of Computer Engineering Department of computer engineering city engineering college VTU Bangalore, India ABSTRACT: Data

More information

Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002

Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002 Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002 Introduction Neural networks are flexible nonlinear models that can be used for regression and classification

More information

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION

ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION ENTERPRISE MINER: 1 DATA EXPLORATION AND VISUALISATION JOZEF MOFFAT, ANALYTICS & INNOVATION PRACTICE, SAS UK 10, MAY 2016 DATA EXPLORATION AND VISUALISATION AGENDA SAS Webinar 10th May 2016 at 10:00 AM

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Chapter 2: Descriptive Statistics

Chapter 2: Descriptive Statistics Chapter 2: Descriptive Statistics Student Learning Outcomes By the end of this chapter, you should be able to: Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize,

More information

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of

More information

Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity

Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity Wendy Foslien, Honeywell Labs Valerie Guralnik, Honeywell Labs Steve Harp, Honeywell Labs William Koran, Honeywell Atrium

More information

Question Bank. 4) It is the source of information later delivered to data marts.

Question Bank. 4) It is the source of information later delivered to data marts. Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

More information

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining. About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa

An Introduction to Data Mining in Institutional Research. Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa An Introduction to Data Mining in Institutional Research Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa AIR/SPSS Professional Development Series Background Covering variety

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms

More information

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014 Data Mining Data mining processes What technological infrastructure is required? Data mining is a system of searching through large amounts of data for patterns. It is a relatively new concept which is

More information

GETTING STARTED WITH DATA MINING

GETTING STARTED WITH DATA MINING GETTING STARTED WITH DATA MINING Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIR Forum 2017 Washington, D.C. 1 Using Data

More information

Data Mining: Approach Towards The Accuracy Using Teradata!

Data Mining: Approach Towards The Accuracy Using Teradata! Data Mining: Approach Towards The Accuracy Using Teradata! Shubhangi Pharande Department of MCA NBNSSOCS,Sinhgad Institute Simantini Nalawade Department of MCA NBNSSOCS,Sinhgad Institute Ajay Nalawade

More information

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software

Data Mining Overview. CHAPTER 1 Introduction to SAS Enterprise Miner Software 1 CHAPTER 1 Introduction to SAS Enterprise Miner Software Data Mining Overview 1 Layout of the SAS Enterprise Miner Window 2 Using the Application Main Menus 3 Using the Toolbox 8 Using the Pop-Up Menus

More information

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models

More information

Customer Clustering using RFM analysis

Customer Clustering using RFM analysis Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras

More information

Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies

Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies Implementing Operational Analytics Using Big Data Technologies to Detect and Predict Sensor Anomalies Joseph Coughlin, Rohit Mital, Shashi Nittur, Benjamin SanNicolas, Christian Wolf, Rinor Jusufi Stinger

More information

Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001

Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001 Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001 Leonard Paas previously worked as a senior consultant at the Database Marketing Centre of Postbank. He worked on

More information

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10 8: Statistics Statistics: Method of collecting, organizing, analyzing, and interpreting data, as well as drawing conclusions based on the data. Methodology is divided into two main areas. Descriptive Statistics:

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA INSIGHTS@SAS: ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA AGENDA 09.00 09.15 Intro 09.15 10.30 Analytics using SAS Enterprise Guide Ellen Lokollo 10.45 12.00 Advanced Analytics using SAS

More information

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file 1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/

More information

Lluis Belanche + Alfredo Vellido Data Mining II An Introduction to Mining (2)

Lluis Belanche + Alfredo Vellido Data Mining II An Introduction to Mining (2) Lluis Belanche + Alfredo Vellido Data Mining II An Introduction to Mining (2) On dates & evaluation: Lectures expected to end on the week 14-18th Dec Likely essay deadline & presentation: 15th, 22nd Jan

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining José Hernández-Orallo Dpto. de Sistemas Informáticos y Computación Universidad Politécnica de Valencia, Spain jorallo@dsic.upv.es Roma, 14-15th May 2009 1 Outline Motivation.

More information

Cloud Migration Strategies

Cloud Migration Strategies Enterprise Strategy Group Getting to the bigger truth. Research Insights Paper Cloud Migration Strategies Mapping the Journey to Successful Cloud Adoption By Dan Conde, ESG Analyst; and Leah Matuson, Research

More information

An Interactive GUI Front-End for a Credit Scoring Modeling System by Jeffrey Morrison, Futian Shi, and Timothy Lee

An Interactive GUI Front-End for a Credit Scoring Modeling System by Jeffrey Morrison, Futian Shi, and Timothy Lee An Interactive GUI Front-End for a Credit Scoring Modeling System by Jeffrey Morrison, Futian Shi, and Timothy Lee Abstract The need for statistical modeling has been on the rise in recent years. Banks,

More information

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce Overview Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Core Ideas in Data Mining Classification Prediction Association Rules Data Reduction Data Exploration

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012

劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012 劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012 Overview of Data Mining ( 資料採礦 ) What is Data Mining? Steps in Data Mining Overview of Data Mining techniques Points to Remember Data mining

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information

IBM DB2 Intelligent Miner for Data. Tutorial. Version 6 Release 1

IBM DB2 Intelligent Miner for Data. Tutorial. Version 6 Release 1 IBM DB2 Intelligent Miner for Data Tutorial Version 6 Release 1 IBM DB2 Intelligent Miner for Data Tutorial Version 6 Release 1 ii IBM DB2 Intelligent Miner for Data About this tutorial This tutorial

More information

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007

What s New in Spotfire DXP 1.1. Spotfire Product Management January 2007 What s New in Spotfire DXP 1.1 Spotfire Product Management January 2007 Spotfire DXP Version 1.1 This document highlights the new capabilities planned for release in version 1.1 of Spotfire DXP. In this

More information

THE STATE OF IT TRANSFORMATION FOR RETAIL

THE STATE OF IT TRANSFORMATION FOR RETAIL THE STATE OF IT TRANSFORMATION FOR RETAIL An Analysis by Dell EMC and VMware Dell EMC and VMware are helping IT groups at retail organizations transform to business-focused service providers. The State

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

is using data to discover and precisely target your most valuable customers

is using data to discover and precisely target your most valuable customers Marketing Forward is using data to discover and precisely target your most valuable customers Boston Proper and Experian Marketing Services combine superior data, analytics and technology to drive profitable

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

Chapter 1 Polynomials and Modeling

Chapter 1 Polynomials and Modeling Chapter 1 Polynomials and Modeling 1.1 Linear Functions Recall that a line is a function of the form y = mx+ b, where m is the slope of the line (how steep the line is) and b gives the y-intercept (where

More information

JMP Clinical. Release Notes. Version 5.0

JMP Clinical. Release Notes. Version 5.0 JMP Clinical Version 5.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of SAS SAS Campus Drive

More information

SAS Enterprise Miner : Tutorials and Examples

SAS Enterprise Miner : Tutorials and Examples SAS Enterprise Miner : Tutorials and Examples SAS Documentation February 13, 2018 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. SAS Enterprise Miner : Tutorials

More information

Data Mining An Overview ITEV, F /18

Data Mining An Overview ITEV, F /18 Data Mining An Overview ITEV, F-2008 1/18 ITEV, F-2008 2/18 What is Data Mining?? ITEV, F-2008 2/18 What is Data Mining?? ITEV, F-2008 2/18 What is Data Mining?! ITEV, F-2008 3/18 What is Data Mining?

More information

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA. Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan

More information

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g

Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g Partner Presentation Faster and Smarter Data Warehouses with Oracle OLAP 11g Vlamis Software Solutions, Inc. Founded in 1992 in Kansas City, Missouri Oracle Partner and reseller since 1995 Specializes

More information

Gain Insight and Improve Performance with Data Mining

Gain Insight and Improve Performance with Data Mining Clementine 11.0 Specifications Gain Insight and Improve Performance with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

CRITERION Vantage 3 Admin Training Manual Contents Introduction 5

CRITERION Vantage 3 Admin Training Manual Contents Introduction 5 CRITERION Vantage 3 Admin Training Manual Contents Introduction 5 Running Admin 6 Understanding the Admin Display 7 Using the System Viewer 11 Variables Characteristic Setup Window 19 Using the List Viewer

More information

Applications and Trends in Data Mining

Applications and Trends in Data Mining Applications and Trends in Data Mining Data mining applications Data mining system products and research prototypes Additional themes on data mining Social impacts of data mining Trends in data mining

More information

An Interactive GUI Front-End for a Credit Scoring Modeling System

An Interactive GUI Front-End for a Credit Scoring Modeling System Paper 6 An Interactive GUI Front-End for a Credit Scoring Modeling System Jeffrey Morrison, Futian Shi, and Timothy Lee Knowledge Sciences & Analytics, Equifax Credit Information Services, Inc. Abstract

More information

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT This chapter provides step by step instructions on how to define and estimate each of the three types of LC models (Cluster, DFactor or Regression) and also

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information

Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts.

Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts. Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts. BY SCOTT A. BARNES, CPA, CFF, CGMA The adversarial nature of the American legal system creates a natural conflict between

More information

Paper SAS Taming the Rule. Charlotte Crain, Chris Upton, SAS Institute Inc.

Paper SAS Taming the Rule. Charlotte Crain, Chris Upton, SAS Institute Inc. ABSTRACT Paper SAS2620-2016 Taming the Rule Charlotte Crain, Chris Upton, SAS Institute Inc. When business rules are deployed and executed--whether a rule is fired or not if the rule-fire outcomes are

More information

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree World Applied Sciences Journal 21 (8): 1207-1212, 2013 ISSN 1818-4952 IDOSI Publications, 2013 DOI: 10.5829/idosi.wasj.2013.21.8.2913 Decision Making Procedure: Applications of IBM SPSS Cluster Analysis

More information

2015 User Satisfaction Survey Final report on OHIM s User Satisfaction Survey (USS) conducted in autumn 2015

2015 User Satisfaction Survey Final report on OHIM s User Satisfaction Survey (USS) conducted in autumn 2015 2015 User Satisfaction Survey Final report on OHIM s User Satisfaction Survey (USS) conducted in autumn 2015 Alicante 18 December 2015 Contents 1. INTRODUCTION... 4 SUMMARY OF SURVEY RESULTS... 4 2. METHODOLOGY

More information

MHPE 494: Data Analysis. Welcome! The Analytic Process

MHPE 494: Data Analysis. Welcome! The Analytic Process MHPE 494: Data Analysis Alan Schwartz, PhD Department of Medical Education Memoona Hasnain,, MD, PhD, MHPE Department of Family Medicine College of Medicine University of Illinois at Chicago Welcome! Your

More information

SAS Web Report Studio 3.1

SAS Web Report Studio 3.1 SAS Web Report Studio 3.1 User s Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2006. SAS Web Report Studio 3.1: User s Guide. Cary, NC: SAS

More information

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems

WKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring

More information

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract 19-1214 Chemometrics Technical Note Description of Pirouette Algorithms Abstract This discussion introduces the three analysis realms available in Pirouette and briefly describes each of the algorithms

More information

IBM Storwize V7000 TCO White Paper:

IBM Storwize V7000 TCO White Paper: IBM Storwize V7000 TCO White Paper: A TCO White Paper An Alinean White Paper Published by: Alinean, Inc. 201 S. Orange Ave Suite 1210 Orlando, FL 32801-12565 Tel: 407.382.0005 Fax: 407.382.0906 Email:

More information

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

SELECTION OF A MULTIVARIATE CALIBRATION METHOD SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper

More information

2013 North American Software Defined Data Center Management Platforms New Product Innovation Award

2013 North American Software Defined Data Center Management Platforms New Product Innovation Award 2013 North American Software Defined Data Center Management Platforms New Product Innovation Award 2013 New Product Innovation Award Software Defined Data Center Management Platforms North America, 2013

More information

TRANSACTIONAL BENCHMARK

TRANSACTIONAL  BENCHMARK TRANSACTIONAL EMAIL BENCHMARK REPORT 2018 Transactional Email: Essential to the Cloud Economy Email Is Vital to Modern Apps and Services Email is part of daily life that most of us take for granted. Email

More information

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended. Previews of TDWI course books are provided as an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews can not be printed. TDWI strives

More information

The strategic advantage of OLAP and multidimensional analysis

The strategic advantage of OLAP and multidimensional analysis IBM Software Business Analytics Cognos Enterprise The strategic advantage of OLAP and multidimensional analysis 2 The strategic advantage of OLAP and multidimensional analysis Overview Online analytical

More information

Integration With the Business Modeler

Integration With the Business Modeler Decision Framework, J. Duggan Research Note 11 September 2003 Evaluating OOA&D Functionality Criteria Looking at nine criteria will help you evaluate the functionality of object-oriented analysis and design

More information

Knowledge Modelling and Management. Part B (9)

Knowledge Modelling and Management. Part B (9) Knowledge Modelling and Management Part B (9) Yun-Heh Chen-Burger http://www.aiai.ed.ac.uk/~jessicac/project/kmm 1 A Brief Introduction to Business Intelligence 2 What is Business Intelligence? Business

More information

Mn/DOT Market Research Reporting General Guidelines for Qualitative and Quantitative Market Research Reports Revised: August 2, 2011

Mn/DOT Market Research Reporting General Guidelines for Qualitative and Quantitative Market Research Reports Revised: August 2, 2011 Mn/DOT Market Research Reporting General Guidelines for Qualitative and Quantitative Market Research Reports Revised: August 2, 2011 The following guidelines have been developed to help our vendors understand

More information

Usability Report for Online Writing Portfolio

Usability Report for Online Writing Portfolio Usability Report for Online Writing Portfolio October 30, 2012 WR 305.01 Written By: Kelsey Carper I pledge on my honor that I have not given or received any unauthorized assistance in the completion of

More information

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016) CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is

More information

Measures of Central Tendency

Measures of Central Tendency Page of 6 Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean The sum of all data values divided by the number of

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Big Data Analytics The Data Mining process. Roger Bohn March. 2016

Big Data Analytics The Data Mining process. Roger Bohn March. 2016 1 Big Data Analytics The Data Mining process Roger Bohn March. 2016 Office hours HK thursday5 to 6 in the library 3115 If trouble, email or Slack private message. RB Wed. 2 to 3:30 in my office Some material

More information

CHAPTER 2 DESCRIPTIVE STATISTICS

CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Computer Science 591Y Department of Computer Science University of Massachusetts Amherst February 3, 2005 Topics Tasks (Definition, example, and notes) Classification

More information

Building Better Parametric Cost Models

Building Better Parametric Cost Models Building Better Parametric Cost Models Based on the PMI PMBOK Guide Fourth Edition 37 IPDI has been reviewed and approved as a provider of project management training by the Project Management Institute

More information

COMP 465 Special Topics: Data Mining

COMP 465 Special Topics: Data Mining COMP 465 Special Topics: Data Mining Introduction & Course Overview 1 Course Page & Class Schedule http://cs.rhodes.edu/welshc/comp465_s15/ What s there? Course info Course schedule Lecture media (slides,

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1) Orange Juice data Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l10-oj-data.html#(1) 1/31 Orange Juice Data The data contain weekly sales of refrigerated

More information

RightNow Technologies Best Practices Implementation Guide. RightNow Technologies, Inc.

RightNow Technologies Best Practices Implementation Guide. RightNow Technologies, Inc. RightNow Technologies Best Practices Implementation Guide RightNow Technologies, Inc. www.rightnow.com http://rightnow.custhelp.com Welcome Welcome to the RightNow Technologies Best Practice Implementation

More information