An Introduction to Data Mining in Institutional Research Dr. Thulasi Kumar Director of Institutional Research University of Northern Iowa
AIR/SPSS Professional Development Series Background Covering variety of topics Up to date information on www.airweb.org
Common Questions 1. Will I be able to get copies of the slides after the event? 2. Is this web seminar being taped so I or others can view it after the fact? 3. Can I ask questions during this event? Copyright 2003-4, SPSS Inc. 3
Common Questions 1. Will I be able to get copies of the slides after the event? Yes 2. Is this web seminar being taped so I or others can view it after the fact? Yes 3. Can I ask questions during this event? Yes Copyright 2003-4, SPSS Inc. 4
Today s s Agenda Data Mining Overview History How it compares to other analytic techniques Phases in the Data Mining Process Applications of Data Mining in Institutional Research Data Mining solutions Question and Answer
The Evolution of Data Analysis Evolutionary Step Business Question Enabling Technologies Product Providers Characteristics Data Collection (1960s) "What was my total revenue in the last five years?" Computers, tapes, disks IBM, CDC Retrospective, static data delivery Data A ccess (1980s) "What were unit sales in New England last March?" Relational databases (RDBMS), Structured Query Language (SQL), ODBC Oracle, Sybase, Informix, IBM, Microsoft Retrospective, dynamic data delivery at record level Data Warehousing & Decision Support (1990s) "What were unit sales in New England last March? Drill down to Boston." On-line analytic processing (OLAP), multidimensional databases, data warehouses SPSS, Comshare, Arbor, Cognos, Microstrategy,NC R Retrospective, dynamic data delivery at multiple levels Data Mining (Emerging Today) "What s likely to happen to Boston unit sales next month? Why?" Advanced algorithms, multiprocessor computers, massive databases SPSS/Clementine, Lockheed, IBM, SGI, SAS, NCR, Oracle, numerous startups Prospective, proactive information delivery Source: SPSS BI
What is Data Mining? The process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in repositories and by using pattern recognition technologies as well as statistical and mathematical techniques (The Gartner Group). The exploration and analysis of large quantities of data in order to discover meaningful patterns and rules (Berry and Linoff). The nontrivial extraction of implicit, previously unknown, and potentially useful information from data (Frawley, Paitestsky-Shapiro and Mathews).
Differences between Statistics and Data Mining STATISTICS Confirmative Small data sets/file-based Small number of variables Deductive Numeric data Clean data DATA MINING Explorative Large data sets/databases Large number of variables Inductive Numeric and non-numeric Data cleaning
Paradigm Shift Traditional IR Work: Data file => Descriptive/Regression Analysis => Tabulations/Reports Historical Predictive Data Mining Driven IR Work: Database => Data Mining (Visualization, Association, Clustering, Predicative Modeling) => Immediate Actions Historical Predictive Source: Jing Luan, Cabrillo College, CA
Data Mining is not OLAP Data Warehousing Data Visualization SQL Ad Hoc Queries Reporting
Data Mining Roots and Algorithms Statistics Distributions, mathematics, etc. Machine Learning Computer science, heuristics and induction algorithms Artificial Intelligence Emulating human intelligence Neural Networks Biological models, psychology and engineering
Data Mining is Predictive Modeling Liner/Logistic Regression Neural Networks Decision Trees Clustering Kohonen Neural Networks Clustering K-Means Clustering Nearest Neighbor Clustering
Data Mining is (cont (cont d) Segmentation Decision Trees Neural Networks Predictive Modeling Affinity Analysis Association Rule Sequence Generators
Phases in the DM Process: CRISP-DM Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment www.crisp-dm.org
CRISP-DM Business Understanding Understanding project objectives and data mining problem identification Data Understanding Capturing, understand, explore your data for quality issues Data Preparation Data cleaning, merge data, derive attributes etc. Modeling Select the data mining techniques, build the model Evaluation Evaluate the results and approved models Deployment Put models into practice, monitoring and maintenance plan
Data at the heart of the Predictive Enterprise Interaction data - Offers - Results - Context - Click streams - Notes Attitudinal data - Opinions - Preferences - Needs - Desires Descriptive data - Attributes - Characteristics - Self-declared info - (Geo)demographics Behavioral data - Orders - Transactions - Payment history - Usage history Source: SPSS BI
Data Mining Applications Institutional Effectiveness Which students make greatest use of institutional services? What courses provide high full-time equivalent students (FTES) and allow better use of space? What are the patterns in course taking? What courses tend to be taken as a group?
Data Mining Applications (cont d) Enrollment Management Who are our best students? Where do our students come from? Who is most likely to return for another semester? Who is most likely to fail or drop out?
Data Mining Applications (cont d) Marketing Who is most likely to respond to our new campaign? Which type of marketing/recruiting works best? Where should we focus our advertising and recruiting?
Data Mining Applications (cont d) Alumni What are the different types/groups of alumni? Who is likely to pledge, for how much, and when? Where and on whom should we focus our fundraising drives?
Data Mining Applications in Institutional Research Categorize your students Classification Cafeteria meal planning Student housing planning Predict students retention/alumni donations Neural Nets/Regression Identify high risk students Estimate/predict alumni contribution Predict new student application rate Group similar students Segmentation Course planning Academic scheduling Identify student preferences for clubs and social organizations Identify courses that are taken together Association Faculty teaching load estimation Course planning Academic scheduling Find patterns and trends over time Sequence Predict alumni donation Predict potential demand for library resources
Data Mining with Clementine Industry-leading workbench for data mining Comprehensive range of tools for all stages of the data mining process Pioneered visual approach for maximum productivity Multiple modeling techniques to predict future events
Summary Successful data mining strategy involves: Well defined goals, project objectives, and questions Sufficient and relevant data Careful consideration and selection of software and analysts (tech and domain expert) Support from senior administrators (VPs and the President) DM provides a set of tools, techniques and a standardized process. Need domain expertise in institutional research to build, test, validate, and deploy models. DM does not build models automatically. Analysts do.
Next Steps: Data Mining Resources http://www.kdnuggets.com/ http://www.dmhe.org/ http://www.uni.edu/instrsch/dm/index.html http://www.spss.com/data_mining/
Questions?
Next Steps: Webcasts and White Papers December 12 th, 2pm Moving Beyond the Basics: Data Mining for Institutional Research Information at www.spss.com/airseries3 Visit www.spss.com/airseries2 to download a copy of the SPSS Data Mining Tips Guide
For more information www.spss.com www.airweb.org Complete the evaluation form and tell us what you thought of today s webcast
THANK YOU! Survey also at: http://www.airweb.org/page.asp?page=217&meetin gid=0010