Ascertaining Apropos Mining Algorithms for Business Applications
|
|
- Dana Stone
- 5 years ago
- Views:
Transcription
1 Ascertaining Apropos Mining Algorithms for Business Applications Misha Sheth Dept. of Computer Engineering, D. J. Sanghvi COE, Mumbai University Parth Mehta Dept. of Computer Engineering, D. J. Sanghvi COE, Mumbai University Prof. Chetashri Bhadane Assit.Prof. Dept. of Computer Engineering, D. J. Sanghvi COE, Mumbai University ABSTRACT With the onset of the information technology era, there is an increasing trend of enterprises attempting to collect and store colossal amounts of data. This calls for efficient data mining technologies to expedite data processing, information retrieval and subsequent knowledge generation. Since it is difficult to understand the complexities of data mining, determining the optimal method from the various data mining applications becomes of prime importance. To resolve this problem, we analyze several approaches vis-à-vis their methodology, type of input parameters, speed of training, ease of modelling as well as issues specific to each method. This allows for swift and profitable applications of data mining mechanisms. Further, leveraging the specific strengths and weaknesses of these techniques in context of business, we look at two applications of data mining in the financial area and attempt to suggest an appropriate method for each of them. Keywords Mining, Knowledge Discovery, Classification Methods, Business Decision INTRODUCTION With the advent of the Internet and a rising growth in business, there is a gradual realization that the huge amount of data collected can be processed and analyzed to help lead to strategic decision making. This data ought to be converted into information via a sequential data gathering and mining process. Once data is collected using various collection methods, it is cleaned and processed to remove discrepancies. Numerous data mining approaches can then be applied on this data to yield desired outcomes. This ultimately culminates into intelligent decision making. By knowledge discovery in databases, useful knowledge, discrepancies, and important information can be drawn out from the database for investigation from various perspectives. However, due to the diverse domains in which business applications are present, it is difficult to precisely realize the algorithm that must be used to suit its specific category, requirement and desired outcomes. We adopt the literature survey to summarize approaches and concepts involved. Moreover, the selection of a technique requires both conceptual analysis and operational definition of business decision and applications. Applications are usually composed of several problems to be solved and in this review; we study each application by breaking the same into parts to establish a standard description. The rest of the paper is organized as follows: We explain literature review of previous works related to this area including explanation about various data mining techniques and the two applications being considered, namely cross-selling and segmentation analysis have been presented. Further, a comparative study of the five data mining techniques is provided. These include Naïve Bayes, Decision Tree, Neural Network, Support Vector Machine and Logistic Regression. Finally, we evaluate the results and conclude the paper. LITERATURE REVIEW People often take data mining as a synonym for a popularly used term, Knowledge Discovery in bases, or KDD. It is also correct to view data mining as simply an important and crucial step in the process of knowledge discovery in databases. Selecting a data mining algorithm includes choosing method(s) to be used for finding patterns in the data such as deciding which parameters and models may be appropriate and tallying a particular data mining method with the comprehensive requirements of the KDD process. The mining results which match the requirements will be elucidated and organized, to be considered to be put into action or be presented to interested sides in the final step. The concept of data mining possesses all activities and methods using the collected data to derive implicit information and evaluating historical records to gain valuable knowledge. Naïve Bayes Bayesian classifiers are statistical classifiers. Class membership probabilities can be predicted using Bayesian classifiers. They designate the most likely class to a specific given example as illustrated by its feature vector [3]. Learning and understanding such classifiers can be hugely simplified by assuming that features are independent given class, that is,, where is a feature vector and is a class.(naïve bayes paper) Naïve Bayes (NB) probabilistic classifiers are the most commonly used [3]. The primary idea in this approach is to make use of the joint probabilities of words and categories to assess the probabilities of categories for a specific scenario or document. Coming to the naïve part of Naïve Bayes technique, it is the assumption of word independence, i.e. the conditional probability of a word given a category is assumed to be independent from the conditional probabilities of other words given that category [5]. This assumption is a primary reason that allows for the computation of the Naïve Bayes classifiers to be far more efficient than the exponential complexity of non-naïve Bayes techniques since it does not use word combinations as predictors [5]. Decision Tree 15
2 A decision tree (DT) is an extremely useful tool for classification. It is simple and easy to understand and assay. Furthermore, building the classification model does not require a lot of time. A decision tree (DT) has a flowchart-like tree structure, where every internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class label [5]. The node at the top of a tree is the root node. While constructing the tree, measures of attribute selection is employed to find the attribute which best partitions the tuples into distinct classes. Information Gain, Gain Ratio, and Gini Index are popular attribute selection measures. While constructing a DT for the purpose of classification a crucial factor that needs to be addressed is the degree of adjustment of the model to the training set being used. If during the construction of DT, a tight stopping criterion is employed then it leads to small and underfitted DTs. On the other hand, if a loose stopping criterion is used, then it leads to generation of large DTs that over-fit the data of the training set. Pruning methods are developed to solve this dilemma. Here using a loosely stopping criterion the DT is allowed to over fit the training set. Then this over-fitted tree is cut back to a smaller tree by eliminating sub-branches which do not seem to contribute to the generalization accuracy. Such Pruning leads to improved performance [7]. Neural Networks It is a mathematical or computational model based on biological neural networks. One can think of it to be an emulation of biological neural mechanism. A typical network consists of a set of input nodes that are connected to a set of output nodes through a set of hidden nodes [2].It consists of a system of interconnected artificial neurons and evaluates information by employing a connectionist approach to computation. Often, it is an adaptive system that transforms its structure based on external or internal knowledge that progresses through the network during the learning phase [1]. In the feed forward neural network, the information flows in only the forward direction, from the input nodes, through the hidden nodes (if any) and to the output nodes [4]. There are also no cycles or loops in the network. On the other hand recurrent neural networks (RNs) are models with bi-directional data flow. While a feed forward network circulates data linearly from input to output, RNs also propagate data from later processing stages to earlier stages [4]. Further, a neural network can be configured in a manner that application of a set of inputs produces the desired set of outputs. A popular way is to 'train' the neural network by providing it with teaching patterns and allowing it to change its weights in accordance to some learning rule. We may categorize the learning situations as supervised learning where the network is trained by providing it with input and matching output patterns or as unsupervised learning in where an (output) unit is schooled to acknowledge clusters of pattern within the input. In this paradigm the system is expected to find statistically salient features of the input population [4]. Support Vector Machine A support vector machine (SVM) is an algorithm which uses a nonlinear mapping to transform the original training data into a higher dimension. In this new dimension, it looks for the linear optimal separating hyper plane. A hyper plane is a decision boundary that separates the tuples of one class from another. With an appropriate nonlinear mapping to a sufficiently high dimension, data from two classes can always be separated by a hyper plane. The SVM finds this hyper plane using support vectors ( essential training tuples) and margins (defined by the support vectors) [5]. The essential idea behind support vector machine is illustrated with the example shown in Figure 1. Here the data is assumed to be linearly separable. Thus, there exists a linear hyper plane which separates the points into two different classes. In case of a two-dimensional model, the hyper plane is a simple straight line. Figure 1 illustrates two such hyper planes, B1 and B2. Both of them can divide the training examples into their respective classes without committing any misclassification errors [5].Even though the training time of even the fastest SVMs may be extremely slow, they are highly accurate, particularly due to their ability to model complex nonlinear decision boundaries. They are much less prone to over fitting than other methods [5]. Regression Figure 1 an example of a two class problem with two separating hyper planes. Linear regression (LR) is mainly used to model continuous valued functions. It is widely used, owing largely to its simple to use structure. Generalized linear models represent the theoretical foundation based on which LR can be enforced upon the modeling of categorical response variables [5]. Common types of generalized linear models include logistic regression (LogR) and Poisson regression. Logistic Regression models the probability of an event occurring as a linear function of a set of predictor variables. Count data commonly exhibit a Poisson distribution and are usually modeled using Poisson regression [5]. Business Application Business application can be broken down into several business problems. These business problems can be further divided into problem processing steps and problem characteristics which are derived from problem descriptions. The two application studied in this review are cross-selling and segmentation analysis. 16
3 Cross-Selling Cross-selling applications primarily consist of financial product cross-selling and retail member customer cross-selling. There are threefold advantages of cross-selling strategy. First, targeting customers with those products that they are highly likely to buy should increase sales and in turn increase profits. Second, reducing the volume of people targeted via more selective targeting should reduce costs. Finally, it is a well-known fact in the financial sector that loyal customers normally have more than two products on average; hence, persuading customers to buy more than one product should increase customer loyalty [6]. In order to achieve cross-selling effects, knowing which person would be interested in what product is the key. The overall goal is to discover characteristics of current customers that can then be used to mark all other customer segments in order to classify them into potential promotion targets and unlikely purchasers [6]. Segmentation Analysis Segmentation is essentially classifying customers into groups with identical characteristics like demographic, geographic, or behavioral traits, and marketing to them as a group. Facing the market with differing demands, applying market segmentation strategy can boost the expected returns [6]. A major chunk of marketing research is concentrated on examining how variables such as demographics and socioeconomic status can be employed to predict differences in consumption and brand loyalty. Segmentation problem should be taken as two different situations, known character parameters and unknown character parameters [6]. Character parameters are known means segmentation analysis deals with customers who have transactional or behavioral records stored in the enterprise database and the analytic parameters are predefined and are derived from analyzer interests [6]. Business Decision and Application Analysis In this we look at how one may decompose each application into four parts in order to form a standard description. Those four concepts are as follows. 1. Business application activity (e.g. cross-selling). 2. steps of solving that business problem. Each step obtains certain derived knowledge which matches certain pattern from data by investigating problem characteristics (e.g. customer segmentation analysis of cross-selling). 3. characteristics i.e. information which needs to be assigned or predefined in processing steps (e.g. customer background data of cross-selling). 4. outcome required for analytical result of problem processing step (e.g. customer profile of segmentation analysis). The analysis of the problem is shown in table 1 [6]. Table 1: Business Decision and Application Analysis Steps Characteristics Outcome Cross Selling 1.Find relationships of characteristic 2.Match campaigns to potential customers Transaction (Input unit, discrete time sequence) Profile Segmentation Analysis 1.Classify s 2.Match Campaigns to potential customers (classification algorithms) 17
4 Table 2: Comparative Study of Classification Algorithms Basic function Types of values Speed of training and convergence Ease of modelling Issue of over- Fitting Specific issue Naïve Bayes Decision Tree Neural Networks Support Vector Machine Naïve Bayes is a statistical classifier and predicts class membership probabilities. It is a continuous classifier. It is fast and less training data needed. It is hard to debug or understand, and difficult to test. It is computationally expensive for datasets with high dimensional attributes. Features are assumed to be independent, normalization needed. Decision tree learning is a heuristic, one-step lookahead (hill climbing), nonbacktracking search. It predicts categorical and continuous values. It is fast. This is because a decision tree inherently "throws away" the input features that it doesn't find useful. It is easy to understand, used for modelling and visual representations. There exists an issue of overfitting. There is a loss of outliers by pruning- have to tune it by adding weights A neural network contains hidden interconnections between input and output nodes forming a large network. Neural networks are data-driven selfadaptive methodsthey can adjust themselves to the data without any explicit specification It is slow. A neural network uses all the input nodes if no selection is performed. It is difficult to explain, complicated visual representation No general methods to determine the optimal number of neurons needed to solve exist. Training is time consuming and requires several passes through the network. It is an algorithm that uses nonlinear mapping to transform data into a higher dimension. With a suitable choice of parameters an SVM can separate any consistent data set. It is slow. It takes a lot of time to train the data. There is good accuracy and power of flexibility. It incorporates capacity control to prevent overfitting It is only directly applicable for two-class tasks. Logistic Regression It models the probability of an event occurring as a linear function of a set of predictor variables. It is used to model continuous valued functions. - It can update new data into model easily; best when want to change classification thresholds. - It can be insensitive to minute data Proposed Explication: Since cross selling has more emphasis on visual representation so that other departments like marketing can draw coherent conclusions from the results, the decision tree classification algorithms should be used. However, C4.5 must also be used to prune it to avoid overfitting and weights must be added so that outliers are not lost. Segmentation Analysis also matches campaigns to potential customers; however, the need for understandable modelling is not as important. Therefore, the most efficient algorithm to be used would be neural networks since all kinds of data can be used and required associations can be formed over the network. CONCLUSION Business applications require careful analysis to efficiently decide the most suitable data mining algorithm to use according to their characteristics. Classification of customers and products is of prime importance. Therefore, the data mining algorithms must be carefully analyzed and used according to the problem specifics. 18
5 REFERENCES [1] Guoqiang Peter Zhang Neural Networks for Classification: A Survey. IEEE Transactions on System, Man, and Cybernetics-Part C: Applications and Reviews, Vol 30, No 4. [2] Indranil Bose, Radha K. Mahapatra Business data mining-a machine learning perspective. Information & Management. Elsevier. [3] I. Rish. An empirical study of the naïve Bayes classifier T. J. Watson Research Center. [4] Dr. Yashpal Singh, Alok Singh Chauhan Neural Networks in Mining. Journal of Theoretical and Applied Information Technology. [5] Reza Entezari-Maleki, Arash Rezaei and Behrouz Minaei-Bidgoli Comparison of Classification Methods Based on the Type of Attributes and Sample Size. Iran University of Science & Technology, Tehran, Iran. [6] Jia-Lang Sang, T.C.Chen, An Analytic approach to select data mining for business decision. Expert System with Applications. [7] Carlos J. Mantas, Joaquin Abellan Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data. Expert System with Applications [8]. 19
International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationChapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction
CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle
More informationAMOL MUKUND LONDHE, DR.CHELPA LINGAM
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 4, Dec 2015, 53-58 IIST COMPARATIVE ANALYSIS OF ANN WITH TRADITIONAL
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationSTUDY PAPER ON CLASSIFICATION TECHIQUE IN DATA MINING
Journal of Analysis and Computation (JAC) (An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861 International Conference on Emerging Trends in IOT & Machine Learning, 2018 STUDY
More informationISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationComparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio
Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio Adela Ioana Tudor, Adela Bâra, Simona Vasilica Oprea Department of Economic Informatics
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationA STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES
A STUDY OF SOME DATA MINING CLASSIFICATION TECHNIQUES Narsaiah Putta Assistant professor Department of CSE, VASAVI College of Engineering, Hyderabad, Telangana, India Abstract Abstract An Classification
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationCOMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES
COMPARISON OF DIFFERENT CLASSIFICATION TECHNIQUES USING DIFFERENT DATASETS V. Vaithiyanathan 1, K. Rajeswari 2, Kapil Tajane 3, Rahul Pitale 3 1 Associate Dean Research, CTS Chair Professor, SASTRA University,
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationCse634 DATA MINING TEST REVIEW. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse634 DATA MINING TEST REVIEW Professor Anita Wasilewska Computer Science Department Stony Brook University Preprocessing stage Preprocessing: includes all the operations that have to be performed before
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationA SURVEY ON DATA MINING TECHNIQUES FOR CLASSIFICATION OF IMAGES
A SURVEY ON DATA MINING TECHNIQUES FOR CLASSIFICATION OF IMAGES 1 Preeti lata sahu, 2 Ms.Aradhana Singh, 3 Mr.K.L.Sinha 1 M.Tech Scholar, 2 Assistant Professor, 3 Sr. Assistant Professor, Department of
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More informationData mining with Support Vector Machine
Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine
More informationStat 602X Exam 2 Spring 2011
Stat 60X Exam Spring 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed . Below is a small p classification training set (for classes) displayed in
More informationReview on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationInternational Journal of Advance Engineering and Research Development. A Survey on Data Mining Methods and its Applications
Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 5, Issue 01, January -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 A Survey
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationParametric Comparisons of Classification Techniques in Data Mining Applications
Parametric Comparisons of Clas Techniques in Data Mining Applications Geeta Kashyap 1, Ekta Chauhan 2 1 Student of Masters of Technology, 2 Assistant Professor, Department of Computer Science and Engineering,
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationStudy on the Application Analysis and Future Development of Data Mining Technology
Study on the Application Analysis and Future Development of Data Mining Technology Ge ZHU 1, Feng LIN 2,* 1 Department of Information Science and Technology, Heilongjiang University, Harbin 150080, China
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationEvent: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect
Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationHybrid Models Using Unsupervised Clustering for Prediction of Customer Churn
Hybrid Models Using Unsupervised Clustering for Prediction of Customer Churn Indranil Bose and Xi Chen Abstract In this paper, we use two-stage hybrid models consisting of unsupervised clustering techniques
More informationNeural Networks In Data Mining
Neural Networks In Mining Abstract-The application of neural networks in the data mining has become wider. Although neural networks may have complex structure, long training time, and uneasily understandable
More informationA study of classification algorithms using Rapidminer
Volume 119 No. 12 2018, 15977-15988 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A study of classification algorithms using Rapidminer Dr.J.Arunadevi 1, S.Ramya 2, M.Ramesh Raja
More informationClassification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska
Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute
More informationResearch on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,
More informationAPPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE
APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata
More informationThis tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.
About the Tutorial Data Mining is defined as the procedure of extracting information from huge sets of data. In other words, we can say that data mining is mining knowledge from data. The tutorial starts
More informationOverview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce
Overview Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Core Ideas in Data Mining Classification Prediction Association Rules Data Reduction Data Exploration
More informationArgha Roy* Dept. of CSE Netaji Subhash Engg. College West Bengal, India.
Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Training Artificial
More informationNeuro-fuzzy, GA-Fuzzy, Neural-Fuzzy-GA: A Data Mining Technique for Optimization
International Journal of Computer Science and Software Engineering Volume 3, Number 1 (2017), pp. 1-9 International Research Publication House http://www.irphouse.com Neuro-fuzzy, GA-Fuzzy, Neural-Fuzzy-GA:
More informationOverview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010
INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,
More informationData mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014
Data Mining Data mining processes What technological infrastructure is required? Data mining is a system of searching through large amounts of data for patterns. It is a relatively new concept which is
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.
More informationCMPT 882 Week 3 Summary
CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationClustering of Data with Mixed Attributes based on Unified Similarity Metric
Clustering of Data with Mixed Attributes based on Unified Similarity Metric M.Soundaryadevi 1, Dr.L.S.Jayashree 2 Dept of CSE, RVS College of Engineering and Technology, Coimbatore, Tamilnadu, India 1
More informationCredit card Fraud Detection using Predictive Modeling: a Review
February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationCursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network
Cursive Handwriting Recognition System Using Feature Extraction and Artificial Neural Network Utkarsh Dwivedi 1, Pranjal Rajput 2, Manish Kumar Sharma 3 1UG Scholar, Dept. of CSE, GCET, Greater Noida,
More informationA Systematic Overview of Data Mining Algorithms
A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a
More informationComparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini*
Comparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini* #Student, Department of Computer Engineering, Punjabi university Patiala, India, aikjotnarula@gmail.com
More informationCse352 Artifficial Intelligence Short Review for Midterm. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse352 Artifficial Intelligence Short Review for Midterm Professor Anita Wasilewska Computer Science Department Stony Brook University Midterm Midterm INCLUDES CLASSIFICATION CLASSIFOCATION by Decision
More informationGlobal Journal of Engineering Science and Research Management
A NOVEL HYBRID APPROACH FOR PREDICTION OF MISSING VALUES IN NUMERIC DATASET V.B.Kamble* 1, S.N.Deshmukh 2 * 1 Department of Computer Science and Engineering, P.E.S. College of Engineering, Aurangabad.
More informationSandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing
Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications
More informationCS229 Lecture notes. Raphael John Lamarre Townshend
CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based
More informationCorrelation Based Feature Selection with Irrelevant Feature Removal
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More information劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012
劉介宇 國立台北護理健康大學 護理助產研究所 / 通識教育中心副教授 兼教師發展中心教師評鑑組長 Nov 19, 2012 Overview of Data Mining ( 資料採礦 ) What is Data Mining? Steps in Data Mining Overview of Data Mining techniques Points to Remember Data mining
More informationBest Customer Services among the E-Commerce Websites A Predictive Analysis
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 5 Issues 6 June 2016, Page No. 17088-17095 Best Customer Services among the E-Commerce Websites A Predictive
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationDiscovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 PP 10-15 www.iosrjen.org Discovery of Agricultural Patterns Using Parallel Hybrid Clustering Paradigm P.Arun, M.Phil, Dr.A.Senthilkumar
More informationTaking Your Application Design to the Next Level with Data Mining
Taking Your Application Design to the Next Level with Data Mining Peter Myers Mentor SolidQ Australia HDNUG 24 June, 2008 WHO WE ARE Industry experts: Growing, elite group of over 90 of the world s best
More informationINSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad
INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad - 500 043 INFORMATION TECHNOLOGY DEFINITIONS AND TERMINOLOGY Course Name : DATA WAREHOUSING AND DATA MINING Course Code : AIT006 Program
More informationTable Of Contents: xix Foreword to Second Edition
Data Mining : Concepts and Techniques Table Of Contents: Foreword xix Foreword to Second Edition xxi Preface xxiii Acknowledgments xxxi About the Authors xxxv Chapter 1 Introduction 1 (38) 1.1 Why Data
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationComparative Study of Clustering Algorithms using R
Comparative Study of Clustering Algorithms using R Debayan Das 1 and D. Peter Augustine 2 1 ( M.Sc Computer Science Student, Christ University, Bangalore, India) 2 (Associate Professor, Department of Computer
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationImplementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees
Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees Dominik Vinan February 6, 2018 Abstract Decision Trees are a well-known part of most modern Machine Learning toolboxes.
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationAn Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm
Proceedings of the National Conference on Recent Trends in Mathematical Computing NCRTMC 13 427 An Effective Performance of Feature Selection with Classification of Data Mining Using SVM Algorithm A.Veeraswamy
More informationCS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek
CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationdata-based banking customer analytics
icare: A framework for big data-based banking customer analytics Authors: N.Sun, J.G. Morris, J. Xu, X.Zhu, M. Xie Presented By: Hardik Sahi Overview 1. 2. 3. 4. 5. 6. Why Big Data? Traditional versus
More informationData Mining and Analytics
Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/
More informationLink Prediction for Social Network
Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue
More informationThe Procedure Proposal of Manufacturing Systems Management by Using of Gained Knowledge from Production Data
The Procedure Proposal of Manufacturing Systems Management by Using of Gained Knowledge from Production Data Pavol Tanuska Member IAENG, Pavel Vazan, Michal Kebisek, Milan Strbo Abstract The paper gives
More informationData Mining Techniques Methods Algorithms and Tools
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationComparison of FP tree and Apriori Algorithm
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.78-82 Comparison of FP tree and Apriori Algorithm Prashasti
More informationA REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING
A REVIEW ON VARIOUS APPROACHES OF CLUSTERING IN DATA MINING Abhinav Kathuria Email - abhinav.kathuria90@gmail.com Abstract: Data mining is the process of the extraction of the hidden pattern from the data
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationData Mining in Bioinformatics Day 1: Classification
Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls
More informationResearch on Data Mining and Statistical Analysis Xiaoyao Lu1, a
6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Research on Data Mining and Statistical Analysis Xiaoyao Lu1, a 1 School of Statistics and Mathematics
More informationOracle9i Data Mining. An Oracle White Paper December 2001
Oracle9i Data Mining An Oracle White Paper December 2001 Oracle9i Data Mining Benefits and Uses of Data Mining... 2 What Is Data Mining?... 3 Data Mining Concepts... 4 Using the Past to Predict the Future...
More informationREMOVAL OF REDUNDANT AND IRRELEVANT DATA FROM TRAINING DATASETS USING SPEEDY FEATURE SELECTION METHOD
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationDISCOVERING INFORMATIVE KNOWLEDGE FROM HETEROGENEOUS DATA SOURCES TO DEVELOP EFFECTIVE DATA MINING
DISCOVERING INFORMATIVE KNOWLEDGE FROM HETEROGENEOUS DATA SOURCES TO DEVELOP EFFECTIVE DATA MINING Ms. Pooja Bhise 1, Prof. Mrs. Vidya Bharde 2 and Prof. Manoj Patil 3 1 PG Student, 2 Professor, Department
More informationKnowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.
Knowledge Discovery and Data Mining Lecture 13 - Neural Nets Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-13-NN
More informationA Cloud Based Intrusion Detection System Using BPN Classifier
A Cloud Based Intrusion Detection System Using BPN Classifier Priyanka Alekar Department of Computer Science & Engineering SKSITS, Rajiv Gandhi Proudyogiki Vishwavidyalaya Indore, Madhya Pradesh, India
More informationIntroduction to Data Mining and Data Analytics
1/28/2016 MIST.7060 Data Analytics 1 Introduction to Data Mining and Data Analytics What Are Data Mining and Data Analytics? Data mining is the process of discovering hidden patterns in data, where Patterns
More informationInternational Journal of Software and Web Sciences (IJSWS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationCS Machine Learning
CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K
More information