CHAPTER 3 MACHINE LEARNING MODEL FOR PREDICTION OF PERFORMANCE
|
|
- Tamsyn Dawson
- 5 years ago
- Views:
Transcription
1 CHAPTER 3 MACHINE LEARNING MODEL FOR PREDICTION OF PERFORMANCE In work educational data mining has been used on qualitative data of students and analysis their performance using C4.5 decision tree algorithm. The results indicate that student s performance also influenced by qualitative data. Acquired knowledge in form of tree is easy to assimilate by users. They take an attempt to look into the higher educational domain of data mining to analyze the students performance. Decision tree induction is one of conjoint approaches for extracting knowledge from sets of feature-based examples. Using machine-learning technique can develop a tool, which can help to predict [90] performance on the basis of data. Machine learning techniques have been positively applied in various fields such as medical science, pattern recognition, image recognition, and various control applications etc. [99]. In this machine learning method studied function represented by decision tree Introduction Data mining commonly expressed as the method of determining significant patterns in large size of data. Data mining deals a great variety of techniques, methods and tools for thorough analysis of available data in various fields [53]. Data mining term uses for this purpose because as mine rocks for a valuable are same as mine valuable information in a large database. It is, however, a contradiction, since mining for gold in rocks is usually called gold mining and not rock mining, thus by analogy, data mining should have been called knowledge mining instead. However, data mining also known as knowledge discovery in databases (KDD) that describes a more complete process. Supplementary terms that referring to data mining are: data dredging, knowledge extraction and pattern discovery. In origin, data mining is not specific to one type of media or data. Data mining ought be applicable to any kind of information storehouse. However, approaches may differ when applied to different 39
2 types of data. Indeed, the challenges presented by different types of data vary significantly [72]. As higher education ability of predicting a student s performance is very important to enhance their quality. In higher education institutions an ample amount of knowledge is hidden and can be extracting. The knowledge can be any student specific information like success rate, academic performance, dropouts rate, course preference, subject specialization, placement success etc. [99]. The quality of the students in a higher education institution is classified by their academic performance. Many factors influence the students performance like financial condition, living location, parents qualification, socio economic, non-academic and academic etc. Various data mining techniques are useful for deriving hidden knowledge from these factors. The technique behind the extraction of the hidden knowledge is knowledge discovery process that extracts the knowledge from available dataset and should create a knowledge base for the benefit of the institution [101]. The factors that describe student performance can be used for predicting students performance. For prediction can use a number of well - known data mining classification algorithms such as ID3, Simple CART, J48, NB Tree, and C4.5 etc. The model is mainly focused on finding the prediction accuracy of academic performance of students using two different datasets. The experimental model also proves that the student attributes considered are highly influential in predicting the results. The performed research work focuses on the development of data mining models for predicting students performance in higher education for classification. This work is done on a small dataset with a number of attributes to analyze the performance of the students. Feature selection has been an effective field of research area in machine learning, statistics and data mining communities [99]. Various attribute selection methods do exists to identify the attributes that make great impact. For such an environment there is the scope for the research investigating the efficiency of machine learning techniques. Data mining combines tools from statistics and machine learning with database management. Data mining can be defined as the process that starting from apparently unstructured data tries to extract knowledge and/or unknown interesting patterns. During this process machine learning algorithms are used. 40
3 Machine learning Machine learning (ML) is the science which evolved to develop algorithm /model. According to Arthur Samuel machine learning gives capability to learn without explicitly programmed. Basically machine learning is a set of tools that teach to computers about the paradigm. With the help of machine learning technique a tool construct, which can robotically predict [90] suitability of particular course for a student on the basis of data. Here we like to mention that, machine-learning techniques have been positively applied in various fields such as medical science, pattern recognition, image recognition, and various control applications etc. [99]. In this machine learning method studied function represented by decision tree. From the artificial intelligence point of view learning is central to human knowledge and intelligence. It is also essential for building intelligent machines. From the software engineering point of view machine learning allows us to program computers, which can be easier than writing code from the traditional way. Beyond the typical statistics problems machine learning has been applied to a vast number of problems. Machine learning is often designed with different considerations than statistics (e.g., speed is often more important than accuracy). Machine learning methods are classified into two phases: A model is developed from a collection of training data i.e. Training and the model is used to make decisions about some new test data. Data mining may apply machine-learning techniques; it may also drive the advancement of machine learning techniques or algorithms [38]. To utilize machinelearning algorithms one has to formulate the problem in their domain to what it expects, usually a set of features. Machine learning can be categorized in three classes [81]: 1. Supervised Learning: This is basically learning for classification or concept, in this the training data is labeled with the appropriate response. Classification and regression is most common type of supervised learning. 41
4 2. Unsupervised learning: Clustering and association is the common unsupervised learning in which given a collection of unlabeled data. Work analyzes and discovers patterns for unlabeled data. 3. Reinforcement learning: In which robot or controller seeks to learn the optimal actions to take based the outcomes of past actions. From the data mining point of view machine learning is research areas of computer science that is quickly grew due to the advances in data analysis research. ML also create place in database industry that are efficient of extracting valuable knowledge from large data stores. The most recurrently deliberate problem by data mining and machine learning academics is classification. It consists of predicting the value of categorical attribute i.e. class based on the values of predicting attributes. There are different classification methods. Machine learning approaches can be categorized from the data-mining point of view into two dissimilar clutches: Symbolic approaches and statistical approach. Inductive learning of symbolic descriptions, such as rules, decision trees or logical representations [81], Statistical approaches follows the pattern-recognition methods, including k-nearest neighbor, bayesian classifiers neural network learning and support vector machines Decision-making People can use knowledge for decision-making. Classification and prediction are two common method of data analysis. Also can use these method for describing significant class and prediction. Decision tree is very popular for classification and prediction model because it does not require any domain knowledge and parameter setting [38]. The decision tree method has mostly used because of its high accuracy of classifying the data set [11]. Decision tree is used for classification. For example suppose have tuple X that is associated with class label. The attribute values of tuple are tested against decision tree algorithm. Every branch of the tree represents the class prediction for that tuple [38]. Decision tree technique use top-down approach. Root 42
5 node of a decision tree play main role from root node each node split recursively according to algorithm. Tree generated with a training set of tuples and resultant node associated with class labels. The commonly used algorithms for building a decision tree are ID3, C4.5 and CART. To implementing data mining classification technique, different tools are available like Rapid Miner, WEKA, and TANAGRA etc. These Tools also help to predict the student s academic performance for future prospects. Therefore, use an illustrative algorithm for one of the most common machine learning techniques namely Decision Trees [38]. Work uses C4.5 decision tree learning method, which is suitable for discrete valued function. In this work using C4.5 Algorithm that is a successor to ID3 [75]. The most commonly used machine learning algorithms is C4.5. It handles discrete valued to build a decision tree. C4.5 distributes the attribute values into two partitions such that all the values that are above the root are treated as one child and the rest are treated as another child. Missing attribute values are also handled by it. For attribute selection C4.5 uses gain ratio to build a decision tree. When there are many outcome values of an attribute then it removes the partialities of information gain [82]. In the partitioning process in ID3 each level use statistical property known as information gain. Using information gain can determine best attribute for training set. C4.5 is the successor of ID3 used an extension of information gain, which is gain ratio [38]. It mentioned above that the decision tree is a top-down approach, but the difficulty is select attribute to split at each node. Have to best split the target class into the purest children nodes. To measure this purity of children nodes is called the information and gain represented by the amount of information. Gain Ratio: The process of selecting a new attribute and subdividing the training examples will be repeated for each non-terminal successor node. Attributes that have been incorporated higher in the tree are excluded, so that any given attribute can appear at most once along any path through the tree [11]. For attribute selection can use gain ratio but before calculating gain ratio have to calculate split information. For an attribute A the information gain, Gain (S, A) that is relative to a collection of examples S, is defined as and values are in Table
6 n = p i log2( pi ) Info (D) (1) [38]. i= 1 And Gain(A) = Info(D) Info A(D)...(2)[38]. n St St Split Information (S, A) = log 2 S S...(3)[38]. i= 1 So gain Ratio will be Gain( A) Gain Ratio (S, A) = SplitInformation( S, A)... (4). In the above equation A is set of categorical attribute and using A, splitinfo (S, A) which is the information of S can be calculated which shows in Table 3.4. Calculated gain values are display in Table Illustrative example The data set value for work is depicted in Table 3.1. In work are taking parentq for parent qualification, location as loc, grade that will describe grade or division of student for previous passing class and suitable as decision label. Decision label has two values yes or no i.e. student is suitable for admission in computer course. In attribute list grade is calculated on the basis of percentage. IF perc>=60%, then grade=first,>=50%,second, otherwise third. Take 150 data. Example set illustrate in Figure 3.1 From equation (1) calculate information gain of suitable for discrete valued function, in which studied where 110 is yes and 27 is no. 44
7 Info(D) = - (110/138)log 2 (110/138) - (27/138)log 2 (27/138) = Table 3.1: List of attributes Attributes parentq Values {Educated,Uneducated} loc {urban,rural} grade {First,Second,Third} suitable {yes,no} Similarly calculate information gain for all attributes. For parent there is two class educated and uneducated. For educated class information gain is Info (educated) = 58/138 (-53/58log 2 53/58-5/58log 2 5/58) =.1627 For uneducated class information gain is Info (uneducated)=79/138(-57/79log 2 57/79-22/79log 2 22/79) =.4885 Info (parent) = Info (educated) + Info (uneducated) = =
8 location attribute consist two values urban and rural for both information gain is Info (urban)=64/138(- 44/64log 2 44/64-20/64log 2 20/64) =.4155 Info (rural)=73/138(- 66/73 log 2 66/73-7/73 log 2 7/73) =.2411 Info (loc)=info (urban) + Info (rural) = =.6567 Figure 3.1: Example set 46
9 grade attribute has 3 classes first, second and third. For all classes will calculate information gain as mention above Info (grade)=info (first) + Info (second) + Info (third) = 66/138(-63/66 log 2 63/66-6/66 log 2 6/66)+ 43/138(-25/43 log 2 25/43-18/43log 2 18/43)+ 28/138(-22/28 log 2 22/28-6/28 log 2 6/28) =.5853 Consolidate Information gain is depicted in Table 3.2. Table 3.2: Information Gain Info(A) Value Info(parentq) Info (location) Info (grade) From equation (2) will get gain value. Gain values of all attributes are shown in Table 3.3. Gain(parentq) = Info(D) - Info parentq (D) = =
10 Gain(location)=Info(D) - Info location (D) = =.0640 Gain(grade)=Info(D) - Info grade (D) = =.1354 Table 3.3: Gain Value Gain Value Gain(parentq) Gain(location) Gain(grade) Split information of decision attribute will be calculated from equation (4). Table 3.4 displays split information for 3 attributes. Split(suitable,parent)= -(59/138)*log 2 (59/138) - (79/138)*log 2 (79/138) = Split(suitable,location)= -(73/138)*log 2 (73/138)-(64/138)*log 2 (64/138) =
11 Split(suitable,grade)= -(66/138)*log 2 (66/138)-(43/138)*log 2 (43/138)- 28/138*log 2 (28/138) = Table 3.4: Split information of the sample Split Information Value Split(S,parentq) Split(S,location) Split(S,grade) Gain Ratio (suitable, parentq) =0.0695/ = Gain Ratio (suitable, location) =0.064/ = Gain Ratio (suitable,grade) =0.1354/ =
12 Table 3.5: Gain Ratio Gain Ratio Value GainRatio(S,parentq) GainRatio(S,location) GainRatio(S,grade) The gain ratio is shown in Table 3.5. Grade attribute has the highest gain ratio; therefore it is selected as the root node in tree. In sample data C4.5 split the data table based on the value of grade of students. Further will repeat above process to select node till reach the decision node [11] Steps involve in modeling To extract any knowledge or Mining Knowledge from data set is known as data mining. Steps for extract educational knowledge using data mining technique are as follows; and show it through Figure 3.2. Data classification problems may concentrate by data cleaning. Data cleaning provide outline of main solution approaches. Real world data collected for mining tend to be unclean. It may be noise, inconsistent and incomplete [78]. 50
13 Select Data Data refinement Data Modeling Evaluation Depolyment Figure 3.2 Steps for extract knowledge Data Refinement: Data refining process refines dissimilar data to increase the understanding of the data; it removes data inconsistency and redundancy. Data refinement process can develop an integrated data resource [38]. Data source can be multiple it may be data warehouses, federated database systems or web-based information systems, so integrated these data may need for data cleaning to increases significantly because sources often contain redundant data in different representations. Data refining process may be completed after integration of different dataset depending on the database or data warehousing implementation. Inconsistent data are the raw material but integrated data resource is the final product. 51
14 Refinement process may involve two steps data cleaning and data transformation. Data cleaning process to integrate and transform is heterogeneous data sources. Data cleaning raise the data quality to which is necessary for analysis. Data cleaning deal with incomplete, missing, non-existent value. Specifically, filtering the problematic data can introduce sample bias into the data and using data overlays could introduce missing values [69]. Data warehouses load and constantly refresh huge amounts of data from a variety of sources so there is high probability of containing unclean data [97]. Moreover, data warehouses support to decisionmaking, so that the correctness of data is vibrant to avoid wrong conclusions. For instance, duplicated or missing information will produce incorrect or misleading statistics ( garbage in, garbage out ). Due to the wide range of possible data inconsistencies and the sheer data volume, data cleaning is considered to be one of the solutions of biggest problems in data warehousing. Change the collective data needs to transform in required format. As know data may be qualitative or quantitate. Convert data from one form to other may call data transformation. Some techniques require a specific form of data. Therefore, data preparation phase is needed. Data transformation includes data preparation operations such as the convert data production of derived attributes, entire new records, or transformed values for existing attributes [78]. Uses excel for refinement process. convert students percent into grade according to traditional method. Data Modeling: Designing a model for extracting knowledge from database is called data modeling. In modeling phase, several modeling techniques are selected and applied. Purpose of data modeling is to recognize all entities that data have. It then defines a relationship between these entities. It can be conceptual, logical or physical data models. Conceptual data modeling typically identifies the highest-level relationships between different entities where as enterprise data modeling similar to conceptual data modeling, but addresses the unique requirements [53]. Logical data modeling illustrates the specific entities, attributes and relationships involved in a 52
15 business function. Serves as the basis for the creation of the physical data model. Physical data modeling represents an application and database-specific implementation of a logical data model. The first step in modeling is selecting the actual modeling technique to be used; this task refers to selecting the specific modeling technique, e.g., building decision trees or generating a neural network etc. Prior to building a model, a procedure needs to be defined to test the model s quality and validity [53]. The main goal of modeling is constancy that means when apply the model on unseen data then will show true value. Evaluation: Before final deployment of the model, it is necessary model evaluation thoroughly. Review the steps executed to construct the model is also needed, to be achieves the objectives. A key objective is to determine if there is some important issue that has not been considered sufficiently. At the end of evaluation user will achieve results purpose of the use data mining. Evaluation steps deals with factors like the accuracy and overview of the model [70]. This step measures the degree to which the model achieves objectives. Evaluation process determines factors due to which model may become deficient. For the evaluation confusion matrix is best for prediction [53]. A confusion matrix or classification matrix is used to appraise the prediction accuracy of a model. It evaluates whether a model whether the model is making mistakes in its predictions if yes then what is the percentage. Numerous classification rules are used to generating a confusion matrix [53]. Almost all performance metrics are represented in terms of the elements of the confusion matrix generated by the model on a test sample. The format of a confusion matrix for a two-class case with class yes and no is shown in table 3.6. A column represents an actual class, while a row represents the predicted class. The total number of instances in the test set is represented on the top of the table (P=total number of positive instances, and N=total number of negative instances), while the number of instances predicted to belong to each class are represented to the left of the table (p= total number of instances classified as positive; n=total number of instances classified as negative). TP (true positives) is the number of correctly classified positive examples. In a similar manner, FN (false negatives) is 53
16 the number of positive examples classified as negative, TN (true negatives) the number of correctly classified negative examples and, finally, FP (false positives) the negative examples for which the positive class was predicted. The positive class s rate represent by (TPrate). TPrate = TP/(TP+FN). The corresponding negative class is measured by the true negative rate (TNrate), and it is calculated as the number of negative examples correctly identified, out of all negative samples. TNrate=TN/(TN+FP). It is also important to evaluate also how many examples, which are identified as belonging to a given class actually belongs to assume class. This calculation is done with the help of positive and negative predicted values [53]. The positive predicted value (PPV): PPV=TP/(TP+FP), while the negative predictive value (NPV) represents the number of negatives correctly identified out of all examples classified as negative, NPV=TN/(TN+FP). The TPrate, TNrate, PPV and NPV indicate some true occurrences, which need to be maximized; sometimes their complements are more interesting. All these parameters provide a more exact view on the performance of a classification method. Measurements, and focus on those alone, or provide a composite metric which serves the given objective the best. Table 3.6: Confusion Matrix Predicted class Actual Class Yes (P) No (N) Yes True Positive (TP) False Positive (FP) No False Negative (FN) True Negative (TN) 54
17 The actual vales in a confusion matrix are often represented as percentages. Whether or not a confusion matrix is good depends on the costs of misclassification [18]. In model building confusion matrix plays an important role. Calculate it in further section. Steps that following are 1. For Data collection select Educational Environment (for this task select an educational organization i.e. SSSSMV, Bhilai, C.G.) 2. For mining select relevant data (using admitted students data) 3. Remove inconsistent or remove noisy data and apply treatment about incomplete and erroneous data. 4. Apply data transformation into modified data (after it data transforming into a new format). 5. Apply data mining and extract meaningful information from training set (Are applying decision tree technique). 6. Evaluate extracted information/result Tool use in experiment: WEKA Waikato environment tool use for knowledge learning is known as WEKA [36]. WEKA developed at the university of Waikato in New Zealand, it is a computer program. WEKA implemented using JAVA that s why it is simple portable & platform independent. The main purpose to develop WEKA was identifying information from data set. For data analysis and predictive modeling workplace of WEKA is the collection of visualization tools and algorithms. WEKA provides a good graphical user interfaces [61]. Collected qualitative data for experiment and 10-fold cross validation applied. Have chosen WEKA [47]. WEKA tool supported.csv Format of data so has been entered and saved in excel.csv format. WEKA can perform several standard data 55
18 mining tasks like clustering, classification, data preprocessing, association, visualization, and feature selection. The WEKA s graphical environment provides explorer, experimenter, knowledge flow, and simple CLI applications.c4.5 algorithm yields acceptable level of accuracy through WEKA. The decision tree generated by WEKA tool is depicted in following Figure Output This chapter is focused on how can enhance quality of education in higher education. As this work mentioned if recognize potential of students then improvement is possible. For the work student s qualitative data has been patronized using decision tree have been visualized in Figure 3.3. Student s grades as root node with the branches of first, second and third. Root node selected on the basis of higher gain ration. It shows by decision tree that course is suitable for all the students with first grade. For second and third grade repeat the same process and for second grade higher information gain is for location so if students get second grade then examine on the basis of location. In tree Parent s qualification and living location taken as branch node and so on. Apply 10fold cross validation, it is a way of reducing the variance of data set. With cross-validation, divide it just once, but in 10 folds divide into, 10 pieces. For training uses 9 of the pieces, and the last piece use for testing. Perform the whole thing for 10 times and every time use different segment for testing. That would be 10-fold cross-validation [26]. Table 3.7 shows contingency table or confusion matrix. The number of correctly classified instances is the sum of diagonals in the matrix (19+91=110); remaining all are incorrectly classified instances [62]. It also shows accuracy information of model. 56
19 Table 3.7: Accuracy Information of C4.5 Correctly Classified Instances % Incorrectly Classified Instances % Table 3.8 displays confusion matrix about it mentioned 3.3. The True Positive (TP) rate is the proportion of class, in the confusion matrix, this is the diagonal element divided by the sum over the relevant row, i.e. 19 / (19+91) = for class yes and 28 /(0+28) = 1.0 for class no in our example True positive rate shows in Table 3.9. The false positive (FP) rate is the proportion of class, but belong to a different class, In the matrix, this is the column sum of class minus the diagonal element, divided by the rows sums of all other classes; i.e. 0/110 =0.0 for class yes and 63/110 = for class no, Which shows in Table 3.9 [49]. 57
20 Table 3.8: Confusion Matrix Suitable C4.5 Yes No Yes Class No 0 28 Table 3.9: Class Accuracy Class label TP Rate FP Rate Yes NO
21 Figure 3.3:Decision tree! 59
22 3.6. Conclusion Predicting students academic performance is a great concern to the higher education system. Data mining can be used in a higher educational system to predict the students academic performance. This work conducts a study to predict student s performance for a particular course like BCA, MCA or any computer course. This is done with student s qualitative data to show the influence in student s performance using machine learning technique decision tree. This concludes that student s performance is affected by qualitative factors. Machine learning has come extreme from its promising stages, and can prove to be a powerful tool in academia. In the future, applications similar to the one developed, as well as any improvements thereof may become an integrated part of every academic institution. The success of any educational organization is mainly dependent on the results it produces in terms of student success rate [66]. This work successfully derived a prediction mechanism for the success of student s course wise, social status and grade wise. The method has been proved to be effective from correctly predicted result is 94% approximately. However, the method helps the college managements to improve their teaching learning process and academic activities midway through the course in order to improve their performance. In future can improve this technique by adding some more qualitative data like hobbies, financial help, caste, attendance, and sports ability. This technique also can be used in any educational organization, institution to predict performance of students and they can improve their result and also reduce dropout rate of students. This work has been showing the accuracy of model through classification matrix or confusion matrix. Accuracy table sows model accuracy is greater then 50% that means model can predict true value on satisfactory rate. 60
International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationChapter 5: Summary and Conclusion CHAPTER 5 SUMMARY AND CONCLUSION. Chapter 1: Introduction
CHAPTER 5 SUMMARY AND CONCLUSION Chapter 1: Introduction Data mining is used to extract the hidden, potential, useful and valuable information from very large amount of data. Data mining tools can handle
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More informationExtra readings beyond the lecture slides are important:
1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their
More informationA Comparative Study of Selected Classification Algorithms of Data Mining
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.220
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationBest First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis
Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction
More informationISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationData Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3
Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting
More informationClassification and Regression
Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationINTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More informationK- Nearest Neighbors(KNN) And Predictive Accuracy
Contact: mailto: Ammar@cu.edu.eg Drammarcu@gmail.com K- Nearest Neighbors(KNN) And Predictive Accuracy Dr. Ammar Mohammed Associate Professor of Computer Science ISSR, Cairo University PhD of CS ( Uni.
More informationCOMP 465: Data Mining Classification Basics
Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationClassification. Instructor: Wei Ding
Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationPart I. Instructor: Wei Ding
Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set
More informationLecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy
Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy Machine Learning Dr.Ammar Mohammed Nearest Neighbors Set of Stored Cases Atr1... AtrN Class A Store the training samples Use training samples
More informationInternational Journal of Software and Web Sciences (IJSWS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationQuestion Bank. 4) It is the source of information later delivered to data marts.
Question Bank Year: 2016-2017 Subject Dept: CS Semester: First Subject Name: Data Mining. Q1) What is data warehouse? ANS. A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
More informationStudy on Classifiers using Genetic Algorithm and Class based Rules Generation
2012 International Conference on Software and Computer Applications (ICSCA 2012) IPCSIT vol. 41 (2012) (2012) IACSIT Press, Singapore Study on Classifiers using Genetic Algorithm and Class based Rules
More informationMissing Value Imputation in Multi Attribute Data Set
Missing Value Imputation in Multi Attribute Data Set Minakshi Dr. Rajan Vohra Gimpy Department of computer science Head of Department of (CSE&I.T) Department of computer science PDMCE, Bahadurgarh, Haryana
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationEvaluating Classifiers
Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with
More informationImplementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees
Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees Dominik Vinan February 6, 2018 Abstract Decision Trees are a well-known part of most modern Machine Learning toolboxes.
More informationCS4491/CS 7265 BIG DATA ANALYTICS
CS4491/CS 7265 BIG DATA ANALYTICS EVALUATION * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Dr. Mingon Kang Computer Science, Kennesaw State University Evaluation for
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationOptimizing Completion Techniques with Data Mining
Optimizing Completion Techniques with Data Mining Robert Balch Martha Cather Tom Engler New Mexico Tech Data Storage capacity is growing at ~ 60% per year -- up from 30% per year in 2002. Stored data estimated
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation
Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Warehousing and Mining Lecture 4 by Hossen Asiful Mustafa Classification: Definition Given a collection of records (training set
More informationChapter 6 Evaluation Metrics and Evaluation
Chapter 6 Evaluation Metrics and Evaluation The area of evaluation of information retrieval and natural language processing systems is complex. It will only be touched on in this chapter. First the scientific
More informationClustering of Data with Mixed Attributes based on Unified Similarity Metric
Clustering of Data with Mixed Attributes based on Unified Similarity Metric M.Soundaryadevi 1, Dr.L.S.Jayashree 2 Dept of CSE, RVS College of Engineering and Technology, Coimbatore, Tamilnadu, India 1
More information.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional
More informationData Mining: STATISTICA
Outline Data Mining: STATISTICA Prepare the data Classification and regression (C & R, ANN) Clustering Association rules Graphic user interface Prepare the Data Statistica can read from Excel,.txt and
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationNETWORK FAULT DETECTION - A CASE FOR DATA MINING
NETWORK FAULT DETECTION - A CASE FOR DATA MINING Poonam Chaudhary & Vikram Singh Department of Computer Science Ch. Devi Lal University, Sirsa ABSTRACT: Parts of the general network fault management problem,
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationIJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):
IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online): 2321-0613 A Study on Handling Missing Values and Noisy Data using WEKA Tool R. Vinodhini 1 A. Rajalakshmi
More informationCreating a Classifier for a Focused Web Crawler
Creating a Classifier for a Focused Web Crawler Nathan Moeller December 16, 2015 1 Abstract With the increasing size of the web, it can be hard to find high quality content with traditional search engines.
More informationData Mining and Analytics
Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/
More informationWhat is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry
Data Mining Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335 5934
More information2. On classification and related tasks
2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.
More informationCharacter Recognition
Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches
More informationDATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS
DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute
More informationA Novel Approach to Compute Confusion Matrix for Classification of n-class Attributes with Feature Selection
A Novel Approach to Compute Confusion Matrix for Classification of n-class Attributes with Feature Selection V. Mohan Patro 1 and Manas Ranjan Patra 2 Department of Computer Science, Berhampur University,
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationSeminars of Software and Services for the Information Society
DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Master of Science in Engineering in Computer Science (MSE-CS) Seminars in Software and Services for the Information Society
More informationRecord Linkage using Probabilistic Methods and Data Mining Techniques
Doi:10.5901/mjss.2017.v8n3p203 Abstract Record Linkage using Probabilistic Methods and Data Mining Techniques Ogerta Elezaj Faculty of Economy, University of Tirana Gloria Tuxhari Faculty of Economy, University
More informationTopics in Machine Learning
Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur
More informationBasic Data Mining Technique
Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm
More informationANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, Comparative Study of Classification Algorithms Using Data Mining
ANALYSIS COMPUTER SCIENCE Discovery Science, Volume 9, Number 20, April 3, 2014 ISSN 2278 5485 EISSN 2278 5477 discovery Science Comparative Study of Classification Algorithms Using Data Mining Akhila
More informationTanagra: An Evaluation
Tanagra: An Evaluation Jessica Enright Jonathan Klippenstein November 5th, 2004 1 Introduction to Tanagra Tanagra was written as an aid to education and research on data mining by Ricco Rakotomalala [1].
More informationData Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005
Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate
More informationData Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy
Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationData Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing
More informationOutline. Prepare the data Classification and regression Clustering Association rules Graphic user interface
Data Mining: i STATISTICA Outline Prepare the data Classification and regression Clustering Association rules Graphic user interface 1 Prepare the Data Statistica can read from Excel,.txt and many other
More informationCHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES
CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationImpact of Encryption Techniques on Classification Algorithm for Privacy Preservation of Data
Impact of Encryption Techniques on Classification Algorithm for Privacy Preservation of Data Jharna Chopra 1, Sampada Satav 2 M.E. Scholar, CTA, SSGI, Bhilai, Chhattisgarh, India 1 Asst.Prof, CSE, SSGI,
More informationA Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression
Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study
More informationCLASSIFICATION OF C4.5 AND CART ALGORITHMS USING DECISION TREE METHOD
CLASSIFICATION OF C4.5 AND CART ALGORITHMS USING DECISION TREE METHOD Khin Lay Myint 1, Aye Aye Cho 2, Aye Mon Win 3 1 Lecturer, Faculty of Information Science, University of Computer Studies, Hinthada,
More informationCS229 Lecture notes. Raphael John Lamarre Townshend
CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based
More informationApplication of Support Vector Machine Algorithm in Spam Filtering
Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification
More informationDATA MINING LECTURE 9. Classification Decision Trees Evaluation
DATA MINING LECTURE 9 Classification Decision Trees Evaluation 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium
More informationWhat is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry
Data Mining Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University it of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Tel. 319-335
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More information6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION
6 NEURAL NETWORK BASED PATH PLANNING ALGORITHM 61 INTRODUCTION In previous chapters path planning algorithms such as trigonometry based path planning algorithm and direction based path planning algorithm
More informationData Mining and Data Warehousing Classification-Lazy Learners
Motivation Data Mining and Data Warehousing Classification-Lazy Learners Lazy Learners are the most intuitive type of learners and are used in many practical scenarios. The reason of their popularity is
More informationBuilding Intelligent Learning Database Systems
Building Intelligent Learning Database Systems 1. Intelligent Learning Database Systems: A Definition (Wu 1995, Wu 2000) 2. Induction: Mining Knowledge from Data Decision tree construction (ID3 and C4.5)
More informationAMOL MUKUND LONDHE, DR.CHELPA LINGAM
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 4, Dec 2015, 53-58 IIST COMPARATIVE ANALYSIS OF ANN WITH TRADITIONAL
More informationMetrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationIntroduction to Data Mining
Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data
More informationData Set. What is Data Mining? Data Mining (Big Data Analytics) Illustrative Applications. What is Knowledge Discovery?
Data Mining (Big Data Analytics) Andrew Kusiak Intelligent Systems Laboratory 2139 Seamans Center The University of Iowa Iowa City, IA 52242-1527 andrew-kusiak@uiowa.edu http://user.engineering.uiowa.edu/~ankusiak/
More informationModel s Performance Measures
Model s Performance Measures Evaluating the performance of a classifier Section 4.5 of course book. Taking into account misclassification costs Class imbalance problem Section 5.7 of course book. TNM033:
More informationData Collection, Preprocessing and Implementation
Chapter 6 Data Collection, Preprocessing and Implementation 6.1 Introduction Data collection is the loosely controlled method of gathering the data. Such data are mostly out of range, impossible data combinations,
More informationWEKA: Practical Machine Learning Tools and Techniques in Java. Seminar A.I. Tools WS 2006/07 Rossen Dimov
WEKA: Practical Machine Learning Tools and Techniques in Java Seminar A.I. Tools WS 2006/07 Rossen Dimov Overview Basic introduction to Machine Learning Weka Tool Conclusion Document classification Demo
More informationData Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA.
Data Mining Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011 Important Note! This presentation was obtained from Dr. Vijay Raghavan
More informationChapter 4: Text Clustering
4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More information