CHAPTER 3 MACHINE LEARNING MODEL FOR PREDICTION OF PERFORMANCE

Size: px

Start display at page:

Download "CHAPTER 3 MACHINE LEARNING MODEL FOR PREDICTION OF PERFORMANCE"

Tamsyn Dawson
5 years ago
Views:

1 CHAPTER 3 MACHINE LEARNING MODEL FOR PREDICTION OF PERFORMANCE In work educational data mining has been used on qualitative data of students and analysis their performance using C4.5 decision tree algorithm. The results indicate that student s performance also influenced by qualitative data. Acquired knowledge in form of tree is easy to assimilate by users. They take an attempt to look into the higher educational domain of data mining to analyze the students performance. Decision tree induction is one of conjoint approaches for extracting knowledge from sets of feature-based examples. Using machine-learning technique can develop a tool, which can help to predict [90] performance on the basis of data. Machine learning techniques have been positively applied in various fields such as medical science, pattern recognition, image recognition, and various control applications etc. [99]. In this machine learning method studied function represented by decision tree Introduction Data mining commonly expressed as the method of determining significant patterns in large size of data. Data mining deals a great variety of techniques, methods and tools for thorough analysis of available data in various fields [53]. Data mining term uses for this purpose because as mine rocks for a valuable are same as mine valuable information in a large database. It is, however, a contradiction, since mining for gold in rocks is usually called gold mining and not rock mining, thus by analogy, data mining should have been called knowledge mining instead. However, data mining also known as knowledge discovery in databases (KDD) that describes a more complete process. Supplementary terms that referring to data mining are: data dredging, knowledge extraction and pattern discovery. In origin, data mining is not specific to one type of media or data. Data mining ought be applicable to any kind of information storehouse. However, approaches may differ when applied to different 39

2 types of data. Indeed, the challenges presented by different types of data vary significantly [72]. As higher education ability of predicting a student s performance is very important to enhance their quality. In higher education institutions an ample amount of knowledge is hidden and can be extracting. The knowledge can be any student specific information like success rate, academic performance, dropouts rate, course preference, subject specialization, placement success etc. [99]. The quality of the students in a higher education institution is classified by their academic performance. Many factors influence the students performance like financial condition, living location, parents qualification, socio economic, non-academic and academic etc. Various data mining techniques are useful for deriving hidden knowledge from these factors. The technique behind the extraction of the hidden knowledge is knowledge discovery process that extracts the knowledge from available dataset and should create a knowledge base for the benefit of the institution [101]. The factors that describe student performance can be used for predicting students performance. For prediction can use a number of well - known data mining classification algorithms such as ID3, Simple CART, J48, NB Tree, and C4.5 etc. The model is mainly focused on finding the prediction accuracy of academic performance of students using two different datasets. The experimental model also proves that the student attributes considered are highly influential in predicting the results. The performed research work focuses on the development of data mining models for predicting students performance in higher education for classification. This work is done on a small dataset with a number of attributes to analyze the performance of the students. Feature selection has been an effective field of research area in machine learning, statistics and data mining communities [99]. Various attribute selection methods do exists to identify the attributes that make great impact. For such an environment there is the scope for the research investigating the efficiency of machine learning techniques. Data mining combines tools from statistics and machine learning with database management. Data mining can be defined as the process that starting from apparently unstructured data tries to extract knowledge and/or unknown interesting patterns. During this process machine learning algorithms are used. 40

3 Machine learning Machine learning (ML) is the science which evolved to develop algorithm /model. According to Arthur Samuel machine learning gives capability to learn without explicitly programmed. Basically machine learning is a set of tools that teach to computers about the paradigm. With the help of machine learning technique a tool construct, which can robotically predict [90] suitability of particular course for a student on the basis of data. Here we like to mention that, machine-learning techniques have been positively applied in various fields such as medical science, pattern recognition, image recognition, and various control applications etc. [99]. In this machine learning method studied function represented by decision tree. From the artificial intelligence point of view learning is central to human knowledge and intelligence. It is also essential for building intelligent machines. From the software engineering point of view machine learning allows us to program computers, which can be easier than writing code from the traditional way. Beyond the typical statistics problems machine learning has been applied to a vast number of problems. Machine learning is often designed with different considerations than statistics (e.g., speed is often more important than accuracy). Machine learning methods are classified into two phases: A model is developed from a collection of training data i.e. Training and the model is used to make decisions about some new test data. Data mining may apply machine-learning techniques; it may also drive the advancement of machine learning techniques or algorithms [38]. To utilize machinelearning algorithms one has to formulate the problem in their domain to what it expects, usually a set of features. Machine learning can be categorized in three classes [81]: 1. Supervised Learning: This is basically learning for classification or concept, in this the training data is labeled with the appropriate response. Classification and regression is most common type of supervised learning. 41

4 2. Unsupervised learning: Clustering and association is the common unsupervised learning in which given a collection of unlabeled data. Work analyzes and discovers patterns for unlabeled data. 3. Reinforcement learning: In which robot or controller seeks to learn the optimal actions to take based the outcomes of past actions. From the data mining point of view machine learning is research areas of computer science that is quickly grew due to the advances in data analysis research. ML also create place in database industry that are efficient of extracting valuable knowledge from large data stores. The most recurrently deliberate problem by data mining and machine learning academics is classification. It consists of predicting the value of categorical attribute i.e. class based on the values of predicting attributes. There are different classification methods. Machine learning approaches can be categorized from the data-mining point of view into two dissimilar clutches: Symbolic approaches and statistical approach. Inductive learning of symbolic descriptions, such as rules, decision trees or logical representations [81], Statistical approaches follows the pattern-recognition methods, including k-nearest neighbor, bayesian classifiers neural network learning and support vector machines Decision-making People can use knowledge for decision-making. Classification and prediction are two common method of data analysis. Also can use these method for describing significant class and prediction. Decision tree is very popular for classification and prediction model because it does not require any domain knowledge and parameter setting [38]. The decision tree method has mostly used because of its high accuracy of classifying the data set [11]. Decision tree is used for classification. For example suppose have tuple X that is associated with class label. The attribute values of tuple are tested against decision tree algorithm. Every branch of the tree represents the class prediction for that tuple [38]. Decision tree technique use top-down approach. Root 42

5 node of a decision tree play main role from root node each node split recursively according to algorithm. Tree generated with a training set of tuples and resultant node associated with class labels. The commonly used algorithms for building a decision tree are ID3, C4.5 and CART. To implementing data mining classification technique, different tools are available like Rapid Miner, WEKA, and TANAGRA etc. These Tools also help to predict the student s academic performance for future prospects. Therefore, use an illustrative algorithm for one of the most common machine learning techniques namely Decision Trees [38]. Work uses C4.5 decision tree learning method, which is suitable for discrete valued function. In this work using C4.5 Algorithm that is a successor to ID3 [75]. The most commonly used machine learning algorithms is C4.5. It handles discrete valued to build a decision tree. C4.5 distributes the attribute values into two partitions such that all the values that are above the root are treated as one child and the rest are treated as another child. Missing attribute values are also handled by it. For attribute selection C4.5 uses gain ratio to build a decision tree. When there are many outcome values of an attribute then it removes the partialities of information gain [82]. In the partitioning process in ID3 each level use statistical property known as information gain. Using information gain can determine best attribute for training set. C4.5 is the successor of ID3 used an extension of information gain, which is gain ratio [38]. It mentioned above that the decision tree is a top-down approach, but the difficulty is select attribute to split at each node. Have to best split the target class into the purest children nodes. To measure this purity of children nodes is called the information and gain represented by the amount of information. Gain Ratio: The process of selecting a new attribute and subdividing the training examples will be repeated for each non-terminal successor node. Attributes that have been incorporated higher in the tree are excluded, so that any given attribute can appear at most once along any path through the tree [11]. For attribute selection can use gain ratio but before calculating gain ratio have to calculate split information. For an attribute A the information gain, Gain (S, A) that is relative to a collection of examples S, is defined as and values are in Table

6 n = p i log2( pi ) Info (D) (1) [38]. i= 1 And Gain(A) = Info(D) Info A(D)...(2)[38]. n St St Split Information (S, A) = log 2 S S...(3)[38]. i= 1 So gain Ratio will be Gain( A) Gain Ratio (S, A) = SplitInformation( S, A)... (4). In the above equation A is set of categorical attribute and using A, splitinfo (S, A) which is the information of S can be calculated which shows in Table 3.4. Calculated gain values are display in Table Illustrative example The data set value for work is depicted in Table 3.1. In work are taking parentq for parent qualification, location as loc, grade that will describe grade or division of student for previous passing class and suitable as decision label. Decision label has two values yes or no i.e. student is suitable for admission in computer course. In attribute list grade is calculated on the basis of percentage. IF perc>=60%, then grade=first,>=50%,second, otherwise third. Take 150 data. Example set illustrate in Figure 3.1 From equation (1) calculate information gain of suitable for discrete valued function, in which studied where 110 is yes and 27 is no. 44

7 Info(D) = - (110/138)log 2 (110/138) - (27/138)log 2 (27/138) = Table 3.1: List of attributes Attributes parentq Values {Educated,Uneducated} loc {urban,rural} grade {First,Second,Third} suitable {yes,no} Similarly calculate information gain for all attributes. For parent there is two class educated and uneducated. For educated class information gain is Info (educated) = 58/138 (-53/58log 2 53/58-5/58log 2 5/58) =.1627 For uneducated class information gain is Info (uneducated)=79/138(-57/79log 2 57/79-22/79log 2 22/79) =.4885 Info (parent) = Info (educated) + Info (uneducated) = =

8 location attribute consist two values urban and rural for both information gain is Info (urban)=64/138(- 44/64log 2 44/64-20/64log 2 20/64) =.4155 Info (rural)=73/138(- 66/73 log 2 66/73-7/73 log 2 7/73) =.2411 Info (loc)=info (urban) + Info (rural) = =.6567 Figure 3.1: Example set 46

9 grade attribute has 3 classes first, second and third. For all classes will calculate information gain as mention above Info (grade)=info (first) + Info (second) + Info (third) = 66/138(-63/66 log 2 63/66-6/66 log 2 6/66)+ 43/138(-25/43 log 2 25/43-18/43log 2 18/43)+ 28/138(-22/28 log 2 22/28-6/28 log 2 6/28) =.5853 Consolidate Information gain is depicted in Table 3.2. Table 3.2: Information Gain Info(A) Value Info(parentq) Info (location) Info (grade) From equation (2) will get gain value. Gain values of all attributes are shown in Table 3.3. Gain(parentq) = Info(D) - Info parentq (D) = =

10 Gain(location)=Info(D) - Info location (D) = =.0640 Gain(grade)=Info(D) - Info grade (D) = =.1354 Table 3.3: Gain Value Gain Value Gain(parentq) Gain(location) Gain(grade) Split information of decision attribute will be calculated from equation (4). Table 3.4 displays split information for 3 attributes. Split(suitable,parent)= -(59/138)*log 2 (59/138) - (79/138)*log 2 (79/138) = Split(suitable,location)= -(73/138)*log 2 (73/138)-(64/138)*log 2 (64/138) =

11 Split(suitable,grade)= -(66/138)*log 2 (66/138)-(43/138)*log 2 (43/138)- 28/138*log 2 (28/138) = Table 3.4: Split information of the sample Split Information Value Split(S,parentq) Split(S,location) Split(S,grade) Gain Ratio (suitable, parentq) =0.0695/ = Gain Ratio (suitable, location) =0.064/ = Gain Ratio (suitable,grade) =0.1354/ =

12 Table 3.5: Gain Ratio Gain Ratio Value GainRatio(S,parentq) GainRatio(S,location) GainRatio(S,grade) The gain ratio is shown in Table 3.5. Grade attribute has the highest gain ratio; therefore it is selected as the root node in tree. In sample data C4.5 split the data table based on the value of grade of students. Further will repeat above process to select node till reach the decision node [11] Steps involve in modeling To extract any knowledge or Mining Knowledge from data set is known as data mining. Steps for extract educational knowledge using data mining technique are as follows; and show it through Figure 3.2. Data classification problems may concentrate by data cleaning. Data cleaning provide outline of main solution approaches. Real world data collected for mining tend to be unclean. It may be noise, inconsistent and incomplete [78]. 50

Select Data Data refinement Data Modeling Evaluation Depolyment Figure 3.

it removes data inconsistency and redundancy. Data refinement process can develop an integrated data resource [38].

data may need for data cleaning to increases significantly because sources often contain redundant data in different representations.

13 Select Data Data refinement Data Modeling Evaluation Depolyment Figure 3.2 Steps for extract knowledge Data Refinement: Data refining process refines dissimilar data to increase the understanding of the data; it removes data inconsistency and redundancy. Data refinement process can develop an integrated data resource [38]. Data source can be multiple it may be data warehouses, federated database systems or web-based information systems, so integrated these data may need for data cleaning to increases significantly because sources often contain redundant data in different representations. Data refining process may be completed after integration of different dataset depending on the database or data warehousing implementation. Inconsistent data are the raw material but integrated data resource is the final product. 51

14 Refinement process may involve two steps data cleaning and data transformation. Data cleaning process to integrate and transform is heterogeneous data sources. Data cleaning raise the data quality to which is necessary for analysis. Data cleaning deal with incomplete, missing, non-existent value. Specifically, filtering the problematic data can introduce sample bias into the data and using data overlays could introduce missing values [69]. Data warehouses load and constantly refresh huge amounts of data from a variety of sources so there is high probability of containing unclean data [97]. Moreover, data warehouses support to decisionmaking, so that the correctness of data is vibrant to avoid wrong conclusions. For instance, duplicated or missing information will produce incorrect or misleading statistics ( garbage in, garbage out ). Due to the wide range of possible data inconsistencies and the sheer data volume, data cleaning is considered to be one of the solutions of biggest problems in data warehousing. Change the collective data needs to transform in required format. As know data may be qualitative or quantitate. Convert data from one form to other may call data transformation. Some techniques require a specific form of data. Therefore, data preparation phase is needed. Data transformation includes data preparation operations such as the convert data production of derived attributes, entire new records, or transformed values for existing attributes [78]. Uses excel for refinement process. convert students percent into grade according to traditional method. Data Modeling: Designing a model for extracting knowledge from database is called data modeling. In modeling phase, several modeling techniques are selected and applied. Purpose of data modeling is to recognize all entities that data have. It then defines a relationship between these entities. It can be conceptual, logical or physical data models. Conceptual data modeling typically identifies the highest-level relationships between different entities where as enterprise data modeling similar to conceptual data modeling, but addresses the unique requirements [53]. Logical data modeling illustrates the specific entities, attributes and relationships involved in a 52

15 business function. Serves as the basis for the creation of the physical data model. Physical data modeling represents an application and database-specific implementation of a logical data model. The first step in modeling is selecting the actual modeling technique to be used; this task refers to selecting the specific modeling technique, e.g., building decision trees or generating a neural network etc. Prior to building a model, a procedure needs to be defined to test the model s quality and validity [53]. The main goal of modeling is constancy that means when apply the model on unseen data then will show true value. Evaluation: Before final deployment of the model, it is necessary model evaluation thoroughly. Review the steps executed to construct the model is also needed, to be achieves the objectives. A key objective is to determine if there is some important issue that has not been considered sufficiently. At the end of evaluation user will achieve results purpose of the use data mining. Evaluation steps deals with factors like the accuracy and overview of the model [70]. This step measures the degree to which the model achieves objectives. Evaluation process determines factors due to which model may become deficient. For the evaluation confusion matrix is best for prediction [53]. A confusion matrix or classification matrix is used to appraise the prediction accuracy of a model. It evaluates whether a model whether the model is making mistakes in its predictions if yes then what is the percentage. Numerous classification rules are used to generating a confusion matrix [53]. Almost all performance metrics are represented in terms of the elements of the confusion matrix generated by the model on a test sample. The format of a confusion matrix for a two-class case with class yes and no is shown in table 3.6. A column represents an actual class, while a row represents the predicted class. The total number of instances in the test set is represented on the top of the table (P=total number of positive instances, and N=total number of negative instances), while the number of instances predicted to belong to each class are represented to the left of the table (p= total number of instances classified as positive; n=total number of instances classified as negative). TP (true positives) is the number of correctly classified positive examples. In a similar manner, FN (false negatives) is 53

16 the number of positive examples classified as negative, TN (true negatives) the number of correctly classified negative examples and, finally, FP (false positives) the negative examples for which the positive class was predicted. The positive class s rate represent by (TPrate). TPrate = TP/(TP+FN). The corresponding negative class is measured by the true negative rate (TNrate), and it is calculated as the number of negative examples correctly identified, out of all negative samples. TNrate=TN/(TN+FP). It is also important to evaluate also how many examples, which are identified as belonging to a given class actually belongs to assume class. This calculation is done with the help of positive and negative predicted values [53]. The positive predicted value (PPV): PPV=TP/(TP+FP), while the negative predictive value (NPV) represents the number of negatives correctly identified out of all examples classified as negative, NPV=TN/(TN+FP). The TPrate, TNrate, PPV and NPV indicate some true occurrences, which need to be maximized; sometimes their complements are more interesting. All these parameters provide a more exact view on the performance of a classification method. Measurements, and focus on those alone, or provide a composite metric which serves the given objective the best. Table 3.6: Confusion Matrix Predicted class Actual Class Yes (P) No (N) Yes True Positive (TP) False Positive (FP) No False Negative (FN) True Negative (TN) 54

17 The actual vales in a confusion matrix are often represented as percentages. Whether or not a confusion matrix is good depends on the costs of misclassification [18]. In model building confusion matrix plays an important role. Calculate it in further section. Steps that following are 1. For Data collection select Educational Environment (for this task select an educational organization i.e. SSSSMV, Bhilai, C.G.) 2. For mining select relevant data (using admitted students data) 3. Remove inconsistent or remove noisy data and apply treatment about incomplete and erroneous data. 4. Apply data transformation into modified data (after it data transforming into a new format). 5. Apply data mining and extract meaningful information from training set (Are applying decision tree technique). 6. Evaluate extracted information/result Tool use in experiment: WEKA Waikato environment tool use for knowledge learning is known as WEKA [36]. WEKA developed at the university of Waikato in New Zealand, it is a computer program. WEKA implemented using JAVA that s why it is simple portable & platform independent. The main purpose to develop WEKA was identifying information from data set. For data analysis and predictive modeling workplace of WEKA is the collection of visualization tools and algorithms. WEKA provides a good graphical user interfaces [61]. Collected qualitative data for experiment and 10-fold cross validation applied. Have chosen WEKA [47]. WEKA tool supported.csv Format of data so has been entered and saved in excel.csv format. WEKA can perform several standard data 55

18 mining tasks like clustering, classification, data preprocessing, association, visualization, and feature selection. The WEKA s graphical environment provides explorer, experimenter, knowledge flow, and simple CLI applications.c4.5 algorithm yields acceptable level of accuracy through WEKA. The decision tree generated by WEKA tool is depicted in following Figure Output This chapter is focused on how can enhance quality of education in higher education. As this work mentioned if recognize potential of students then improvement is possible. For the work student s qualitative data has been patronized using decision tree have been visualized in Figure 3.3. Student s grades as root node with the branches of first, second and third. Root node selected on the basis of higher gain ration. It shows by decision tree that course is suitable for all the students with first grade. For second and third grade repeat the same process and for second grade higher information gain is for location so if students get second grade then examine on the basis of location. In tree Parent s qualification and living location taken as branch node and so on. Apply 10fold cross validation, it is a way of reducing the variance of data set. With cross-validation, divide it just once, but in 10 folds divide into, 10 pieces. For training uses 9 of the pieces, and the last piece use for testing. Perform the whole thing for 10 times and every time use different segment for testing. That would be 10-fold cross-validation [26]. Table 3.7 shows contingency table or confusion matrix. The number of correctly classified instances is the sum of diagonals in the matrix (19+91=110); remaining all are incorrectly classified instances [62]. It also shows accuracy information of model. 56

19 Table 3.7: Accuracy Information of C4.5 Correctly Classified Instances % Incorrectly Classified Instances % Table 3.8 displays confusion matrix about it mentioned 3.3. The True Positive (TP) rate is the proportion of class, in the confusion matrix, this is the diagonal element divided by the sum over the relevant row, i.e. 19 / (19+91) = for class yes and 28 /(0+28) = 1.0 for class no in our example True positive rate shows in Table 3.9. The false positive (FP) rate is the proportion of class, but belong to a different class, In the matrix, this is the column sum of class minus the diagonal element, divided by the rows sums of all other classes; i.e. 0/110 =0.0 for class yes and 63/110 = for class no, Which shows in Table 3.9 [49]. 57

20 Table 3.8: Confusion Matrix Suitable C4.5 Yes No Yes Class No 0 28 Table 3.9: Class Accuracy Class label TP Rate FP Rate Yes NO

21 Figure 3.3:Decision tree! 59

22 3.6. Conclusion Predicting students academic performance is a great concern to the higher education system. Data mining can be used in a higher educational system to predict the students academic performance. This work conducts a study to predict student s performance for a particular course like BCA, MCA or any computer course. This is done with student s qualitative data to show the influence in student s performance using machine learning technique decision tree. This concludes that student s performance is affected by qualitative factors. Machine learning has come extreme from its promising stages, and can prove to be a powerful tool in academia. In the future, applications similar to the one developed, as well as any improvements thereof may become an integrated part of every academic institution. The success of any educational organization is mainly dependent on the results it produces in terms of student success rate [66]. This work successfully derived a prediction mechanism for the success of student s course wise, social status and grade wise. The method has been proved to be effective from correctly predicted result is 94% approximately. However, the method helps the college managements to improve their teaching learning process and academic activities midway through the course in order to improve their performance. In future can improve this technique by adding some more qualitative data like hobbies, financial help, caste, attendance, and sports ability. This technique also can be used in any educational organization, institution to predict performance of students and they can improve their result and also reduce dropout rate of students. This work has been showing the accuracy of model through classification matrix or confusion matrix. Accuracy table sows model accuracy is greater then 50% that means model can predict true value on satisfactory rate. 60

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,