Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Similar documents
The Research of Support Vector Machine in Agricultural Data Classification

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment

CSCI 5417 Information Retrieval Systems Jim Martin!

Support Vector Machines

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Analysis of Continuous Beams in General

Classifier Selection Based on Data Complexity Measures *

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

An Entropy-Based Approach to Integrated Information Needs Assessment

A User Selection Method in Advertising System

X- Chart Using ANOM Approach

An Anti-Noise Text Categorization Method based on Support Vector Machines *

A Binarization Algorithm specialized on Document Images and Photos

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Cluster Analysis of Electrical Behavior

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Solving two-person zero-sum game by Matlab

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

y and the total sum of

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Scheduling Remote Access to Scientific Instruments in Cyberinfrastructure for Education and Research

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Classifying Acoustic Transient Signals Using Artificial Intelligence

CS 534: Computer Vision Model Fitting

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

Unsupervised Learning

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Load Balancing for Hex-Cell Interconnection Network

Feature Selection as an Improving Step for Decision Tree Construction

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Three supervised learning methods on pen digits character recognition dataset

Biostatistics 615/815

Machine Learning: Algorithms and Applications

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Probability Base Classification Technique: A Preliminary Study for Two Groups

Extraction of Fuzzy Rules from Trained Neural Network Using Evolutionary Algorithm *

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Face Recognition Based on SVM and 2DPCA

Bridges and cut-vertices of Intuitionistic Fuzzy Graph Structure

Module Management Tool in Software Development Organizations

EXTENDED BIC CRITERION FOR MODEL SELECTION

Clustering Algorithm of Similarity Segmentation based on Point Sorting

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

On Some Entertaining Applications of the Concept of Set in Computer Science Course

Data Mining: Model Evaluation

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

A Statistical Model Selection Strategy Applied to Neural Networks

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

On Supporting Identification in a Hand-Based Biometric Framework

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

An Improvement to Naive Bayes for Text Classification

Module 6: FEM for Plates and Shells Lecture 6: Finite Element Analysis of Shell

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article

Impact of a New Attribute Extraction Algorithm on Web Page Classification

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

Related-Mode Attacks on CTR Encryption Mode

Edge Detection in Noisy Images Using the Support Vector Machines

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

Machine Learning 9. week

Solitary and Traveling Wave Solutions to a Model. of Long Range Diffusion Involving Flux with. Stability Analysis

Smoothing Spline ANOVA for variable screening

Deep Classification in Large-scale Text Hierarchies

A Five-Point Subdivision Scheme with Two Parameters and a Four-Point Shape-Preserving Scheme

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Learning-Based Top-N Selection Query Evaluation over Relational Databases

A COMPARISON OF TWO METHODS FOR FITTING HIGH DIMENSIONAL RESPONSE SURFACES

Context-Specific Bayesian Clustering for Gene Expression Data

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Improving anti-spam filtering, based on Naive Bayesian and neural networks in multi-agent filters

Wishing you all a Total Quality New Year!

Cell Count Method on a Network with SANET

A Multivariate Analysis of Static Code Attributes for Defect Prediction

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

F Geometric Mean Graphs

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Experiments in Text Categorization Using Term Selection by Distance to Transition Point

A Deflected Grid-based Algorithm for Clustering Analysis

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

An Evaluation of Divide-and-Combine Strategies for Image Categorization by Multi-Class Support Vector Machines

Semi - - Connectedness in Bitopological Spaces

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

Random Variables and Probability Distributions

Transcription:

Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status Ayundyah Kesumawat 1), Dn Wakabu 2) Statstcs Department, Faculty of Mathematcs and Natural Scences, Islamc Unversty of Indonesa (UII), 55584 Sleman, Yogyakarta, Indonesa) 1 ayundyah.k@u.ac.d, 2 12611071@students.u.ac.d ARTICLE INFO AB ST R ACT Artcle hstory: Receved Aprl 12, 2017 Revsed June 04, 2017 Accepted June 20, 2017 Keywords: Nave Bayes Graduaton Status Classfcaton Length of study n collage s a tme t takes a student to have completed the study n college. Bachelor degree n achevng normal that t takes tme for four years, but stll there are students who completed ther studes beyond normal lmts (over four years). Ths such as nfluence on the value of accredtaton of nsttuton. In ths paper we used fve varables: grade pont average (GPA), Concentraton n Hgh School, Sex, partcpaton n assstance and cty of resdence, whch are classfed by the Graduaton Status students over four years and less than equal to four years. The method used for the classfcaton of a student's study tme s Nave Bayes algorthm. Ths study nvestgated classfcaton student based on Graduaton Status n Department Statstcs of Islamc Unversty of Indonesa. From the result, Naïve Bayes algorthm classfcaton s qute good wth accuracy value for Naïve Bayes s 81,18%. I. Introducton Hgher educaton n Indonesa are under Drectorate of Hgher Educaton and accredted by the Natonal Accredtaton Board for Hgher Educaton. The hgher educaton nsttuton s categorzed nto two types: publc and prvate hgher educaton [1]. There are four types of hgher educaton nsttutons: unverstes, nsttutes, academes and polytechncs. Unverstes n Indonesa are largely offered by the prvate sector. Out of around 3,500 nsttutons, only around 150 nsttutons are publc (establshed and operated by the government) [2]. Islamc Unversty of Indonesa s a prvate unversty n Indonesa. There are 9 facultes, and 46 program under faculty. Department of statstcs s one of the program under Mathematcs and Natural Scence Faculty. One of the standard qualty from hgher educaton s based on the student body t means the comparson student and lecturer. Expectaton for any hgher educaton s afford alumn wth hgh qualty. One of crtera for a hgh qualty can descrbe by graduaton status of the student n a nsttuton. The graduaton status can affect the qualty of alumn especally for Statstcs Department. Ths stuaton wll affect the qualty of the Department of statstcs, especally courses n Indonesa measured by the accredtaton conducted by the Natonal Accredtaton Board of Hgher Educaton (BAN PT). Qualty s measured by seven major standards, s one of ts students. Especally wth regard to the evaluaton of the student standard component of the assessment s the grade pont average and the duraton of the study [3]. Based on the assessment matrx nstrument of accredtaton that the percentage of students who graduate on tme s one element of the accredtaton assessment [4]. Ths ssue s extremely mportant for the management of department consderng the percentage of students graduatng on tme s one of the elements of accredtaton set by the Natonal Accredtaton Board. One of method that can used to solve ths problem s detected factor that mpled Graduaton Status. In ths paper used fve varable that supposed to be a factor that mpled Graduaton Status of student whch are grade pont average (GPA), Concentraton n Hgh School, Sex, partcpaton n assstance and cty of resdence. DOI: W : http://pubs.ascee.org/ndex.php/jabs E : nfo@ascee.org

ISSN: Internatonal Journal of Appled Busness and Informaton Systems 7 Classfcaton method used n ths research method s Nave Bayes Algorthm and Naïve Bayes s the best classfer aganst several common classfers n term of accuracy and computatonal effcency [5]. The remanng parts of ths paper are organzed n the followng structures. In secton 2, some backgrounds of Naïve Bayes Algorthm and accuracy classfcaton wll be revewed. The methodology and result whch are used accuracy for classfcaton are presented n secton 3. In secton 4, numercal expermental Naïve Bayes algorthm s presented. Fnally, dscussons and conclusons are presented n secton 5. II. Methodology Ths secton dscusses on technques that have been performed n ths study. A. Data Ths research was conducted n the Department of Statstcs Faculty of Mathematcs and Natural Scences, Islamc Unversty of Indonesa, the data used n ths research s regstraton data from student n department of statstcs Islamc Unversty of Indonesa and alumn data snce 1997-2012. The number of data used s 404 alumn snce 1997-2011 and 116 actve student class of year 2012. In ths paper used 5 varable whch s graduaton status, gender, cty of resdence, concentraton n hgh school, assstances, and Grade Pont Average (GPA). The category for each varable as follow : Table 1. Varabel Category Varable Descrpton Category Descrpton Varable Dependent Varable Independent Graduaton Status Sex Cty of Recdence Concentraton n Hgh School Assstance GPA On Tme Not On Tme Male Female Yogyakarta and Central Java Outsde Yogyakarta and Central Java Scence Socal Other Yes No The student that complete ther study for 4 years (8 semester) The student that complete ther study beyond 4 years Plenty < 2,75 Satsfy 2,75 3,00 Very Satsfy 3,01 3,51 Cumlaude > 3,51 2.2. Naïve Bayes Algorthm One of the statstcal classfer are Bayesan Classfer. They can predct class membershp probabltes, such as the probablty that a gven tranng sample belongs to a partcular class [6]. So many studes comparng by performance decson tree and selected neural network classfer wth A smple Bayesan classfer known as the Naïve Bayes Classfer. Attrbute value on a class s assumed ndependent n Naïve Bayes Classfer or t called class condtonal ndependence. The Naïve Bayes. Nave Bayes s based on Bayes theorem whch has a smlar classfcaton capablty wth decson tree and neural network. For large data mplementaton Nave Bayes proved to have hgh accuracy and speed. Bayes Theorem has the followng general form: Ayundyah Kesumawat & Dn Wakabu (Implementaton Naïve Bayes Algorthm)

8 Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 P( H X ) P( X H ) P( H ) P( X ) wth : X : Data wth unknown class H : Hypothess X data s a specfc class P(H X) : The probablty of the hypothess H s based on the condton X P(H) : The probablty of the hypothess H (pror prob.) P (X H) : The probablty of X on the condton P(X) : The probablty of X Bayes formula can be expressed nformally by sayng that lkelhood pror Posteror evdence We know how frequently some partcular evdence s observed, gven a known outcome. We can use ths known fact to compute the reverse, to compute the chance of that outcome happenng, gven the evdence The Naïve Bayes classfer, works as follows [5]: Gven a set of labelled tranng samples and ther assocated class label. As usual, each tranng sample s represented by an n-dmensonal attrbute vector, X (x 1, x 2,..., x n), depctng n measurements made on the tranng sample from n attrbutes, respectvely, A1, A2,..., A. Suppose that there are m classes, C1, C2,..., C m. Gven a tranng sample, X, the classfer wll predct that X belongs to the class havng the hghest posteror probablty, condtoned on X. That s, the Naïve Bayes Classfer predcts that tranng sample X belongs to the class C f and only f P( C X ) P( C X ) for 1 j m, j j Thus we maxmze P( C X ). The class C for whch P( C X ) s maxmzed s called the maxmum posteror hypothess. By Bayes theorem (equaton (1)), n P(C X) P( X C ) P(C ) PX ( ) Suppose P(X) s equal for all classes, only P(X C ) P(C ) maxmzed. If the class pror probabltes are not known, then t s commonly assumed that the classes are equally lkely, that s, P( C1) P( C2)... P( C m ), and we would therefore maxmze P(X C ). Otherwse, we maxmze P(X C ) P(C ). Note that the class pror probabltes may be estmated by P( C ) C / D, D, where D, C s the number of tranng samples of class C n D. If data sets wth many attrbutes gven, t would be extremely computatonally hard to compute P(X C ). In case to reduce computaton for evaluatng P(X C ), the Naïve Bayes assumpton of class condtonal s ndependence. To make presumes that the values of the attrbutes are condtonally ndependent of one another, gven the class label of the tranng sample. Thus, P( X C ) P( x C ) P( x C )... P( x C ) 1 2 n Ayundyah Kesumawat & Dn Wakabu (Implementaton Naïve Bayes Algorthm)

ISSN: Internatonal Journal of Appled Busness and Informaton Systems 9 We can easly estmate the probabltes P( x1 C ),.., P( xn C ) can be estmated from the tranng samples, where If Ak s categorcal, then P( xk C ) S k / S, where S k s the number of tranng samples of class C havng value x k and S s the number of tranng samples belongng to C. If Ak s contnuous-valued, then the attrbute s assumed to have a Gaussan dstrbuton so that P( x C ) g( x, c, c ) where g( x, c, c ) s Normal densty functon for attrbute Ak, whle k k c and c k are the mean and standard devaton, respectvely. In order to classfy an unknown data nput X, P(X C )P(C ) s evaluated for each class C. 2.3. Classfcaton Accuracy The classfcaton accuracy A of an ndvdual program depends on the number of samples correctly classfed (true postves plus true negatves) and s evaluated by the formula. [6] A true classfcaton statement 100% number of statement III. Numercal Expermental Result In ths secton descrbe the results of classfcaton for Statstcs Department Student based on Graduaton Status ncludng Nave Bayes models, and the results of the classfcaton. Frst of all, a datasets wth 404 alumn classfed n two dfferent categores s used for evaluaton. A. Descrtve Statstcs In ths secton descrbe the descrptve statstcs from the data. Alumn n Statstcs Department are spread throughout Indonesa as follows : Fg. 1. Dstrbuton Alumn of Statstcs Department Based on Fgure 1., the hghest cty resdence of alumn s from Central Java and Yogyakarta. In ths paper, for cty resdence dvded by two regon are cty resdence from Central Java and Yogyakarta, and outsde Central Java and Yogyakarta. The percentage for the graduaton status of a student department of Statstcs as follows : Ayundyah Kesumawat & Dn Wakabu (Implementaton Naïve Bayes Algorthm)

10 Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Percentage for Graduaton Status 100% 80% 60% 40% 20% 0% 0% 32% 54% 100%100%100% 85% 100% 86%80% 85%80% 82% 64% 50%50% 82% 100% 46% 68% 0% 0% 0% 15% 0% 14% 20%15%20% 18% 36% 50%50% 18% On Tme Not On Tme Fg. 2. Percentage for Graduaton Status From Fgure 2, t can be conclude that from 1995 untl 2011 there are several data that showed a 100% that the student dd not completed ther study n a less than equal to four years. 200 100 165 95 109 35 0 Female Male Not On Tme On Tme Fg. 3. Graduaton Status Based on Sex Based on fgure 3, t can be conclude the student that female student has an hgher number for Graduaton status not on tme and on tme than a male student. It s possble happen, because there are many female student than male student so t mpled ths condton. B. Naïve Bayes Classfer In applyng the Nave Bayes Classfer the selected dataset contans two categores of data : Graduaton on Tme and Graduaton Not On Tme. 30% data (404 data) are used to bulds the tranng dataset for the classfer. The other (116 data) data are used as the testng dataset to test the classfer. The data used n ths paper as follow : No Sex Cty of Resdence Table 2. Data Tranng Concentraton Hgh School 1 Male Yogyakarta and Central Java Scence : 404 Female Outsde Yogyakarta and Central Java GPA Very Satsfy Graduaton Status Not On Tme Scence Cumlaude On Tme The data n table 2 proceed by usng R software and Weka 3.8 to solve the problem. Then, n ths secton the pror probablty of each class n graduaton status (Y) and the condtonal probablty for each ndependent varable (except : Graduaton Status) to Y showed n output R program. The output program from R as follow : Ayundyah Kesumawat & Dn Wakabu (Implementaton Naïve Bayes Algorthm)

ISSN: Internatonal Journal of Appled Busness and Informaton Systems 11 Table 3. Pror Probablty for Graduaton Status Pror Probablty Not On Tme On Tme 0.6782178 0.3217822 The pror probablty for graduaton status n table 3 show that the bggest probablty s graduaton not on tme for ths case. It mpled the condtonal probablty for each varable as follow : Varable (X) Sex GPA Cty of Resdence Concentraton n Hgh School Table 4. Condtonal Probablty for Graduaton Status Probablty of Graduaton Categores Status (Y) Not on Tme On Tme Male 0.6021898 0.7307692 Female 0.3978102 0.2692308 Cumlaude 0.1277372 0.6384615 Very Satsfy 0.5364964 0.3615385 Satsfy 0.1934307 0 Plenty 0.1423358 0 Outsde Yogyakarta and Central Java 0.5182482 0.5 Yogyakarta and Central Java 0.4817518 0.5 Scence 0.72262774 0.95384615 Socal 0.13868613 0.02307692 Other 0.13868613 0.02307692 Based on table 4, t can be conclude that the bggest probablty n each varable there are male for on tme status, cumlaude GPA n on tme status, cty of resdence that outsde Yogyakarta and Central Java n not on tme status, and scence concentraton n on tme status. That probablty wll mpled the accuracy of the classfcaton that showed n the next secton as follow. C. Accuracy of Classfcaton Accordng to the table 5, t can be seen from the 302 Graduaton Student wth "On Tme" status there are 250 students classfed correctly, and 52 other students are not approprate. Table 5. Accuracy of Classfcaton Graduaton Status On Tme Not On Tme On Tme 250 24 Not On Tme 52 78 Percentage Correct 81,18% Thus, t can be sad that the Nave Bayes algorthm successfully classfes Graduaton status student n Statstcs Department UII wth a percentage of 81,18% accuracy. IV. Concluson In ths paper, Naïve Bayes Classfer has been dscussed as the best classfer n ths problem. Thus, the level of accuracy n classfcaton models Nave Bayes algorthm 81,18%. Acknowledgment The authors acknowledgment wth thanks the Statstcs Department, Islamc Unversty of Indonesa for provdng the fnancal assstance n the research actvty. References [1] Indonesan Government Regulaton No. 66 (2010). Management and Delvery of Educaton. [2] Moelodhardjo, B. Y., (2014). Hgher Educaton Sector n Indonesa. Internatonal Semnar on Massfcaton of Hgher Educaton n Large Academc Systems. Brtsh Councl. New Delh. Ayundyah Kesumawat & Dn Wakabu (Implementaton Naïve Bayes Algorthm)

12 Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 [3] Han, J. and Kamber, M. (2006). Data Mnng : Concepts and Technques. Elsever. San Franssco. [4] BAN PT - Natonal Accredtaton Board of Hgher Educaton. (2008). Book VI Matrx Instrument Ratng Program Accredtaton. [5] S.L. Tng, W.H. Ip, Albert H.C. Tsang. (2011). Is Naïve Bayes a Good Classfer for Document Classfcaton?. Internatonal Journal of Software Engneerng and Its Applcaton, 5(3), Hongkong. [6] BAN PT - Natonal Accredtaton Board of Hgher Educaton. (2011). Accredtaton of Hgher Educaton Insttutons - Book III Gudelnes for Preparaton of Form, pp.4. [7] Rachmawan, A. F., & Ulama, B. S. S. (2016). Ads Flterng Mengunakan Jarngan Syaraf Truan Perceptron, Naïve Bayes Classfer, dan Regres logstk. Jurnal Sans dan Sen ITS, 5(1), D83-D89. Ayundyah Kesumawat & Dn Wakabu (Implementaton Naïve Bayes Algorthm)