Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Vol 1, No 2, September 2017, pp. 6-12 6 Implementaton Naïve Bayes Algorthm for Student Classfcaton Based on Graduaton Status Ayundyah Kesumawat 1), Dn Wakabu 2) Statstcs Department, Faculty of Mathematcs and Natural Scences, Islamc Unversty of Indonesa (UII), 55584 Sleman, Yogyakarta, Indonesa) 1 ayundyah.k@u.ac.d, 2 12611071@students.u.ac.d ARTICLE INFO AB ST R ACT Artcle hstory: Receved Aprl 12, 2017 Revsed June 04, 2017 Accepted June 20, 2017 Keywords: Nave Bayes Graduaton Status Classfcaton Length of study n collage s a tme t takes a student to have completed the study n college. Bachelor degree n achevng normal that t takes tme for four years, but stll there are students who completed ther studes beyond normal lmts (over four years). Ths such as nfluence on the value of accredtaton of nsttuton. In ths paper we used fve varables: grade pont average (GPA), Concentraton n Hgh School, Sex, partcpaton n assstance and cty of resdence, whch are classfed by the Graduaton Status students over four years and less than equal to four years. The method used for the classfcaton of a student's study tme s Nave Bayes algorthm. Ths study nvestgated classfcaton student based on Graduaton Status n Department Statstcs of Islamc Unversty of Indonesa. From the result, Naïve Bayes algorthm classfcaton s qute good wth accuracy value for Naïve Bayes s 81,18%. I. Introducton Hgher educaton n Indonesa are under Drectorate of Hgher Educaton and accredted by the Natonal Accredtaton Board for Hgher Educaton. The hgher educaton nsttuton s categorzed nto two types: publc and prvate hgher educaton [1]. There are four types of hgher educaton nsttutons: unverstes, nsttutes, academes and polytechncs. Unverstes n Indonesa are largely offered by the prvate sector. Out of around 3,500 nsttutons, only around 150 nsttutons are publc (establshed and operated by the government) [2]. Islamc Unversty of Indonesa s a prvate unversty n Indonesa. There are 9 facultes, and 46 program under faculty. Department of statstcs s one of the program under Mathematcs and Natural Scence Faculty. One of the standard qualty from hgher educaton s based on the student body t means the comparson student and lecturer. Expectaton for any hgher educaton s afford alumn wth hgh qualty. One of crtera for a hgh qualty can descrbe by graduaton status of the student n a nsttuton. The graduaton status can affect the qualty of alumn especally for Statstcs Department. Ths stuaton wll affect the qualty of the Department of statstcs, especally courses n Indonesa measured by the accredtaton conducted by the Natonal Accredtaton Board of Hgher Educaton (BAN PT). Qualty s measured by seven major standards, s one of ts students. Especally wth regard to the evaluaton of the student standard component of the assessment s the grade pont average and the duraton of the study [3]. Based on the assessment matrx nstrument of accredtaton that the percentage of students who graduate on tme s one element of the accredtaton assessment [4]. Ths ssue s extremely mportant for the management of department consderng the percentage of students graduatng on tme s one of the elements of accredtaton set by the Natonal Accredtaton Board. One of method that can used to solve ths problem s detected factor that mpled Graduaton Status. In ths paper used fve varable that supposed to be a factor that mpled Graduaton Status of student whch are grade pont average (GPA), Concentraton n Hgh School, Sex, partcpaton n assstance and cty of resdence. DOI: W : http://pubs.ascee.org/ndex.php/jabs E : nfo@ascee.org
ISSN: Internatonal Journal of Appled Busness and Informaton Systems 7 Classfcaton method used n ths research method s Nave Bayes Algorthm and Naïve Bayes s the best classfer aganst several common classfers n term of accuracy and computatonal effcency [5]. The remanng parts of ths paper are organzed n the followng structures. In secton 2, some backgrounds of Naïve Bayes Algorthm and accuracy classfcaton wll be revewed. The methodology and result whch are used accuracy for classfcaton are presented n secton 3. In secton 4, numercal expermental Naïve Bayes algorthm s presented. Fnally, dscussons and conclusons are presented n secton 5. II. Methodology Ths secton dscusses on technques that have been performed n ths study. A. Data Ths research was conducted n the Department of Statstcs Faculty of Mathematcs and Natural Scences, Islamc Unversty of Indonesa, the data used n ths research s regstraton data from student n department of statstcs Islamc Unversty of Indonesa and alumn data snce 1997-2012. The number of data used s 404 alumn snce 1997-2011 and 116 actve student class of year 2012. In ths paper used 5 varable whch s graduaton status, gender, cty of resdence, concentraton n hgh school, assstances, and Grade Pont Average (GPA). The category for each varable as follow : Table 1. Varabel Category Varable Descrpton Category Descrpton Varable Dependent Varable Independent Graduaton Status Sex Cty of Recdence Concentraton n Hgh School Assstance GPA On Tme Not On Tme Male Female Yogyakarta and Central Java Outsde Yogyakarta and Central Java Scence Socal Other Yes No The student that complete ther study for 4 years (8 semester) The student that complete ther study beyond 4 years Plenty < 2,75 Satsfy 2,75 3,00 Very Satsfy 3,01 3,51 Cumlaude > 3,51 2.2. Naïve Bayes Algorthm One of the statstcal classfer are Bayesan Classfer. They can predct class membershp probabltes, such as the probablty that a gven tranng sample belongs to a partcular class [6]. So many studes comparng by performance decson tree and selected neural network classfer wth A smple Bayesan classfer known as the Naïve Bayes Classfer. Attrbute value on a class s assumed ndependent n Naïve Bayes Classfer or t called class condtonal ndependence. The Naïve Bayes. Nave Bayes s based on Bayes theorem whch has a smlar classfcaton capablty wth decson tree and neural network. For large data mplementaton Nave Bayes proved to have hgh accuracy and speed. Bayes Theorem has the followng general form: Ayundyah Kesumawat & Dn Wakabu (Implementaton Naïve Bayes Algorthm)
8 Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 P( H X ) P( X H ) P( H ) P( X ) wth : X : Data wth unknown class H : Hypothess X data s a specfc class P(H X) : The probablty of the hypothess H s based on the condton X P(H) : The probablty of the hypothess H (pror prob.) P (X H) : The probablty of X on the condton P(X) : The probablty of X Bayes formula can be expressed nformally by sayng that lkelhood pror Posteror evdence We know how frequently some partcular evdence s observed, gven a known outcome. We can use ths known fact to compute the reverse, to compute the chance of that outcome happenng, gven the evdence The Naïve Bayes classfer, works as follows [5]: Gven a set of labelled tranng samples and ther assocated class label. As usual, each tranng sample s represented by an n-dmensonal attrbute vector, X (x 1, x 2,..., x n), depctng n measurements made on the tranng sample from n attrbutes, respectvely, A1, A2,..., A. Suppose that there are m classes, C1, C2,..., C m. Gven a tranng sample, X, the classfer wll predct that X belongs to the class havng the hghest posteror probablty, condtoned on X. That s, the Naïve Bayes Classfer predcts that tranng sample X belongs to the class C f and only f P( C X ) P( C X ) for 1 j m, j j Thus we maxmze P( C X ). The class C for whch P( C X ) s maxmzed s called the maxmum posteror hypothess. By Bayes theorem (equaton (1)), n P(C X) P( X C ) P(C ) PX ( ) Suppose P(X) s equal for all classes, only P(X C ) P(C ) maxmzed. If the class pror probabltes are not known, then t s commonly assumed that the classes are equally lkely, that s, P( C1) P( C2)... P( C m ), and we would therefore maxmze P(X C ). Otherwse, we maxmze P(X C ) P(C ). Note that the class pror probabltes may be estmated by P( C ) C / D, D, where D, C s the number of tranng samples of class C n D. If data sets wth many attrbutes gven, t would be extremely computatonally hard to compute P(X C ). In case to reduce computaton for evaluatng P(X C ), the Naïve Bayes assumpton of class condtonal s ndependence. To make presumes that the values of the attrbutes are condtonally ndependent of one another, gven the class label of the tranng sample. Thus, P( X C ) P( x C ) P( x C )... P( x C ) 1 2 n Ayundyah Kesumawat & Dn Wakabu (Implementaton Naïve Bayes Algorthm)
ISSN: Internatonal Journal of Appled Busness and Informaton Systems 9 We can easly estmate the probabltes P( x1 C ),.., P( xn C ) can be estmated from the tranng samples, where If Ak s categorcal, then P( xk C ) S k / S, where S k s the number of tranng samples of class C havng value x k and S s the number of tranng samples belongng to C. If Ak s contnuous-valued, then the attrbute s assumed to have a Gaussan dstrbuton so that P( x C ) g( x, c, c ) where g( x, c, c ) s Normal densty functon for attrbute Ak, whle k k c and c k are the mean and standard devaton, respectvely. In order to classfy an unknown data nput X, P(X C )P(C ) s evaluated for each class C. 2.3. Classfcaton Accuracy The classfcaton accuracy A of an ndvdual program depends on the number of samples correctly classfed (true postves plus true negatves) and s evaluated by the formula. [6] A true classfcaton statement 100% number of statement III. Numercal Expermental Result In ths secton descrbe the results of classfcaton for Statstcs Department Student based on Graduaton Status ncludng Nave Bayes models, and the results of the classfcaton. Frst of all, a datasets wth 404 alumn classfed n two dfferent categores s used for evaluaton. A. Descrtve Statstcs In ths secton descrbe the descrptve statstcs from the data. Alumn n Statstcs Department are spread throughout Indonesa as follows : Fg. 1. Dstrbuton Alumn of Statstcs Department Based on Fgure 1., the hghest cty resdence of alumn s from Central Java and Yogyakarta. In ths paper, for cty resdence dvded by two regon are cty resdence from Central Java and Yogyakarta, and outsde Central Java and Yogyakarta. The percentage for the graduaton status of a student department of Statstcs as follows : Ayundyah Kesumawat & Dn Wakabu (Implementaton Naïve Bayes Algorthm)
10 Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 Percentage for Graduaton Status 100% 80% 60% 40% 20% 0% 0% 32% 54% 100%100%100% 85% 100% 86%80% 85%80% 82% 64% 50%50% 82% 100% 46% 68% 0% 0% 0% 15% 0% 14% 20%15%20% 18% 36% 50%50% 18% On Tme Not On Tme Fg. 2. Percentage for Graduaton Status From Fgure 2, t can be conclude that from 1995 untl 2011 there are several data that showed a 100% that the student dd not completed ther study n a less than equal to four years. 200 100 165 95 109 35 0 Female Male Not On Tme On Tme Fg. 3. Graduaton Status Based on Sex Based on fgure 3, t can be conclude the student that female student has an hgher number for Graduaton status not on tme and on tme than a male student. It s possble happen, because there are many female student than male student so t mpled ths condton. B. Naïve Bayes Classfer In applyng the Nave Bayes Classfer the selected dataset contans two categores of data : Graduaton on Tme and Graduaton Not On Tme. 30% data (404 data) are used to bulds the tranng dataset for the classfer. The other (116 data) data are used as the testng dataset to test the classfer. The data used n ths paper as follow : No Sex Cty of Resdence Table 2. Data Tranng Concentraton Hgh School 1 Male Yogyakarta and Central Java Scence : 404 Female Outsde Yogyakarta and Central Java GPA Very Satsfy Graduaton Status Not On Tme Scence Cumlaude On Tme The data n table 2 proceed by usng R software and Weka 3.8 to solve the problem. Then, n ths secton the pror probablty of each class n graduaton status (Y) and the condtonal probablty for each ndependent varable (except : Graduaton Status) to Y showed n output R program. The output program from R as follow : Ayundyah Kesumawat & Dn Wakabu (Implementaton Naïve Bayes Algorthm)
ISSN: Internatonal Journal of Appled Busness and Informaton Systems 11 Table 3. Pror Probablty for Graduaton Status Pror Probablty Not On Tme On Tme 0.6782178 0.3217822 The pror probablty for graduaton status n table 3 show that the bggest probablty s graduaton not on tme for ths case. It mpled the condtonal probablty for each varable as follow : Varable (X) Sex GPA Cty of Resdence Concentraton n Hgh School Table 4. Condtonal Probablty for Graduaton Status Probablty of Graduaton Categores Status (Y) Not on Tme On Tme Male 0.6021898 0.7307692 Female 0.3978102 0.2692308 Cumlaude 0.1277372 0.6384615 Very Satsfy 0.5364964 0.3615385 Satsfy 0.1934307 0 Plenty 0.1423358 0 Outsde Yogyakarta and Central Java 0.5182482 0.5 Yogyakarta and Central Java 0.4817518 0.5 Scence 0.72262774 0.95384615 Socal 0.13868613 0.02307692 Other 0.13868613 0.02307692 Based on table 4, t can be conclude that the bggest probablty n each varable there are male for on tme status, cumlaude GPA n on tme status, cty of resdence that outsde Yogyakarta and Central Java n not on tme status, and scence concentraton n on tme status. That probablty wll mpled the accuracy of the classfcaton that showed n the next secton as follow. C. Accuracy of Classfcaton Accordng to the table 5, t can be seen from the 302 Graduaton Student wth "On Tme" status there are 250 students classfed correctly, and 52 other students are not approprate. Table 5. Accuracy of Classfcaton Graduaton Status On Tme Not On Tme On Tme 250 24 Not On Tme 52 78 Percentage Correct 81,18% Thus, t can be sad that the Nave Bayes algorthm successfully classfes Graduaton status student n Statstcs Department UII wth a percentage of 81,18% accuracy. IV. Concluson In ths paper, Naïve Bayes Classfer has been dscussed as the best classfer n ths problem. Thus, the level of accuracy n classfcaton models Nave Bayes algorthm 81,18%. Acknowledgment The authors acknowledgment wth thanks the Statstcs Department, Islamc Unversty of Indonesa for provdng the fnancal assstance n the research actvty. References [1] Indonesan Government Regulaton No. 66 (2010). Management and Delvery of Educaton. [2] Moelodhardjo, B. Y., (2014). Hgher Educaton Sector n Indonesa. Internatonal Semnar on Massfcaton of Hgher Educaton n Large Academc Systems. Brtsh Councl. New Delh. Ayundyah Kesumawat & Dn Wakabu (Implementaton Naïve Bayes Algorthm)
12 Internatonal Journal of Appled Busness and Informaton Systems ISSN: 2597-8993 [3] Han, J. and Kamber, M. (2006). Data Mnng : Concepts and Technques. Elsever. San Franssco. [4] BAN PT - Natonal Accredtaton Board of Hgher Educaton. (2008). Book VI Matrx Instrument Ratng Program Accredtaton. [5] S.L. Tng, W.H. Ip, Albert H.C. Tsang. (2011). Is Naïve Bayes a Good Classfer for Document Classfcaton?. Internatonal Journal of Software Engneerng and Its Applcaton, 5(3), Hongkong. [6] BAN PT - Natonal Accredtaton Board of Hgher Educaton. (2011). Accredtaton of Hgher Educaton Insttutons - Book III Gudelnes for Preparaton of Form, pp.4. [7] Rachmawan, A. F., & Ulama, B. S. S. (2016). Ads Flterng Mengunakan Jarngan Syaraf Truan Perceptron, Naïve Bayes Classfer, dan Regres logstk. Jurnal Sans dan Sen ITS, 5(1), D83-D89. Ayundyah Kesumawat & Dn Wakabu (Implementaton Naïve Bayes Algorthm)