A Binary Neural Decision Table Classifier

A Bnary Neural Decson Table Classfer Vctora J. Hodge Smon O Keefe Jm Austn vcky@cs.york.ac.uk sok@cs.york.ac.uk austn@cs.york.ac.uk Advanced Computer Archtecture Group Department of Computer Scence Unversty of York, York, YO0 5DD, UK Abstract In ths paper, we ntroduce a neural network -based decson table algorthm. We focus on the mplementaton detals of the decson table algorthm when t s constructed usng the neural network. Decson tables are smple supervsed classfers whch, Kohav demonstrated, can outperform state-of-the-art classfers such as C4.5. We couple ths power wth the effcency and flexblty of a bnary assocatve-memory neural network. We demonstrate how the bnary assocatve-memory neural network can form the decson table ndex to map between attrbute values and data records. We also show how two attrbute selecton algorthms, whch may be used to pre-select the attrbutes for the decson table, can easly be mplemented wthn the bnary assocatve-memory neural framework. The frst attrbute selector uses mutual nformaton between attrbutes and classes to select the attrbutes that classfy best. The second attrbute selector uses a probablstc approach to evaluate randomly selected attrbute subsets. Introducton Supervsed classfer algorthms am to predct the class of an unseen data tem. They nduce a hypothess usng the tranng data to map nputs onto classfed outputs (decsons). Ths hypothess should then correctly classfy prevously unseen data tems. There s a wde varety of classfers ncludng: decson trees, neural networks, Bayesan classfers, Support Vector Machnes and k-nearest neghbour. We have prevously developed a k-nn classfer[ha04] usng an assocatve memory neural network called the Advanced Uncertan Reasonng Archtecture (AURA)[A95]. In ths paper, we extend the approach to encompass a decson table supervsed classfer, couplng the classfcaton power of the decs on table wth the speed and storage effcency of an assocatve memory neural network The decson table has two components: a schema and a body. The schema s the set of attrbutes pre-selected to represent the data and s usually a subset of the data s total attrbutes. There are varous approaches for attrbute selecton; we dscuss two later n ths paper. The body s essentally a table of labelled data tems where the attrbutes specfed by the schema form the rows and the decsons (classfcatons) form the columns. Each column s mutually exclusve and represents an equvalence set of records as defned by the attrbutes of the schema. Kohav [K95] uses a Decson Table Maorty (DTM) for classfcaton whereby f an unseen tem exactly matches a stored tem n the body then the decson table assgns the stored tem s decson to the unseen tem. However, f there s no exact match then the decson table assgns the maorty class across all tems to the unseen tem. Our decson table approach mplements both DTM and proxmty-based matchng as mplemented n our k-nn classfer whereby f there s no exact match then the decson table assgns the class of the nearest stored tem to the unseen tem. RAM-based Neural Networks The AURA C++ lbrary provdes a range of classes and methods for rapd partal matchng of large data sets [A95]. In ths paper we defne partal matchng as the retreval of those stored records that match some or all of the nput record. In our AURA decson table, we use best partal matchng to retreve the records that are the top matches. AURA belongs to a class of neural networks called Random Access Memory (RAM-based) networks. RAM-based networks were frst developed by Bledsoe & Brownng [BB59] and Aleksander & Albrow [AA68] for pattern recognton and led to the WISARD pattern recognton machne [ATB84]. See also [A98] for a detaled complaton of RAM methods. RAMs are founded on the twn prncples of matrces (usually called Correlaton Matrx Memores (CMMs)) and n-tuplng. Each matrx accepts m nputs as a vector or tuple addressng m rows and n outputs as a vector addressng n columns of the matrx. Durng the tranng phase, the matrx weghts M lk are ncremented f both the nput row I l and output column O k are set. NC4. of 7

Therefore, tranng s a sngle epoch process wth one tranng step for each nput-output assocaton preservng the hgh speed. Durng recall, the presentaton of vector I elcts the recall of vector O as vector I contans all of the addressng nformaton necessary to access and retreve vector O. Ths tranng and recall makes RAMs computatonally smple and transparent wth wellunderstood propertes. RAMs are also able to partally match records durng retreval. Therefore, they can rapdly match records that are close to the nput but do not match exactly. AURA AURA has been used n an nformaton retreval system[h0], hgh speed rule matchng systems[akl95], 3-D structure matchng[ta00] and trademark searchng[aa98]. AURA technques have demonstrated superor performance wth respect to speed compared to conventonal data ndexng approaches [HA0] such as hashng and nverted fle lsts whch may be used for a decson table body. AURA trans 20 tmes faster than an nverted fle lst and 6 tmes faster than a hashng algorthm. It s up to 24 tmes faster than the nverted fle lst for recall and up to 4 tmes faster than the hashng algorthm. AURA technques have demonstrated superor speed and accuracy compared to conventonal neural classfers [ZAK99]. The rapd tranng, computatonal smplcty, network transparency and partal match capablty of RAM networks coupled wth our robust quantsaton and encodng method to map numerc attrbutes from the data set onto bnary vectors for tranng and recall make AURA deal to use as the bass of an effcent mplementaton. A more formal defnton of AURA, ts components and methods now follows. Correlaton Matrx Memores (CMMs) are the buldng blocks for AURA systems. AURA uses bnary nput I and output O vectors to tran records n to the CMM and recall sets of matchng records from the CMM as n Equaton and Fgure. Equaton CMM = I all O T where s logcal OR Tranng s a sngle epoch process wth one tranng step for each nput-output assocaton (each I x O T n Equaton ) whch equates to one step for each record n the data set. Fgure Showng a CMM wth nput vector and output vector o. Four matrx locatons are set followng tranng 0 o 0, 2 o n-2, m- o 0 and m o n. For the methodology descrbed n ths paper, we: Tran the data set nto the CMM (decson table body CMM) whch ndexes all records n the data set and allows them to be matched. Select the attrbutes for the schema usng schema CMMs. We descrbe two selecton algorthms. One uses a sngle CMM but the second algorthm uses two coupled CMMs. Match and classfy unseen tems usng the traned decson table. Data For the data sets: Symbolc and numercal unordered attrbutes are enumerated and each separate token maps onto an nteger (Text? Integer) whch dentfes the bt to set wthn the vector. For example, a SEX_TYPE attrbute would map as, (F? 0) and (M? ). Kohav s DTM methodology s prncpally amed at symbolc attrbutes but the AURA decson table can handle contnuous numerc attrbutes. Any real-valued or ordered numerc attrbutes, are quantsed (mapped to dscrete bns) and each ndvdual bn maps onto an nteger whch dentfes the bt to set n the nput vector. A range of nput values for attrbute f map onto each bn whch n turn maps to a unque nteger to ndex the vector as n Equaton 2. The range of attrbute values mappng to each bn s equal. Equaton 2 R f bns fk a Integer fk + offset f where FV f cardnalty( Integer f ) cardnalty( bnsf ) NC4. 2 of 7

In Equaton 2, offset f s a cumulatve nteger offset wthn the bnary vector for each attrbute f and offset f+ = offset f +nbns f, where nbns f s the number of bns for attrbute f, FV f s the set of attrbute values for attrbute f,? s a many-to-one mappng and a s a one-to-one mappng. Ths quantsaton (bnnng) approach ams to subdvde the attrbutes unformly across the range of each attrbute. The range of values s dvded nto b bns such that each bn s of equal wdth. The even wdths of the bns prevent dstorton of the nter-bn dstances. Once the bns and nteger mappngs have been determned, we map the records onto bnary vectors. Each attrbute maps onto a consecutve secton of bts n the bnary vector. For each record n the data set For each attrbute Calculate bn for attrbute value; Set bt n vector as n Equaton 2; Each bnary vector represents a record from the data set Body Tranng The decson table body s an ndex of all contngences and the decson to take for each. Input vectors represent quantsed records and form an nput I to the CMM durng tranng. The CMM assocates the nput wth a unque output vector O T durng tranng whch represents an equvalence set of records. Ths produces a CMM where the rows represent the attrbutes and ther respectve values and the columns represent equvalence sets of records (where equvalence s determned by the attrbutes desgnated by the schema). We use an array of lnked lsts to store the equvalence sets of records and a second array to store the counts of each class for the equvalence set as a hstogram. The algorthm s: ) Input vector to CMM; 2) Threshold at value nf; 3) If exact match 4) Add the record to column lst; 5) Add class to hstogram; 6) Else tran record as next column; nf s the number of attrbutes. Steps and 2 are equvalent to testng for an exact match durng body recall as descrbed next. Fgure 3 shows a traned CMM where each row s an attrbute value and each column represents an equvalence set. Body Recall The decson table classfes by fndng the set of matchng records. To recall the matches for a query record, we frstly produce an nput vector by quantsng the target values for each attrbute to dentfy the bns and thus CMM rows to actvate as n Equaton 2. To retreve the matchng records for a partcular record, AURA effectvely calculates the dot product of the nput vector I k and the CMM, computng a postve nteger-valued output vector O k (the summed output vector) as n Equaton 3 and Fgure 2 & Fgure 3. Equaton 3 O T k = I k CMM The AURA technque thresholds the summed output O k to produce a bnary output vector as n Fgure 2 for exact match and Fgure 3 for a partal match. Fgure 2 Dagram showng the CMM recall for an exact match. The left hand column s the nput vector. The dot s the value for each attrbute (a value for an unordered attrbute or a bn for an ordered numerc attrbute). AURA multples the nput vector by the values n the matrx columns, usng the dot product, sums each column to produce the summed output vector and then thresholds ths vector at a value equvalent to the number of attrbutes n the nput (6 here) to produce the thresholded attrbute vector whch ndcates the matchng column (the mddle column here). For exact match (as n Kohav s DTM), we use the Wllshaw threshold. It sets a bt n the thresholded output vector for every locaton n the summed output vector that has a value hgher than a threshold value. The threshold value s set to the number of attrbutes nf for an exact match. If there s an exact match there wll be a bt set n the thresholded output vector ndcatng the matchng equvalence set. It s then smply a case of lookng up the class hstogram for ths equvalence set n the stored array and classfyng the record by the maorty class n the hstogram. If there NC4. 3 of 7

are no bts set n the thresholded output vector then we classfy the unseen record accordng to the maorty class across the data set. Fgure 3 Dagram showng the CMM recall for a partal match. The left hand column s the nput vector. The dot s the value for each attrbute (a value for an unordered attrbute or a bn for an ordered numerc attrbute). AURA multples the nput vector by the values n the matrx columns, usng the dot product, sums each column to produce the summed output vector and then thresholds ths vector at a value equvalent to the hghest value n the vector (5 here) to produce the thresholded attrbute vector whch ndcates the matchng column (the mddle column here). For partal matchng, we use the L-Max threshold. L- Max thresholdng essentally retreves at least L top matches. It sets a bt n the thresholded output vector for every locaton n the summed output vector that has a value hgher than a threshold value. The AURA C++ lbrary automatcally sets the threshold value to the hghest nteger value that wll retreve at least L matches. For the AURA decson table, L s set to the value of. There wll be a bt set n the thresholded output vector ndcatng the best matchng equvalence set. It s then smply a case of lookng up the class hstogram for ths equvalence set n the stored array and classfyng the unseen record as the maorty class. We note there may be more than one best matchng equvalence set so the maorty class across all best matchng sets wll need to be calculated. Schema Tranng In the decson table body CMM, the rows represented attrbute values and the columns represented equvalence sets. In the schema CMM used for the frst attrbute selecton algorthm, the rows represent attrbute values and the columns represent ndvdual records. For our second attrbute selecton algorthm, we use two CMMs where the frst CMM ndexes the second CMM. In the frst CMM, the rows represent records and the columns represent attrbute values. In the second CMM 2, the rows represent attrbute values and the columns represent the records. Ths second CMM 2 s therefore dentcal to the CMM used for the frst attrbute selecton algorthm Durng tranng for the frst attrbute selecton algorthm and CMM 2 of the second attrbute selecton algorthm, the nput vectors I represent the attrbute values n the data records. The CMM assocates the nput wth a unque output vector O T. Each output vector s orthogonal wth a sngle bt set correspondng to the record s poston n the data set, the frst record has the frst bt set n the output vector, the second and so on. Durng tranng for CMM, the records represent the nput vectors I wth a sngle bt set and the output vectors O T represent the attrbute values n the data records. The CMM tranng process s gven n Equaton. Schema Attrbute Selecton As wth Kohav, we assume that all records are to be used n the body and durng attrbute selecton. There are two fundamental approaches to attrbute selecton whch are used n classfcaton: a flter approach that selects the optmal set of attrbutes ndependently of the classfer algorthm and the wrapper approach that selects attrbutes to optmse classfcaton usng the algorthm. We examne two flter approaches whch are more flexble than wrapper approaches as they are not drectly coupled to the classfcaton algorthm. For a data set wth N attrbutes there are O(N M ) possble combnatons of M attrbutes whch s ntractable to search exhaustvely. In the followng: we use one flter approach (mutual nformaton attrbute selecton) that examnes attrbutes on an ndvdual bass and another probablstc flter approach (probablstc Las Vegas algorthm) that examnes randomly selected subsets of attrbutes. Mutual Informaton Attrbute Selecton Wettscherek [W94] descrbes a mutual nformaton attrbute selecton algorthm whch calculates the NC4. 4 of 7

mutual nformaton between class C and each attrbute F. The mutual nformaton between two attrbutes s the reducton n uncertanty concernng the possble values of one attrbute that s obtaned when the value of the other attrbute s determned. For unordered attrbutes, nfv s the number of dstnct attrbute values (f ) for attrbute F and the number of classes (C): Equaton 4 I( C,F) = nfv = p(c = c F = f ) p(c= c F = f) log p(c = c) p(f c= = f ) For ordered numerc attrbutes, the technque computes the mutual nformaton between a dscrete random varable (class) and a contnuous random varable (attrbute). It estmates the probablty functon of the attrbutes usng densty estmaton. We assume attrbute F has densty f(x) and the ont densty of C and F s f(x,y). Then the mutual nformaton s: Equaton 5 I( C, F ) = x c= f ( x, C = c) f ( x, C = c) log dx f ( x) p(c = c) Equaton 5 requres an estmate of the densty functon f(x) and the ont densty functon f(x, C=c). To approxmate f(x) and f(x, C=c), we utlse the bnnng to represent the densty whch s analogous to the Trapezum Rule for usng the areas of slces (trapeza) to represent the area under the graph for ntegraton. We use the bns to represent strps of the probablty densty functon and count the number of records mappng nto each bn to estmate the densty. In AURA, for unordered data, the mutual nformaton s gven by Equaton 6: Equaton 6 nrowsfv I (C, F ) = = c= nrowf n n( BVf BVc ) nrowf nrowf n( BVf BVc ) n nrowf log nclass c nrowf n n Where nrowsfv s the number of rows n the CMM for attrbute F, n s the number of records n the data set, BVf s a bnary vector (CMM row) for f, BVc s a bnary vector wth one bt set for each record n class c, n(bvf BVc) s a count of the set bts when BVc s logcally anded wth BVf and nclass c s the number of records n class c. In AURA, for real/dscrete ordered numerc attrbutes, the mutual nformaton s gven by Equaton 7: Equaton 7 nrowsfv I (C, F ) = = c = nrowb n n( BVb BVc ) nrowb nrowb n( BVb BVc ) n nrowb log nclass nrowb c n n Where nb s the number of bns n the CMM for attrbute F, n s the number of records n the data set, BVb s a bnary vector (CMM row) for f, BVc s a bnary vector wth one bt set for each record n class c, n(bvf BVc) s a count of the set bts when BVc s logcally ANDed wth BVb and nclass c s the number of records n class c. The technque assumes ndependence of attrbutes and gnores mssng values. It s also the user s prerogatve to determne the number of attrbutes to select. Probablstc Las Vegas Algorthm Lu & Setono [LS96] ntroduced a probablstc Las Vegas algorthm whch uses random search and nconsstency to evaluate attrbute subsets. For each equvalence set of records (where the records match accordng to the attrbutes desgnated n the schema), consstency s defned as the number of matchng records mnus the largest number of matchng records n any one class. The nconsstency scores are summed across all equvalence sets to produce an nconsstency score for the partcular attrbute selecton. The technque uses random search to select attrbutes as random search s less susceptble to local mnma than heurstc searches such as forward search or backward search. Forward search works by greedly addng attrbutes to a subset of selected attrbutes untl some termnaton condton s met whereby addng new attrbutes to the subset does not ncrease the dscrmnatory power of the subset above a prespecfed threshold value. Backward search works by greedly removng attrbutes from an ntal set of all attrbutes untl some termnaton condton s met NC4. 5 of 7

whereby removng an attrbute from the subset decreases the dscrmnatory power of the subset above a prespecfed threshold. A poor attrbute choce at the begnnng of a forward or backward search wll adversely effect the fnal selecton whereas a random search wll not rely on any ntal choces. Lu and Setono defned ther algorthm as: ) F best = N; 2) For = to MAX_TRIES 3) S = randomattrbuteset(seed); 4) nf = numberofattrbutes(s); 5) If(nF < nf best ) 6) If(InconCheck(S,D) < γ) 7) S best = S; nf best = nf; 8) End for Where D s the dataset, N the number of attrbutes and γ the permssble nconsstency score. Lu & Setono recommend settng MAX_TRIES to 77xN 5. Fgure 4 Showng the two CMM combnaton we use for Lu & Setono s algorthm. In the frst CMM (CMM ), the records ndex the rows (one row per record) and the attrbute values ndex the columns. The outputs from the CMM (matchng attrbute values) feed straght nto the second CMM(CMM 2 ), where the attrbute values ndex the rows and the records ndex the columns (one column per record). Lu and Setono s algorthm may be calculated smply usng the AURA schema CMMs. We need to use two lnked CMMs for the calculaton as n Fgure 4. We rotate the schema CMM (CMM ) through 90º. CMM s rows ndex the records and CMM s columns ndex the attrbute values. If we feed the outputs from CMM (the actvated attrbute values) nto CMM 2 then we can calculate the nconsstency scores easly. Lne 6 of Lu and Setono s algorthm lsted above then becomes: Place all records n a queue Q; Whle!empty(Q) Remove R the head record from Q; Actvate row R n CMM; Threshold CMM at value ; Feed CMM output nto CMM2; Threshold CMM2 at value F best; ; B = bts set n thresholded_vector; Max = cardnalty of largest class; InconCheck(S,D) += B-Max; End whle The queue effectvely holds the unprocessed records. By actvatng the head record s row n CMM and Wllshaw thresholdng at value (denotng all actve columns (.e., all attrbute values n the record)), we can determne that records attrbute values. When these values are fed nto CMM 2, we effectvely actvate all records matchng these values. Ths approach s the most effcent as the CMMs store all attrbutes and ther values but we only focus on those attrbutes under nvestgaton durng each teraton of the algorthm. An alternatve approach would be to ust store those attrbutes selected n the random subset each tme we execute lne 6 of Lu and Setono s algorthm but the CMMs would need to be retraned many tmes (up to 77xN 5 ). After thresholdng CMM 2 at the value F best (the number of attrbutes), we retreve the equvalence set of matchng records where equvalence s specfed by the current attrbute selecton n the algorthm {S}. It s then smply a matter of countng the number of matchng records (the number of bts set n the thresholded output vector), calculatng the number of these matchng records n each class, dentfyng the largest class membershp and subtractng the largest class membershp from the number of records. The algorthm has now processed all of the records n ths equvalence set so t removes these records from the queue. If we repeat ths process for each record at the head of the queue untl the queue s empty, we wll have processed all equvalence sets. We can then calculate InconCheck(S,D) for ths attrbute selecton and compare t wth the threshold value as n lne 6 of Lu and Setono s algorthm. Once we have terated through Lu and Setono s algorthm MAX_TRIES tmes then we have selected an optmal attrbute subset. We have not tred all combnatons of all attrbutes as ths s ntractable for a large data set. However, we have made a suffcent approxmaton. NC4. 6 of 7

Concluson In ths paper we have ntroduced a bnary neural decson table classfer. The AURA neural archtecture, whch underpns the classfer, has demonstrated superor tranng and recall speed compared to conventonal ndexng approaches such as hashng or nverted fle lsts whch may be used for a decson table. AURA trans 20 tmes faster than an nverted fle lst and 6 tmes faster than a hashng algorthm. It s up to 24 tmes faster than the nverted fle lst for recall and up to 4 tmes faster than the hashng algorthm. In ths paper, we descrbed the mplementaton detals of the technque. Our next step s to evaluate the AURA decson table for speed and memory usage aganst a conventonal decson table mplementaton. We have shown how two qute dfferent attrbute selecton approaches may be mplemented wthn the AURA decson table framework. We descrbed a mutual nformaton attrbute selector that examnes attrbutes on an ndvdual bass and scores them accordng to ther class dscrmnaton ablty. We also demonstrated a probablstc Las Vegas algorthm whch uses random search and nconsstency to evaluate attrbute subsets. We feel the technque s flexble and easly extended to other attrbute selecton algorthms. Acknowledgement Ths work was supported by EPSRC Grant number GR/R550/0. References [AA68] Aleksander, I., & Albrow, R.C. Pattern recognton wth Adaptve Logc Elements. IEE Conference on Pattern Recognton, pp 68-74, 968. [ATB84] Aleksander, I., Thomas, W.V., & Bowden, P.A. Wsard: A radcal step forward n mage recognton. Sensor Revew, pp 20-24, 984. [AA98] Alws, S., & Austn, J. A Novel Archtecture for Trademark Image Retreval Systems. In, Electronc Workshops n Computng, 998. [A95] Austn, J. Dstrbuted Assocatve Memores for Hgh Speed Symbolc Reasonng. In, IJCAI Workng Notes of Workshop on Connectonst- Symbolc Integraton: From Unfed to Hybrd Approaches, pp 87-93, 995. [A98] Austn, J. RAM-based Neural Networks, Progress n Neural Processng 9, Sngapore: World Scentfc Pub. Co., 998. [AKL95] Austn, J., Kennedy, J., & Lees, K. A Neural Archtecture for Fast Rule Matchng. In, Artfcal Neural Networks and Expert Systems Conference (ANNES 95), Dunedn, New Zealand, 995. [BB59] Bledsoe, W.W., & Brownng, I. Pattern recognton and Readng by Machne. In, Proceedngs of Eastern Jont Computer Conference, pp 225-23, 959. [H0] Hodge, V., Integratng Informaton Retreval & Neural Networks, PhD Thess,Department of Computer Scence, The Unversty of York, 200. [HA0] Hodge, V. & Austn, J. An Evaluaton of Standard Retreval Algorthms and a Bnary Neural Approach. Neural Networks 4(3), pp. 287-303, Elsever, 200. [HA04] Hodge, V. & Austn, J. A Hgh Performance k-nn Approach Usng Bnary Neural Networks. To appear, Neural Networks, Elsever, 2004. [K95] Kohav, R.. The Power of Decson Tables. In, Procs of Eurpean Confernce on Machne Learnng. LNAI 94, Sprnger-Verlag, pp74-89, 995. [LS96] Lu, H., and Setono, R. A probablstc approach to feature selecton - A flter soluton. In, 3th Internatonal Conference on Machne Learnng (ICML'96), pp. 39-327, 996. [TA00] Turner, A., & Austn, J. Performance Evaluaton of a fast Chemcal Structure Matchng Method usng Dstrbuted Neural Relaxaton. In, 4 th Internatonal conference on Knowledge Based Intellgent Engneerng Systems, 2000. [W94] Wettscherek, D.. A Study of Dstance-Based Machne Learnng Algorthms. PhD Thess, Dept of Comp. Sc., Oregon State Unversty, 994. [ZAK99] Zhou, P., Austn, J. & Kennedy, J. A Hgh Performance k-nn Classfer Us ng a Bnary Correlaton Matrx Memory, Advances n Neural Informaton Processng Systems, Vol., 999. NC4. 7 of 7