Comparing High-Order Boolean Features

Brgham Young Unversty BYU cholarsarchve All Faculty Publcatons 2005-07-0 Comparng Hgh-Order Boolean Features Adam Drake adam_drake@yahoo.com Dan A. Ventura ventura@cs.byu.edu Follow ths and addtonal works at: https://scholarsarchve.byu.edu/facpub Part of the Computer cences Commons Orgnal Publcaton Ctaton Adam Drake and Dan Ventura, "Comparng Hgh-Order Boolean Features", Proceedngs of the Jont Conference on Informaton cences, pp. 428-43, July 25. BYU cholarsarchve Ctaton Drake, Adam and Ventura, Dan A., "Comparng Hgh-Order Boolean Features" (2005). All Faculty Publcatons. 364. https://scholarsarchve.byu.edu/facpub/364 Ths Peer-Revewed Artcle s brought to you for free and open access by BYU cholarsarchve. It has been accepted for ncluson n All Faculty Publcatons by an authorzed admnstrator of BYU cholarsarchve. For more nformaton, please contact scholarsarchve@byu.edu, ellen_amatangelo@byu.edu.

Comparng Hgh-Order Boolean Features Adam Drake and Dan Ventura 2 Computer cence Department, Brgham Young Unversty acd2@cs.byu.edu 2 ventura@cs.byu.edu Abstract Many learnng algorthms attempt, ether explctly or mplctly, to dscover useful hgh-order features. When consderng all possble functons that could be encountered, no partcular type of hgh-order feature should be more useful than any other. However, ths paper presents arguments and emprcal results that suggest that for the learnng problems typcally encountered n practce, some hgh-order features may be more useful than others. Keywords: hgh-order correlatons, feature selecton, learnng theory.. Introducton earchng for useful hgh-order relatonshps (relatonshps between two or more of the orgnal nput features of a learnng problem) s a fundamental task of many learnng algorthms. Typcally, the search for useful hgh-order features and the types of hgh-order features learned are mplct n the algorthms. Hgh-order features allow algorthms to more accurately and/or more effcently model phenomena for whch the orgnal, frst-order features may be nsuffcent. There are many hgh-order features that could be consdered by a learnng algorthm typcally far more than can be consdered n a feasble amount of tme. Therefore, algorthms must lmt themselves to searchng for one or a few types of hgh-order relatonshps. Ths paper explores the queston of whether certan types of hgh-order relatonshps are more lkely than others to be found n the data of realworld learnng problems, and by extenson, whether certan types of relatonshps are more useful to examne than others... Motvaton Learnng algorthms based on the dscrete Fourer transform of Boolean functons (functons of the form f: {0,} n {,-}) have been used wth great success n the feld of computatonal learnng theory to prove varous learnablty results [][2][3]. However, the potental beneft of applyng Fourer-based technques to real-world problems s not well studed. One realworld applcaton has been presented [4], but t requres the use of a membershp oracle, lmtng ts applcablty. The queston of whether Fourer-based algorthms can effectvely solve more general realworld problems, for whch oracle queres may not be possble, remans open. The study of practcal Fourer-based learnng leads to a queston about the utlty of Fourer representatons. Fourer-based algorthms represent functons as a lnear combnaton of Fourer bass functons. Let f be an arbtrary functon of n Boolean nputs, x through x n. The Fourer transform of f gves the coeffcents, f ˆ( α ), that allow f to be represented as a lnear combnaton of the bass functons χ α : f ( x,..., x ) = fˆ( α ) ( x,..., n χ α x n ) n α {0,} The bass functons are defned as follows: n- f α s even = 0 x χ α ( x,..., xn ) = n- f α s odd = 0 x and the Fourer coeffcents are computed as shown here: fˆ( α ) = n f ( x) χ α ( x) 2 n x {0,} Note that the Fourer transform used here, also known as a Walsh transform, s a smplfed Fourer transform for functons of Boolean nputs. The Fourer bass functons are party functons, each computng the party (or the logcal XOR) of all nputs x such that =. Thus, the hgh-order features consdered by Fourer-based learnng algorthms are hgh-order XOR functons. The Fourer bass s capable of representng any Boolean functon; however, the fact that the representaton s based on XOR relatonshps suggests that a Fourer-based approach would be especally benefcal when useful hgh-order XOR relatonshps exst n the data. mlarly, t would seem less benefcal when such correlatons do not exst. Ths observaton begs the followng queston: Are hgh-order XOR relatonshps lkely to be found n the

data of real-world problems? And more generally, are some hgh-order relatonshps more lkely to be found than others? Ths paper presents an argument that because feature selecton s done by humans, and s therefore based towards human reasonng, the hgh-order relatonshps that exst n real-world data wll tend to be based towards relatonshps that are ndcatve of the way humans correlate data. Ths hypothess s tested by examnng the prevalence of hgh-order XOR relatonshps, whch are relatvely non-ntutve, to more ntutve hgh-order AND and OR relatonshps. Tests on several real-world problems suggest that AND and OR relatonshps are more lkely to be found n the data of real-world problems. 2. K th -Order Boolean Features The hgh-order features consdered n ths paper are patterned after the bass functons of the Fourer transform. Each hgh-order feature s a functon over a subset of the orgnal Boolean nput features. The three functon types consdered here are conjuncton (AND), dsjuncton (OR), and party (XOR) functons. Let n be the number of nput features of a partcular problem, let x be the value of the th feature, and let {,,n} be the subset of features over whch a partcular Boolean functon s defned. Then the AND, OR, and XOR functons can be defned as follows (note that the XOR functons defned below are functonally equvalent to the Fourer bass functons descrbed prevously, but are now defned n terms of the subset ): AND x,..., x ) = ( n OR x,..., x ) = ( n XOR x,..., x ) = ( n f f f f f f x x s odd = = 0 = = 0 s even The AND, OR, and XOR functons compute the logcal AND, OR, and XOR, respectvely, of the nput features specfed n. For example, the functon AND {,3,4} computes the logcal AND of the frst, thrd, and fourth features. It s equvalent to the expresson x x 3 x 4. The order of a functon, k, s the number of elements n. Thus, AND {,3,4} s a thrd-order feature. In ths paper, hgh-order features are defned as those for whch k 2. Gven a data set wth n nput features, there are 2 n possble subsets, and therefore 2 n possble functons, for each type of relatonshp. One of these subsets s the empty set, whch for each relatonshp type gves a constant functon. In addton, there are n subsets contanng only one feature. These n frst-order functons are also equvalent for each type of relatonshp. The remanng 2 n -n- functons are unque for each type of relatonshp, and compute all possble second- and hgher-order AND, OR, and XOR relatonshps. When computng the Fourer transform of a functon f, a negatve coeffcent ndcates that f s negatvely correlated wth some XOR functon, XOR, and therefore postvely correlated wth XNOR. mlarly, f f s negatvely correlated wth an AND or OR functon, t s postvely correlated wth the correspondng NAND or NOR functon, respectvely. However, for smplcty, the nverson s gnored n the followng dscusson, and a strong correlaton could refer to ether a strong postve or a strong negatve correlaton. Thus, for example, an AND correlaton could refer to ether an AND or a NAND correlaton. (The groupng of AND wth NAND and OR wth NOR s natural when patternng the AND and OR functons after the Fourer bass functons. However, by DeMorgan s law, NAND s equvalent to OR, and NOR s equvalent to AND, where sgnfes that the nputs n are nverted. Consequently, AND could be logcally grouped wth NOR, and OR wth NAND. However, the choce of groupng does not sgnfcantly alter the results presented n ths paper, nor does t affect the conclusons.) 3. An Argument for Intutve Hgh- Order Features A no free lunch [5] argument would suggest that no hgh-order relatonshps wll be better on average than any others. When consderng two possble hgh-order relatonshps, there wll be just as many functons for whch the frst s better as there wll be for the second. However, there are reasons why some correlatons mght be more lkely to be useful n practce. Data sets encountered n the real world are not randomly generated. In general, data sets are gathered by people who select the features that they thnk wll be most useful n analyzng a partcular problem. Because people are selectng the features, the data sets of real-world problems wll be based towards whatever reasonng humans use to select features. Thus, the queston of whether certan hgh-order relatonshps are more lkely to appear n data than others can be reposed as a queston of whether the

features selected by humans are more lkely to exhbt some hgh-order relatonshps than others. A consderaton of these ssues leads to the followng reasonng. It s very natural for people to thnk n terms of conjunctons (AND) and dsjunctons (OR). These logcal operators are very ntutve. Because people tend to thnk n terms of AND and OR, we hypothesze that humans are more lkely to pck features that combne well n useful hgh-order AND and OR relatonshps than n other less ntutve relatonshps. The XOR relatonshp, although farly ntutve when nvolvng only two varables, s less ntutve when more varables are nvolved. It seems less lkely that people wll select features that exhbt useful hghorder XOR relatonshps. The generalzaton of these deas would be that n general, hgh-order relatonshps that are ntutve and representatve of the way people thnk are more lkely to be useful features n human-based data sets. Although sgnfcant testng would be requred to verfy ths clam, the results of ths paper provde some early supportng evdence. A fnal consderaton s not only whether certan hgh-order features exst, but whether they are useful. Even f t s true that ntutve features are more useful, t may stll be possble to fnd other hgh-order correlatons. Although these concdental relatonshps may exst n the data, because they do not reflect the bas ntroduced by human feature selecton they may not generalze as well to unseen data. 4. Comparng Hgh-Order Features To test the prevalence of dfferent hgh-order correlatons, several real-world data sets were taken from the UCI machne learnng repostory [6]. As ths work was motvated by a study of functons wth Boolean nputs, all data sets consdered ether contan only Boolean-valued features or have had ther non- Boolean features encoded as Boolean features. For contnuously-valued nputs, a reasonable threshold was chosen, and values above the threshold were assgned a, whle values below the threshold were assgned a 0. Nomnally-valued nputs were encoded nto bnary usng the mnmum number of bts requred to account for each possble value. There was some concern that ths choce of encodng mght affect the types of hgh-order correlatons found, but our testng suggested that t made lttle or no dfference. If anythng, encodng the orgnal nput varables would seem to ncrease the lkelhood of non-ntutve correlatons beng found. Each data set was examned n terms of AND, OR, and XOR relatonshps. For each type of relatonshp, the most hghly correlated feature was determned by checkng how well all 2 n functons (and ther nverses) correctly classfed examples n the data set. In addton, the accuracy of the most hghly correlated frst-order feature was computed to gve some dea of the usefulness of the hgh-order features. Table shows the results of ths experment. The classfcaton accuracy of the most hghly correlated functon of each type, along wth the accuracy of the most hghly correlated frst-order feature, s shown for each data set. The best accuracy for each data set s hghlghted n bold. Data et st AND OR XOR Adult 80.3 82. 8.6 8.6 Chess 68.3 67.7 8. 75.3 German 7.7 7.7 73. 7.7 Heart 75.6 76.3 77.0 76.3 Pma 73.6 75.4 7. 65.9 PECT 66.3 79.4 87.6 70.8 Votng 96.3 95.9 90. 88. WBC 87.3 87. 96.0 92.7 WBC2 76.8 80.3 78.8 77.3 WBC3 9.4 94.4 89.8 9.2 Table : Best classfcaton accuracy of any frst-order or hgher-order AND, OR, or XOR feature. For each data set, the best feature s hghlghted n bold. For each of the ten data sets tested, the most hghly correlated functon was always ether an AND or an OR functon, supportng the dea that AND and OR relatonshps are more lkely to be found n realworld data. No XOR functon was ever the most hghly correlated hgh-order feature. On the other hand, the best XOR functon was sometmes not far behnd n accuracy, and t was not always the worst of the three. For one of the data sets, the Votng set, none of the hgh-order features were better than the best frstorder feature. However, of the three feature types, the best feature for that set was a hgh-order AND feature. Table 2 shows the orders of the most hghly correlated hgh-order AND, OR, and XOR features. In several cases, multple features of a sngle type were equally well correlated. In these cases, a range of orders s reported, ndcatng that the orders of the best features fell wthn that range. The number of features n each data set s shown n parentheses. 5. Conclusons and Future Work Gven that we had no pror reason to beleve that a partcular correlaton type would be more or less prevalent n the data sets tested, the fact that no XOR relatonshp was ever the best ndvdual hgh-order

Data et AND OR XOR Adult (34) 3 2 2 Chess (37) 2 5-6 6 German (24) 5 4-7 2 Heart (6) 2 2-3 2 Pma (8) 2 2 2 PECT (22) 5-22 8-2 2 Votng (6) 2 2 3 WBC (36) 2 5-9 2 WBC2 (33) 4-20 2-6 3-4 WBC3 (30) 4 2 2 Table 2: The orders of the most hghly correlated AND, OR, or XOR features for each data set. The number of nput features n each data set s shown n parentheses. feature s sgnfcant. These results seem to suggest that algorthms that mplctly or explctly search for hgh-order AND and OR correlatons wll tend to be more successful than those based on XOR. An nterestng observaton s that although the best hgh-order features were always ether AND or OR relatonshps, nether the AND nor the OR features were unversally better. For example, the Chess data set exhbted strong hgh-order OR correlatons, but only weak AND correlatons. On the other hand, the Votng data set contaned sgnfcantly stronger AND correlatons than OR correlatons. Ths suggests that an algorthm capable of effectvely learnng both types of correlatons should be more successful over a broader range of learnng problems. A potentally nterestng mplcaton of ths research regards the mportance of beng able to learn XOR relatonshps. For example, the perceptron learnng algorthm has receved crtcsm for ts nablty to learn XOR relatonshps [7]. However, the results presented here suggest that ths may not be a sgnfcant weakness when workng wth typcal realworld problems. Another nterestng observaton regards the orders of the best hgh-order features shown n Table 2. In general, the orders of the most useful relatonshps tended to be farly low relatve to the total number of nput features. Ths observaton also seems to support the dea that data sets are based towards human reasonng, as humans are not lkely to consder very hgh-order relatonshps. (The unusually hgh-order correlatons found n the PECT and WBC2 data sets are prmarly a consequence of the hgh ratos of postve to negatve examples found n those sets.) It s mportant to note that the results of ths paper test ndvdual hgh-order features, and not the learnng potental of combnatons of hgh-order features. An nterestng test for future work wll be to determne f the same patterns exst when combnatons of hghorder features are used. For example, do combnatons of ether AND or OR features always outperform combnatons of XOR features. Another nterestng queston s whether an algorthm that can use each type of hgh-order feature benefts from the use of hgh-order XOR features or f they tend to not be useful even n combnaton wth other features. It may be true that hgh-order features that are more ntutve to humans are more lkely to be useful n solvng real-world learnng problems. Although the research presented here s supportve, more research wll be requred to valdate ths clam. Future work would nclude testng over a broader range of learnng problems and testng more types of correlatons. For example, ths research tested correlatons of Booleanvalued attrbutes on classfcaton problems, but there are other correlatons and problems to consder. For example, whch hgh-order correlatons are most useful when dealng wth real-valued features or when performng regresson. Another mportant area of future work wll be n comparng the generalzaton capabltes of each type of hgh-order feature. Although many hgh-order relatonshps may exst n data, some may not generalze as well as others. 6. References [] N. Lnal, Y. Mansour, and N. Nsan, Constant Depth Crcuts, Fourer Transform, and Learnablty, Journal of the Assocaton for Computng Machnery, vol. 40, n. 3, pp. 607-620, 993. [2] E. Kushlevtz and Y. Mansour, Learnng Decson Trees usng the Fourer pectrum, IAM Journal on Computng, vol. 22, n. 6, pp. 33-348, 993. [3] J. Jackson, An Effcent Membershp-Query Algorthm for Learnng DNF wth Respect to the Unform Dstrbuton, Journal of Computer and ystem cences, vol. 55, pp. 44-440, 997. [4] Y. Mansour &. ahar, Implementaton Issues n the Fourer Transform Algorthm, Machne Learnng, vol. 4, pp. 5-33, 2000. [5] D. Wolpert and W. Macready, No Free Lunch Theorems for Optmzaton, IEEE Transactons on Evolutonary Computng, vol., n., pp. 67-82, 997. [6] C. L. Blake and C. J. Merz, UCI Repostory of Machne Learnng Databases [http://www.cs.uc.edu/~mlearn/mlrepostory.html], Irvne, CA: Unversty of Calforna, Department of Informaton and Computer cence, 998. [7] M. Mnsky and. Papert, Perceptrons, MIT Press: Cambrdge, MA, 969.