ASSOCIATION RULE MINING BASED ON IMAGE CONTENT

Internatonal Journal of Informaton Technology and Knowledge Management January-June 011, Volume 4, No. 1, pp. 143-146 ASSOCIATION RULE MINING BASED ON IMAGE CONTENT Deepa S. Deshpande Image mnng s concerned wth knowledge dscovery n mage databases. We present a data mnng approach to fnd assocaton rules based on mage content. The Data mnng approach has four major steps: Preprocessng, Feature Extracton, Preparaton of Transactonal database and Assocaton rule mnng. The purpose of our experments s to explore the feasblty of data mnng approach.. Results wll show that there s promse n mage mnng based on content. Mammography s one of the best methods n breast cancer detecton, but n some cases, radologsts cannot detect tumors despte ther experence. Computer-aded method usng assocaton rule could assst medcal staff and mprove the accuracy of detecton. It s well known that data mnng technques are more sutable to larger databases than the one used for these prelmnary tests. In partcular, a Computer aded method based on assocaton rules becomes more accurate wth a larger dataset. Tradtonal assocaton rule algorthms adopt an teratve method to dscovery frequent tem set, whch requres very large calculatons and a complcated transacton process. Because of ths, a new assocaton rule algorthm s proposed n ths paper. Expermental results show that ths new method can quckly dscover frequent tem sets and effectvely mne potental assocaton rules. 1. INTRODUCTION Advances n mage acquston and storage technology have led to tremendous growth n very large and detaled mage databases. A vast amount of mage data such as satellte mages, medcal mages, and dgtal photographs are generated every day. These mages, f analyzed, can reveal useful nformaton to the human users. Unfortunately, t s dffcult or even mpossble for human to dscover the underlyng knowledge and patterns n the mage when handlng a large collecton of mages. Image mnng deals wth the extracton of mplct knowledge, mage data relatonshp, or other patterns not explctly stored n the mage databases. The mages from an mage database are frst preprocessed to mprove ther qualty. These mages then undergo varous transformatons and feature extracton to generate the mportant features from the mages. Wth the generated features, mnng can be carred out usng data mnng technques to dscover sgnfcant patterns. The resultng patterns are evaluated and nterpreted to obtan the fnal knowledge, whch can be appled to applcatons. In tradtonal assocaton rule mnng, an assocaton rule s represented n LHS => RHS form wth both LHS and RHS allowed to contan multple tems. Support of an assocaton rule s defned as the percentage of transactons that contans all tems (both LHS and RHS) n an assocaton rule and confdence of an assocaton rule s defned as the percentage of LHS tems that also contans RHS. An assocaton rule holds f ts support s greater than mnsup Department of Computer Scence & Engneerng, MGM s Jawaharlal Nehru Engneerng College Aurangabad (Maharashtra), Dr.B.A.M.U. Unversty, Inda Emal: deepadsd@yahoo.com and confdence s greater than mnconf, where mnsup and mnconf are confgurable. The problem of fndng assocaton rules s decomposed nto sub-problems of fndng all tem sets wth at-least mnmum support (also called large tem sets) and usng these large tem sets to generate the desred rules (tested for mnmum confdence). Large tem set generaton s acheved by generatng canddate tem sets and keepng the ones wth mnmum support. Ths requres very large calculatons and a complcated transacton process. The dscovery of assocaton rules s typcally done n two steps: dscovery of frequent tem sets and the generaton of assocaton rules. The second step s rather straghtforward, and the frst step domnates the processng tme, so ths paper explctly focuses on the frst step by proposng new algorthm.. SCHEME FOR EXPERIMENT Step-I : Pre-processng Phase: Snce real-lfe data s often ncomplete, nosy and nconsstent, pre-processng becomes a necessty. In our case, we had mages that were very large (typcal sze was 104 x 104) and almost 50% of the whole mage comprsed of the background wth a lot of nose. In addton, these mages were scanned at dfferent llumnaton condtons, and therefore some mages appeared too brght and some were too dark. The frst step toward nose removal was prunng the mages wth the help of the crop operaton n Image Processng. Croppng cuts off the unwanted portons of the mage. Thus, we elmnated almost all the background nformaton and most of the nose.. The next step towards pre-processng the mages was usng mage enhancement technques. Image enhancement helps n qualtatve mprovement of the mage wth respect to a specfc applcaton. Enhancement can be done ether n the

144 DEEPA S. DESHPANDE spatal doman or n the frequency doman. Here we work wth the spatal doman and drectly deal wth the mage plane tself. In order to dmnsh the effect of over-brghtness or over-darkness n mages, and at the same tme accentuate the mage features, we appled the Hstogram Equalzaton method, whch s a wdely used technque. The nose removal step was necessary before ths enhancement because, otherwse, t would also result n enhancement of nose. Step-II : Feature Extracton Process: Once the preprocessng s appled, an extracton process s used n order to extract texture feature usng GLCM technque Statstcal parameters such as Standard devaton, Mean, Moments, Smoothness, Unformty, Entropy can be extracted from the preprocessed mages by usng GLCM( Gray Level Cooccurrence Matrx). GLCM of an mage s computed usng a dsplacement vector d, defned by ts radus ä and orentaton è. Frequency normalzaton can be employed by dvdng value n each cell by the total number of pxel pars possble. Hence the normalzaton factor for 0 would be (N x 1) N y where N x represents the wdth and N y represents the heght of the mage. The quantzaton level s an equally mportant consderaton for determnng the co-occurrence texture features. Also, neghborng co-occurrence matrx elements are hghly correlated as they are measures of smlar mage qualtes. Each of these factors s dscussed ahead n detal. Choce of radus δ: δ value ranges from 1, to 10. Applyng large dsplacement value to a fne texture would yeld a GLCM that does not capture detaled textural nformaton. It has been observed that overall classfcaton accuraces wth δ = 1,, 4, 8 are acceptable wth the best results for δ = 1 and. Ths concluson s justfed, as a pxel s more lkely to be correlated to other closely located pxel than the one located far away. Choce of angle : Every pxel has eght neghborng pxels allowng eght choces for θ, whch are 0, 45, 90, 135, 180, 5, 70 or 315. However, takng nto consderaton the defnton of GLCM, the co-occurrng pars obtaned by choosng è equal to 0 would be smlar to those obtaned by choosng è equal to 180. Ths concept extends to 45, 90 and 135 as well. Hence, we have four choces to select the value of. Sample texture measures of mammogram mages are gven below: Moment Expresson Measure of texture Mean L 1 m = z () 0 p z A measure of average = ntensty Standard σ = µ ()z = σ A measure of average devaton contrast Smoothness R = 1 1/(1 + σ ) Thrd Moment L 1 3 = µ ()() 3 = z m p z 0 L 1 Measures the relatve smoothness of the ntensty n a regon. Measures the skew ness of a hstogram Unformty U = p () z Measures the unformty Entropy e = = 0 L 1 = 0 p()log() z p z of ntensty n the hstogram A measure of randomness Step-III: Preparaton of Transactonal Database: The extracted features are organzed n a database n the form of transactons, whch n turn consttute the nput for dervng assocaton rules. The transactons are of the form [ Image ID, F1; F; :::; Fn] where F1:::Fn are n features extracted for a gven mage. Sample Texture measures of mammogram mages are gven below: Image Average Average Smoothness Thrd Unformty Entropy Samples ntensty contrast moment Mam 1 39.6760 4.8696 0.075 0.6056 0.1663 4.7401 Mam 47.9076 1.9005 0.0736 6.341 0.1910 4.6683 Mam 3 43.7049 46.3144 0.0319 0.4708 0.156 4.4888 Mam 4 43.334 40.3894 0.045 0.45 0.1030 5.4656 Mam 5 43.3946 40.4359 0.045 0.419 0.1036 5.4638 Mam 6 6.3899 68.4661 0.067.1793 0.33 3.310 Mam 7 68.0774 71.3436 0.076 1.6967 0.47 3.0586 Mam 8 61.969 74.953 0.078 3.7407 0.058 4.9878 Mam 9 55.0435 81.8304 0.0934 8.8683 0.557 4.463 Mam 10 43.1755 69.3156 0.0688 6.161 0.3507 3.9049

ASSOCIATION RULE MINING BASED ON IMAGE CONTENT 145 Step-IV: Assocaton Rule Mnng: Dscoverng frequent tem sets s the key process n assocaton rule mnng. In order to perform data mnng assocaton rule algorthm, numercal attrbutes should be dscretzed frst,.e. contnuous attrbute values should be dvded nto multple segments. Tradtonal assocaton rule algorthms adopt an teratve method to dscovery, whch requres very large calculatons and a complcated transacton process. Because of ths, a new assocaton rule algorthm s proposed n ths paper. Ths new algorthm adopts a Boolean vector method to dscoverng frequent tem sets. In general, the new assocaton rule algorthm conssts of four phases as follows: 1. Transformng the transacton database nto the Boolean matrx.. Generatng the set of frequent 1-temsets L1. 3. Prunng the Boolean matrx. 4. Generatng the set of frequent k-tem sets Lk(k>1). The detaled algorthm, phase by phase, s presented below: 1. Transformng the transacton database nto the Boolean matrx: The mned transacton database s D, wth D havng m transactons and n tems. Let T={T1,T,,Tm} be the set of transactons and I={I1,I,,In}be the set of tems. We set up a Boolean matrx Am*n, whch has m rows and n columns. Scannng the transacton database D, we use a bnnng procedure to convert each real valued feature nto a set of bnary features. The 0 to 1 range for each feature s unformly dvded nto k bns, and each of k bnary features record whether the feature les wthn correspondng range.. Generatng the set of frequent 1-temset L1: The Boolean matrx Am*n s scanned and support numbers of all tems are computed. The support number Ij.supth of tem Ij s the number of 1s n the jth column of the Boolean matrx Am*n. If Ij.supth s smaller than the mnmum support number, temset {Ij} s not a frequent 1-temset and the jth column of the Boolean matrx Am*n wll be deleted from Am*n. Otherwse temset {Ij} s the frequent 1-temset and s added to the set of frequent 1-temset L1. The sum of the element values of each row s recomputed, and the rows whose sum of element values s smaller than are deleted from ths matrx. 3. Prunng the Boolean matrx: Prunng the Boolean matrx means deletng some rows and columns from t. Frst, the column of the Boolean matrx s pruned accordng to Proposton. Ths s descrbed n detal as: Let I be the set of all tems n the frequent set LK-1, where k>. Compute all LK-1(j) where j belongs to I, and delete the column of correspondence tem j f LK 1(j) s smaller than k 1. Second, recompute the sum of the element values n each row n the Boolean matrx. The rows of the Boolean matrx whose sum of element values s smaller than k are deleted from ths matrx. 4. Generatng the set of frequent k-temsets Lk: Frequent k-tem sets are dscovered only by and relatonal calculus, whch s carred out for the k-vectors combnaton. If the Boolean matrx Ap*q has q columns where < q n and mnsupth p m, k q c, combnatons of k-vectors wll be produced. The and relatonal calculus s for each combnaton of k-vectors. If the sum of element values n the and calculaton result s not smaller than the mnmum support number mnsupth, the k-temsets correspondng to ths combnaton of k- vectors are the frequent k-temsets and are added to the set of frequent k-temsets Lk. 3. EXPERIMENTAL RESULTS In order to apprase the performance of the new assocaton rule mnng algorthm, we conducted an experment usng the Apror algorthm and the proposed algorthm. The algorthms were mplemented n C Here presents the expermental results for dfferent numbers of mnmum supports. The results show that the performance of the new assocaton rule mnng algorthm s much better than that of the Apror algorthm. Moreover, the better the performance effcency of new assocaton rule mnng algorthm s, the smaller the mnmum support s. Ths s because the smaller the mnmum support, the more canddate tem sets the Apror algorthm has to determne, and also the Apror algorthm s jon and prunng processes take more tme to execute. However, the new assocaton rule mnng algorthm does not produce canddate tem sets, and t spends less tme calculatng k-supports wth the Boolean matrx pruned.

146 DEEPA S. DESHPANDE *ABBM : Algorthm Based on Boolean Matrx. Major steps to mprove the performance of the new method for assocaton rule mnng: Addng more robust features, whch are capable of generalzng more effectvely? Ths can reduce a lot of the naccuraces n the detecton process. We ntend to look nto more mage detecton features to get more generalzed vew of the mages. Ths would help us n detecton of dfferent types of assocaton rules. The transactonal database s constructed by mergng some already exstng features n the orgnal database wth some new vsual content features that we extracted from the mages usng mage processng technques. The exstng features are: The type of the tssue (dense, fatty and fatty glandular); The poston of the breast: left or rght. The transactons are of the form [ Image ID, Class Label, F1; F; :::; Fn ] where F1:::Fn are n features extracted for a gven mage. The type of tssue s an mportant feature to be added to the feature database, beng well known the fact that for some types of tssue the recognton s more dffcult than for others. Method wth these features ncorporated could ncrease the accuracy rate Ths project s a part of an mportant Data-Mnng project We can show n ths report that Assocaton rule mnng does help us n reducng the load on the experts to manually go through these mages We ntend to buld an automated system, whch would to a large extent automatcally detect assocaton rules from these mages. The endsystem would ndependently for most of the predcton process. We need a systematc approach to determne an optmal smlarty threshold for support & confdence or at least a close one. A very hgh threshold means only perfect matches are accepted. Fndng the rght smlarty threshold for each mage type looks lke an nterestng problem. Rght now t s provded by the user but t can be changed to be tuned by the algorthm tself. 4. CONCLUSION In ths paper, an new method for assocaton rule mnng s proposed. The man features of ths method are that t only scans the transacton database once, t does not produce canddate temsets, and t adopts the Boolean vector relatonal calculus to dscover frequent temsets. In addton, t stores all transacton data n bnary form, so t needs less memory space and can be appled to mnng large databases. BIBLIOGRAPHY [1] R. Agrawal, T. Imelnsk, and A. Swam, Mnng Assocaton Rules between Sets of Items n Large Databases, In Proceedngs of the 1993 ACM SIGMOD Internatonal Conference on Management of Data, Pages 07 16, Washngton, DC, May 6-8 1993. [] Agrawal, R., Imelnsk, T., & Swam, A. (1993), Mnng Assocaton Rules between Sets of Items n Large Databases, Proceedngs of the ACM SICMOD Conference on Management of Data, pp. 07-16, Washngton, D.C. [3] Han, J., Pe, J., & Yn, Y (000), Mnng Frequent Patterns Canddate Generaton In Proc. 000 ACM-SIGMOD Int. Management of Data (SIGMOD 00), Dallas, TX. [4] Berzal, F., Blanco, I., Sánchez, D. and Vla, M.A. Measurng the Accuracy and Importance of Assocaton Rules: A New Framework Intellgent Data Analyss, 6:1-35, 00. [5] Davd A. Claus, An Analyss of Co-occurrence Texture Statstcs as a Functon of Gray Level Quantzaton, Can. J. Remote Sensng, 8, No. 1, pp. 45-6, 00. [6] Bodon, F. A Fast Apror Implementaton, In Proc. IEEE ICDM Workshop on Frequent Item set Mnng Implementatons, 003. [7] Brjs, T. Vanhoof, K. and Wets, G., Defnng Interestngness for Assocaton Rules, In Int. Journal of Informaton Theores and Applcatons, 10:4, 003. [8] Tung, A., Lu, H., Han, J., & Feng, L. (003), Effcent Mnng of Intertransacton Assocaton Rules, IEEE Transacton on Knowledge and Data Engneerng, 15(1), 43-56. [9] Xu, Z. & Zhang, S. (003), An Optmzaton Algorthm Base on Apror for Assocaton Rules, ComputerEngneerng, 9(19), 83-84. [10] 4th European Conference of the Internatonal Federaton for Medcal and Bologcal Engneerng ECIFMBE 008 3 7 November 008 Antwerp, Belgum, 10.1007/978-3-540-8908-3_144, Jos Vander Sloten, Pascal Verdonck, Marc Nyssen and Jens Hauesen. [11] Rab Narayan Panda, Dr. Bjay Ketan Pangrah, Dr. Manas Ranjan Patro, Feature Extracton for Classfcaton of Mcro calcfcatons and Mass Lesons n Mammograms, IJCSNS Internatonal Journal of Computer Scence and Network Securty, 9, No.5, May 009. [1] J. Han, M. Kamber (001), Data Mnng, Morgan Kaufmann Publshers, San Francsco, CA. [13] R. C. Gonzalez and R. E..Woods, Dgtal Image Processng, Second Edton 00. [14] R. C. Gonzalez, Dgtal Image Processng usng Matlab Pearson Publcaton, 005. [15] Image Processng The Fundamentals Mara Petrou Unversty of SurreN Guldford, UK. Panagota Bosdogann Techncal Unversfy of Crete, Chana, Greece John Wley & Sons, LTD.