Benchmarking of Update Learning Strategies on Digit Classifier Systems

Similar documents
D. Barbuzzi* and G. Pirlo

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Online Detection and Classification of Moving Objects Using Progressively Improving Detectors

Unsupervised Learning

Incremental MQDF Learning for Writer Adaptive Handwriting Recognition 1

Support Vector Machines

Feature Reduction and Selection

Smoothing Spline ANOVA for variable screening

Classifier Selection Based on Data Complexity Measures *

An Entropy-Based Approach to Integrated Information Needs Assessment

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

The Research of Support Vector Machine in Agricultural Data Classification

PRÉSENTATIONS DE PROJETS

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

A Binarization Algorithm specialized on Document Images and Photos

Classifying Acoustic Transient Signals Using Artificial Intelligence

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

X- Chart Using ANOM Approach

A Lazy Ensemble Learning Method to Classification

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

y and the total sum of

Optimizing Document Scoring for Query Retrieval

Specialized Weighted Majority Statistical Techniques in Robotics (Fall 2009)

Foreground and Background Information in an HMM-based Method for Recognition of Isolated Characters and Numeral Strings

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

S1 Note. Basis functions.

Support Vector Machines

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Recognition of Handwritten Numerals Using a Combined Classifier with Hybrid Features

Biostatistics 615/815

Wishing you all a Total Quality New Year!

Machine Learning: Algorithms and Applications

Lecture 5: Multilayer Perceptrons

A Background Subtraction for a Vision-based User Interface *

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

ISSN: International Journal of Engineering and Innovative Technology (IJEIT) Volume 1, Issue 4, April 2012

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Feature Kernel Functions: Improving SVMs Using High-level Knowledge

CS 534: Computer Vision Model Fitting

Edge Detection in Noisy Images Using the Support Vector Machines

Learning-based License Plate Detection on Edge Features

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Accumulated-Recognition-Rate Normalization for Combining Multiple On/Off-Line Japanese Character Classifiers Tested on a Large Database

An Image Fusion Approach Based on Segmentation Region

Investigating the Performance of Naïve- Bayes Classifiers and K- Nearest Neighbor Classifiers

Hierarchical clustering for gene expression data analysis

Machine Learning 9. week

Cluster Analysis of Electrical Behavior

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Parallel matrix-vector multiplication

A Statistical Model Selection Strategy Applied to Neural Networks

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Sorting Review. Sorting. Comparison Sorting. CSE 680 Prof. Roger Crawfis. Assumptions

Manifold-Ranking Based Keyword Propagation for Image Retrieval *

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Analysis of Continuous Beams in General

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article. A selective ensemble classification method on microarray data

SVM-based Learning for Multiple Model Estimation

Incremental Learning with Support Vector Machines and Fuzzy Set Theory

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults

Backpropagation: In Search of Performance Parameters

Title: A Novel Protocol for Accuracy Assessment in Classification of Very High Resolution Images

An Approach to Real-Time Recognition of Chinese Handwritten Sentences

Yan et al. / J Zhejiang Univ-Sci C (Comput & Electron) in press 1. Improving Naive Bayes classifier by dividing its decision regions *

Fast Feature Value Searching for Face Detection

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Unsupervised Learning and Clustering

Performance Evaluation of Information Retrieval Systems

ENSEMBLE OF NEURAL NETWORKS FOR IMPROVED RECOGNITION AND CLASSIFICATION OF ARRHYTHMIA

Machine Learning. Topic 6: Clustering

An Optimal Algorithm for Prufer Codes *

Lecture 4: Principal components

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Journal of Process Control

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Dynamic Voltage Scaling of Supply and Body Bias Exploiting Software Runtime Distribution

Meta-heuristics for Multidimensional Knapsack Problems

Private Information Retrieval (PIR)

General Vector Machine. Hong Zhao Department of Physics, Xiamen University

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Bootstrapping Structured Page Segmentation

INTELLECT SENSING OF NEURAL NETWORK THAT TRAINED TO CLASSIFY COMPLEX SIGNALS. Reznik A. Galinskaya A.

Research and Application of Fingerprint Recognition Based on MATLAB

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Transcription:

2012 Internatonal Conference on Fronters n Handwrtng Recognton Benchmarkng of Update Learnng Strateges on Dgt Classfer Systems D. Barbuzz, D. Impedovo, G. Prlo Dpartmento d Informatca Unverstà degl Stud d Bar "ldo Moro" Bar, Italy {d.mpedovo, prlo}@d.unba.t bstract Three dfferent strateges n order to re-tran classfers, when new labeled data become avalable, are presented n a mult-expert scenaro. The frst method s the use of the entre new dataset. The second one s related to the consderaton that each sngle classfer s able to select new samples startng from those on whch t performs a mssclassfcaton. Fnally, by nspectng the mult expert system behavor, a sample msclassfed by an expert, s used to update that classfer only f t produces a mss-classfcaton by the ensemble of classfers. Ths paper provdes a comparson of three approaches under dfferent condtons on two state of the art classfers (SVM and Nave Bayes) by takng nto account four dfferent combnaton technques. Experments have been performed by consderng the CEDR (handwrtten dgt) database. It s shown how results depend by the amount of the new tranng samples, as well as by the specfc combnaton decson schema and by classfers n the ensemble. Keywords- Feedback learnng, Mult Expert, Tranng Sample Selecton I. INTRODUCTION pattern recognton system conssts of two man processes: enrollment (or tranng) and matchng (or recognton). In the frst phase, samples of specfc classes are acqured, processed and features extracted. These features are labeled wth the ground truth and used to generate the model representng the class. Matchng mode performs the recognton of the (unknown) nput pattern by comparng t to the enrolled templates. Dependng by the specfc scenaro, a sngle classfer s not always able to gan acceptable or hgh performance, so that n many applcatons [1, 2, 4, 6] classfers combnaton s a sutable soluton. On the other hand, as the specfc scenaro evolves, new labeled data can became avalable. In these stuatons, the typcal ssue s the way n whch the new data should be used. Recently, t has been showed, that n cases where a Mult Expert system (ME) s adopted, the collectve behavor of classfers can be used to select the most proftable samples n order to update the knowledge base of classfers [15, 18]. More specfcally samples, to be used for re-tranng, are selected by consderng those, msclassfed by a specfc expert of the set, whch produces a msclassfcaton at the ME level. Ths approach moves from the consderaton that the collectve behavor of a set of classfers can convey more nformaton than those of each classfer of the set, and ths nformaton can be exploted for classfcaton ams [4, 5, 17]. Ths paper reports of a comparson of ths approach to stuaton n whch the entre new dataset s used for learnng as well as the case n whch specfc samples are selected by the ndvdual classfer. Tests have been performed on the task of handwrtten dgt recognton, on the CEDR database, by consderng dfferent type of features, two state of the art classfers (Support Vector Machne and a Nave Bayes classfer), and four dfferent combnaton technques (Majorty Vote, Weghted Majorty vote, Sum Rule and Product Rule). It s shown how results depend by the specfc classfer, by the combnaton decson schema, as well as by tranng/test data dstrbuton. The paper s organzed as follows: Secton II presents the background of re-tranng and the dfferent strateges. Expermental setup and results are, respectvely, n Secton III and IV. Secton V reports conclusons of the work. II.. Background LNING UPDTING STRTEGIES Template update s an nterestng and open ssue both n the case of supervsed and sem-supervsed learnng. Let us consder the scenaro n whch new labeled data become avalable. The queston s: how to use new data? The smplest way, to update the knowledge base of the classfer, s probably to use the entre new set to retran the system gven the ntal tranng condton or, dependng by the classfer, the set of new+old data. On the other hand, many nterestng algorthms can be adopted n order to select specfc samples. mong the others, daboost [7, 9] s able to mprove performance of a classfer on a gven data set by focusng the learner attenton on dffcult nstances. Even f ths approach s very powerful, t works well n the case of weak classfers, moreover not all the learnng algorthms accept weghts for the ncomng samples. nother nterestng approach s the baggng one: a 978-0-7695-4774-9/12 $26.00 2012 IEEE DOI 10.1109/ICFHR.2012.186 35

number of weak classfers traned on dfferent subset (random nstance) of the entre dataset are combned by means of the smple majorty votng [8]. Unfortunately baggng and daboost are desgned and work well n the case of weak classfers and, on the other hand, f appled to a ME system, they do not take nto account the behavor of dfferent classfers n the ensemble, n fact they are appled consderng a sngle classfer n a stand-alone modalty. From ths pont of vew, the frst ntent of ths work s to deal wth state of the art performng classfers (not weak) and to determne strateges whch can be appled whatever classfer s consdered. Let us consder the case of new un-labeled data, two well known approaches, used n order to select specfc samples, are self-tranng and co-tranng. These are sem-supervsed learnng paradgms (the updatng process s performed usng both the ntal set of labeled templates and a set of unlabelled data acqured durng the on-lne operaton). Selftranng (or self-update) [13] s based on the concept that a classfer s retraned on ts own most confdent output produced from unlabeled data. The classfer s at frst traned on the set of labeled data and, subsequently, several self-tranng teratons are performed to ncorporate the unlabeled data untl some stop rule s satsfed. Co-tranng (or co-update) [12] s the stuaton under whch two classfers mprove each other. More specfcally the frst expert s up-dated wth elements confdently recognzed by the other one and vce-versa: the assumpton s that classfers nvolved n the co-tranng process have a condtonally ndependent vew of the data. Co-tran and self-tran have a very strong role, n the current state of the art, on bometrcs template up-datng process [14], moreover they have been appled recently even on the feld of handwrtng recognton [10, 11]. The man result observed for self-tranng on the task of handwrtten word recognton [10] s that the challenge of successful self-tranng les n fndng the optmal tradeoff between data qualty and data quantty for retranng. In partcular, f the re-tranng s done wth only those elements whose correctness can nearly be guaranteed, the retranng set does not change sgnfcantly and the classfer may reman nearly the same, or n other cases t could dscard genune samples whose dstrbuton s far from the one already embedded n the knowledge base thus resultng n a performance degradaton. Enlargng the retranng set, on the other hand, s only possble at the cost of ncreasng nose,.e. addng mslabeled words to the tranng set. In ths scenaro, the cotran approach [11] appears to be much more nterestng, n fact t does not suffer lmtatons of the self-update process, and performance mprovement are more evdent than those observed n the self-tranng case, even f the confdence threshold stll plays a crucal role. Co-tran can be easly extended from two to n classfers but the basc observaton s that, once more, even f an ensemble of classfers s avalable, there s no analyss and use of ther common behavor of classfcaton gven the nput to be recognzed and the specfc combnaton schema. From these observatons, specfc strateges are depcted n the next paragraph takng nto account the task of supervsed learnng. B. Learnng Strateges Let be: C j, for j=1,2,,m, the set of pattern classes, P = xk k = 1,2,..., K, a set of pattern to be feed { } to the Mult Expert (ME) system. P s consdered to be parttoned nto S subsets P 1,P 2,, P s,, P S, beng P s ={x k P k [N s (s-1)+1, N s s]} and N s =K/S (N s nteger), that are fed one after the other to the mult-expert system. In partcular, P 1 s used for learnng only, whereas P 2, P 3,,P s,,p S are used both for classfcaton and learnng (when necessary); y s Ω, the label for the x s pattern, Ω = { C, 1 C2,..., C M }, the -th classfer for =1,2,,N, F (k) = (F,1 (k), F,2 (k),, F,r (k), F,R (k)) the numeral feature vector used by for representng the pattern x k P (for the sake of smplcty t s here assumed that each classfer uses R numeral features) KB ( k), the knowledge base of after the processng of P k. In partcular ( ) 1 2 M ( k ) = KB ( k ), KB ( k ) KB ( k ) KB,..., E the mult expert system whch combnes hypothess n order to obtan the fnal one. Intally, frst stage (s=1), the classfer s traned usng the patterns x k P * = P 1. Therefore, the knowledge base KB (s) of s ntally defned as: KB (s)=(kb 1 (s),kb 2 (s),,kb j (s),,kb M (s)) where, for j=1,2,,m: KB j (s)=(f j,1 (s),f j,2 (s),, F j,r (s),, F j,r (s)) (1a) (1b) beng F j,r (s) the set of the r-th feature of the -th classfer for the patterns of the class C j that belongs to P *. Successvely, the subsets P 2, P 3,, P s,, P S-1 are provded one after the other to the mult-classfer system both for classfcaton and for learnng. P S s just consdered to be the testng set n order to avod based or too optmstc results. When consderng new labeled data (samples of P 2, 36

P 3,, P s,, P S-1 ), two dfferent strateges can be followed n order to select patterns from P s to tran : 1. x t Ps : update _ KB,.e. all the avalable new patterns belongng to P s are used to update the knowledge base of each ndvdual classfer x P ' x y : update _ KB, 2. t s ( t ) t.e. the ndvdual classfer s updated by consderng only samples belongng to P s whch have been msclassfed by tself. The second approach s derved from daboost and baggng and, at the same tme, t can be consdered as a supervsed verson of self-tranng. In order to nspect and take advantage of the common behavor of the ensemble of classfers, the thrd strategy proposed n ths work (and compared to the prevous two) s the followng: 3. x t Ps ' ( ( xt t E( xt t ): update_ KB.e. the ndvdual classfer s updated by consderng all ts msclassfed samples f and only f these produce (or contrbute to) a msclassfcaton of the ME. For the sake of smplcty, let us consder a ME adoptng three base classfers combned by means of Majorty Vote. In the case depcted n fgure 1(a) n whch two classfers correctly recognze the sample x, both the frst and the second approach would update the knowledge base of wth x thus ncreasng the smlarty ndex [16] of 1, 2 and 3 wth the only advantage of ncreasng performance of 1 on the tranng set and on pattern smlar to x. On the other hand, the ME system would exhbt the prevous performance wthout any mprovements. The thrd approach would not update the knowledgebase of 1. In the case depcted n fgure 1(b), the updatng of the knowledge base of 1 and/or 3 would produce the mprovements of the ME performance. The thrd strategy takes nto account performance of the ndvdual classfer as well as performance at ME level. It s able to select not only samples to be used for the updatng process, but also classfers to whch those samples must be feed. Of course, t s evdent that, many new samples wll not be feed to a specfc classfer and, n general, we could expect to observe performance degradaton f compared to the other strateges. Ths can happen dependng by performances of classfers as well as by the rato new/old data. However we have to consder and remark that we are dealng wth already traned and workng classfers: ntal performances are expected to be hgh (not weak). Ths leads to two consderatons: gven a specfc classfer, the dfference between the confdence value n the case of mssclassfcaton and n the case of correct one could be mputed to the fact that the specfc classfer (features, matchng technque, etc.) s unable to represent t, and no mprovements would be obtaned by ntroducng the new sample n the knowledge base. Ths s partcularly true under the assumpton that strong (not weak) classfers are used. f each classfer n the ensemble were able to recognze exactly the same set of patterns, t would be un-useful ther combnaton. From ths pont of vew we are nterested n not ncreasng the smlarty ndex (SI) among classfers. (a) (b) Fgure 1. Examples of updatng requests III. 1 ( x 2 ( x 3 ( x 1 ( x 2 ( x 3 ( x E ( x ( x EXPIMENTL SETUP. Data Set mult-expert system for handwrtten dgt recognton has been consdered: the CEDR database [9] P={x k j=1,2,,20351} (classes from 0 to 9 ) has been used. The DB has been ntally parttoned nto 6 subsets: P 1 ={x 1,x 2,x 3,,,x 12750 }, P 2 ={x 12751,,,x 14119 }, P 3 ={x 14120,,,x 15488 }, P 4 ={x 15489,,,x 16857 }, P 5 ={x 16858,,,x 18223 }, P 6 ={x 18224,,,x 20351 }. In partcular, P 1 P 2 P 3 P 4 P 5 represent the set usually adopted for tranng when consderng the CEDR DB [6]. P 6 s the testng dataset. Each dgt s zoned nto 16 unform (regular) regons [5], successvely, for each regon, the followng set of features have been consdered [6]: F 1 : features set 1: hole, up cavty, down cavty, left cavty, rght cavty, up end pont, down end pont, left end pont, rght end pont, crossng ponts, up extrema ponts, down extrema ponts, left extrema ponts, rght extrema ponts; E 37

F 2 : features set 2 (contour profles): max/mn peaks, max/mn profles, max/mn wdth, max/mn heght; F 3 : feature set 3 (ntersecton wth lnes): 5 horzontal lnes, 5 vertcal lnes, 5 slant -45 lnes and 5 slant +45 lnes. B. Classfers Tests have been performed takng nto account Support Vector Machnes (SVMs) and a Nave Bayes classfer (NB). SVM s a bnary (two-class) classfer, mult-class recognton s here performed by combnng multple bnary SVMs. The kernel functon adopted s the rbf gamma, performance are nfluenced by the standard devaton value ( ) and by the tolerance of classfcaton errors n learnng. In ths work 2 =0.3var (var s the varance of the tranng data set) [6]. NB classfer fts, n the tranng phase, a multvarate normal densty to each class C by consderng a dagonal covarance matrx. Gven an nput to be classfed, the Maxmum a Posteror (MP) decson rule s adopted to select the most probable hypothess among the dfferent classes. C. Combnaton technques Many approaches have been consdered so far for classfers combnaton. These approaches dffer n terms of type of output they combne, system topology and degree of a-pror knowledge they use [1, 2, 3]. The combnaton technque plays a crucal role n the selecton of new patterns to be feed to the classfer n the proposed approach. In ths work the followng decson combnaton strateges have been consdered and compared: Majorty Vote (MV), Weghted Majorty Vote (WMV), Sum Rule (SR) and Product Rule (PR). MV just consders labels provded by the ndvdual classfers, t s generally adopted f no knowledge s avalable about performance of classfers so that they are equal-consdered. The second approach can be adopted by consderng weghts related to the performance of ndvdual classfers on a specfc dataset. Gven the case depcted n ths work, t seems to be more realstc, n fact the behavor of classfers can be evaluated, for nstance, on the new avalable dataset. In partcular, let ε be the error rate of the -th classfer evaluated on the last avalable tranng set, the weght assgned to s w = log 1 β j, beng β ε =. 1 ε Sum Rule (SR) and Product Rule (PR) take nto account the confdence of each ndvdual classfer gven the nput pattern and the dfferent classes [1]. Before the combnaton, confdence values provded by dfferent classfers were normalzed by means of Z-score. IV. RESULTS Results are reported n terms of error rate percentage (). Values of the smlarty ndex (SI) are reported n the last row of each table, moreover for the dfferent learnng strateges and for each classfer, the followng ratos are evaluated and reported:, beng N F the total number of avalable new samples, N S the number of samples (selected among the prevous one) used for learnng and N I the number of samples used for the ntal tranng. R1 represents the percentage of new patterns selected from those avalable whle R2 s a measure of ther nfluence on the ntal tranng set. The label X-feed refers to the use of the X modalty for the feedback tranng process: ll s the feedback of the entre set (frst strategy n par. II.B), s feedback at classfer level (second strategy n par. II.B). MV, WMV, "SR", "PR" are feedback at ME level adoptng, respectvely, the majorty vote, the weghted majorty vote, the sum rule and the product rule schema. Table I reports results related to the use of SVM. The three set of features F 1, F 2 and F 3 (see par. III.) lead, respectvely, to SVM 1, SVM 2 and SVM 3. P 1 s used for tranng and P 6 for testng. P 2 P 3 P 4 P 5 s used for feedback learnng. In ths case the total amount of new samples s the 42.86% of the number of samples of the ntal tranng set (P 1 ). The frst column (No-feed) reports results related to the use of P 1 for tranng and of P 6 for testng, wthout applyng any feedback (R1=R2=0%), whle the approach ll-feed uses all samples belongng to the new set n order to update the knowledge base of each sngle classfer (R1=100%, R2=42.86% ). Dependng by the combnaton technque, a specfc strategy can outperform the others. In two cases out four (MV-feed and SR-feed), the mult-expert strategy outperforms the use of the entre new dataset, whle on three cases out 4 (MV-feed, WMVfeed and SR-feed) the mult-expert strategy outperforms the feedback at sngle expert level. In the case of Majorty Vote and of Sum Rule, feedback at ME level defntvely outperforms other two approaches. In these cases t s of nterest the fact that a very restrcted subset of samples s selected for re-tranng. Table II reports results related to the use of NB classfer under the same condtons of the prevous experment. In ths case performance mprovements provded by feedback at sngle expert level are always better than those obtaned by any ME technque. t the same tme, t must be underlned that the worst re-tranng strategy s the one whch consders the entre set of new avalable samples. typcal mplementaton of co- and self-tran takes nto account multple tranng teratons. In ths work experments have been performed consderng up to 3 teratons. 38

Nofeed TBLE I. SVM, FEEDBCK - P 2 P 3 P 4 P 5 - feed MV-feed WMV-feed SR-feed PR- feed R1 R2 R1 R2 R1 R2 R1 R2 R1 R2 SVM 1 2.94 2.82 6.13 2.63 3.01 2.62 1.12 3.01 1.70 0.73 2.96 1.39 0.60 2.96 1.17 0.50 2.92 SVM 2 8.37 8.13 10.46 4.48 8.36 3.06 1.31 8.55 2.14 0.92 8.04 1.66 0.71 8.22 1.32 0.56 7.97 SVM 3 4.09 4.35 4.12 1.76 4.46 2.27 0.97 4.46 2.27 0.97 4.32 1.26 0.54 4.18 0.95 0.41 4.23 MV 2.54 2.49 2.35 X X X 2.58 WMV 1.69 1.83 X 1.79 X X 1.74 SR 1.46 1.41 X X 1.36 X 1.41 PR 1.22 1.17 X X X 1.22 1.17 SI 91.29 91.30 90.98 90.91 91.32 91.24 91.55 ll-feed TBLE II. BF, FEEDBCK - P 2 P 3 P 4 P 5 - feed MV-feed WMV-feed SR-feed PR- feed R1 R2 R1 R2 R1 R2 R1 R2 R1 R2 BF 1 6,81 5,97 11,54 4,95 6,44 5,63 2,41 6,67 3,84 1,65 6,72 3,99 1,71 6,63 3,38 1,45 6,53 BF 2 12,55 11,51 14,49 6,21 11,70 7,10 3,04 11,94 5,31 2,27 12,08 4,45 1,90 12,08 3,77 1,61 12,27 BF 3 10,62 9,59 10,92 4,68 10,24 5,74 2,46 10,24 5,74 2,46 10,29 3,71 1,59 10,34 2,87 1,23 11,04 MV 6,44 5,64 5,97 X X X 6.63 WMV 4,56 4,14 X 4,23 X X 4.70 SR 3,67 3,24 X X 3,52 X 3.81 PR 3,10 2,82 X X X 3,01 3.10 SI 84,05 85,34 84,70 84,52 84,37 84,32 84.35 ll-feed Nofeed Nofeed TBLE III. BF, 3 FEEDBCK LNING ITTIONS - P 2 P 3 P 4 P 5 - feed MV-feed WMV-feed SR-feed PR- feed llfeed R1 R2 R1 R2 R1 R2 R1 R2 R1 R2 BF 1 6,81 5,59 30.19 12.93 6.16 15,06 6,45 6,48 10,41 4,46 6,44 10,50 4,50 6.58 8,96 3,84 6,77 BF 2 12,55 10.71 38.27 16.40 11,28 18,00 7,71 11,70 13,48 5,78 11.61 11,64 4,99 11.56 9,68 4,15 12,45 BF 3 10,62 9.12 29.36 12.58 9.68 15,00 6,43 9.63 15,08 6,46 10,20 9,46 4,05 10,39 7,46 3,20 11,61 MV 6,44 5.60 5,39 X X X 6.91 WMV 4,56 4.09 X 4,13 X X 4.79 SR 3,67 3.01 X X 3,34 X 3.99 PR 3,10 2.63 X X X 2.76 3.24 SI 84,05 86.47 85.26 84,96 84.56 84,32 85.37 Nofeed TBLE IV. FEEDBCK - P 2, P 3, P 4, P 5 C-feed MP- feed WMV- Feed SR- Feed PR- Feed llfeed R1 R2 R1 R2 R1 R2 R1 R2 R1 R2 SVM 2.07 2.10 1,60 0,17 2.06 1,18 0,13 2.07 1,33 0,14 2.08 1,18 0,13 2.08 1,15 0,12 1.91 NB 4.18 4.02 4,18 0,45 4.10 2,25 0,24 4.11 2,23 0,24 4.13 2,38 0,25 4.11 2,28 0,24 3.93 MP 3.48 3.44 3.51 X X X 3.32 WMV 2.54 2.54 X 2.54 X X 2.35 SR 2.35 2.35 X X 2.30 X 2.41 PR 2.21 2.22 X X X 2.21 2.17 SI 94.24 94.42 94.33 94.31 94.31 94.30 94.62 In the case of SVM classfers, slght mprovements have been observed n terms of whle confermng the general trend between dfferent feedback strateges already reported n table I. much more nterestng trend has been observed n the case of NB classfers (table III) where 3 re-tranng teratons have been consdered. The spread between performance obtaned wth a sngle expert strategy and a ME one s sensly lower than the case of a sngle teraton (table II), moreover MV-feed s able to outperform feedback at sngle expert level. It s also of nterest to observe that, due to over-fttng, results obtaned gvng the entre set of new samples for feedback learnng provdes a decreasng of performance as the number of teratons ncrease. Fnally, table IV reports results related to the use of a unque feature set F= F 1 F 2 F 3, the two dfferent classfers and a reduced set of samples provded for feedback learnng. In partcular P 1 s used for the ntal tranng and P 6 for testng. P 2, P 3, P 4, P 5 are ndependently 39

used, one from the other, for feedback learnng, performance were evaluated for each set and the average s fnally reported. The frst consderaton s that the performed by SVM s so low that t appears un-useful combnng t wth BF (no complementary nfo s added). Of course ths represents an extreme workng pont. Feedback at ME level s able to outperform ll-feed n the only case of Sum Rule (SR) by usng a reduced subset of the avalable new patterns. In all other cases, results provded by the ME feedback approach are equal to those obtaned by feedback at sngle expert level. statstcal sgnfcance of the <0.03 level was acheved for all tests performed and here reported. V. CONCLUSIONS Ths paper shows the possblty to mprove the effectveness of a mult-classfer system, when new labeled data are avalable, by a sutable use of the nformaton extracted from the collectve behavor of the classfers. Experments have been performed consderng state of the art classfers, features and combnatons technques. It has been showed that performance of feedback tranng strctly depend by the classfer structure, by the combnaton strategy of the ME whch s responsble for sample selecton, but also by the data dstrbuton, and the smlarty between samples n the feedback set and samples of the testng set. It has also showed that multple tranng teratons on the same set of data are able to mprove performance both n the case of feedback at sngle and mult expert level. Fnally, also n cases of whch the cardnalty of the new selected tranng set s neglgble f compared to that of the ntal tranng set, the feedback strategy s able to produce mprovements. Future works wll nspect deeply the possblty of teratve re-tranng gven a set of new labeled samples as well as the possblty of evaluate the approaches on the task of sem-supervsed learnng. REFENCES [1] J. Kttler, M. Hatef, R.P.W. Dun, J. Matas, "On combnng classfers", IEEE Trans. on PMI, Vol.20, no.3, pp.226-239, 1998. [2] R. Plamondon, S.N. Srhar, On-lne and Off-lne Handwrtng Recognton: comprehensve survey, IEEE Trans. on PMI, Vol. 22, n.1, pp. 63-84, 2000. [3] C.Y. Suen, C. Nadal, R. Legault, T.. Ma, L. Lam, Computer Recognton of unconstraned handwrtten numerals, Proc. IEEE, Vol. 80, Issue 7, pp. 1162-1180, 1992. [4] C.Y. Suen, J. Tan, nalyss of errors of handwrtten dgts made by a multtude of classfers, Pattern Recognton Letters, Vol. 26, Issue 3, pp. 369-379, 2005. [5] G. Prlo, D. Impedovo Fuzzy-Zonng-Based Classfcaton for Handwrtten Characters, IEEE Trans. on Fuzzy Systems, Vol. 19, Issue 4, pp. 780-785, 2011. [6] C.L. Lu, K. Nakashma, H. Sako, H. Fujsawa, Handwrtten dgt recognton: benchmarkng of state-of-the-art technques, Pattern Recognton, Vol. 36, No. 10, pp. 2271-2285, 2003. [7] R.E. Schapre, The strength of weak learnablty, Machne Learnng, vol. 5, Issue 2, pp. 197-227, 1990. [8] R. Polkar, Bootstrap-Inspred Technques n Computatonal Intellgence, IEEE Sgnal Processng Magazne, Vol. 24, No. 4, pp. 59-72, 2007. [9] Y. Freud and R.E. Schapre, Decson-theoretc generalzaton of on-lne learnng and an applcaton to boostng, J. of Computer and System Scences, vol. 55, no. 1, pp. 119-139, 1997. [10] V. Frnken, H. Bunke, Evaluatng Retranng Rules for Sem- Supervsed Learnng n Neural Network Based Cursve Word Recognton, proc. of ICDR, pp. 31-35, 2009. [11] V. Frnken,. Fscher, H. Bunke,. Fornes, Co-Tranng for Handwrtten Word Recognton, n proc. of ICDR, pp. 314-318, 2011. [12]. Blum and T. Mtchell, Combnng Labeled and Unlabeled Data wth Co-Tranng. CM Proc. of COLT, pp. 92 100, 1998. [13] H. J. Scudder, Probablty of Error of Some daptve Pattern-Recognton Machnes. IEEE Trans. on Informaton Theory, Vol. 11, No. 3, pp. 363 371, 1965. [14] U. Uludag,. Ross,. Jan, Bometrc template selecton and update: a case study n fngerprnts, Pattern Recognton, Vol. 37, Issue 7, pp. 1533 1542, 2004. [15] D. Impedovo, G. Prlo, "Updatng Knowledge n Feedbackbased Mult-Classfer Systems", n IEEE proc. of Internatonal Conference on Document nalyss and Recognton (ICDR2011), pp. 227-231, 2011. [16] D. Impedovo, G. Prlo, L. Sarcnella, E. Stasolla, rtfcal Classfer generaton for Mult-Expert System Evaluaton, n IEEE Proc. Of Internatonal Conference on Fronters n Handwrtng Recognton (ICFHR 2010), pp. 421-426, 2010. [17] D. Impedovo, G. Prlo, M. Petrone, " mult-resoluton multclassfer system for speaker verfcaton", early vew on Wley Expert Systems, 2011. [18] G. Prlo, C.. Trullo, D. Impedovo, Feedback-Based Mult-Classfer System, IEEE proc. of Internatonal Conference on Document nalyss and Recognton (ICDR 2009), pp. 713-717, 2009. 40