Transformation Invariance in Pattern Recognition: Tangent Distance and Propagation

Size: px
Start display at page:

Download "Transformation Invariance in Pattern Recognition: Tangent Distance and Propagation"

Transcription

1 Transformation Invariance in Pattern Recognition: Tangent Distance and Propagation Patrice Y. Simard, 1 Yann A. Le Cun, 2 John S. Denker, 2 Bernard Victorri 3 1 Microsoft Research, 1 Microsoft Way, Redmond, WA E-mai: patrice@microsoft.com 2 AT&T Labs, 100 Schuz Dr., Redbank, NJ, E-mai: (Y.A.L): yann@research.att.com; E-mai (J.S.D.): jsd@research.att.com 3 LATTICE-CNRS, ENS Paris, France. E-mai: victorri@ens.fr ABSTRACT: In pattern recognition, statistica modeing, or regression, the amount of data is a critica factor affecting the performance. If the amount of data and computationa resources are unimited, even trivia agorithms wi converge to the optima soution. However, in the practica case, given imited data and other resources, satisfactory performance requires sophisticated methods to reguarize the probem by introducing a priori knowedge. Invariance of the output with respect to certain transformations of the input is a typica exampe of such a priori knowedge. We introduce the concept of tangent vectors, which compacty represent the essence of these transformation invariances, and two casses of agorithms, tangent distance and tangent propagation, which make use of these invariances to improve performance John Wiey & Sons, Inc. Int J Imaging Syst Techno, 11, , 2000 I. INTRODUCTION Pattern Recognition is one of the main tasks of bioogica information processing systems, and a major chaenge of computer science. The probem of pattern recognition is to cassify objects into categories, given that objects in a particuar category may have widey varying features and objects in different categories may have quite simiar features. A typica exampe is handwritten digit recognition. Characters, typicay represented as fixed-size images (e.g., pixes), must be cassified into 1 of 10 categories using a cassification function. Buiding such a cassification function is a major technoogica chaenge, as irreevant variabiities among objects of the same cass must be eiminated and meaningfu differences between objects of different casses must be identified. These cassification functions for most rea-pattern recognition tasks are too compicated to be synthesized by hand using ony what humans know about the task. Instead, we use sophisticated techniques that combine humans a priori knowedge with information automaticay extracted from a set of abeed exampes (the training set). These techniques can be divided into two camps, according to the number of parameters they require: the memory-based agorithms, which in effect store a sizeabe subset of the entire training set, and Correspondence to: P. Simard The majority of this work was competed at AT&T Labs. the earned-function techniques, which earn by adjusting a comparativey sma number of parameters. This distinction is arbitrary because the patterns stored by a memory-based agorithm can be considered the parameters of a very compex earned function. The distinction is, however, usefu in this work. This is because memorybased agorithms often rey on a metric that can be modified to incorporate transformation invariances; the earned-function agorithms consist of seecting a cassification function, the derivatives of which can be constrained to refect the same transformation invariances. The two methods for incorporating invariances are different enough to justify two independent sections. A. Memory-Based Agorithms. To compute the cassification function, many practica pattern recognition systems and severa bioogica modes simpy store a the exampes, together with their abes, in a memory. Each incoming pattern can then be compared with a the stored prototypes. The abes associated with the prototypes that best match the input determine the output. The above method is the simpest exampe of the memory-based modes. Memory-based modes require three things: a distance measure to compare inputs to prototypes, an output function to produce an output by combining the abes of the prototypes, and a storage scheme to buid the set of prototypes. A three aspects have been abundanty treated in the iterature. Output functions range from simpy voting the abes associated with the k cosest prototypes (K-Nearest Neighbors) to computing a score for each cass as a inear combination of the distances to a the prototypes, using fixed (Parzen, 1962) or earned (Broomhead and Lowe, 1988) coefficients. Storage schemes vary from storing the entire training set and picking appropriate subsets of it (Dasarathy, 1991) to storing earned patterns such as earning vector quantization (LVQ; Kohonen, 1984). Distance measures can be as simpe as the Eucidean distance, assuming the patterns and prototypes are represented as vectors, or more compex as in the generaized quadratic metric (Fukunaga and Fick, 1984) or in eastic matching methods (Hinton et a., 1992). A simpe but inefficient pattern recognition method is to use a simpe distance measure, such as Eucidean distance between vectors representing the raw input, combined with a arge set of proto John Wiey & Sons, Inc.

2 P is transformed (e.g., rotated) according to a transformation s(p, ) that depends on one parameter (e.g., the ange of the rotation), the set of a the transformed patterns S P x for which x sp, (1) Figure 1. According to the Eucidean distance, the pattern to be cassified is more simiar to prototype B. A better distance measure woud find that prototype A is coser because it differs mainy by a rotation and a thickness transformation, two transformations that shoud eave the cassification invariant. types. This method is inefficient because amost a possibe instances of a category must be present in the prototype set. In the case of handwritten digit recognition, this means that digits of each cass in a possibe positions, sizes, anges, writing styes, ine thicknesses, and skews must be stored. In rea situations, this approach eads to impracticay arge prototype sets or to mediocre recognition accuracy (Fig. 1). An unabeed image of a thick, santed 9 must be cassified by finding the cosest prototype image from two images representing, respectivey, a thin, upright 9 and a thick, santed 4. According to the Eucidean distance (sum of the squares of the pixe to pixe differences), the 4 is coser. The resut is an incorrect cassification. The cassica way of deaing with this probem is to use a so-caed feature extractor. Its purpose is to compute a representation of the patterns that is minimay affected by transformations of the patterns that do not modify their category. For character recognition, the representation shoud be invariant with respect to position, size changes, sight rotations, distortions, or changes in ine thickness. The design and impementation of feature extractors is the major botteneck of buiding a pattern recognition system. For exampe, the probem iustrated in Figure 1 can be soved by desanting and thinning the images. An aternative is to use an invariant distance measure constructed in such a way that the distance between a prototype and a pattern wi not be affected by irreevant transformations of the pattern or of the prototype. With an invariant distance measure, each prototype can match many possibe instances of pattern, thereby greaty reducing the number of prototypes required. The natura way of doing this is to use deformabe prototypes. During the matching process, each prototype is deformed so as to best fit the incoming pattern. The quaity of the fit, possiby combined with a measure of the amount of deformation, is then used as the distance measure (Hinton et a., 1992). With the exampe of Figure 1, the 9 prototype woud be rotated and thickened so as to best match the incoming 9. This approach has two shortcomings. First, a set of aowed deformations must be designed based on a priori knowedge. Fortunatey, this is feasibe for many tasks, incuding character recognition. Second, the search for the best-matching deformation is often enormousy expensive and/or unreiabe. Consider the case of patterns that can be represented by vectors. For exampe, the pixe vaues of a pixe character image can be viewed as the components of a 256-dimensiona (256-D) vector. One pattern, or one prototype, is a point in this 256-D space. Assuming that the set of aowabe transformations is continuous, the set of a the patterns that can be obtained by transforming one prototype using one or a combination of aowabe transformations is a surface in the 256-D pixe space. More precisey, when a pattern is a 1-D curve in the vector space of the inputs. In the remainder of this study, we wi aways assume that we have chosen s to be differentiabe with respect to both P and, such that s(p, 0) P. When the set of transformations is parameterized by n parameters i, the intrinsic dimension of the manifod S P is n. For exampe, if the aowabe transformations of character images are horizonta and vertica shifts, rotations, and scaing, the surface wi be a 4-D manifod. In genera, the manifod wi not be inear. Even a simpe image transation corresponds to a highy noninear transformation in the high-dimensiona pixe space. For exampe, if the image of an 8 is transated upward, some pixes osciate from white to back and back severa times. Matching a deformabe prototype to an incoming pattern now amounts to finding the point on the surface that is at a minimum distance from the point representing the incoming pattern. This noninearity makes the matching much more expensive and unreiabe. Simpe minimization methods such as gradient descent (or conjugate gradient) can be used to find the minimum distance point. However, these methods ony converge to a oca minimum. In addition, running such an iterative procedure for each prototype is usuay prohibitivey expensive. If the set of transformations happens to be inear in pixe space, then the manifod is a inear subspace (a hyperpane). The matching procedure is then reduced to finding the shortest distance between a point (vector) and a hyperpane, which is an easy-to-sove quadratic minimization probem. This specia case has been studied and is sometimes referred to as Procrustes anaysis (Sibson, 1978). It has been appied to signature verification (Hastie et a., 1991) and on-ine character recognition (Sinden and Wifong, 1992). This study considers the more genera case of noninear transformations such as geometric transformations of gray-eve images. Remember that even a simpe image transation corresponds to a highy noninear transformation in the high-dimensiona pixe space. The main idea of this study is to approximate the surface of possibe transforms of a pattern by its tangent pane at the pattern, thereby reducing the matching to finding the shortest distance between two panes. This distance is caed the tangent distance. The resut of the approximation is shown in Figure 2, in the case of rotation for handwritten digits. The theoretica curve in pixe space which represents Eq. (1), together with its inear approximation, is shown in Figure 2 (top). Points of the transformation curve are depicted beow for various amounts of rotation (each ange corresponds to a vaue of ). Figure 2 (bottom) depicts the inear approximation of the curve s(p, ) given by the Tayor expansion of s around 0: sp, sp, 0 sp, O 2 P T (2) This inear approximation is competey characterized by the point P sp, and the tangent vector T. Tangent vectors, aso caed the Lie derivatives of the transformation s, wi be discussed in 182 Vo. 11, (2000)

3 Figure 4. Learning a given function (soid ine) from a imited set of exampes (x 1 to x 4 ). The fitted curves are shown by a dotted ine. (Left) The ony constraint is that the fitted curve goes through the exampes. (Right) The fitted curves go through each exampe and its derivatives evauated at the exampes agree with the derivatives of the given function. ony the unknown pattern (one-sided distance). Athough we concentrate on using tangent distance to recognize images, the method can be appied to many different types of signas, such as tempora signas, speech, and sensor data. Figure 2. (Top) Representation of the effect of the rotation in pixe space. (Midde) Sma rotations of an origina digitized image of the digit 2 for different ange vaues of. (Bottom) Images obtained by moving aong the tangent to the transformation curve for the same origina digitized image P by adding various amounts () of the tangent vector T. Section IV. For reasonaby sma anges ( 1), the approximation is very good (Fig. 2). Figure 3 iustrates the difference among the Eucidean distance, the fu invariant distance (minimum distance between manifods), and the tangent distance. Both the prototype and the pattern are deformabe (two-sided distance). However, for simpicity or efficiency reasons, it is aso possibe to deform ony the prototype or Figure 3. Iustration of the Eucidean distance and the tangent distance between P and E. The curves S p and S e represent the sets of points obtained by appying the chosen transformations (e.g., transations and rotations) to P and E. The ines going through P and E represent the tangent to these curves. Assuming that working space has more dimensions than the number of chosen transformations (on the diagram, assume one transformation in a 3-D space), the tangent spaces do not intersect and the tangent distance is uniquey defined. B. Learned-Function Agorithms. Rather than trying to keep a representation of the training set, it is aso possibe to choose a cassification function by earning a set of parameters. This is the approach taken in neura networks, curve fitting, and regression. We assume that a data are drawn independenty from a given statistica distribution, and our earning machine is characterized by the set of functions it can impement, G w ( x), indexed by the vector of parameters w. We write F( x) to represent the correct or desired abeing of the point x. The task is to find a vaue for w such that G w best approximates F. We can use a finite set of training data to hep find this vector. We assume the correct abeing F( x) is known for a points in the training set. For exampe, G w may be the function computed by a neura net having weights w, org w may be a poynomia having coefficients w. Without additiona information, finding a vaue for w is an i-posed probem uness the number of parameters is sma and/or the size of the training set is arge. This is because the training set does not provide enough information to distinguish the best soution among a the candidate ws (Fig. 4, eft). The desired function F (soid ine) is to be approximated by a functions G w (dotted ine) from four exampes {(x i, F(x i ))} i1,2,3,4. As exempified in Figure 4, the fitted function G w argey disagrees with the desired function F between the exampes, but it is not possibe to infer this from the training set aone. Many vaues of w can generate many different functions of G w, some of which may be terribe approximations of F, even though they are in compete agreement with the training set. Because of this, it is customary to add reguarizers, or additiona constraints, to restrict the search of an acceptabe w. For exampe, we may require the function G w to be smooth, by adding the constraint that w 2 shoud be minimized. It is important that the reguarizer refects a property of F. Hence, reguarizers depend on a priori knowedge about the function to be modeed. Seecting a good famiy {G w, w q } of functions is a difficut task, sometimes known as mode seection (Hastie and Tibshirani, 1990; Hoer and Kennard, 1970). If contains a arge famiy of functions, it is more ikey that it wi contain a good approximation of F (the function we are trying to approximate). However, it is aso more ikey that the seected candidate (using the training set) wi generaize poory because many functions in wi agree with the training data and take outrageous vaues between the training sampes. If, on the other hand, contains a sma famiy of Vo. 11, (2000) 183

4 functions, it is more ikey that a function G w that fits the data wi be a good approximation of F. The capacity of the famiy of functions is often referred to as the VC dimension (Vapnik, 1982; Vapnik and Chervonenkis, 1971). If a arge amount of data is avaiabe, shoud contain a arge famiy of functions (high VC dimension), so that more functions can be approximated, and in particuar, F. If, on the other hand, the data are scarce, shoud be restricted to a sma famiy of functions (ow VC dimension), to contro the vaues between the (more distant) sampes. 1 The VC dimension can aso be controed by putting a knob on how much effect is given to some reguarizers. For instance, it is possibe to contro the capacity of a neura network by adding weight decay as a reguarizer. Weight decay is a heuristic that favors smooth cassification functions, by making a tradeoff by decreasing w 2 at the cost, usuay, of sighty increased error on the training set. Because the optima cassification function is not necessariy smooth, for instance at a decision boundary, the weight decay reguarizer can have adverse effects. As mentioned earier, the reguarizer shoud refect interesting properties (a priori knowedge) of the function to be earned. If the functions F and G w are assumed to be differentiabe, which is generay the case, the search for G w can be greaty improved by requiring that the derivatives of G w evauated at the points { x i } are more or ess equa (this is the reguarizer knob) to the derivatives of F at the same points (Fig. 4, right). This resut can be extended to mutidimensiona inputs. In this case, we can impose the equaity of the derivatives of F and G w in certain directions, not necessariy in a directions of the input space. Such constraints find immediate use in traditiona pattern recognition probems. It is often the case that a priori knowedge is avaiabe on how the desired function varies with respect to some transformations of the input. It is straightforward to derive the corresponding constraint on the directiona derivatives of the fitted function G w in the directions of the transformations (previousy named tangent vectors). Typica exampes can be found in pattern recognition where the desired cassification function is known to be invariant with respect to some transformation of the input such as transation, rotation, and scaing. In other words, the directiona derivatives of the cassification function in the directions of these transformations is zero (Fig. 4). The right part of Figure 4 shows how the additiona constraints on G w hep generaization by constraining the vaues of G w outside the training set. For every transformation that has a known effect on the cassification function, a reguarizer can be added in the form of a constraint on the directiona derivative of G w in the direction of the tangent vector (such as the one depicted in Fig. 2), computed from the curve of transformation. Section II anayzes in detai how to use distance based on a tangent vector in memory-based agorithms; Section III discusses the use of tangent vectors in neura networks, with the tangent propagation agorithm; and Section IV compares different agorithms to compute tangent vectors. II. TANGENT DISTANCE The Eucidean distance between two patterns P and E is in genera not appropriate because it is sensitive to irreevant transformations 1 Note that this point of view aso appies to memory-based systems. In the case where a the training data can be kept in memory, however, the VC dimension is infinite, and the formaism is meaningess. The VC dimension is a earning paradigm and is not usefu uness earning is invoved. of P and of E. In contrast, the transformed distance (E, P) is defined to be the minima distance between the two manifods S P and S E. Therefore, it is invariant with respect to the transformation used to generate S P and S E (Fig. 3). Unfortunatey, these manifods have no anaytic expression in genera. Finding the distance between them is a difficut optimization probem with mutipe oca minima. Besides, true invariance is not necessariy desirabe because a rotation of a 6 into a 9 does not preserve the correct cassification. Our approach consists of computing the minimum distance between the inear surfaces that best approximate the noninear manifods S P and S E. This soves three probems at once: (1) inear manifods have simpe anaytic expressions that can be easiy computed and stored, (2) finding the minimum distance between inear manifods is a simpe east-squares probem that can be soved efficienty, and (3) this distance is ocay, not gobay, invariant. Thus, the distance between a 6 and a sighty rotated 6 is sma but the distance between a 6 and a 9 is arge. The different distances between P and E are represented schematicay in Figure 3. Figure 3 represents two patterns P and E in 3-D space. The manifods generated by s are represented by 1-D curves going through E and P, respectivey. The inear approximations to the manifods are represented by ines tangentia to the curves at E and P. These ines do not intersect in three dimensions and the shortest distance between them (uniquey defined) is D(E, P). The distance between the two noninear transformation curves (E, P) is aso shown on Figure 3. An efficient impementation of the tangent distance D(E, P) is given Section IIA using image recognition as an iustration. We then compare our methods with the best-known competing methods. Finay, we discuss possibe variations on the tangent distance and how it can be generaized to probems other than pattern recognition. A. Impementation. We describe formay the computation of the tangent distance. Let the function s transform an image P to s(p, ) according to the parameter. We require s to be differentiabe with respect to and P and require s(p, 0) P. For exampe, if P is a 2-D image, s(p, ) coud be a rotation of P by the ange. If we are interested in a transformations of images that conserve distances (isometry), s(p, ) woud be a rotation by foowed by a transation by x, y of the image P. In this case (, x, y ) is a vector of parameters of dimension 3. In genera, ( 1,..., m ) is of dimension m. Because s is differentiabe, the set S P { x? for which x s(p, )} is a differentiabe manifod that can be approximated to the first order by a hyperpane T P. This hyperpane is a tangent to S P at P and is generated by the coumns of matrix: sp, L P 0 sp,,..., 1 sp, m (3) 0 which are vectors tangentia to the manifod. If E and P are two patterns to be compared, the respective tangent panes T E and T P can be used to define a new distance D between these two patterns. The tangent distance D(E, P) between E and P is defined by: DE, P min x y 2 (4) xt E,yT P The equation of the tangent panes T E and T P is given by: 184 Vo. 11, (2000)

5 E E E L E E (5) P P P L P P (6) where L E and L P are the matrices containing the tangent vectors (Eq. 3) and the vectors E and P are the coordinates of E and P (using bases L E and L P ) in the corresponding tangent panes. Note that E, E, L E, and E denote vectors and matrices in inear Eq. (5). For exampe, if the pixe space was of dimension 5, and there were two tangent vectors, we coud rewrite Eq. (5) as: E1 E 2 E1 E 2 E 3 E 3 E 4 E 4 E 5 E 5 L11 L12 L 21 L 22 1 L 31 L 32 L 41 L 42 2 (7) L 51 L 52 The quantities L E and L P are attributes of the patterns so, in many cases, they can be precomputed and stored. Computing the tangent distance: DE, P min E, P E E P P 2 min E, P d E, p (8) where d( E, p ) E( E ) P( P ) 2, amounts to soving a inear east-squares probem. In the interest of carity and to make the computationa costs more apparent, the detais of the computation of P and E are speed out (the advanced reader can skip to the next section). The optimaity condition is that the partia derivatives of d( E, P ) with respect to P and E shoud be zero: d E, P E 2E E P P L E 0 (9) d E, P P 2P P E E L P 0 (10) Substituting E and P by their expressions yieds to the foowing inear system of equations, which we must sove for P and E : The soution of this system is: L P E P L P P L E E 0 (11) L E E P L P P L E E 0 (12) L PE L 1 EE L E L P E P L PE L 1 EE L EP L PP P (13) L EP L 1 PP L P L E E P L EE L EP L 1 PP L PE E (14) where L EE L E L E, L PE L P L E, L EP L E L P, and L PP L P L P. LU decompositions of L EE and L PP can be precomputed. The most expensive part in soving this system is evauating L EP (L PE can be obtained by transposing L EP ). It requires m E m P dot products, where m E is the number of tangent vectors for E and m P is the number of tangent vectors for P. Once L EP has been computed, P and E can be computed by soving two (sma) inear systems of, respectivey, m E and m P equations. The tangent distance is obtained by computing E( E ) P( P ) using the vaue of P and E in Eqs. (5) and (6). If n is the dimension of the input space (i.e., the ength of vectors E and P), the agorithm described above requires roughy n(m E 1)(m P 1) 3(m E 3 m P 3 ) mutipyadds. Approximations to the tangent distance can, however, be computed more efficienty. B. Some Iustrative Resuts Loca Invariance. The oca 2 invariance of tangent distance can be iustrated by transforming a reference image by various amounts and measuring its distance to a set of prototypes. Figure 5 (bottom) shows 10 typica handwritten digit images. One of them, the digit 3, is chosen to be the reference. The reference is transated horizontay by the amount indicated in the abscissa. There are 10 curves for Eucidean distance and 10 more curves for tangent distance, measuring the distance between the transated reference and 1 of the 10 digits. Because the reference was chosen from the 10 digits, it is not surprising that the curve corresponding to the digit 3 goes to 0 when the reference is not transated (0 pixe transation). It is cear from Figure 5 that if the reference (the image 3 ) is transated by more than two pixes, the Eucidean distance wi confuse it with other digits, namey 8 or 5. In contrast, there is no possibe confusion when tangent distance is used. As a matter of fact, in this exampe, the tangent distance correcty identifies the reference up to a transation of five pixes. Simiar curves were obtained with a the other transformations (e.g., rotation and scaing). The oca invariance of tangent distance with respect to sma transformations generay impies more accurate cassification for much arger transformations. This is the singe most important feature of tangent distance. The ocaity of the invariance has another important benefit: oca invariance can be enforced with very few tangent vectors. The reason is that for infinitesima (oca) transformations, there is a direct correspondence 3 between the tangent vectors of the tangent pane and the various compositions of transformations. For exampe, the three tangent vectors for X-transation, Y-transation, and rotations around the origin generate a tangent pane corresponding to a the possibe compositions of horizonta transations, vertica transations, and rotations. The resuting tangent distance is then ocay invariant to a the transations and a the rotations (around any center). Figure 6 further iustrates this phenomenon by dispaying points in the tangent pane generated from ony five tangent vectors. Each of these images ooks ike it has been obtained by appying various combinations of scaing, rotation, horizonta and vertica skewing, and thickening. Yet, the tangent distance between any of these points and the origina image is 0. Handwritten Digit Recognition. Experiments were conducted to evauate the performance of tangent distance for handwritten digit recognition. An interesting characteristic of digit images is that we can readiy identify a set of oca transformations that do not affect the identity of the character, whie covering a arge portion of the set of possibe instances of the character. Seven such image transformations were identified: X- and Y-transations, rotation, scaing, two hyperboic transformations (which can generate shearing and 2 Loca invariance refers to invariance with respect to sma transformations (i.e., a rotation of a very sma ange). In contrast, goba invariance refers to invariance with respect to arbitrariy arge transformations (i.e., a rotation of 180 ). Goba invariance is not desirabe in digit recognition, because we need to distinguish a 6 from a 9. 3 An isomorphism actuay, see Lie agebra in (Choquet-Bruhat et a., 1982) Vo. 11, (2000) 185

6 Figure 5. Eucidean and tangent distances among 10 typica images of handwritten digits and a transated image of the digit 3. The abscissa represents the amount of horizonta transation (measured in pixes). squeezing), and ine thickening or thinning. The first six transformations were chosen to span the set of a possibe inear coordinate transforms in the image pane. (Nevertheess, they correspond to highy noninear transforms in pixe space.) Additiona transformations have been tried with ess success. Three databases were used to test our agorithm: 1. U.S. Posta Service (USPS) database: The database consisted of pixe size-normaized images of handwritten digits taken from U.S. mai enveopes. The training and testing set had, respectivey, 9,709 and 2,007 exampes. 2. NIST1 database: The second experiment was a competition organized by the Nationa Institute of Standards and Technoogy (NIST) in spring The object of the competition was to cassify a test set of 59,000 handwritten digits, given a training set of 223,000 patterns. 3. NIST2 database: The third experiment was performed on a database made out of the training and testing database provided by the NIST (see above). NIST had divided the data into two sets, which unfortunatey had different distributions. The NIST1 training set (223,000 patterns) was easier than the testing set (59,000 patterns). In our NIST2 experiments, we combined these two sets 50/50 to make a training set of 60,000 patterns and testing and vaidation sets of 10,000 patterns each, a having the same characteristics. For each of these three databases, we tried to evauate human performance to benchmark the difficuty of the database. For the Figure 6. (Left) Origina image. (Midde) Five tangent vectors corresponding, respectivey, to the five transformations: scaing, rotation, expansion of the X axis whie compressing the Y axis, expansion of the first diagona whie compressing the second diagona, and thickening. (Right) 32 points in the tangent space generated by adding or subtracting each of the five tangent vectors. 186 Vo. 11, (2000)

7 Tabe I. Performances in percent of errors for (in order) human, K- nearest neighbor (K-NN), tangent distance (TD), Lenet1 (simpe neura network), Lenet4 (arge neura network), optima margin cassifier (OMC), oca earning (LL) and boosting (Boost). Human K-NN TD Lenet1 Lenet4 OMC LL Boost USPS NIST NIST USPS, two members of our group went through the test set and both obtained a 2.5% raw error performance. The human performance on NIST1 was provided by the NIST. The human performance on NIST2 was measured on a sma subsampe of the database and must therefore be taken with caution. Severa of the eading agorithms were tested on each of these databases. The first experiment used the K-nearest neighbor agorithm, using the ordinary Eucidean distance. The prototype set consisted of a avaiabe training exampes. A 1-nearest neighbor rue gave optima performance in USPS, whereas a 3-nearest neighbors rue performed better in NIST2. The second experiment was simiar to the first, but the distance function was changed to tangent distance with seven transformations. For the USPS and NIST2 databases, the prototype set was constructed as before. However, for NIST1, it was constructed by cycing through the training set. Any patterns that were miscassified were added to the prototype set. After a few cyces, no more prototypes are added (the training error was 0). This resuted in 10,000 prototypes. A 3-nearest neighbors rue gave optima performance on this set. Other agorithms such as neura nets (LeCun et a., 1990, 1995), optima margin cassifier (Cortes and Vapnik, 1995), oca earning (Bottou and Vapnik, 1992), and boosting (Drucker et a., 1993) were aso used on these databases. A case study can be found in LeCun et a. (1995). The resuts are summarized in Tabe 1. As iustrated in Tabe 1, the tangent distance agorithm equas or outperforms a other agorithms we tested, in a cases except one: boosted Lenet 4 was the winner on the NIST2 database. This is not surprising. The K-nearest neighbor agorithm (with no preprocessing) is very unsophisticated compared with oca earning, optima margin cassifier, and boosting. The advantage of tangent distance is the a priori knowedge of transformation invariance embedded into the distance. When the training data are sufficienty arge, as is the case in NIST2, some of this knowedge can be picked up from the data by the more sophisticated agorithms. In other words, the vaue of a priori knowedge decreases as the size of the training set increases. C. How to Make Tangent Distance Work. This section is dedicated to the technoogica know how, which is necessary to make tangent distance work with various appications. Tricks of this sort are usuay not pubished for various reasons (e.g., they are not aways theoreticay sound, page area is too vauabe, the tricks are specific to one particuar appication, and commercia competitive considerations discourage teing everyone how to reproduce the resut). However, they are often a determining factor in making the technoogy a success. Severa of these techniques wi be discussed here. Smoothing the Input Space. This is the singe most important factor in obtaining good performance with tangent distance. By definition, the tangent vectors are the Lie derivatives of the transformation function s(p, ) with respect to. They can be written as: L P sp, sp, sp, 0 im 30 (15) It is therefore important that s be differentiabe (and we behaved) with respect to. In particuar, it is cear from Eq. (15) that s(p, ) must be computed for arbitrariy sma. Fortunatey, even when P can ony take discrete vaues, it is easy to make s differentiabe. The trick is to use a smoothing interpoating function C as a preprocessing for P, such that s(c (P), ) is differentiabe (with respect to C (P) and, not with respect to P). For instance, if the input space for P is binary images, C (P) can be a convoution of P with a Gaussian function of standard deviation. Ifs(C (P), ) isa transation of pixes, the derivative of s(c (P), ) can easiy be computed. This is because s(c (P), ) can be obtained by transating Gaussian functions. Preprocessing is discussed in more detai in Section IV. The smoothing factor contros the ocaity of the invariance. The smoother the transformation curve defined by s, the onger the inear approximation wi be vaid. In genera, the best smoothing is the maximum smoothing that does not bur the features. For exampe, in handwritten character recognition with pixe images, a Gaussian function with a standard deviation of one pixe yieded the best resuts. Increased smoothing ed to confusion (such as a 5 mistaken for a 6 because the ower oop had been cosed by the smoothing) and decreased smoothing did not make fu use of the invariance properties. If the avaiabe computation time aows it, the best strategy is to extract features first, smooth shameessy, and then compute the tangent distance on the smoothed features. Controed Deformation. The inear system given in Eq. (8) is singuar if some of the tangent vectors for E or P are parae. Athough the probabiity of this happening is zero when the data are taken from a rea-vaued continuous distribution (as is the case in handwritten character recognition), it is possibe that a pattern may be dupicated in both the training and the test set, resuting in a division by zero error. The fix is simpe and eegant. Equation (8) can be repaced by Eq. (16): DE, P min E, P E L E E P L P P 2 kl E E 2 kl P P 2 (16) The physica interpretation of this equation is iustrated in Fig. 7. The point E( E ) on the tangent pane T E is attached to E with a spring with spring constant k and to P( p ) (on the tangent pane T P ) with spring constant 1, and P( p ) is aso attached to P with spring constant k. (A three springs have zero natura ength.) The new tangent distance is the tota potentia eastic energy stored of a three springs at equiibrium. As for the standard tangent distance, the soution can easiy be obtained by differentiating Eq. (16) with respect to E and P. The differentiation yieds: L P E P L P 1 k P L E E 0 (17) L E E P L P P L E 1 k E 0 (18) Vo. 11, (2000) 187

8 Figure 7. The tangent distance between E and P is the eastic energy stored in each of the three springs connecting P, P,E, and E. P and E can move without friction aong the tangent panes. The spring constants are indicated on the figure. The soution of this system is L PE L 1 EE L E 1 kl P E P L PE L 1 EE L EP 1 k 2 L PP P (19) L EP L 1 PP L P 1 kl E E P 1 k 2 L EE L EP L 1 PP L PE E (20) where L EE L E L E, L PE L P L E, L EP L E L P, and L PP L P L P. The system has the same compexity as the vania tangent distance, except that it aways has a soution for k 0 and is more numericay stabe. Note that in the imit cases, the system yieds the standard tangent distance (k 0) and the Eucidean distance (k ). This approach is aso usefu when the number of tangent vectors is greater or equa than the number of dimensions of the space. The standard tangent distance woud most ikey be zero (when the tangent spaces intersect), but the spring tangent distance sti expresses vauabe information about the invariances. If the number of the dimension of the input space is arge compared with the number of tangent vectors, keeping k as sma as possibe is better. This is because it does not interfere with the siding aong the tangent pane (E and P are ess constrained). Contrary to intuition, there is no danger of siding too far in high dimensiona space because tangent vectors are aways roughy orthogona and they coud ony side far if they were parae. Hierarchy of Distances. If severa invariances are used, cassification using tangent distance aone woud be quite expensive. Fortunatey, if a typica memory-based agorithm is used, for exampe, K-nearest neighbors, it is unnecessary to compute the fu tangent distance between the uncassified pattern and a the abeed sampes. In particuar, if a crude estimate of the tangent distance indicates with sufficient confidence that a sampe is far from the pattern to be cassified, no more computation is needed to know that this sampe is not one of the K-nearest neighbors. Based on this observation, one can buid a hierarchy of distances that can greaty reduce the computation of each cassification. Assume, for instance, that we have m approximations D i of the tangent distance, ordered such that D 1 is the crudest approximation of the tangent distance and D m is exacty the tangent distance (for instance, D 1 to D 5 coud be the Eucidean distance with increasing resoution and D 6 to D 10 coud each add a tangent vector at fu resoution). The basic idea is to keep a poo of a the prototypes that coud potentiay be the K-nearest neighbors of the uncassified pattern. Initiay, the poo contains a the sampes. Each of the distances D i corresponds to a stage of the cassification process. The cassification agorithm has three steps at each stage and proceeds from Stage 1 to Stage m or unti the cassification is compete. In Step 1, the distance D i among a the sampes in the poo and the uncassified pattern is computed. In Step 2, a cassification and a confidence score is computed with these distances. If the confidence is good enough, that is, better than C i (e.g., if a the sampes eft in the poo are in the same cass), the cassification is compete; otherwise, proceed to Step 3. In Step 3, the K i cosest sampes, according to distance D i, are kept in the poo and the remaining sampes are discarded. Finding the K i cosest sampes can be done in O( p) (where p is the number of sampes in the poo) because these eements need not to be sorted (Aho et a., 1983; Press et a., 1988). The reduced poo is then passed to stage i 1. The two constants C i and K i must be determined in advance using a vaidation set. This can easiy be done graphicay by potting the error as a function of K i and C i at each stage (starting with a K i equa to the number of abeed sampes and C i 1 for a stages). At each stage, there are a minimum K i and a minimum C i, which give optima performance on the vaidation set. By taking arger vaues, we can decrease the probabiity of making errors on the test sets. The sighty worse performance of using a hierarchy of distances is often we worth the speed up. The computationa cost of a pattern cassification is then equa to: computationa cost number of prototypes at stage i i distance compexity at stage i probabiity to reach stage i (21) A this is better iustrated in Figure 8. This system was used for the USPS experiment. In cassification of handwritten digits (16 16 pixe images), D 1, D 2, and D 3 were the Eucidean distances at resoution 2 2, 4 4, and 8 8 respectivey. D 4 was the one-sided tangent distance with X-transation, on the sampe side ony, at resoution 8 8. D 5 was the doube-sided tangent distance with X-transation at resoution Each of the subsequent Figure 8. Pattern recognition using a hierarchy of distances. The fiter proceeds from eft (starting with the whoe database) to right (where ony a few prototypes remain). At each stage, distances between prototypes and the unknown pattern are computed and sorted; the best candidate prototypes are seected for the next stage. As the compexity of the distance increases, the number of prototypes decreases, making computation feasibe. At each stage, a cassification is attempted and a confidence score is computed. If the confidence score is high enough, the remaining stages are skipped. 188 Vo. 11, (2000)

9 Tabe II. Summary computation for the cassification of one pattern. i No. T.V. Resoution No. Prototypes (K i ) No. Dot Products Probabiity No. of mu/add , , , , , , , , , , ,000 The first coumn is the distance index; the second coumn indicates the number of tangent vectors (0 for the Eucidean distance); the third coumn indicates the resoution in pixes; the fourth is K i or the number of prototypes on which the distance D i must be computed; the fifth coumn indicates the number of additiona dot products that must be computed to evauate distance D i ; the sixth coumn indicates the probabiity to not skip that stage after the confidence score has been used; and the ast coumn indicates the tota average number of mutipy-adds that must be performed (product of Coumns 3 to 6) at each stage. distances added one tangent vector on each side (Y-transation, scaing, rotation, hyperboic deformation 1, hyperboic deformation 2, and thickness) unti the fu tangent distance was computed (D 11 ). Tabe 2 shows the expected number of mutipy-adds at each stage. It shoud be noted that the fu tangent distance need ony be computed for 1 in 20 unknown patterns (probabiity 0.05) and ony with 5 sampes out of the origina 10,000. The net speed up was in the order of 500, compared with computing the fu tangent distance between every unknown pattern and every sampe (this is six times faster than computing the the Eucidean distance at fu resoution). Mutipe Iterations. Tangent distance can be viewed as one iteration of a Newton-type agorithm that finds the points of minimum distance on the true transformation manifods. The vectors E and P are the coordinates of the two cosest points in the respective tangent spaces, but they can aso be interpreted as the vaue for the rea (noninear) transformations. In other words, E and P can be used to compute the points s(e, E ) and s(p, P ), the rea noninear transformation of E and P. From these new points, we can recompute the tangent vectors and the tangent distance and reiterate the process. If the appropriate conditions are met, this process can converge to a oca minimum in the distance between the two transformation manifods of P and E. This process did not improve handwritten character recognition, but yieded impressive resuts in face recognition (Vasconceos and Lippman, 1998). In that case, each successive iteration was done at increasing resoution (hence, combining hierarchica distances and mutipe iterations), making the whoe process computationay efficient. III. TANGENT PROPAGATION The previous section deat with memory-based techniques. We now appy tangent-distance principes to earned-function techniques. The key idea is to incorporate the invariance directy into the cassification function by way of optimization of its parameters. More precisey, assume the cassification function can be written as G w ( x), where x is the input to the cassifier and w is a parameter vector that must be optimized to yied good cassification. We present an agorithm, caed tangent propagation, in which gradient descent in w is used to improve both cassification and transformation invariance of the training data. In a neura network context, the process can be viewed as a generaization of the widey used back propagation method, which propagates information about the training data. The exception is that the new agorithm aso propagates transformation invariance information. We again assume that a data are drawn independenty from a given statistica distribution and that our earning machine is characterized by the set of functions it can impement, G w ( x), indexed by the vector of parameters w. Ideay, we woud ike to find w, which minimizes the energy function G wx Fx 2 dx (22) where F( x) represents the correct or desired abeing of the point x. In the rea word, we must estimate this integra using ony a finite set of training points B drawn the distribution. That is, we try to minimize p E G w x i Fx i 2 (23) i1 where the sum runs over the training set B. An estimate of w can be computed by foowing a gradient descent using the weight-update rue: w E w (24) Consider an input transformation s( x, ) controed by a parameter. As aways, we require that s is differentiabe and that s( x, 0) x. Now, in addition to the known abes of the training data, we assume that Fsx i, is known at 0 for each point x in the training set. To incorporate the invariance property into G w ( x), we add that the foowing constraint on the derivative: p E r i1 G wsx i, Fsx 2 i, 0 (25) Vo. 11, (2000) 189

10 shoud be sma at 0. In many pattern cassification probems, we are interested in the oca cassification invariance property for F( x) with respect to the transformation s (the cassification does not change when the input is sighty transformed), so we can simpify Eq. (25) to: p E r i1 G 2 wsx i, 0 (26) because Fsx i, 0. To minimize this term, we can modify the gradient descent rue to use the energy function: E E p E r (27) with the weight update rue: w E w (28) The earning rates (or reguarization parameters) and are tremendousy important. This is because they determine the tradeoff between earning the invariances (based on the chosen directiona derivatives) vs. earning the abe itsef (i.e., the zeroth derivative) at each point in the training set. The oca variation of the cassification function, which appears in Eq. (26), can be written as: G w sx, G wsx, sx, 0 sx, 0 x G w x sx, 0 (29) because s( x, ) x if 0. where x G w ( x) is the Jacobian of G w ( x) for pattern x and s(, x)/ is the tangent vector associated with transformation s as described in the previous section. Mutipying the tangent vector by the Jacobian invoves one forward propagation through a inearized version of the network. If is mutidimensiona, the forward propagation must be repeated for each tangent vector. The theory of Lie agebras (Gimore, 1974) ensures that compositions of oca (sma) transformations correspond to inear combinations of the corresponding tangent vectors (this resut is discussed further in Section IV). Consequenty, if E r ( x) 0 is verified, the network derivative in the direction of a inear combination of the tangent vectors is equa to the same inear combination of the desired derivatives. In other words, if the network is successfuy trained to be ocay invariant with respect to horizonta and vertica transations, it wi be invariant with respect to compositions thereof. It is possibe to devise an efficient agorithm, tangent prop, for performing the weight update (Eq. 28). It is anaogous to ordinary back propagation. In addition to propagating neuron activations, it aso propagates the tangent vectors. The equations can be easiy derived from Figure 9. A. Loca Rue. The forward propagation equation is: Figure 9. Forward (a, x,, ) and backward propagated variabes (b, y,, ) in the reguar (roman symbos) and the Jacobian (inearized) network (Greek symbos). Converging forks (in the direction in which the signa is traveing) are sums; diverging forks dupicate the vaues. a i j w 1 ij x j x i a i (30) Where is a noninear differentiabe function (typicay a sigmoid). The forward propagation starts at the first ayer ( 1), with x 0 being the input ayer, and ends at the output ayer ( L). Simiary, the tangent forward propagation (tangent prop) is defined by: i j w 1 ij j i a i i (31) The tangent forward propagation starts at the first ayer ( 1), sx, with 0 being the tangent vector, and ends at the output ayer ( L). The tangent gradient back propagation can be computed using the chain rue: E i k i k E 1 k 1 k i 1 1 k w ki E E i (32) i i i i i a i (33) The tangent backward propagation starts at the output ayer ( L), with L being the network variation G wsx,, and ends at the input ayer. Simiary, the gradient back propagation equation E x i k b i k E 1 a k 1 a k x i y 1 1 k w ki E E x i E i (34) a i x i a i i a i y i b i a i ia i i (35) 190 Vo. 11, (2000)

11 Figure 10. Generaization performance curve as a function of the training set size for the tangent and back prop agorithms. The standard backward propagation starts at the output ayer ( L), with x L G w ( x 0 ) being the network output, and ends at the input ayer. Finay, the weight update is: continuous coordinate transformations, and independent image segment transformations. The next experiment is designed to show that in appications where data are highy correated, tangent prop yieds a arge speed advantage. Because the distortion mode impies adding ots of highy correated data, the advantage of tangent prop over the distortion mode becomes cear. The task is to approximate a function that has pateaus at three ocations. We want to enforce oca invariance near each of the training points (Fig. 11, bottom). The network has 1 input unit, 20 hidden units, and 1 output unit. Two strategies are possibe: either generate a sma set of training points covering each of the pateaus (open squares on Fig. 11, bottom) or generate one training point for each pateau (cosed squares) and enforce oca invariance around them (by setting the desired derivative to 0). The training set of the former method is used as a measure of performance for both methods. A parameters were adjusted for approximatey optima performance in a cases. The earning curves for both modes are shown in Figure 11 (top). Each sweep through the training set for tangent prop is a itte faster because it requires ony six forward propagations, whereas it requires nine in the distortion mode. As can be seen, stabe performance is achieved after 1,300 sweeps for the tangent prop, vs. 8,000 for the distortion mode. The overa speedup is therefore about 10. In this exampe, tangent prop can take advantage of a arge reguarization term. The distortion mode is at a disadvantage because the ony parameter that effectivey contros the amount of w ij E a i a i w ij i w ij E i (36) w ij y i x 1 j 1 i j (37) The computation requires one forward propagation and one backward propagation per pattern and per tangent vector during training. After the network is trained, it is approximatey ocay invariant with respect to the chosen transformation. After training, the evauation of the earned function is in a ways identica to a network that is not trained for invariance (except that the weights have different vaues). B. Resuts. Two experiments iustrate the advantages of tangent prop. The first experiment is a cassification task, using a sma (ineary separabe) set of 480 binarized handwritten digits. The training sets consist of 10, 20, 40, 80, 160, or 320 patterns and the test set contains 160 patterns. The patterns are smoothed using a Gaussian kerne with standard deviation of one-haf pixe. For each of the training set patterns, the tangent vectors for horizonta and vertica transation are computed. The network has two hidden ayers with ocay connected shared weights and one output ayer with 10 units (5,194 connections, 1,060 free parameters; LeCun, 1989). The generaization performance as a function of the training set size for traditiona back and tangent prop is compared in Figure 10. We conducted additiona experiments in which we impemented transations, rotations, expansions, and hyperboic deformations. This set of six generators is a basis for a inear transformations of coordinates for 2-D images. It is straightforward to impement other generators incuding gray-eve shifting, smooth segmentation, oca Figure 11. Comparison of the distortion mode (eft) and tangent prop (right). The top row gives the earning curves (error vs. number of sweeps through the training set). The bottom row gives the fina input-output function of the network; the dashed ine is the resut for unadorned back prop. Vo. 11, (2000) 191

12 reguarization is the magnitude of the distortions. This cannot be increased to arge vaues because the right answer is ony invariant under sma distortions. How To Make Tangent Prop Work Large Network Capacity. Reativey few experiments have been done with tangent propagation. It is cear, however, that the invariance constraint can be extremey beneficia. If the network does not have enough capacity, it wi not benefit from the extra knowedge introduced by the invariance. Intereaving of the Tangent Vectors. Because the tangent vectors introduce even more correation inside the training set, substantia speedup can be obtained by aternating a reguar forward and backward propagation with a tangent forward and backward propagation (even if there are severa tangent vectors, ony one is used at each pattern). For instance, if there were three tangent vectors, the training sequence coud be: x 1, t 1 x 1, x 2, t 2 x 2, x 3, t 3 x 3, x 4, t 1 x 4, x 5, t 2 x 5,... (38) where x i means a forward and backward propagation for pattern i and t j ( x i ) means a tangent forward and backward propagation of tangent vector j of pattern i. With such intereaving, the earning converges faster than grouping a the tangent vectors together. Of course, this ony makes sense with on-ine updates as opposed to batch updates. IV. TANGENT VECTORS We consider the genera paradigm for transformation invariance and for the tangent vectors used in the two previous sections. Before we introduce each transformation and its corresponding tangent vectors, the theory behind the practice is expained. There are two aspects to the probem. First, it is possibe to estabish a forma connection between groups of transformations of the input space (such as transation and rotation of 2 ) and their effect on a functiona of that space (such as a mapping of 2 to, which may represent an image, in continuous form). The theory of Lie groups and Lie agebra (Choquet-Bruhat et a., 1982) aows us to do this. The second probem invoves coding. Computer images are finite vectors of discrete variabes. How can a theory that was deveoped for differentiabe functiona of 2 to be appied to these vectors? We provide a brief expanation of the theorems of Lie groups and Lie agebras, which are appicabe to pattern recognition. We aso expore soutions to the coding probem. Finay, some exampes of transformation and coding are given for particuar appications. A. Lie Groups and Lie Agebras. Consider an input space (e.g., the pane 2 ) and a differentiabe function f that maps points of to. f: X B 3 fx (39) The function f(x) f( x, y) can be interpreted as the continuous (defined for a points of 2 ) equivaent of the discrete computer image P[i, j]. Next, consider a famiy of transformations t, parameterized by, which maps bijectivey a point of to a point of. t : X 3 t X (40) We assume that t is differentiabe with respect to and X and that t 0 is the identity. For exampe, t coud be the group of affine transformations of 2 : t : x y 3 x 1x 2 y 5 3 x y 4 y 6 with (41) This is a Lie group 4 with six parameters. Another exampe is the group of direct isometry: t : x y 3 x cos y sin a x sin y cos b (42) which is a Lie group with three parameters. We now consider the functiona s( f, ), defined by sf, f Et 1 (43) This functiona s, which takes another functiona f as an argument, shoud remind the reader of Figure 2 where P, the discrete equivaent of f, is the argument of s. The Lie agebra associated with the action of t on f is the space generated by the m oca transformations L i of f defined by: sf, L ai f i 0 We can now write the oca approximation of s as: (44) sf, f 1 L 1 f 2 L 2 f... m L m f o 2 f (45) This equation is the continuous equivaent of Eq. (2) used in the introduction. The foowing exampe iustrates how L i can be computed from t. Consider the group of direct isometry defined in Eq. (42) (with parameter (, a, b) as before, and X ( x, y)). sf, X fx a cos y b sin, x a sin y b cos (46) If we differentiate around (0, 0, 0) with respect to, we obtain: i.e., sf, X y f f x, y x x, y (47) x y L y x x y (48) 4 A Lie group is a group that is aso a differentiabe manifod such that the differentiabe structure is compatibe with the group structure. 192 Vo. 11, (2000)

13 The transformation L a x and L b can be obtained in y a simiar fashion. A oca transformations of the group can be written as: sf, f y f f x a x y f f b x y o2 f (49) which corresponds to a inear combination of the three basic operators L, L a, and L b. 5 The most important property is that the three operators generate the whoe space of oca transformations. The resut of appying the operators to a function f, such as a 2-D image, is a set of vectors (referred to as tangent vector in the previous sections). Each point in the tangent space corresponds to a unique transformation. Conversey, any transformation of the Lie group (in the exampe, a rotations of any ange and center together with a transations) corresponds to a point in the tangent pane. B. Tangent Vectors. The ast probem to be soved is that of coding. Computer images, for instance, are coded as a finite set of discrete (even binary) vaues. These are hardy the differentiabe mappings of to, which we assumed in Section IVA. To sove this probem, we introduce a smooth interpoating function C, which maps the discrete vectors to continuous mapping of to. For exampe, if P is an image of n pixes, it can be mapped to a continuousy vaued function f over 2 by convoving it with a 2-D Gaussian function g of standard deviation. This is because g is a differentiabe mapping of 2 to, and P can be interpreted as a sum of impuse functions. In the 2-D case, the new interpretation of P can be written as: Px, y Pijx iy j (50) i, j where P[i][ j] denotes the finite vector of discrete vaues, as stored in a computer. The resut of the convoution is of course differentiabe because it is a sum of Gaussian functions. The Gaussian mapping is given by: C : P 3 f Pg (51) In the 2-D case, the function f can be written as: fx, y Pijg x i, y j (52) i, j Other coding functions can be used, such as cubic spine or even biinear interpoation. Biinear interpoation between the pixes yieds a function f, which is differentiabe amost everywhere. The fact that the derivatives have two vaues at the integer ocations (because the biinear interpoation is different on both sides of each pixes) is not a probem in practice; just choose one of the two vaues. 5 These operators are said to generate a Lie agebra. This is because on top of the addition and mutipication by a scaar, there is a specia mutipication caed Lie bracket, which is defined by [L 1, L 2 ] L 1 E L 2 L 2 E L 1. In the above exampe, [L, L a ] L b,[l a, L b ] 0, and [L b, L ] L a. Figure 12. Graphic iustration of the computation of f and two tangent vectors corresponding to L x /x (X-tansation) and L x /y (Y-transation), from a binary image I. The Gaussian function gx, y exp x2 y has a standard deviation of 0.9 in this exampe athough its graphic representation (sma images on the right) have been rescaed for carity. The Gaussian mapping is preferred for two reasons. First, the smoothing parameter can be used to contro the ocaity of the invariance. This is because when f is smoother, the oca approximation of Eq. (45) is vaid for arger transformations. Second, when combined with the transformation operator L, the derivative can be appied on the cosed form of the Gaussian function. For instance, if the X-transation operator L x is appied to f P g, the actua computation becomes: L X f x Pg P g x (53) because of the differentiation properties of convoution when the support is compact. This is easiy done by convoving the origina image with the X-derivative of the Gaussian function g (Fig. 12). Simiary, the tangent vector for scaing can be computed with: L S f x x y yig xi g yi x g (54) y This operation is iustrated in Figure 13. C. Important Transformations in Image Processing. This section summarizes how to compute the tangent vectors for image processing (in 2-D). Each discrete image I i is convoved with a Gaussian of standard deviation g to obtain a representation of the continuous image f i, according to Eq. (55): Vo. 11, (2000) 193

14 L Y y (59) 3. Rotation: This transformation is usefu when the cassification function is invariant with respect to the input transformation: The Lie operator is defined by: t : x x cos y sin y 3x sin y cos (60) L R y x x y (61) 4. Scaing: This transformation is usefu when the cassification function is invariant with respect to the input transformation: t : x y 3 x x y y (62) Figure 13. Graphic iustration of the computation of the tangent vector T u D x S x D y S y (bottom image). The dispacement for each pixe is proportiona to the distance of the pixe to the center of the image (D x (x, y) x x 0 and D y (x, y) y y 0 ). The two mutipications (horizonta ines) and the addition (vertica right coumn) are done pixe by pixe. f i I i g. (55) The resuting image f i wi be used in a the computations requiring I i (except for computing the tangent vector). For each image I i, the tangent vectors are computed by appying the operators corresponding to the transformations of interest to the expression I i g. The resut, which can be precomputed, is an image that is the tangent vector. The foowing ist contains some of the most usefu tangent vectors: 1. X-transation: This transformation is usefu when the cassification function is invariant with respect to the input transformation: t : x y 3 The Lie operator is defined by: x y (56) The Lie operator is defined by: L S x x y y (63) 5. Parae hyperboic transformation: This transformation is usefu when the cassification function is invariant with respect to the input transformation: t : x y 3 The Lie operator is defined by: L S x x y y x x y y (64) (65) 6. Diagona hyperboic transformation: This transformation is usefu when the cassification function is invariant with respect to the input transformation: The Lie operator is defined by: t : x x y y 3y x (66) L X x (57) L S y x x y (67) 2. Y-transation: This transformation is usefu when the cassification function is invariant with respect to the input transformation: The Lie operator is defined by: t : x y 3 x y (58) The resuting tangent vector is is the norm of the gradient of the image, which is easiy computed. 7. Thickening: This transformation is usefu when the cassification function is invariant with respect to the variation of thickness. This is known in morphoogy as diation and its inverse, erosion. It is usefu in certain domains (such as handwritten character recognition) because thickening and thinning are natura variations that correspond to the pressure appied on a pen or to different absorbtion properties of the 194 Vo. 11, (2000)

15 Figure 14. Iustration of five tangent vectors (top), corresponding dispacements (midde), and transformation effects (bottom). The dispacements D x and D y are represented in the form of a vector fied. The tangent vector for the thickness deformation (right coumn) corresponds to the norm of the gradient of the gray-eve image. ink on the paper. A diation (resp. erosion) can be defined as the operation of repacing each vaue f( x, y) by the argest (resp. smaest) vaue of f( x, y) found within a neighborhood of a certain shape, centered at ( x, y). The region is caed the structura eement. We assume that the structura eement is a sphere of radius. We define the thickening transformation as the function that takes the function f and generates the function f defined by: f X max fx r for 0 (68) r f X max fx r for 0 (69) r The derivative of the thickening for 0 can be written as: im 30 fx fx max r fx r fx im a30 (70) f(x) can be put within the max expression because it does not depend on r. Because tends toward 0, we can write: fx r fx r fx Or 2 r fx (71) The maximum of max fx r fx max r fx (72) r r is attained when r and f(x) are coinear, that is, when r fx fx (73) assuming 0. It can easiy be shown that this equation hods when is negative, because we then try to minimize Eq. (69). Therefore: im a30 f X fx fx (74) which is the tangent vector of interest. Note that this is true for positive or negative. The same tangent vector describes both thickening and thinning. Aternativey, our computation of the dispacement r can be used and the foowing transformation of the input can be defined as: where t f: x y 3 x r x y r y (75) r x, r y r fx fx (76) This transformation of the input space is different for each pattern f (we do not have a Lie group of transformations), but the fied structure generated by the (pseudo Lie) operator is sti usefu. The operator used to find the tangent vector is defined by: L T (77) which means that the tangent vector image is obtained by computing the normaized gray-eve gradient of the image at each point (the gradient at each point is normaized). The ast five transformations are depicted in Figure 14 with the tangent vector. The ast operator corresponds to a thickening or thinning of the image. This unusua transformation is extremey usefu for handwritten character recognition. Vo. 11, (2000) 195

Nearest Neighbor Learning

Nearest Neighbor Learning Nearest Neighbor Learning Cassify based on oca simiarity Ranges from simpe nearest neighbor to case-based and anaogica reasoning Use oca information near the current query instance to decide the cassification

More information

Lecture outline Graphics and Interaction Scan Converting Polygons and Lines. Inside or outside a polygon? Scan conversion.

Lecture outline Graphics and Interaction Scan Converting Polygons and Lines. Inside or outside a polygon? Scan conversion. Lecture outine 433-324 Graphics and Interaction Scan Converting Poygons and Lines Department of Computer Science and Software Engineering The Introduction Scan conversion Scan-ine agorithm Edge coherence

More information

Chapter Multidimensional Direct Search Method

Chapter Multidimensional Direct Search Method Chapter 09.03 Mutidimensiona Direct Search Method After reading this chapter, you shoud be abe to:. Understand the fundamentas of the mutidimensiona direct search methods. Understand how the coordinate

More information

Lecture Notes for Chapter 4 Part III. Introduction to Data Mining

Lecture Notes for Chapter 4 Part III. Introduction to Data Mining Data Mining Cassification: Basic Concepts, Decision Trees, and Mode Evauation Lecture Notes for Chapter 4 Part III Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,

More information

Binarized support vector machines

Binarized support vector machines Universidad Caros III de Madrid Repositorio instituciona e-archivo Departamento de Estadística http://e-archivo.uc3m.es DES - Working Papers. Statistics and Econometrics. WS 2007-11 Binarized support vector

More information

JOINT IMAGE REGISTRATION AND EXAMPLE-BASED SUPER-RESOLUTION ALGORITHM

JOINT IMAGE REGISTRATION AND EXAMPLE-BASED SUPER-RESOLUTION ALGORITHM JOINT IMAGE REGISTRATION AND AMPLE-BASED SUPER-RESOLUTION ALGORITHM Hyo-Song Kim, Jeyong Shin, and Rae-Hong Park Department of Eectronic Engineering, Schoo of Engineering, Sogang University 35 Baekbeom-ro,

More information

Outline. Parallel Numerical Algorithms. Forward Substitution. Triangular Matrices. Solving Triangular Systems. Back Substitution. Parallel Algorithm

Outline. Parallel Numerical Algorithms. Forward Substitution. Triangular Matrices. Solving Triangular Systems. Back Substitution. Parallel Algorithm Outine Parae Numerica Agorithms Chapter 8 Prof. Michae T. Heath Department of Computer Science University of Iinois at Urbana-Champaign CS 554 / CSE 512 1 2 3 4 Trianguar Matrices Michae T. Heath Parae

More information

A Design Method for Optimal Truss Structures with Certain Redundancy Based on Combinatorial Rigidity Theory

A Design Method for Optimal Truss Structures with Certain Redundancy Based on Combinatorial Rigidity Theory 0 th Word Congress on Structura and Mutidiscipinary Optimization May 9 -, 03, Orando, Forida, USA A Design Method for Optima Truss Structures with Certain Redundancy Based on Combinatoria Rigidity Theory

More information

Endoscopic Motion Compensation of High Speed Videoendoscopy

Endoscopic Motion Compensation of High Speed Videoendoscopy Endoscopic Motion Compensation of High Speed Videoendoscopy Bharath avuri Department of Computer Science and Engineering, University of South Caroina, Coumbia, SC - 901. ravuri@cse.sc.edu Abstract. High

More information

Sensitivity Analysis of Hopfield Neural Network in Classifying Natural RGB Color Space

Sensitivity Analysis of Hopfield Neural Network in Classifying Natural RGB Color Space Sensitivity Anaysis of Hopfied Neura Network in Cassifying Natura RGB Coor Space Department of Computer Science University of Sharjah UAE rsammouda@sharjah.ac.ae Abstract: - This paper presents a study

More information

A Comparison of a Second-Order versus a Fourth- Order Laplacian Operator in the Multigrid Algorithm

A Comparison of a Second-Order versus a Fourth- Order Laplacian Operator in the Multigrid Algorithm A Comparison of a Second-Order versus a Fourth- Order Lapacian Operator in the Mutigrid Agorithm Kaushik Datta (kdatta@cs.berkeey.edu Math Project May 9, 003 Abstract In this paper, the mutigrid agorithm

More information

Mobile App Recommendation: Maximize the Total App Downloads

Mobile App Recommendation: Maximize the Total App Downloads Mobie App Recommendation: Maximize the Tota App Downoads Zhuohua Chen Schoo of Economics and Management Tsinghua University chenzhh3.12@sem.tsinghua.edu.cn Yinghui (Catherine) Yang Graduate Schoo of Management

More information

Solving Large Double Digestion Problems for DNA Restriction Mapping by Using Branch-and-Bound Integer Linear Programming

Solving Large Double Digestion Problems for DNA Restriction Mapping by Using Branch-and-Bound Integer Linear Programming The First Internationa Symposium on Optimization and Systems Bioogy (OSB 07) Beijing, China, August 8 10, 2007 Copyright 2007 ORSC & APORC pp. 267 279 Soving Large Doube Digestion Probems for DNA Restriction

More information

Neural Network Enhancement of the Los Alamos Force Deployment Estimator

Neural Network Enhancement of the Los Alamos Force Deployment Estimator Missouri University of Science and Technoogy Schoars' Mine Eectrica and Computer Engineering Facuty Research & Creative Works Eectrica and Computer Engineering 1-1-1994 Neura Network Enhancement of the

More information

Response Surface Model Updating for Nonlinear Structures

Response Surface Model Updating for Nonlinear Structures Response Surface Mode Updating for Noninear Structures Gonaz Shahidi a, Shamim Pakzad b a PhD Student, Department of Civi and Environmenta Engineering, Lehigh University, ATLSS Engineering Research Center,

More information

On-Chip CNN Accelerator for Image Super-Resolution

On-Chip CNN Accelerator for Image Super-Resolution On-Chip CNN Acceerator for Image Super-Resoution Jung-Woo Chang and Suk-Ju Kang Dept. of Eectronic Engineering, Sogang University, Seou, South Korea {zwzang91, sjkang}@sogang.ac.kr ABSTRACT To impement

More information

A Petrel Plugin for Surface Modeling

A Petrel Plugin for Surface Modeling A Petre Pugin for Surface Modeing R. M. Hassanpour, S. H. Derakhshan and C. V. Deutsch Structure and thickness uncertainty are important components of any uncertainty study. The exact ocations of the geoogica

More information

Solutions to the Final Exam

Solutions to the Final Exam CS/Math 24: Intro to Discrete Math 5//2 Instructor: Dieter van Mekebeek Soutions to the Fina Exam Probem Let D be the set of a peope. From the definition of R we see that (x, y) R if and ony if x is a

More information

Neural Networks. Aarti Singh. Machine Learning Nov 3, Slides Courtesy: Tom Mitchell

Neural Networks. Aarti Singh. Machine Learning Nov 3, Slides Courtesy: Tom Mitchell Neura Networks Aarti Singh Machine Learning 10-601 Nov 3, 2011 Sides Courtesy: Tom Mitche 1 Logis0c Regression Assumes the foowing func1ona form for P(Y X): Logis1c func1on appied to a inear func1on of

More information

ACTIVE LEARNING ON WEIGHTED GRAPHS USING ADAPTIVE AND NON-ADAPTIVE APPROACHES. Eyal En Gad, Akshay Gadde, A. Salman Avestimehr and Antonio Ortega

ACTIVE LEARNING ON WEIGHTED GRAPHS USING ADAPTIVE AND NON-ADAPTIVE APPROACHES. Eyal En Gad, Akshay Gadde, A. Salman Avestimehr and Antonio Ortega ACTIVE LEARNING ON WEIGHTED GRAPHS USING ADAPTIVE AND NON-ADAPTIVE APPROACHES Eya En Gad, Akshay Gadde, A. Saman Avestimehr and Antonio Ortega Department of Eectrica Engineering University of Southern

More information

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions 2006 Internationa Joint Conference on Neura Networks Sheraton Vancouver Wa Centre Hote, Vancouver, BC, Canada Juy 16-21, 2006 A New Supervised Custering Agorithm Based on Min-Max Moduar Network with Gaussian-Zero-Crossing

More information

Further Optimization of the Decoding Method for Shortened Binary Cyclic Fire Code

Further Optimization of the Decoding Method for Shortened Binary Cyclic Fire Code Further Optimization of the Decoding Method for Shortened Binary Cycic Fire Code Ch. Nanda Kishore Heosoft (India) Private Limited 8-2-703, Road No-12 Banjara His, Hyderabad, INDIA Phone: +91-040-3378222

More information

Language Identification for Texts Written in Transliteration

Language Identification for Texts Written in Transliteration Language Identification for Texts Written in Transiteration Andrey Chepovskiy, Sergey Gusev, Margarita Kurbatova Higher Schoo of Economics, Data Anaysis and Artificia Inteigence Department, Pokrovskiy

More information

Special Edition Using Microsoft Excel Selecting and Naming Cells and Ranges

Special Edition Using Microsoft Excel Selecting and Naming Cells and Ranges Specia Edition Using Microsoft Exce 2000 - Lesson 3 - Seecting and Naming Ces and.. Page 1 of 8 [Figures are not incuded in this sampe chapter] Specia Edition Using Microsoft Exce 2000-3 - Seecting and

More information

Automatic Grouping for Social Networks CS229 Project Report

Automatic Grouping for Social Networks CS229 Project Report Automatic Grouping for Socia Networks CS229 Project Report Xiaoying Tian Ya Le Yangru Fang Abstract Socia networking sites aow users to manuay categorize their friends, but it is aborious to construct

More information

A Fast Block Matching Algorithm Based on the Winner-Update Strategy

A Fast Block Matching Algorithm Based on the Winner-Update Strategy In Proceedings of the Fourth Asian Conference on Computer Vision, Taipei, Taiwan, Jan. 000, Voume, pages 977 98 A Fast Bock Matching Agorithm Based on the Winner-Update Strategy Yong-Sheng Chenyz Yi-Ping

More information

As Michi Henning and Steve Vinoski showed 1, calling a remote

As Michi Henning and Steve Vinoski showed 1, calling a remote Reducing CORBA Ca Latency by Caching and Prefetching Bernd Brügge and Christoph Vismeier Technische Universität München Method ca atency is a major probem in approaches based on object-oriented middeware

More information

FACE RECOGNITION WITH HARMONIC DE-LIGHTING. s: {lyqing, sgshan, wgao}jdl.ac.cn

FACE RECOGNITION WITH HARMONIC DE-LIGHTING.  s: {lyqing, sgshan, wgao}jdl.ac.cn FACE RECOGNITION WITH HARMONIC DE-LIGHTING Laiyun Qing 1,, Shiguang Shan, Wen Gao 1, 1 Graduate Schoo, CAS, Beijing, China, 100080 ICT-ISVISION Joint R&D Laboratory for Face Recognition, CAS, Beijing,

More information

Hiding secrete data in compressed images using histogram analysis

Hiding secrete data in compressed images using histogram analysis University of Woongong Research Onine University of Woongong in Dubai - Papers University of Woongong in Dubai 2 iding secrete data in compressed images using histogram anaysis Farhad Keissarian University

More information

Neural Networks. Aarti Singh & Barnabas Poczos. Machine Learning / Apr 24, Slides Courtesy: Tom Mitchell

Neural Networks. Aarti Singh & Barnabas Poczos. Machine Learning / Apr 24, Slides Courtesy: Tom Mitchell Neura Networks Aarti Singh & Barnabas Poczos Machine Learning 10-701/15-781 Apr 24, 2014 Sides Courtesy: Tom Mitche 1 Logis0c Regression Assumes the foowing func1ona form for P(Y X): Logis1c func1on appied

More information

Layer-Specific Adaptive Learning Rates for Deep Networks

Layer-Specific Adaptive Learning Rates for Deep Networks Layer-Specific Adaptive Learning Rates for Deep Networks arxiv:1510.04609v1 [cs.cv] 15 Oct 2015 Bharat Singh, Soham De, Yangmuzi Zhang, Thomas Godstein, and Gavin Tayor Department of Computer Science Department

More information

Distance Weighted Discrimination and Second Order Cone Programming

Distance Weighted Discrimination and Second Order Cone Programming Distance Weighted Discrimination and Second Order Cone Programming Hanwen Huang, Xiaosun Lu, Yufeng Liu, J. S. Marron, Perry Haaand Apri 3, 2012 1 Introduction This vignette demonstrates the utiity and

More information

Extended Node-Arc Formulation for the K-Edge-Disjoint Hop-Constrained Network Design Problem

Extended Node-Arc Formulation for the K-Edge-Disjoint Hop-Constrained Network Design Problem Extended Node-Arc Formuation for the K-Edge-Disjoint Hop-Constrained Network Design Probem Quentin Botton Université cathoique de Louvain, Louvain Schoo of Management, (Begique) botton@poms.uc.ac.be Bernard

More information

A Robust Sign Language Recognition System with Sparsely Labeled Instances Using Wi-Fi Signals

A Robust Sign Language Recognition System with Sparsely Labeled Instances Using Wi-Fi Signals A Robust Sign Language Recognition System with Sparsey Labeed Instances Using Wi-Fi Signas Jiacheng Shang, Jie Wu Center for Networked Computing Dept. of Computer and Info. Sciences Tempe University Motivation

More information

Space-Time Trade-offs.

Space-Time Trade-offs. Space-Time Trade-offs. Chethan Kamath 03.07.2017 1 Motivation An important question in the study of computation is how to best use the registers in a CPU. In most cases, the amount of registers avaiabe

More information

Proceedings of the International Conference on Systolic Arrays, San Diego, California, U.S.A., May 25-27, 1988 AN EFFICIENT ASYNCHRONOUS MULTIPLIER!

Proceedings of the International Conference on Systolic Arrays, San Diego, California, U.S.A., May 25-27, 1988 AN EFFICIENT ASYNCHRONOUS MULTIPLIER! [1,2] have, in theory, revoutionized cryptography. Unfortunatey, athough offer many advantages over conventiona and authentication), such cock synchronization in this appication due to the arge operand

More information

Fastest-Path Computation

Fastest-Path Computation Fastest-Path Computation DONGHUI ZHANG Coege of Computer & Information Science Northeastern University Synonyms fastest route; driving direction Definition In the United states, ony 9.% of the househods

More information

file://j:\macmillancomputerpublishing\chapters\in073.html 3/22/01

file://j:\macmillancomputerpublishing\chapters\in073.html 3/22/01 Page 1 of 15 Chapter 9 Chapter 9: Deveoping the Logica Data Mode The information requirements and business rues provide the information to produce the entities, attributes, and reationships in ogica mode.

More information

Semi-Supervised Learning with Sparse Distributed Representations

Semi-Supervised Learning with Sparse Distributed Representations Semi-Supervised Learning with Sparse Distributed Representations David Zieger dzieger@stanford.edu CS 229 Fina Project 1 Introduction For many machine earning appications, abeed data may be very difficut

More information

InnerSpec: Technical Report

InnerSpec: Technical Report InnerSpec: Technica Report Fabrizio Guerrini, Aessandro Gnutti, Riccardo Leonardi Department of Information Engineering, University of Brescia Via Branze 38, 25123 Brescia, Itay {fabrizio.guerrini, a.gnutti006,

More information

Automatic Hidden Web Database Classification

Automatic Hidden Web Database Classification Automatic idden Web atabase Cassification Zhiguo Gong, Jingbai Zhang, and Qian Liu Facuty of Science and Technoogy niversity of Macau Macao, PRC {fstzgg,ma46597,ma46620}@umac.mo Abstract. In this paper,

More information

Load Balancing by MPLS in Differentiated Services Networks

Load Balancing by MPLS in Differentiated Services Networks Load Baancing by MPLS in Differentiated Services Networks Riikka Susitaiva, Jorma Virtamo, and Samui Aato Networking Laboratory, Hesinki University of Technoogy P.O.Box 3000, FIN-02015 HUT, Finand {riikka.susitaiva,

More information

Image Segmentation Using Semi-Supervised k-means

Image Segmentation Using Semi-Supervised k-means I J C T A, 9(34) 2016, pp. 595-601 Internationa Science Press Image Segmentation Using Semi-Supervised k-means Reza Monsefi * and Saeed Zahedi * ABSTRACT Extracting the region of interest is a very chaenging

More information

A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS. A. C. Finch, K. J. Mackenzie, G. J. Balsdon, G. Symonds

A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS. A. C. Finch, K. J. Mackenzie, G. J. Balsdon, G. Symonds A METHOD FOR GRIDLESS ROUTING OF PRINTED CIRCUIT BOARDS A C Finch K J Mackenzie G J Basdon G Symonds Raca-Redac Ltd Newtown Tewkesbury Gos Engand ABSTRACT The introduction of fine-ine technoogies to printed

More information

On Upper Bounds for Assortment Optimization under the Mixture of Multinomial Logit Models

On Upper Bounds for Assortment Optimization under the Mixture of Multinomial Logit Models On Upper Bounds for Assortment Optimization under the Mixture of Mutinomia Logit Modes Sumit Kunnumka September 30, 2014 Abstract The assortment optimization probem under the mixture of mutinomia ogit

More information

A Memory Grouping Method for Sharing Memory BIST Logic

A Memory Grouping Method for Sharing Memory BIST Logic A Memory Grouping Method for Sharing Memory BIST Logic Masahide Miyazai, Tomoazu Yoneda, and Hideo Fuiwara Graduate Schoo of Information Science, Nara Institute of Science and Technoogy (NAIST), 8916-5

More information

An Introduction to Design Patterns

An Introduction to Design Patterns An Introduction to Design Patterns 1 Definitions A pattern is a recurring soution to a standard probem, in a context. Christopher Aexander, a professor of architecture Why woud what a prof of architecture

More information

Arithmetic Coding. Prof. Ja-Ling Wu. Department of Computer Science and Information Engineering National Taiwan University

Arithmetic Coding. Prof. Ja-Ling Wu. Department of Computer Science and Information Engineering National Taiwan University Arithmetic Coding Prof. Ja-Ling Wu Department of Computer Science and Information Engineering Nationa Taiwan University F(X) Shannon-Fano-Eias Coding W..o.g. we can take X={,,,m}. Assume p()>0 for a. The

More information

Computer Networks. College of Computing. Copyleft 2003~2018

Computer Networks. College of Computing.   Copyleft 2003~2018 Computer Networks Computer Networks Prof. Lin Weiguo Coege of Computing Copyeft 2003~2018 inwei@cuc.edu.cn http://icourse.cuc.edu.cn/computernetworks/ http://tc.cuc.edu.cn Attention The materias beow are

More information

Improvement of Nearest-Neighbor Classifiers via Support Vector Machines

Improvement of Nearest-Neighbor Classifiers via Support Vector Machines From: FLAIRS-01 Proceedings. Copyright 2001, AAAI (www.aaai.org). A rights reserved. Improvement of Nearest-Neighbor Cassifiers via Support Vector Machines Marc Sebban and Richard Nock TRIVIA-Department

More information

An Exponential Time 2-Approximation Algorithm for Bandwidth

An Exponential Time 2-Approximation Algorithm for Bandwidth An Exponentia Time 2-Approximation Agorithm for Bandwidth Martin Fürer 1, Serge Gaspers 2, Shiva Prasad Kasiviswanathan 3 1 Computer Science and Engineering, Pennsyvania State University, furer@cse.psu.edu

More information

MACHINE learning techniques can, automatically,

MACHINE learning techniques can, automatically, Proceedings of Internationa Joint Conference on Neura Networks, Daas, Texas, USA, August 4-9, 203 High Leve Data Cassification Based on Network Entropy Fiipe Aves Neto and Liang Zhao Abstract Traditiona

More information

Joint disparity and motion eld estimation in. stereoscopic image sequences. Ioannis Patras, Nikos Alvertos and Georgios Tziritas y.

Joint disparity and motion eld estimation in. stereoscopic image sequences. Ioannis Patras, Nikos Alvertos and Georgios Tziritas y. FORTH-ICS / TR-157 December 1995 Joint disparity and motion ed estimation in stereoscopic image sequences Ioannis Patras, Nikos Avertos and Georgios Tziritas y Abstract This work aims at determining four

More information

Research of Classification based on Deep Neural Network

Research of  Classification based on Deep Neural Network 2018 Internationa Conference on Sensor Network and Computer Engineering (ICSNCE 2018) Research of Emai Cassification based on Deep Neura Network Wang Yawen Schoo of Computer Science and Engineering Xi

More information

Performance of data networks with random links

Performance of data networks with random links Performance of data networks with random inks arxiv:adap-org/9909006 v2 4 Jan 2001 Henryk Fukś and Anna T. Lawniczak Department of Mathematics and Statistics, University of Gueph, Gueph, Ontario N1G 2W1,

More information

University of Illinois at Urbana-Champaign, Urbana, IL 61801, /11/$ IEEE 162

University of Illinois at Urbana-Champaign, Urbana, IL 61801, /11/$ IEEE 162 oward Efficient Spatia Variation Decomposition via Sparse Regression Wangyang Zhang, Karthik Baakrishnan, Xin Li, Duane Boning and Rob Rutenbar 3 Carnegie Meon University, Pittsburgh, PA 53, wangyan@ece.cmu.edu,

More information

Quality of Service Evaluations of Multicast Streaming Protocols *

Quality of Service Evaluations of Multicast Streaming Protocols * Quaity of Service Evauations of Muticast Streaming Protocos Haonan Tan Derek L. Eager Mary. Vernon Hongfei Guo omputer Sciences Department University of Wisconsin-Madison, USA {haonan, vernon, guo}@cs.wisc.edu

More information

Alpha labelings of straight simple polyominal caterpillars

Alpha labelings of straight simple polyominal caterpillars Apha abeings of straight simpe poyomina caterpiars Daibor Froncek, O Nei Kingston, Kye Vezina Department of Mathematics and Statistics University of Minnesota Duuth University Drive Duuth, MN 82-3, U.S.A.

More information

Relative Positioning from Model Indexing

Relative Positioning from Model Indexing Reative Positioning from Mode Indexing Stefan Carsson Computationa Vision and Active Perception Laboratory (CVAP)* Roya Institute of Technoogy (KTH), Stockhom, Sweden Abstract We show how to determine

More information

Priority Queueing for Packets with Two Characteristics

Priority Queueing for Packets with Two Characteristics 1 Priority Queueing for Packets with Two Characteristics Pave Chuprikov, Sergey I. Nikoenko, Aex Davydow, Kiri Kogan Abstract Modern network eements are increasingy required to dea with heterogeneous traffic.

More information

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Why Learn to Program?

Intro to Programming & C Why Program? 1.2 Computer Systems: Hardware and Software. Why Learn to Program? Intro to Programming & C++ Unit 1 Sections 1.1-3 and 2.1-10, 2.12-13, 2.15-17 CS 1428 Spring 2018 Ji Seaman 1.1 Why Program? Computer programmabe machine designed to foow instructions Program a set of

More information

MULTIGRID REDUCTION IN TIME FOR NONLINEAR PARABOLIC PROBLEMS: A CASE STUDY

MULTIGRID REDUCTION IN TIME FOR NONLINEAR PARABOLIC PROBLEMS: A CASE STUDY MULTIGRID REDUCTION IN TIME FOR NONLINEAR PARABOLIC PROBLEMS: A CASE STUDY R.D. FALGOUT, T.A. MANTEUFFEL, B. O NEILL, AND J.B. SCHRODER Abstract. The need for paraeism in the time dimension is being driven

More information

Distinct Sampling on Streaming Data with Near-Duplicates*

Distinct Sampling on Streaming Data with Near-Duplicates* Distinct Samping on Streaming Data with Near-Dupicates* ABSTRACT Jiecao Chen Indiana University Boomington Boomington, IN, USA jiecchen@umai.iu.edu In this paper we study how to perform distinct samping

More information

THE PERCENTAGE OCCUPANCY HIT OR MISS TRANSFORM

THE PERCENTAGE OCCUPANCY HIT OR MISS TRANSFORM 17th European Signa Processing Conference (EUSIPCO 2009) Gasgow, Scotand, August 24-28, 2009 THE PERCENTAGE OCCUPANCY HIT OR MISS TRANSFORM P. Murray 1, S. Marsha 1, and E.Buinger 2 1 Dept. of Eectronic

More information

FIRST BEZIER POINT (SS) R LE LE. φ LE FIRST BEZIER POINT (PS)

FIRST BEZIER POINT (SS) R LE LE. φ LE FIRST BEZIER POINT (PS) Singe- and Muti-Objective Airfoi Design Using Genetic Agorithms and Articia Inteigence A.P. Giotis K.C. Giannakogou y Nationa Technica University of Athens, Greece Abstract Transonic airfoi design probems

More information

Resource Optimization to Provision a Virtual Private Network Using the Hose Model

Resource Optimization to Provision a Virtual Private Network Using the Hose Model Resource Optimization to Provision a Virtua Private Network Using the Hose Mode Monia Ghobadi, Sudhakar Ganti, Ghoamai C. Shoja University of Victoria, Victoria C, Canada V8W 3P6 e-mai: {monia, sganti,

More information

Complex Human Activity Searching in a Video Employing Negative Space Analysis

Complex Human Activity Searching in a Video Employing Negative Space Analysis Compex Human Activity Searching in a Video Empoying Negative Space Anaysis Shah Atiqur Rahman, Siu-Yeung Cho, M.K.H. Leung 3, Schoo of Computer Engineering, Nanyang Technoogica University, Singapore 639798

More information

Forgot to compute the new centroids (-1); error in centroid computations (-1); incorrect clustering results (-2 points); more than 2 errors: 0 points.

Forgot to compute the new centroids (-1); error in centroid computations (-1); incorrect clustering results (-2 points); more than 2 errors: 0 points. Probem 1 a. K means is ony capabe of discovering shapes that are convex poygons [1] Cannot discover X shape because X is not convex. [1] DBSCAN can discover X shape. [1] b. K-means is prototype based and

More information

Comparative Analysis of Relevance for SVM-Based Interactive Document Retrieval

Comparative Analysis of Relevance for SVM-Based Interactive Document Retrieval Comparative Anaysis for SVM-Based Interactive Document Retrieva Paper: Comparative Anaysis of Reevance for SVM-Based Interactive Document Retrieva Hiroshi Murata, Takashi Onoda, and Seiji Yamada Centra

More information

Application of Intelligence Based Genetic Algorithm for Job Sequencing Problem on Parallel Mixed-Model Assembly Line

Application of Intelligence Based Genetic Algorithm for Job Sequencing Problem on Parallel Mixed-Model Assembly Line American J. of Engineering and Appied Sciences 3 (): 5-24, 200 ISSN 94-7020 200 Science Pubications Appication of Inteigence Based Genetic Agorithm for Job Sequencing Probem on Parae Mixed-Mode Assemby

More information

A Column Generation Approach for Support Vector Machines

A Column Generation Approach for Support Vector Machines A Coumn Generation Approach for Support Vector Machines Emiio Carrizosa Universidad de Sevia (Spain). ecarrizosa@us.es Beén Martín-Barragán Universidad de Sevia (Spain). bemart@us.es Doores Romero Moraes

More information

MCSE Training Guide: Windows Architecture and Memory

MCSE Training Guide: Windows Architecture and Memory MCSE Training Guide: Windows 95 -- Ch 2 -- Architecture and Memory Page 1 of 13 MCSE Training Guide: Windows 95-2 - Architecture and Memory This chapter wi hep you prepare for the exam by covering the

More information

AUTOMATIC gender classification based on facial images

AUTOMATIC gender classification based on facial images SUBMITTED TO IEEE TRANSACTIONS ON NEURAL NETWORKS 1 Gender Cassification Using a Min-Max Moduar Support Vector Machine with Incorporating Prior Knowedge Hui-Cheng Lian and Bao-Liang Lu, Senior Member,

More information

Filtering. Yao Wang Polytechnic University, Brooklyn, NY 11201

Filtering. Yao Wang Polytechnic University, Brooklyn, NY 11201 Spatia Domain Linear Fitering Yao Wang Poytechnic University Brookyn NY With contribution rom Zhu Liu Onur Gueryuz and Gonzaez/Woods Digita Image Processing ed Introduction Outine Noise remova using ow-pass

More information

Optimization and Application of Support Vector Machine Based on SVM Algorithm Parameters

Optimization and Application of Support Vector Machine Based on SVM Algorithm Parameters Optimization and Appication of Support Vector Machine Based on SVM Agorithm Parameters YAN Hui-feng 1, WANG Wei-feng 1, LIU Jie 2 1 ChongQing University of Posts and Teecom 400065, China 2 Schoo Of Civi

More information

Real-Time Feature Descriptor Matching via a Multi-Resolution Exhaustive Search Method

Real-Time Feature Descriptor Matching via a Multi-Resolution Exhaustive Search Method 297 Rea-Time Feature escriptor Matching via a Muti-Resoution Ehaustive Search Method Chi-Yi Tsai, An-Hung Tsao, and Chuan-Wei Wang epartment of Eectrica Engineering, Tamang University, New Taipei City,

More information

Privacy Preserving Subgraph Matching on Large Graphs in Cloud

Privacy Preserving Subgraph Matching on Large Graphs in Cloud Privacy Preserving Subgraph Matching on Large Graphs in Coud Zhao Chang,#, Lei Zou, Feifei Li # Peing University, China; # University of Utah, USA; {changzhao,zouei}@pu.edu.cn; {zchang,ifeifei}@cs.utah.edu

More information

A NEW APPROACH FOR BLOCK BASED STEGANALYSIS USING A MULTI-CLASSIFIER

A NEW APPROACH FOR BLOCK BASED STEGANALYSIS USING A MULTI-CLASSIFIER Internationa Journa on Technica and Physica Probems of Engineering (IJTPE) Pubished by Internationa Organization of IOTPE ISSN 077-358 IJTPE Journa www.iotpe.com ijtpe@iotpe.com September 014 Issue 0 Voume

More information

CS 231. Inverse Kinematics Intro to Motion Capture. 3D characters. Representation. 1) Skeleton Origin (root) Joint centers/ bones lengths

CS 231. Inverse Kinematics Intro to Motion Capture. 3D characters. Representation. 1) Skeleton Origin (root) Joint centers/ bones lengths CS Inverse Kinematics Intro to Motion Capture Representation D characters ) Skeeton Origin (root) Joint centers/ bones engths ) Keyframes Pos/Rot Root (x) Joint Anges (q) Kinematics study of static movement

More information

Reference trajectory tracking for a multi-dof robot arm

Reference trajectory tracking for a multi-dof robot arm Archives of Contro Sciences Voume 5LXI, 5 No. 4, pages 53 57 Reference trajectory tracking for a muti-dof robot arm RÓBERT KRASŇANSKÝ, PETER VALACH, DÁVID SOÓS, JAVAD ZARBAKHSH This paper presents the

More information

(0,l) (0,0) (l,0) (l,l)

(0,l) (0,0) (l,0) (l,l) Parae Domain Decomposition and Load Baancing Using Space-Fiing Curves Srinivas Auru Fatih E. Sevigen Dept. of CS Schoo of EECS New Mexico State University Syracuse University Las Cruces, NM 88003-8001

More information

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications

Alternative Decompositions for Distributed Maximization of Network Utility: Framework and Applications Aternative Decompositions for Distributed Maximization of Network Utiity: Framework and Appications Danie P. Paomar and Mung Chiang Eectrica Engineering Department, Princeton University, NJ 08544, USA

More information

DETERMINING INTUITIONISTIC FUZZY DEGREE OF OVERLAPPING OF COMPUTATION AND COMMUNICATION IN PARALLEL APPLICATIONS USING GENERALIZED NETS

DETERMINING INTUITIONISTIC FUZZY DEGREE OF OVERLAPPING OF COMPUTATION AND COMMUNICATION IN PARALLEL APPLICATIONS USING GENERALIZED NETS DETERMINING INTUITIONISTIC FUZZY DEGREE OF OVERLAPPING OF COMPUTATION AND COMMUNICATION IN PARALLEL APPLICATIONS USING GENERALIZED NETS Pave Tchesmedjiev, Peter Vassiev Centre for Biomedica Engineering,

More information

A Novel Method for Early Software Quality Prediction Based on Support Vector Machine

A Novel Method for Early Software Quality Prediction Based on Support Vector Machine A Nove Method for Eary Software Quaity Prediction Based on Support Vector Machine Fei Xing 1,PingGuo 1;2, and Michae R. Lyu 2 1 Department of Computer Science Beijing Norma University, Beijing, 1875, China

More information

RIGID registration of two point sets is a classical problem

RIGID registration of two point sets is a classical problem 1 Towards non-iterative cosest point: Exact recovery of pose for rigid 2D/3D registration using semidefinite programming Yuehaw Khoo 1,2, Ankur Kapoor 1 1 Imaging and Computer Vision, Siemens Research

More information

DISTANCE TRANSFORMATION FOR NETWORK DESIGN PROBLEMS

DISTANCE TRANSFORMATION FOR NETWORK DESIGN PROBLEMS DISTANCE TRANSFORMATION FOR NETWORK DESIGN PROBLEMS A Ridha Mahjoub, Michae Poss, Luidi Simonetti, Eduardo Uchoa To cite this version: A Ridha Mahjoub, Michae Poss, Luidi Simonetti, Eduardo Uchoa. DISTANCE

More information

AUTOMATIC IMAGE RETARGETING USING SALIENCY BASED MESH PARAMETERIZATION

AUTOMATIC IMAGE RETARGETING USING SALIENCY BASED MESH PARAMETERIZATION S.Sai Kumar et a. / (IJCSIT Internationa Journa of Computer Science and Information Technoogies, Vo. 1 (4, 010, 73-79 AUTOMATIC IMAGE RETARGETING USING SALIENCY BASED MESH PARAMETERIZATION 1 S.Sai Kumar,

More information

Multiscale Representation of Surfaces by Tight Wavelet Frames with Applications to Denoising

Multiscale Representation of Surfaces by Tight Wavelet Frames with Applications to Denoising Mutiscae Representation of Surfaces by Tight Waveet Frames with Appications to Denoising Bin Dong a, Qingtang Jiang b, Chaoqiang Liu c, and Zuowei Shen c a Department of Mathematics, University of Arizona,

More information

A study of comparative evaluation of methods for image processing using color features

A study of comparative evaluation of methods for image processing using color features A study of comparative evauation of methods for image processing using coor features FLORENTINA MAGDA ENESCU,CAZACU DUMITRU Department Eectronics, Computers and Eectrica Engineering University Pitești

More information

Substitute Model of Deep-groove Ball Bearings in Numeric Analysis of Complex Constructions Like Manipulators

Substitute Model of Deep-groove Ball Bearings in Numeric Analysis of Complex Constructions Like Manipulators Mechanics and Mechanica Engineering Vo. 12, No. 4 (2008) 349 356 c Technica University of Lodz Substitute Mode of Deep-groove Ba Bearings in Numeric Anaysis of Compex Constructions Like Manipuators Leszek

More information

Massively Parallel Part of Speech Tagging Using. Min-Max Modular Neural Networks.

Massively Parallel Part of Speech Tagging Using. Min-Max Modular Neural Networks. assivey Parae Part of Speech Tagging Using in-ax oduar Neura Networks Bao-Liang Lu y, Qing a z, ichinori Ichikawa y, & Hitoshi Isahara z y Lab. for Brain-Operative Device, Brain Science Institute, RIEN

More information

A Novel Linear-Polynomial Kernel to Construct Support Vector Machines for Speech Recognition

A Novel Linear-Polynomial Kernel to Construct Support Vector Machines for Speech Recognition Journa of Computer Science 7 (7): 99-996, 20 ISSN 549-3636 20 Science Pubications A Nove Linear-Poynomia Kerne to Construct Support Vector Machines for Speech Recognition Bawant A. Sonkambe and 2 D.D.

More information

PHASE retrieval has been an active research topic for decades [1], [2]. The underlying goal is to estimate an unknown

PHASE retrieval has been an active research topic for decades [1], [2]. The underlying goal is to estimate an unknown DOLPHIn Dictionary Learning for Phase Retrieva Andreas M. Timann, Yonina C. Edar, Feow, IEEE, and Juien Maira, Member, IEEE arxiv:60.063v [math.oc] 3 Aug 06 Abstract We propose a new agorithm to earn a

More information

CORRELATION filters (CFs) are a useful tool for a variety

CORRELATION filters (CFs) are a useful tool for a variety Zero-Aiasing Correation Fiters for Object Recognition Joseph A. Fernandez, Student Member, IEEE, Vishnu Naresh Boddeti, Member, IEEE, Andres Rodriguez, Member, IEEE, B. V. K. Vijaya Kumar, Feow, IEEE arxiv:4.36v

More information

Straight-line code (or IPO: Input-Process-Output) If/else & switch. Relational Expressions. Decisions. Sections 4.1-6, , 4.

Straight-line code (or IPO: Input-Process-Output) If/else & switch. Relational Expressions. Decisions. Sections 4.1-6, , 4. If/ese & switch Unit 3 Sections 4.1-6, 4.8-12, 4.14-15 CS 1428 Spring 2018 Ji Seaman Straight-ine code (or IPO: Input-Process-Output) So far a of our programs have foowed this basic format: Input some

More information

CLOUD RADIO ACCESS NETWORK WITH OPTIMIZED BASE-STATION CACHING

CLOUD RADIO ACCESS NETWORK WITH OPTIMIZED BASE-STATION CACHING CLOUD RADIO ACCESS NETWORK WITH OPTIMIZED BASE-STATION CACHING Binbin Dai and Wei Yu Ya-Feng Liu Department of Eectrica and Computer Engineering University of Toronto, Toronto ON, Canada M5S 3G4 Emais:

More information

Collinearity and Coplanarity Constraints for Structure from Motion

Collinearity and Coplanarity Constraints for Structure from Motion Coinearity and Copanarity Constraints for Structure from Motion Gang Liu 1, Reinhard Kette 2, and Bodo Rosenhahn 3 1 Institute of Information Sciences and Technoogy, Massey University, New Zeaand, Department

More information

Quality Assessment using Tone Mapping Algorithm

Quality Assessment using Tone Mapping Algorithm Quaity Assessment using Tone Mapping Agorithm Nandiki.pushpa atha, Kuriti.Rajendra Prasad Research Schoar, Assistant Professor, Vignan s institute of engineering for women, Visakhapatnam, Andhra Pradesh,

More information

M. Badent 1, E. Di Giacomo 2, G. Liotta 2

M. Badent 1, E. Di Giacomo 2, G. Liotta 2 DIEI Dipartimento di Ingegneria Eettronica e de informazione RT 005-06 Drawing Coored Graphs on Coored Points M. Badent 1, E. Di Giacomo 2, G. Liotta 2 1 University of Konstanz 2 Università di Perugia

More information

AN EVOLUTIONARY APPROACH TO OPTIMIZATION OF A LAYOUT CHART

AN EVOLUTIONARY APPROACH TO OPTIMIZATION OF A LAYOUT CHART 13 AN EVOLUTIONARY APPROACH TO OPTIMIZATION OF A LAYOUT CHART Eva Vona University of Ostrava, 30th dubna st. 22, Ostrava, Czech Repubic e-mai: Eva.Vona@osu.cz Abstract: This artice presents the use of

More information