A Neural Network Model for Storing and Retrieving 2D Images of Rotated 3D Object Using Principal Components

A Neual Netwok Model fo Stong and Reteving 2D Images of Rotated 3D Object Using Pncipal Components Tsukasa AMANO, Shuichi KUROGI, Ayako EGUCHI, Takeshi NISHIDA, Yasuhio FUCHIKAWA Depatment of Contol Engineeng, Kyushu Institute of Technology, Kitakyushu, Fukuoka 84-855, Japan and Toshihio IDA Depatment of Electonics and Contol Engineeng Kitakyushu National College of Technology Kitakyushu, Fukuoka 82-985, Japan ABSTRACT A neual netwok model fo stong and eteving twodimensional (2D) images of otated thee-dimensional (3D) object is pesented, whee pncipal components of the 2D images ae used fo data compession. The netwok is fo examining how we can stoe and eteve huge amount of images, and how we can constuct a neual model of the mental otation which is supposed to play impotant oles in human peception. Numecal expements with the pesent model show that we can stoe and eteve a huge amount of 2D images due to the eigenspace method utilizing pncipal components, while achieving the calculation time fo eteving otated images being popotional to the otation angle. Keywods: Neual Netwok Model, Stong and Reteving 2D Images of Rotated 3D Object, Mental Rotation, Pncipal Components of 2D images, Data Compession. 1. INTRODUCTION A neual netwok model fo stong and eteving twodimensional (2D) images of otated thee-dimensional (3D) object is pesented. The pesent model, like ou othe models[1], [2], has been developed fom the following points of view. One is the engineeng point of view, whee an efficient method fo eteving otated images as well as the images tansfomed by tanslation, magnification, pojection, etc. is useful in the fields of image pocessing and compute vision fo measung otation angles, invaant patten matching, and so foth. Although we hee conside only 2D images of a otated 3D object, it is difficult to deal with because thee ae a huge amount of 2D images. The othe one is the psychological point of view, whee we would like to model mental otation and mental tansfomation [3], [4] because the mental tansfomation psychologically seems to play an impotant ole in human peception. Hee, we focus on the finding that the esponse time fo identifying a otated object is popotional to the otation angle. To ovecome the poblem that thee ae a huge amount of 2D images, we use the eigenspace method which is widely used fo data appoximation and compession in vaous fields such as compute vision, communication, statistics, and so on. The eigenspace method uses the pncipal components which ae deved by the pncipal component analysis (PCA). So fa, thee have been developed a numbe of PCA atificial neual netwoks which can lean data and output the pncipal components [8]. Hee, we do not ty to model a PCA neual netwok but use the pncipal components which ae supposed to be output by a PCA neual netwok. Fo eteving memozed data we ty to embed the psychological finding that the esponse time is popotional to the otation angle which howeve depends on whethe the image is otated in the pictue plane o in depth [5], [6], which suggests that 2D images fo diffeent otation axes had bette be pocessed diffeently, so we apply the eigenspace method to the images in each otation axis, espectively. In the following sections, we fist show the eigenspace method fo compessing otated images, and then we explain stong and eteving compessed images while intoducing a neual netwok model, and last we examine the pesent method by means of numecal expements. 2. IMAGE COMPRESSION AND STORING Pncipal Components of Images Suppose thee is a 3D (thee-dimensional) object as used in [3] which is pojected onto a 2D (two-dimensional) image consisting of N N pixels, and let x be an m(= N 2 ) dimensional vecto epesenting the oginal 2D image of the 3D object, and let x(a X, a Y, a Z, θ i ), x(a, θ i ) o x fo shot, be the zeo-mean vecto epesenting the 2D image of the 3D object otated by the angle θ i aound the otation axis a, (a X, a Y, a Z ) T, whee i and ae indices fo otation angles and axes, and X, Y and Z shows the coodinate axes. In this aticle, we use the images with N = 128, θ i = i 1 [degee] fo i = 1, 2,, n and n = 36 (see Fig. 1). The singula value decomposition (SVD) of a matx

Z which indicates that the appoximation eo becomes smalle as K becomes lage, and the cumulative popotion given by x θ i [degee] X Y XY Z coodinate system P K µ (K) j=1, d2 P n j=1 d2 (6) indicates how much the econstucted vecto esembles the oginal vecto x. Fo two econstucted image vectos and and the coefficient vectos = U (K) T, we have = U (K) T and x(1,,, 18) x(, 1,, 18) x(,, 1, 18) Fig. 1. Examples of 2D images; x indicates the oginal image, and the images with the expession x(a X, a Y, a Z, θ i ) ae the images otated by the angle θ i[degee] aound the axis a = (a X, a Y, a Z ) with espect to the XY Z coodinate system. X, [x 1, x 2,, x n ] R m n (m n 1) is given by X = U D V T, (1) whee matces U, [u 1, u 2,, u n ] R m n and V, [v 1, v 2,, v n ]R n n ae othogonal matces, D = diag[d 1, d 2,, d n] R n n is a diagonal matx, and d 1 d 2 d n. The vectos u and v ae called the left and ght singula vectos, espectively, and d ae called the singula values. The jth left singula vecto u is also the jth eigenvecto of the covaance matx XX T and d 2 i ae the eigenvalues of XX T. Futhe, u is called the jth pncipal component, especially fo statistical data, and the PCA neual netwoks descbed above can lean to output the pncipal components. The eigenspace method utilizes the patial Kahunen-Loéve (KL) expansion of the ode K(< n) given by, U (K) = KX j=1 p j u, (2) fo appoximating given data x, whee the matx is given by U (K) U (K), [u 1, u 2,, u K,,, ] (3) and the coefficient vecto is given by, (p 1, p 2,, p K,,, ) T, U (K) T x. (4) The mean appoximation eo is deved as e (K), 1 n nx i=1 x 2 = nx j=k+1 d 2, (5) Thus, we can use squae distance of 2 = and and 2. (7) fo evaluating the, whee we can educe the computational space and time because = (p 1, p 2,, p K,,, ) T is identical to a K dimensional vecto while is m = N 2 (> K) dimensional vecto. Stong and Matching Suppose we have stoed o memozed the coefficient vectos = U (K) T (M) x and the matx U (K) fo the angles θ i[m] fo i = 1, 2,, n and the axes a fo = 1, 2,, and then we have an input o peceived vecto to be decided whethe is one of the memozed x, and identify the angle θ j[m] and the axis a s[m] if = x sj[m], whee the subscpts [M] and [P ] ae used fo discminating memozed and peceived (o pesented ), espectively. To solve this eteving poblem, let denote the KL coefficient of with espect to U (K) as follows,, U (K) T. (8) Then, we have the squae distance to the memozed vectos as, (x [P ]) 2 (9) = and when = x sj[p ] we have U (K) T 2, (1) x sj[p ] = U (K) T U (n) s[p ] p(n) sj[p ] 2 p(k). (11) This equation shows that when K (< n) is sufficiently lage and is x sj[p ] = x (o s = and j = i), the squae distance takes the global minimum value ( o small value nea when thee is noise), although when K is too small may take small values nea even when and x ae not

x(,, 1, 9) x(1,,, 27) Fig. 2. Images with diffeent otation angles and axes to be discminated. the same. Thus, fo a sufficiently lage K we ae supposed to be able to identify the angle θ j[p ] = θ i[m] and the axis a s[p ] = a which achieves x sj[p ] = x by means of obtaining the smallest fo all memozed and i. Hee, with as smalle K as possible, we can take advantages of the patial KL coefficients, such that (1) the calculation time and space can be educed as descbed above, (2) the KL coefficients ae obust to noise since the noise is aveaged by means of the PCA scheme, (3) the squae distance x sj[p ] fo smalle K changes moe smoothly with the incease of the angle θ j[p ], thus we can seach the minimum of the distance moe stably as shown in the next section. 3. PROCESS TO RETRIEVE ROTATION ANGLES AND AXES Popety of Squae Distance vs. Angle Befoe descbing the eteving pocess, we would like to show how x sj[p ] changes fo the change of memozed angles θ i[m]. Namely, we hee fist show the esult of an expement, whee fo the oginal image x shown in Fig. 1 we use the images otated by the angles θ i[m] =, 1, 2,, 359 [degee] aound the axis a = (,, 1) T fo memozed images, and the two images shown in Fig. 2 fo peceived images. The esult is shown in Fig. 3, whee (a) shows the esult fo K = n = 36 using non-appoximated images, and (b) shows the esult fo K = 1 with the cumulative popotion µ K = 21.6% which does not seem so big but sufficiently big enough fo discminating the peceived images by means of the pocess to eteve otation angles shown in the next section. Futhe, the ange of the angles θ i[m] whose distance x sj[p ] fo x(,, 1, 9) is little than the minimum x sj[p ] fo x(1,,, 27) is 6 degees (fom 88 to 93 [degee]) fo K = 36 and 8 degees (fom 87 to 94 [degee]]) fo K = 1, which indicates the obustness of the images using K = 1 to the change of otation angle, although the ange is not so widened. Pocess to Reteve Rotation Angles Fo both (a) and (b) in Fig. 3, when we select the angle θ i[m] with x sj[p ] =, we can eject x(1,,, 27), accept x(,, 1, 9) and identify the angle θ i[m] = 9 which is coect fo this memozed squae distance d (x sj[p] ) squae distance d (x sj[p] ) 35 3 25 2 15 1 5 25 2 15 1 5 x sj[p] =x(1,,,27) x sj[p] =x(,,1,9) 5 1 15 2 25 3 35 memozed otation angle θ i[m] [degee] (a) K = n = 36 x sj[p] =x(1,,,27) x sj[p] =x(,,1,9) 5 1 15 2 25 3 35 memozed otation angle θ i[m] [degee] (b) K = 1 Fig. 3. The squae distance x sj[p ] of simila images x sj[p ] = x(,, 1, 9) and x sj[p ] = x(1,,, 27) shown in Fig. 2 to the images x otated by the angles θ i[m] = 1, 2,, 359 [degee] aound the axis a = (,, 1) T. axis a = (,, 1). Futhe fo obustness, we had bette select θ i[m] with the minimum of x sj[p ] less than a theshold d θ, whee d θ can be detemined as a value less than the minimum x sj[p ] fo x sj[p ] = x(1,,, 27) to be discminated. Actually, the minimum x sj[p ] fo xsj[p ] = x(1,,, 27) was 64.9 fo K = 36, and 35.5 fo K = 1. Instead of looking fo all θ i[m], fo faste seach and fo constucting the model fo the mental otation, we can seach successively as θ i[m] =, ±1, ±2,, ±18, whee negative θ i[m] indicates 36 θ i[m]. Then we can find out the angle θ i satisfying [M] min n (i 1)[M], d (K) (i +1)[M], dθ o. (12) Since this equation indicates that [M] is smalle than o equal to the theshold d θ as well as the local minimum with espect to the otation angle θ i (i = 1, 2, ), it can be equal to o vey nea to the global minimum. Since [M] fo K = 1 looks much smoothe than that fo K = n, the global mini-

d *[M] (x [P] ) x(.63,.55,.54, 254) x(.63,.53, 1.28,.57, 254) θ i*[m] x(.31,.27,.91, 177) x(.28,.92, 7.14,.26, 178) NET Fig. 4. Schematic diagam of the netwok NET =NET(a ) fo the otation axis a fo stong and eteving otation angles θ i. mum may be seached stably much moe fo K = 1. Pocess to Reteve Rotation Axes The above pocess to seach otation angles aound the otation axis a can be pocessed by the netwok NET shown in Fig. 4, whee the laye of pncipal components should lean and execute the PCA which is supposed to be ealized by a PCA neual netwok [8]. Futhe, on the topological laye the KL coefficient vectos p ae mapped topologically with espect to the otation angle θ i, whee this kind of topology peseving map is well known as the SOM (self-oganizing map) [7]. Although moe investigation on the netwok implementation may be inteesting, we put it fo futue eseach. By means of paallelly unning the netwoks NET fo all, the net NET which fistly outputs the esult θ i is supposed to indicate the coect otation axis a because the net NET is supposed to output only when the otation axis is coect as shown in the pevious section. 4. NUMERICAL EXPERIMENTS Fo memozed images, we have made 25 andom axes a = (a X, a Y, a Z ) fo = 1, 2,, 25 chosen fom the egion satisfying a X, a Y, a Z and < a 2 X + a Y + a 2 Z 1 and let a be momalised as a = 1. Fo each axis a, the oginal image x is otated by θ i = i 1 fo i = 1, 2,, 36, x(.1,.6,.79, 221) x(.13,.8, 6.91,.59, 22) Fig. 5. Examples of images and paamete values used fo peceived (left) and those ecognized (ght). Namely, on the lefthand side, the images ae geneated as peceived ones with the paamete values and fed to the eteval algothm, while on the ght-hand side, the images ae geneated fom the paamete values eteved. 1 8 6 4 2 6 12 18 24 3 36 Fig. 6. Relation between the calculation time and the angle. and apply the patial KL tansfom with the ode K = 1 to X = [x 1, x 1,, x 36] and obtain U (1) = [u 1, u 2,, u 1 ] and p (1) = U (1) x fo i = 1, 2,, 36. We geneated 5 images andomly fo peceived images, and an the eteve pocess and obtained the otation angles and the axes. Some examples ae shown in Fig. 5, which shows thee ae some eos in eteved paamete values, but they ae supposed to be eteved coectly. Fo all peceived images, the coect ecognition ate was 88.6%. Hee note that the axes fo peceived images wee geneated fom a X, a Y, a Z as fo the memozed ones, and 25 memo-

zed axes wee necessay fo this ecognition ate, and 1 = 25 4 axes ae supposed to be necessay fo all 3D axes since the axes fo a X, a Y, a Z is a quate of all axes (a X R a Y R and a Z ), whee the axis a = (a X, a Y, a Z ) with an angle θ is equivalent to the axis a = (a X, a Y, a Z) with an angle 36 θ. The calculation time is shown in Fig. 6, whee the pepocessing time fo obtaining the pncipal components via the SVD about 2 minutes is not included in the calculation time in Fig. 6. Hee, we used a pesonal compute with AthlonXP18+1.53MHz CPU and VineLinux 2.5. Fom the figue, we can seectj that the calculation time was popotional to the otation angle, which shows the same popety of the psychological findings [3]. 5. CONCLUDING REMARKS We have pesented a method fo stong and eteving 2D images of otated 3D object. As a esult of numecal expements, we could stoe a huge amount of 2D images due to the eigenspace method, and achieve the popety that the computational time to eteve a pesented image is popotional to the otation angle. Howeve, thee ae a numbe of poblems we would like to solve in the futue, such that fom the engineeng point of view we could not memoze the images fo all otation axes due to the memoy capacity, fom psychological point of view the pesent model has to be efined fo explaining othe psychological findings as well as the pesent one, and so on. 6. REFERENCES [1] S.Kuogi, T.Nishida, K.Yamamoto, Image tansfomation of local featues fo otation invaant patten matching, Poc. of ICONIP 21, vol.2, pp.693 698, 21. [2] S.Kuogi, T.Amano, Invaant patten matching using 3D image tansfomation of local featues, Poc. of JNNS 22, pp.117 12, 22. [3] R. N. Shepad and J. Metzle, Mental otation of thee-dimensional objects, Science, vol.171, pp.71 73, 1971 [4] R. N. Shepad and L.A.Coope, Mental images and thei tansfomations, MIT Pess: Cambdge, MA, 1982 [5] M.Pasons, Visual discmination of abstact mioeflected thee-dimensional objects at many oentations, Peception & Psychophysics, vol.42, pp.49 59, 1987. [6] N.Kanamo and Y.Takeda, The diffeence of mental pocesses between depth and plane otation in natual objects, Technical Repot on Attention and Cognition, No.24, 23. [7] T.Kohonen, Self-oganization and associative memoy, Spnge Velag, Belin, 1984. [8] K.I.Diamantaas, S.Y.Kung, Pncipal component neual netwoks, Jhon Wiley & Sons, Inc., 1996.