IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.3A, March 2006 7 Development of Face Trackng and Recognton Algorthm for DVR (Dgtal Vdeo Recorder) Jang-Seon Ryu and Eung-Tae Km Korea Polytechnc Unversty, Sheung-Cty, Kyongg-Do, Korea Summary In ths paper, the effcent face trackng and recognton algorthm and system are presented for DVR (Dgtal Vdeo Recorder). Vdeo survellance system DVR, whch s now replacng the tradtonal analog CCTV (Closed-Crcut Televson) system, records dgtal mages captured from camera and replays them. Recently ntellgent home system ntegrates DVR system wth bometrcs technques n order to strengthen vdeo securty. For ths purpose, we tracks faces and gets facal area by usng PTZ (pan-tlt-zoom) camera, and then develops face trackng and recognton algorthm for DVR system to offer outstandng relablty of securty system. In order to ncrease face recognton rates, the effcent pre-processng technque and HMM (Hdden Markov Model) method based on DCT are proposed and ther performances are analyzed. Key words: Face trackng, face recognton, DVR, DCT-HMM, vdeo survellance. area through Pen-Tlt and Zoom-n/Zoom-out functon, and then recordng them hgh-resoluton. If only profle face mage or a part of face s detected, ths system tracks and zooms repeatedly to extract a front face, as a result, the effcency of face recognton s hgher and a front face mage can be recorded hgh resoluton. Consequently, when or where t s hard for man to watch and montor contnuously, the proposed ntellgent survellance system based on face trackng and recognton technology s essental for unmanned survellance system, so t can support the ntellgent recognton under the varous crcumstances and operate the alarm n actve manner. Therefore, ths system can create new addtonal servces and expand DVR markets afterwards. In ths paper, we presented face trackng and recognton algorthm that are sutable for real-tme DVR system. Specally, face detecton and trackng method by usng PTZ (pan-tlt-zoom) and fast face recognton algorthm are proposed.. Introducton Current vdeo survellance system such as CCTV has functoned only basc securty montorng -gate control through vdeo call or storage and management of mages captured from securty camera (nstalled) n resdental area, so ts servce range has been very lmted. On the other hand, as dgtal technology of vdeo survellance system makes a rapd progress, the needs of ntellgent home securty system are tend to ncrease. So the mportance of DVR (Dgtal Vdeo Recorder) system, whch s the man system of dgtal home securty, s now ponted up. Also development of the next-generaton vdeo survellance system technology s requred to support the judgment and control of the home securty envronment such as gong n and out, a vst and nvason. To ths am, DVR s ntegrated wth face recognton and trackng technology [-4] to enhance the effcency of vdeo survellance system, store the vdeo nformaton more effcently, and organze optmum securty system. In partcular, f vstor s far away from the camera, hs/her face could be recorded smaller or couldn t be dscrmnated. In order to compensate these dsadvantages, perfect securty system should be developed for fndng human face automatcally, trackng and recognzng facal 2. Face trackng and recognton algorthm for DVR 2. The proposed DVR system wth face recognton Fgure shows the proposed DVR system based on face trackng and recognton. Frst, RGB mage s captured from the PTZ camera. Through the face detecton functon the system judges whether captured mage s ncluded n face canddate area or not. To do ths, Haar-Lke method [5,6] detects face area, whch s so smple as to detect the face area quckly and accurately. Haar-lke features can be constructed n many shapes and computed n dfferent ways. In our system, Haar-lke features are used as shown n fgure 2. The used Haar-lke orented edge flter havng a block structure can be evaluated very fast. We used Adaboost method to select feature by usng threshold-based weak classfers that output dscrete values n {-,}. If there s a face canddate area, ths system, nterlockng wth DVR, tracks faces at hgh-speed and extracts facal features by movng the PTZ camera up and down, rght and left, and zoom n and out. Manuscrpt receved March 25, 2006. Manuscrpt revsed March 30, 2006.
8 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.3A, March 2006 Fg. The proposed DVR system n the dagonal drecton than that of (a). Ths resulted n a great change of frequency coeffcent, and t accordngly affected detecton of facal features and decreased face recognton rates. Therefore mage ntensty s normalzed n pre-processng unt as shown n Fgure 4. After capturng mage, the average value (A) of current mage s calculated and the averaged ntensty (B) of the mages whch are accumulated n database s added to each pxel of nput mage. And then t subtracts A from B to prevent over-change of pxel value and the hgh varaton of the DCT nformaton. Fnally each pxel value s readjusted to 8bt level. Fgure 5 shows that the pre-processed DCT nformaton s very smlar to the orgnal one. Accordng to the proposed method, average ntensty 20 s compensated to (a) of Fgure 3. (a) Orgnal mage Fg. 2 Haar-lke Feature Samples Tracked face mage s modeled and processed by the proposed Normalzed-Intensty/Dscrete-Cosne- Transform/Hdden Markov Model (NI-DCT-HMM) method. In our system, we recognzes the face by usng low frequency coeffcents, that s nsenstve to changes such as facal expresson changes, whether wearng accessores or not and so on, as a feature pont of HMM method. Ths method s recognton rates s hgh despte feature ponts are few. Accordng to the recognzed results, the system takes proper actons such as alarmng. In the process of recognton, the system also uses the stored data traned by NI-DCT-HMM method. 2.2 Normalzed Intensty Input mage s affected by outsde llumnaton. Face recognton rates s decreased due to the varaton of the llumnaton. To solve ths problem, mage s normalzed wthn a certan range of ntensty. Fgure 3 shows how excessve changes of ntensty affect frequency doman. The mage n Fgure 3 (b) s brghter than (a), and t dsplays the DCT result that adds 00 to the pxel value of orgnal mage. Addtonally, DCT value of (b) s greater (b) Changed mages (Intensty 00 added to the orgnal mage) Fg. 3 DCT changes due to ntensty (B) Average accumulated n database (A) Average Calculaton Fg. 4 Normalzed brghtness ntensty Subtract Re-Levelng Pxel-Value
IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.3, March 2006 9 the ndvdual hdden states. HMM λs characterzed by three parameters (A,B,П). Let O=(o,o 2,,o T ) where each o t s a D-element vector, be the observed sequence at T dfferent observaton nstance and correspondng state sequence be Q=(q,q 2,,q T ) where q t {,2,,N}, N beng the number of states n the model. Fg. 5 Normalzed Image and DCT due to brghtness compensaton 2.3 HMM modelng and tranng The process of HMM (Hdden Markov Model) modelng s shown n fgure 6 and fgure 7. Frst, D-HMM s runnng wth nput mage. The wdth and heght of mage s dvded by overlappng block that has the same wdth and heght (L). Overlappng sze (P) of consecutve block s same as or smaller than L-. And then 2D-HMM s runnng by dvdng forehead nto 3 areas, eye nto 5, nose nto 5, mouth nto 5, and chn nto 3. In each area, DCT s performed on 2x2 block (DCT sze for recognton s greater than general DCT block) to detect the feature pont of low frequency coeffcents of 3x3 sze ncludng DC area. Fgure 6 shows the dvdng block for 2-D HMM modelng.. Fg. 6 D-HMM & 2D-HMM The 2-D DCT s defned as N N ( 2x + ) uπ ( 2y+ ) vπ () Fuv (,) = α() xα() y f(, xy)cos cos x= 0 y= 0 2N 2N where : for w= 0 N α( w) = 2 : for w=,2,..., N N The feature ponts of DCT n each area are tranng by usng mult-stage HMM method as shown n fgure 7. The tranng sequence orderng s forehead-eye-nose-lb-chn. The HMM modelng s descrbed n detal as follows. Every HMM s assocated wth non-observable (Hdden) states, and an observable sequence generated by Fg. 7 Mult-Stage HMM The HMM parameters λ=(a,b, ) are defned as follows : A : s the transton probablty matrx, The elements of A are: α =, j P[ q = j q = ] (2) t+ t B ; s the emsson probablty matrx determnng the output observaton gven that the HMM s n a partcular state. Every element of ths matrx bj( ot) = j N, t N (3) s the posteror densty of observaton o t at tme t gven that HMM s n state q t =s j П : s the ntal state dstrbuton matrx wth -th entry π = Pq [ = ] (4) A mult-stage HMM s a generalzaton of a HMM where each state n a one-dmensonal HMM s tself an HMM as shown n fgure 7. Thus, t conssts of a set of super states along wth a set of embedded states. The super states model the two-dmensonal data along one drecton. Whle the embedded HMMs model the data along the other drecton. The elements of an embedded HMM are. A set of N 0 super states 2. The ntal super state probablty dstrbuton 0 ={π 0, }, where π 0, s the probablty of beng n super state at tme zero. 3. The state transton matrx between the super states, A 0 ={a 0,j }, where a 0,j s the probablty of makng a transton form super state to super state j.
20 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.3A, March 2006 4. In an embedded HMM, each super state k s tself a standard HMM defned by the parameter set Λ k =(П k,a k j,b k ) where П k s the ntal state probablty dstrbuton of the embedded states, A k j s the state transton matrx for the embedded states, and B k s the probablty dstrbuton matrx of the observaton, wth a contnuous mxture of the observaton. Wth a contnuous mxture embedded HMM, the observatons are characterzed by contnuous probablty densty functon whch are taken to be fnte Gaussan mxtures of the form. k M k k k k b ( O ) = c N( O μ, U ) (5) t0, t, m t0, t, m m m= where c k m s the mxture coeffcent for the mth mxture n state of super state k and k k NO ( μ, U ) s a t0, t, m m Gaussan densty wth a mean vector μ k m and covarance matrx U k m. The procedure of HMM Tranng s as follows. Step : Cluster all Ω tranng sequences, generated from Ω number of face mages of one partcular subject, {O ω }, ω Ω each of length T, n N Clusters usng some clusterng algorthm. Each clusterng wll represent a state of the tranng vector. (From to N) Step 2: Assgn cluster number of the nearest cluster to each of the tranng vector. t th trang vector wll be assgned a number I f ts dstance, sat Eucldean dstance, s smaller than ts dstance to any other cluster, j, j. Step 3: Calculate mean {μ }, and covarance matrx { } for each state(cluster) μ = ot for N N (6) o N t = ( ot μ) ( ot μ) for N (7) where N s the number of vectors assgned to state Step 4: Calculate A and matrces usng event countng. No. of occurrence of o π = for N (8) No. of tranng sequences Ω No. of occurrence of ot,o t+ j αj = No. of occurrences of ot (9) for,j N, t T- Step 5: Calculate the B matrx of probablty densty for each of the tranng vector for each state. ( o ) t t μ j ( ot μ ) 2 b ( o ) = e (0) j t 2 2 (2 π ) D R b j (o j ) s Gaussan. j N, o t s of sze Dx Step 6 : use the Vterb Algorthm to fnd the optmal state sequence Q * for each tranng sequence. The state reassgnment s done. A Vector s assgned state f q t * = Step 7 : The reassgnment s effectve only for those tranng vectors whose earler state assgnment s dfferent from the Vterb state. If there s any effectve state reassgnment, repeat Steps 3 to 6, else stop and the HMM s traned for the gven tranng sequences. Step 8 : After tranng of nput mage, we tested t on the recognton module. The data wth hghest probablty among the traned sequences s the face traned currently. Ths hghest probablty s stored n database for the usage of t n the recognton process afterward. 2.4 The proposed face recognton algorthm The proposed face recognton algorthm s descrbed n fgure 8. The feature ponts of the tranng mages are stored n database through 2-D HMM method based on DCT. In the process of face recognton, feature pont s frst extracted from nput mage and compared wth that of database, and then face s recognzes by Vterb back trackng as shown n fgure 8. Fgure 8 Flowchart of the proposed face recognton In order to judge whether database ncludes currently recognzed person or not, probablty value s used. The traned HMMs are used to compute the lkelhood functon as follows. Let O be the DCT-based observaton sequence generated from the face mage to be recognzed. Step : usng the vterb s back trackng algorthm
IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.3, March 2006 2 * [ ] Q = arg max P O, Q λ () Q Step 2: The recognzed face corresponds to for whch * the lkelhood functon P O, Q λ s maxmum. Step 3: repeat step and 2 for all the stored data n database. Step 4: determne the canddate by comparng the probablty n Step 2 wth the probabltes stored n database usng maxmum lkelhood. Step 5: repeat Step 4 after removng the best canddate n database and select more 2 canddates. Step 6 : f the probablty s generated dfferent from the probabltes of database durng Step 4-5, the system judges the undentfed person to be an nvader, stores hs/her mage newly, and then alarm the appearance of nvader. Step 7 ; f the probablty value s located n a certan range n step 4-5, recognton s consdered to be success and then user name s dsplayed. 3. Expermental results In ths secton, we evaluated the performance of the proposed face recognton algorthm for DVR. The used CPU s Celeron GHz(52M), and NTSC PTZ camera resoluton s 60K pxel. And Graber applcaton s made by MS Dot Net 2000 wth Drect-X 9.0. Fgure 9 shows the developed face recognton DVR system wth PTZ camera. It can track and recognze the face of nvader. If the porton of face area s small, the PTZ camera operates zoom-n to get the acceptable facal mage. Fg. 9 The proposed face recognton DVR system 3.2 Face mage for recognton Fgure 0 shows untraned mage data for face recognton. Varous experments are tested for each case such as sze, nput angle, and face angle. The nput mage sze s lmted wth 00x80~352x288. Then, we have evaluated the average performance of the proposed method through 00 trals, The developed DVR system conssts of 3 parts. The real-tme mage s captured from PTZ camera (), face area s tracked and extracted for face recognton (2), and then the system fnds the most smlar face from database and dsplays regstered face (3). 3. Database of face mages For ths experment, sample mages of 00 persons are used. DCT-HMM method s used for mplementng varous knds of face recognton database. The traned face mages nclude 5 knds of dfferent mages per person by consderng facal expresson, posture, whether wearng glasses or not and so on. Fg. 0 Samples for face recognton 3.3 Frontal vew face recognton tme In ths experment, we compared the face recognton speed for the frontal vew mage wth normalzed ntensty.
22 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.3A, March 2006 Fgure shows the face recognton tme where the normalzed ntensty method s used or not, respectvely. 3 2.8 2.6 2.4 2.2 2.8.6.4.2 4 7 0 3 6 9 22 25 28 3 34 37 40 43 46 49 (a) Face recognton tme wthout normalzed ntensty The face s rotatng from center to rght. The captured mage contans the non-frontal mage. The averaged recognton tme of our system s 3.55 seconds. The averaged recognton rate of our system s 82% wthn left-rght ±4 degree, but the recognton rate s lowered rapdly around 3. degree where DCT coeffcent changes greatly. Ths s conformed to the area that the mage around eyes starts to change. As a result, n the process of HMM, the falure of state concernng about eyes s consdered to decrease recognton rates. sec 3. 2.9 2.7 2.5 2.3 2..9.7 00% 90% 80% 70% 60% 50% 40% 30% 20% 0% 0% 0.00º.00º 2.50º 2.70º 2.90º 3.0º 3.30º 3.50º 3.70º 3.90º.5 4 7 0 3 6 9 22 25 28 3 34 37 40 43 46 49 Tmes Fg. 2 Recognton rates for the nonfrontal face mage (b) Face recognton tme wth normalzed ntensty Fg. recognton tme for frontal vew face On the average, the recognton tme wthout normalzed ntensty s 2.6 seconds and recognton rates s 5% On the other hand, recognton tme and rates wth normalzed ntensty s 2.59 seconds and 98%, whch shows hgher performance. We know that the face recognton s very senstve to the varaton of llumnaton. But, the proposed method mproved the performance wth more lttle processng tme by compensatng the dfferences between the envronments when obtanng mages from camera and mplementng recognton database. 3.4 Nonfrontal face mage recognton tme The performance of our system s evaluated for the nonfrontal face mage. Fgure 2 shows the recognton rates of our algorthm by varyng the vew-angle of face. 3.5 Top and bottom recognton tme The performance of our system s evaluated for the nonfrontal face mage. Fgure 3 shows the recognton tme of our system by varyng face up and down. In most case, camera s equpped n the celng or wall. The varaton of vew-angle up and down s not serous. So, the mage n the top and bottom recognton s tested wthn up and down ±5 degree. In the case of the top-vew, the recognton tme and rates are average 2.22 seconds and 92.73% respectvely. In the case of the down-vew, they are average 2.42 seconds and 88% respectvely. The processng tme s slghtly vared n the range of 0 degree~3 degree. But the more processng tme s needed for more than 3 degree. Results of top and bottom are dfferent. Because two cases have few smlartes and top recognton extracts more feature ponts of face than bottom recognton.
IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.3, March 2006 23 4. Concluson In ths paper, we presented the face trackng and recognton algorthm for DVR system. sec 2.9 2.7 2.5 2.3 2..9.7.5 Case Top (a) Smulaton results of mages wthn top ±5 degree sec 2.9 2.7 2.5 2.3 2..9.7.5 5 4.2 3.6 2.9 2..4 0.7 degree Case Down 5 4.2 3.6 2.9 2..4 0.7 degree (b) Smulaton results of mages wthn bottom ±5 degree Fg. 3 Recognton wthn top and bottom area The proposed system contans face detecton usng Haarlke feature, trackng usng PTZ camera, the normalzed ntensty method for pre-processng, 2D DCT-HMM, and Vterb back-trackng decson. Face recognton algorthm based on DCT-HMM method has showed faster processng speed and hgher recognton rates n the smaller database. But the performance can be degraded because of the changes of llumnaton. To recover ths problem, normalzed ntensty method s proposed. Expermental results showed that recognton rates wth normalzed ntensty s greater 6 tmes than wthout normalzed. On the other hand, recognton tme ncreased only 0.5 seconds. The ncrement of recognton tme s comparatvely smaller, because the proposed algorthm s so smple that t can calculate at hgh speed. For the nonfrontal face mages, the performance of our system about left and rght-angle mages has showed the same recognton rates due to the symmetry characterstcs of face, but n the case of top and bottom mages were dfferent accordng to wth/wthout capturng eye mage. That s, top mage captures more mages around eyes than bottom mage. Ths shows frequency coeffcent around eyes s consdered to play an mportant role for overall performance. Therefore t s very mportant to mplement database and extract face mages focused on detectng enough face area wth eyes. Acknowledgments Ths work s fnancally supported by SMBA(small and medum busness admnstraton) through the project of the Technology Innovaton and Development for Small and Medum Busness. References [] F.S Samara and S. Young, HMM-based Archtecture for Face Identfcaton, Image and Vson Computng, vol. 2, no. 8, pp537-543 Oct 994. [2] Mng-Husan Yang, davd J, Kregman, and Narendra Ahuja, Detectng Faces n Image : A Survey, IEEE Trans. Pattern Analyss and machne Intellgence, vol. 24 no., jan2002. [3] Baback Moghddam and Alex Pentland, Probablstc Vsual learnng for Object Representaton IEEE Trans Pattern Anayss and machne Intellgence, vol. 9, no. 7 July. [4] Theme Secton "Face and Gesture Recognton", IEEE Pattern Recognton and machne Intellgent, vol. 0, no 4, July 997. [5] P. Vola and M. Jones. Robust Real-tme Object Detecton, Internatonal Journal of Computer Vson, 2002. [6] R. Lenhart and J. Maydt, "An Extended Set of Haar-lke Features for Rapd Object Detecton," IEEE Internatonal Conference on Image Processng, vol., pp. 900-903, Sep. 2002. [7] F.S Samara and S. Young, HMM-based Archtecture for Face Identfcaton. Image and Vson Computng, vol 2, No. 8, pp537-543 Oct 994. [8] K. R. Rao and P. Yp, Dscrete Cosne Transform, Algorthms, Advantages, Applcatons. New York: Academc, 990. [9] Ara V. Nefan and Monson H. Hayes III "An Embedded HMM-Based Approach for Face Detecton And Recognton", ICASSP, vol. 6, pp. 5-9, 999. [0] H. Othman and T. Aboulnasr "Low Complexty 2-D Hdden Markov Model for Face Recognton", ISCAS, May 28-3, 2000. [] Kohr and V.V.Desa, Face recognton usng a DCT- HMM approach, Fourth IEEE Workshop on Applcatons of Computer Vson, pp. 226 23, Oct. 998. [2] S. Werner and G. Rgoll, "Pseudo 2-Dmensonal Hdden Models In Speech Recognton" AIEEE Workshop on Automatc Speech Recognton and Understandng, pp. 44~444, Dec. 200. [3] S. H. Ln, S. Y. Kung, and L. J. Ln, "Face Recognton Detecton by Probablstc Decson-Based Neural Network, IEEE Trans. on Neural Network, vol. 8, no., pp. 4-32, Jan. 997.
24 IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.3A, March 2006 [4] A. Lants, C.J. Taylor, and T.F. Cootes, "An Automatc Face Identfcaton System Usng Flexble Appearance Models, Image and Vson Computng, vo.3, no. 5, pp. 393-40. [5] A Rajagopalan, K. Kumar, J. Karlekar, R. Manvasakan, M. Patl, U. Desa, P. Poonacha, and S. Chaudhur, "Fndng Faces n Photographs, Proc. Sxth IEEE Int'l Conf. Computer Vson, pp. 640-645, 998. [6] R. J. Quan, M. I. Sezan, and K. E. Matthews, "A robust realtme face trackng algorthm," IEEE Internatonal Conference on Image Processng, 998. Eung-Tae Km receved the B.S. degree (summa cum laude) n Electroncs Engneerng from Inha Unversty, Inchon, Korea, n 99, and the M.S. degree and the Ph.D. degree n electrcal & electroncs engneerng from Korea Advanced Insttute of Scence and Technology (KAIST), Taejon, Korea, n 993 and 999, respectvely. From 998 to 2004, he was a senor researcher at the Dgtal TV Lab. of LG Electroncs Co. Ltd. Snce 2004, he has been wth the Department of Electroncs Engneerng at Korea Polytechnc Unversty. Hs research nterests nclude low btrate vdeo transmsson technques, MPEG system and vdeo, multmeda transmsson, and DTV/DMB/DVR system. Jang-Seon Ryu receved the B.S degree n Electroncs Engneerng from Korea Polytechnc Unversty, Sheung, Korea, n 2006. Currently, he s a M. S. student n the Department of Informaton and Communcaton, Graduate school of Knowledge-Based Technology and Energy, Korea Polytechnc Unversty. Hs research nterests nclude hghperformance vdeo codec on the embedded system, multmeda system-on-chp desgn, dgtal vdeo recorder and dgtal multmeda broadcastng.