Real-Time Non-Rigid Multi-Frame Depth Video Super-Resolution

Real-Time Non-Rigid Muli-Frame Deph Video Super-Resoluion Kassem Al Ismaeil 1, Djamila Aouada 1, Thomas Solignac 2, Bruno Mirbach 2, Björn Oersen 1 1 Inerdisciplinary Cenre for Securiy, Reliabiliy, and Trus, Universiy of Luxembourg. {kassem.alismaeil,djamila.aouada,bjorn.oersen}@uni.lu 2 Advanced Engineering Deparmen, IEE S.A. {homas.solignac,bruno.mirbach}@iee.lu Absrac This paper proposes o enhance low resoluion dynamic deph videos conaining freely non rigidly moving objecs wih a new dynamic muli frame super resoluion algorihm. Exisen mehods are eiher limied o rigid objecs, or resriced o global laeral moions discarding radial displacemens. We address hese shorcomings by accouning for non rigid displacemens in 3D. In addiion o 2D opical flow, we esimae he deph displacemen, and simulaneously correc he deph measuremen by Kalman filering. This concep is incorporaed efficienly in a muli frame super resoluion framework. I is formulaed in a recursive manner ha ensures an efficien deploymen in real ime. Resuls show he overall improved performance of he proposed mehod as compared o alernaive approaches, and specifically in handling relaively large 3D moions. Tes examples range from a full moving human body o a highly dynamic facial video wih varying expressions. 1. Inroducion The recen developmens in deph sensing echnologies, be i ime of fligh (ToF) cameras or srucured ligh cameras, have seen he explosion of heir applicaions in gaming, auomoive sensing, surveillance, medical care, and many more. The major problem of hese sensors is heir high conaminaion wih noise and low spaial resoluion. In addiion, in he case of large disances beween he sensor and he scene of ineres, a similar effec is observed even by using a relaively high resoluion deph sensor. In his paper, we consider dynamic deph videos wih one or muliple moving objecs deforming non rigidly. This is a very ypical scenario encounered in people sensing, cloh This work was suppored by he Naional Research Fund, Luxembourg, under he CORE projec C11/BM/1204105/FAVE/Oersen. Figure 1. Differen super resoluion mehods applied o a real low resoluion dynamic deph sequence capured wih a ToF camera wih SR scale facor of r = 4. (a) Low resoluion deph frame. (b) Bicubic inerpolaion. (c) Pach Based Single Image Super Resoluion (SISR) [5]. (d) Upsampling for Precise Super Resoluion (UP-SR) [4]. (e) Proposed algorihm (50 ms per frame). deformaion, hand gesure, variaions of facial expressions, o name a few. Such scenes are more challenging han saic scenes. Indeed, in addiion o challenges due o noise and ouliers, non rigid deformaions in 3D cause occlusions, which resul in missing daa, and in undesired holes. Super-resoluion (SR) algorihms have been proposed as a soluion o his problem. Two caegories of algorihms may be disinguished; muli frame SR which use muliple frames in an inverse problem formulaion o reconsruc one high resoluion frame [16, 7, 4]. The second caegory is known as single image SR. I is based on dicionary learning and a heavy raining [5, 12]. In [4], we proposed he firs dynamic muli-frame deph SR. This algorihm is, however, limied o laeral moions, 1

and fails in he case of radial deformaions. Moreover, i is no pracical due o a heavy cumulaive moion esimaion process applied o a cerain number of frames buffered in he memory. Alernaively, a recursive formulaion may be hough of as in [15] where an ieraive SR was proposed based on a block affine moion model resuling in a relaively efficien processing. This, however, is no applicable o non laeral moions. Earlier aemps for recursive SR approaches have proposed o use a Kalman filer formulaion [8, 10, 9, 13, 18]. These mehods work only under wo condiions: consan ranslaional moion beween low resoluion frames which represens he sysem moion model (i.e. ransiion marix), and inensiy consisency assumpion beween each pair of images in he video sequence. In he case of dynamic deph videos, hese assumpions are no always valid. Indeed, for such videos, individual pixel moions have o be racked hrough he video. A local moion model such as a dense 2D opical flow as in [4] is no sufficien, i is necessary o accoun for he full 3D moion in he SR reconsrucion, known as scene flow, or he 2.5D moion, known as range flow. For a reduced complexiy we herein propose o approximae range flow by esimaing radial moions on op of he 2D opical flow. Moreover, we propose a recursive deph muli-frame SR algorihm by using muliple Kalman filers. To ensure efficiency, we propose o rea a video as a se of one dimensional signals. By so doing, we show ha we reach an approximaion of range flow; which enables us o ake radial deformaions ino accoun in he SR esimaion. To adequaely preserve he smoohness properies of he deph surface, and remove noise and blur wihou over smoohing, we propose o use a muli level version of he ieraive bilaeral oal variaion regularizaion given in [11]. In summary, he conribuion of his paper is a new muli frame deph SR algorihm which has he following properies: 1) Recursive, hence, suiable for real ime applicaions. 2) Robus o radial moions wihou explicily compuing range flow. 3) Accurae deph video reconsrucion hanks o he proposed muli level ieraive bilaeral regularizaion. An overview of he proposed algorihm is shown in Figure 2. The remainder of he paper is organized as follows: Secion 2 gives he problem for deph video super resoluion. Secion 3 explains he proposed concep for handling radial moion wihin he super resoluion framework. The proposed recursive deph video SR algorihm is presened in Secion 4. Quaniaive and qualiaive evaluaions and comparisons wih oher approaches are given in Secion 5. Finally, he conclusion is given in Secion 6. The following noaions will be considered: bold small leers correspond o vecors. Bold capial leers denoe marices. Ialic leers are scalars. p denoes a pixel posiion on image plane a insan, and m denoes he corresponding 2D opical flow a. 2. Background and Problem Formulaion We briefly review he problem of muli frame SR of dynamic deph videos and highligh he challenges ha remain unackled by exising approaches. Le us consider a video of N observed low resoluion (LR) deph frames of a dynamically deforming deph scene F acquired using a deph sensor, ToF or srucured ligh. The scene is assumed o conain one or muliple moving objecs. Each LR frame g, = 1,, N, is represened by a column vecor of size (m 1) corresponding o he lexicographic ordering of frame pixels. The objecive of deph SR is o reconsruc a higher resoluion (HR) deph video {f, = 1,, N}, where each frame is of size (n 1) wih n m = r N being he SR scale facor. The classical muli frame deph SR problem may be simplified by reconsrucing one HR frame a a ime, referred o as reference frame, by using he observed video. Therefore, if he reference ime is 0, hen he problem is o reconsruc f 0 using he N = (N 0 + 1) preceding measuremens. The operaion may be repeaed for 0 = 1,, N. A noisy LR observaion is modelled as follows: g = DHM 0 f 0 + n, 0 and, 0 [1, N] N, (1) where D is a known consan downsampling marix of dimension (m n). The sysem blur is represened by he ime and space invarian marix H. The (n n) marices M 0 correspond o he moion beween f 0 and g before heir downsampling. The vecor n is an addiive whie noise a ime insan. Wihou loss of generaliy, boh H and M 0 are assumed o be block circulan marices, so hey are commuaive. As a resul, he esimaion of f 0 may be decomposed ino wo seps; esimaion of a blurred HR image, followed by a deblurring sep. While he LidarBoos algorihm [16] is a reference mehod for muli frame deph SR, i is only applicable o saic scenes for objec scanning. The UP-SR algorihm in [4] is, so far, he only deph muli frame SR proposed for dynamic scenes. This algorihm is based on wo key componens. The firs one is o densely upsample he observed LR sequence prior o any operaion. This is shown o ensure a more accurae regisraion of frames. The resuling r imes upsampled image is defined as g = U g, where U is an (n m) upsampling marix. The second componen of UP-SR is o use a cumulaive moion compensaion approach beween he reference frame and all observaions. This operaion sars by esimaing he moion beween consecuive frames, using classical dense 2D opical flow esimaion beween he upsamled versions g 1

Figure 2. Flow char of he proposed muli frame deph super resoluion algorihm for dynamic deph videos conaining one or muliple non rigidly deforming objecs. and g, namely, ˆM 1 = arg min M Ψ (g 1, g, M), (2) where Ψ is a dense opical flow-relaed cos funcion and g = M 1g 1 +δ. (3) The vecor δ is referred o as he innovaion image. I conains novel poins appearing, or disappearing due o occlusions or large moions. This innovaion is assumed in [4] o be negligible. In addiion, similarly o [8], for analyical convenience, i is assumed ha all pixels in g originae from pixels in g 1 in a one o one mapping. Therefore, each row in M 1 conains 1 for each posiion corresponding o he address of he source pixel in g 1. This assumpion of bijeciveness implies ha he marix ˆM 1 is assumed o be an inverible permuaion, ˆM 1 s.., [ ˆM 1] 1 =. Furhermore, is esimae leads o he following regisraion o g 1 : g 1 = ˆM 1 g. (4) Using a cumulaive moion compensaion approach, he regisraion of a non consecuive frame g o he reference g 0 is achieved as follows: g 0 = ˆM 0 g = 0 ˆM 0+1 ˆM 1 } {{ } ( 0 ) imes g. (5) Choosing he upsampling marix U o be he ranspose of D, he produc UD = A gives a block circulan marix A ha defines a new blurring marix B = AH. Therefore, he esimaion of f 0 sars by esimaing is blurred version h 0 = Bf 0. The daa model in (1) becomes g 0 = h 0 + ν, 0 and, 0 [1, N] N, (6) 0 where ν = ˆM U n is an addiive noise vecor of lengh n. I is assumed o be independen and idenically disribued. Using an L 1 norm, he blurred esimae is found by pixel wise emporal median filering of he upsampled regisered LR observaions such as: ĥ 0 = arg min h 0 N h 0 g 0 = 0 1 = med {g 0 } N = 0. (7) Then, as a second sep, follows an image deblurring o recover ˆf 0 from ĥ 0. The robusness of he UP-SR algorihm in handling large moions is achieved hanks o he cumulaive moion approach combined wih upsampling, as has been shown experimenally in [4]. However, as described above, he only considered moions are laeral moions using 2D dense opical flow. Radial displacemens in he deph direcion, ofen encounered in deph sequences, are herefore no handled. Moreover, he UP-SR regisraion sep is based on a heavy cumulaive moion esimaion which makes his algorihm no suiable for real ime applicaions. 3. Range Flow Approximaion We argue ha he above menioned challenges may be resolved by incorporaing he 2.5D version of dense opical flow [20], known as range flow, in he UP-SR framework. The direc compuaion of range flow can be complex. Insead of is direc compuaion, we propose an approximaion by decomposing range flow ino 2D opical flow and a filered radial moion. 3.1. Flow Decoupling In order o address he problem of radial moions, i is imporan o consider he full 3D moion per pixel. A a ime insan, and for a pixel posiion p = (x, y ) on he sensor image plane, he deph surface F can be defined as he following mapping: F : R 2 N R 3 p (x, y, z (x, y )). (8)

The deformaion of he surface F from ( 0 1) o 0 akes he poin p 0 1 o a new posiion p 0. Given u 0 = x 0 and v 0, he vecor l = (u 0, v 0, 1) T represens 0 = y he direcion of he displacemen from p 0 1 o p 0. The surface deformaion may hen be expressed hrough he derivaive of F following he direcion l resuling in a range flow (u 0, v 0, w 0 ) where he laeral displacemen is m 0 = (u 0, v 0 ) and he radial displacemen in he deph direcion is w 0 = z 0. Applying he gradien consrain on he deph oal derivaive, we find he range flow consrain as firs proposed in [20], and defined as follows: z u 0 z x + v 0 0 y + w 0 = dz 0 d. (9) In his work we propose o decouple m 0 from he radial displacemen w 0. We compue m 0 using available approaches for 2D opical flow esimaion. We compue he 2D opical flow using he low resoluion 2D inensiy images associaed wih he considered deph sensor. Noe ha he inensiy (ampliude) images provided by he ToF camera can no be used direcly. Their values differ significanly depending on he inegraion ime and objec disance from he camera. Thus, in order o guaranee an accurae regisraion, we apply a sandardizaion sep similar o he one proposed in [17] prior o moion esimaion, see Figure 3. If he inensiy images are no available (e.g. using synheic daa) he 2D opical flow can be direcly esimaed using he deph images afer a preprocessing sep wih a bilaeral filer. The bilaeral filer is only used in he preprocessing sep while he original deph daa is mapped in he regisraion sep. We define he regisered deph image from ( 0 1) o 0 as z 0 0 1. Consequenly, he radial displacemen w 0 may be approximaed by he emporal difference beween deph values, i.e., w 0 z 0 (p 0 ) z 0 0 1 (p 0 ). (10) This firs approximaion of w 0 is an iniial value ha requires furher refinemen direcly accouning for he sysem noise. We propose o do ha using racking wih a Kalman filer as deailed in Secion 3.2. 3.2. Refinemen by Filering Le us sar by simplifying he noaion as z (p ) z. Since, by definiion, we have z 1 (p 1 ) = z 1, hen we may wrie z 1(p ) z 1. We consider he following sae vecor: ( ) z s =, (11) w where boh he deph measuremen and he radial displacemen are o be filered. To apply he Kalman filer, one needs (a) (c) (d) Figure 3. Correcing he ampliude images using a sandardizaion sep [17]. We can see in (a) and (b) he original ampliude images for a dynamic scene conaining a moving hand owards he camera where he inensiy (ampliude) values differ significanly depending on he objec disance from he camera. The correced ampliude images for he same scene are presened in (c) and (d), where he inensiy consisency is preserved. o inroduce a Gaussian sysem; so a noisy deph observaion may be modelled as (b) z = b s + n, (12) where he observaion vecor is b = (1, 0) T, and he observaion noise n is Gaussian wih he variance σn, 2 i.e., n N (0, σn). 2 We assume a consan velociy model wih an acceleraion γ following a Gaussian disribuion γ N (0, σa). 2 The dynamic model is hen defined as { z = z 1 + w 1 + 1 2 γ 2., (13) w = w 1 + γ. which can be rewrien as: s = Ks 1 + γ, (14) ( ( 1 1 ) where K = ), and γ 0 1 = γ 2 2 is he process error which is whie Gaussian wih he covariance ( Q = σa 2 2 2 ) /4 /2. (15) /2 1 Using he sandard Kalman equaions, he predicion is achieved as { ŝ 1 = Ks 1 1, ˆP 1 = KP 1 1 K T (16) + Q. The error in he predicion of ŝ 1 is correced using he observed measuremen z. This error is considered as he

difference beween he predicion and he observaion, and weighed using he Kalman gain marix G which is calculaed as follows: G = ˆP 1 b T ( b ˆP 1 b T + σ 2 n) 1. (17) ( ) z The correced sae vecor s = and correced w error covariance marix P are compued as follows: { ) s = ŝ 1 + G ( z bŝ 1, P = ˆP 1 G b ˆP (18) 1, This per pixel filering is exended o all he deph frame and incorporaed in he SR framework in Secion 4. 4. Proposed Recursive Deph Video Super- Resoluion In wha follows, we define a recursive muli frame super resoluion algorihm by incorporaing he Kalman filering framework of Secion 3.2 o he dynamic deph video SR problem. In addiion o handling radial moions, and in order o properly preserve non rigidiy, we propose o recursively filer each pixel rajecory separaely by assuming ha all rajecories are independen. This assumpion requires a correcive sep o bring back he correlaion beween neighbouring pixels from he original deph surface F. To ha end, we use a maximum a poseriori (MAP) esimaion where we propose a muli level ieraive bilaeral oal variaion (TV) regularizaion. The advanage of he processing per pixel is o keep he exac same formulaion as in Secion 3.2; hence, all he required marix inversions will be for (2 2) marices. The burden of radiional Kalman filer based SR as in [8] will consequenly be avoided. For a recursive muli frame SR algorihm, insead of using he whole video sequence of lengh N o recover one frame, we use he preceding recovered frame ˆf 1 o esimae f from he curren upsampled observaion g. Similarly o he UP SR algorihm, we esimae f in wo seps; firs, finding a blurred version ĥ as he resul of he Kalman filering, hen a deblurred version ˆf as he resul of he MAP ieraive regularizaion. 4.1. Blurred Esimaion To exend he range flow approximaion of Secion 3 o a full frame, he poin p is now considered as an elemen of a grid consiuing a discree sampling of R 2. We, hus, end up wih discree posiions p i = (x i, y i ) such ha i [1, n]. We define he deph image a as he column vecor of all he blurred deph values z (p i ), and wrie h = [z (p i )], i. The obained moion vecors are furher scaled using he SR facor r. The scaled moion vecors are hen used in order o regiser he deph images ˆf 1 and g, resuling in f 1. The regisraion sep reorders he pixels in order o have a correspondence ha enables a direc pixel wise filering over ime. Moreover, o apply he Kalman filer of Secion 3.2, one needs o define a Gaussian sysem similar o he one defined by (12) and (14). The observaion model in (12) is applicable o he SR daa model in (6) under he assumpion of a zero mean addiive whie Gaussian noise. The dynamic model in (14) is acually equivalen o he model in (3), and one can prove ha he innovaion is relaed o he deph displacemen w i 1 and acceleraion uncerainy γ i of he pixel p i by he following equaion: δ (i) = w i 1 + 1 2 γi ( ) 2. (19) The resul of he n join filers run in parallel is he blurred deph image esimae ĥ. Furhermore, in order o separae background from foreground deph pixels, and ackle he problem of flying pixels, especially around edges we define a fixed hreshold τ such ha: { Coninue he rack if z ẑ 1 < τ; New rack & spaial median if z ẑ 1 τ. The choice of he hreshold value τ is relaed o he ype of he used deph sensor and he level of he sensor specific noise. In order o correc he arifacs due o his one dimensional processing of an image, we propose a mulilevel ieraive bilaeral TV deblurring sep as described in he nex secion. 4.2. Muli Level Ieraive Bilaeral TV Deblurring In order o esimae he deblurred high resoluion deph image f from ĥ, we apply he following MAP deblurring framework: ( ) ˆf = argmin Bf ĥ 1 + λγ(f ), (20) f where λ is he regularizaion parameer, and B is he blurring marix. We choose o use a bilaeral TV regularizer [11] defined as: Γ(f ) = i=i j=j i= I j= J α i + j f S i xs j yf 1. (21) The marices S i x and S j y are shifing marices which shif f by i, and j pixels in he horizonal and verical direcions, respecively. The scalar α ]0, 1] is he base of he exponenial kernel which conrols he speed of decay [3]. In order o effecively deblur ĥ while keeping he deails of f wihou over smoohing, we apply he MAP esimaion in (20) where we propose o use a muli level version in a similar fashion as in [14, 19, 6]. Combined wih a

seepes descen numerical solver, he proposed soluion is described by he following pseudo code: for l = 1,, L for k = 1,, K ˆfk,l = ˆf { ) (k 1),l β B T sign (Bˆf (k 1),l h + λ i=i 2 l j=j j= J i= I )} α i + j [I S j y S i x ]sign (ˆf(k 1),l S i xs j yˆf (k 1),l end for h ˆf K,l end for The parameer β is a scalar which represens he sep size in he direcion of he gradien, and I is he ideniy marix and sign( ) is he sign funcion. In our experimens, we used hree levels wih L = 3, and seven ieraions per level wih K = 7. 5. Experimenal Resuls In his secion, we evaluae he performance of he proposed algorihm using: (i) synheic deph videos, and (ii) real deph videos of dynamic scenes capured by a ToF camera (pmd CamBoard nano). We show he effeciveness of he proposed algorihm as compared o sae of ar mehods where we provide quaniaive and qualiaive evaluaions. 5.1. Synheic Daa In order o provide a quaniaive evaluaion, we firs sar wih a simple and fully conrolled se up. We use a generaed sequence of 20 deph frames of a synheic hand moving radially wih respec o he camera (5 cm difference beween each wo successive frames, and = 0.1 seconds). We downsample he sequence wih a scale facor of r = 2, and r = 4. These sequences are furher degraded wih addiive noise wih σ varying from 10 o 80 mm. The creaed LR noisy deph sequences are hen super resolved using he proposed algorihm wih, r = 1, r = 2, and a scale facor of r = 4. In he simple case where r = 1, he SR resoluion problem is merely a denoising one. In oher words, he objecive is no o increase resoluion, and hence here is no blur due o upsampling. In conras, by increasing he SR facor r more blurring effecs occur leading o a higher 3D error in he final reconsruced HR scene Figure 4. In order o evaluae he qualiy of he filered deph daa and he filered velociy, we randomly choose one pixel p and rack is filered deph value z and is filered velociy z hrough he super-resolved sequence. We do he same for all SR facors. In Figure 5, we repor he racking resuls of he Figure 4. 3D RMSE in mm of he super resolved hand sequence using he proposed mehod wih differen SR scale facors. Increasing he SR facor leads o a higher 3D reconsrucion error. This is due o he blurring effecs of he upsampling process and he lower resoluion of he used LR deph sequence as compared o he one used wih r = 1. randomly chosen pixels from he super-resolved sequences wih r = 1, r = 2, and r = 4, and a fixed noise level of σ = 50 mm. We can see how he deph values are filered (blue lines) as compared o he noisy deph measuremens (red lines) for all scale facors as shown in Figure 5 (a), (b), and (c). Similar behaviour is observed for he corresponding filered velociies in Figure 5 (d), (e), and (f). 5.2. Publically Available Daa We esed he proposed mehod using a complex scene wih a highly non rigidly moving objec. We use he publicly available Samba [1] daa. This daase provides a real sequence of a full 3D dynamic dancing lady scene wih high resoluion ground ruh. This sequence is quie complex where i conains boh non rigid radial moions and self occlusions, represened by hands and leg movemens, respecively. We use he publicly available oolbox V-REP [2] o creae from he Samba daa a synheic deph sequence wih fully known ground ruh. We choose o fix a deph camera a a disance of 2 meers from he 3D scene. Is resoluion is 1024 2 pixels. The camera is used o capure he deph sequence. Then, similarly o he previous se-up, we downsample he obained deph sequence wih r = 4 and furher degrade i wih addiive noise wih sandard deviaion σ varying from 0 o 50 mm. The creaed LR noisy deph sequence is hen super resolved using sae of ar mehods, he convenional bicubic inerpolaion, UP-SR [4], SISR [5], and he proposed algorihm. To measure he accuracy of each mehod, we back projec he reconsruced HR deph images o he 3D world using he camera marix. Then, we calculae he 3D RMSE of each back projeced 3D poin cloud as compared o he 3D

Figure 5. Tracking resuls for differen deph values randomly chosen from he super-resolved sequences wih differen SR scale facors r = 1, r = 2, and r = 4, are ploed in (a), (b), and (c), respecively. The corresponding filered deph displacemens are shown in (d), (e), and (f), recepively. Bicubic SISR UP-SR Proposed σ = 25mm Hand Torso 10.5 7.5 9.0 5.6 22.2 15.6 9.6 3.6 Leg 8.9 8.4 9.3 7.5 Full body 8.8 6.6 15.9 6.3 Hand 25.2 14.1 29.7 9.9 σ = 50mm Torso Leg Full body 14.9 13.1 16.5 6.9 9.6 9.7 17.4 12.8 23.5 4.8 8.1 9.5 Table 1. 3D RMSE in mm for he super resolved dancing girl sequence using differen SR mehods. These mehods are applied on LR noisy deph sequences wih wo noise levels. The super resoluion scale facor for his experimen is r = 4. Figure 6. 3D Ploing of one super-resolved deph frame wih r = 4 using: (b) bicubic inerpolaion, (c) Pach based single image SR (SISR) [5], (d) UP-SR [4], (e) our new proposed algorihm. (a) 3D ploing of one LR deph frame. (f) 3D ground ruh. ground ruh. Table 1 shows he 3D reconsrucion error of he bicubic, UP-SR [4], and SISR [5] mehods as compared o he proposed mehod versus differen noise levels. The comparison is done a wo levels: (i) Differen pars of he reconsruced 3D body, namely, hand, orso, and he leg, and (ii) full reconsruced 3D body. As expeced, by applying he convenional bicubic inerpolaion mehod direcly on deph images, a large error is obained. This error is mainly due o he flying pixels around objec boundaries. Thus, we run anoher round of experimens using a modified bicubic

inerpolaion, where we remove all flying pixels by defining a fixed hreshold. Ye, he 3D reconsrucion error is sill relaively high across all noise levels, see Table 1. This is due o he fac ha bicubic inerpolaion does no profi from he emporal informaion provided by he sequence. We observe in Table 1 ha he proposed mehod provides, mos of he ime, beer resuls as compared o sae of ar algorihms. In order o visually evaluae he performance of he proposed algorihm, we plo he super resolved resuls of he dancing girl sequence in 3D. We show he resuls for he sequence a he noise level of σ = 30 mm. We noe ha he proposed algorihm ouperforms sae of ar mehods by keeping he fine deails (e.g. he deails of he face) as can be seen in Figure 6 (e). Noe ha he UP-SR algorihm fails in he presence of radial movemens and self occlusions, see red boxes in Figure 6 (d). In conras, he SISR algorihm can handle hese cases, bu canno keep he fine deails due o is pach based naure, see Figure 6 (c). In addiion, a heavy raining phase is required. 5.3. Real Daa Finally, we esed he proposed algorihm on a real sequence capured wih a ToF camera (pmd CamBoard Nano). The capured LR deph sequence conains a non rigidly moving face. Samples of he LR capured frames are ploed in he firs and second rows of Figure 7. We super resolve his sequence using he proposed algorihm wih an SR scale facor of r = 4. Obained resuls are given in 3D in he hird and fourh rows of Figure 7. The obained resuls show he effeciveness of he proposed algorihm in reducing he noise, and furher increasing he resoluion of he reconsruced 3D face under large non rigid deformaions. To visually appreciae hese resuls as compared o sae of ar mehods, we esed he bicubic, UP-SR, and SISR on he same LR real deph sequence. Obained resuls show he superioriy of he proposed algorihm as compared o oher mehods, see Figure 1. In Figure 8, we plo he filered deph value of a randomly chosen racked pixel. The blue line shows he filered rajecory of his pixel as compared o is row noisy measuremen in red. The algorihm s run ime on his sequence is 50 ms per frame on a 2.2 GHz i7 processor wih 4 Gigabye ram. 6. Conclusion A new real ime dynamic muli frame super-resoluion algorihm for deph videos has been proposed. I has been shown o be effecive in enhancing he resoluion and he qualiy of low resoluion dynamic scenes wih highly non rigidly moving objecs. Obained resuls show he robusness of he proposed algorihm agains radial moions. This is handled by firs esimaing he deph displacemen, and simulaneously correcing he deph measuremen by Kalman filering. For he sake of real ime processing, he proposed Figure 7. Resuls of applying he proposed algorihm on a real sequence capured by a LR ToF camera (120 160 pixels) of a non rigidly moving face. Firs and second rows conain a 3D ploing of seleced LR capured frames. Third and fourh rows conain he 3D ploing of he super-resolved deph frames wih r = 4. Figure 8. Filered deph value profile of a racked pixel hrough he super-resolved sequence of a real face, wih SR scale facor of 4. algorihm is based on per pixel emporal processing of he deph video sequence such ha muliple one dimensional signals are filered separaely. Each filered deph frame is furher refined using a muli level ieraive bilaeral oal variaion regularizaion afer filering and before proceeding o he nex frame in he sequence. In he case of selfocclusions, he proposed algorihm needs a few number of deph measuremens before converging, which is no suiable for some applicaions. Our fuure work will focus on increasing robusness o self occlusions.

References [1] hp://people.csail.mi.edu/drdaniel/mesh animaion/. 6 [2] hp://www.k-eam.com/mobile-roboics-producs/v-rep. 6 [3] K. Al Ismaeil, D. Aouada, B. Mirbach, and B. Oersen. Bilaeral filer evaluaion based on exponenial kernels. In Paern Recogniion (ICPR), 2012 20h IEEE Inernaional Conference on, pages 258 261, Nov 2012. 5 [4] K. Al Ismaeil, D. Aouada, B. Mirbach, and B. Oersen. Dynamic super resoluion of deph sequences wih non-rigid moions. In Image Processing (ICIP), 2013 20h IEEE Inernaional Conference on, pages 660 664, Sep 2013. 1, 2, 3, 6, 7 [5] O. M. Aodha, N. Campbell, A. Nair, and G. Brosow. Pach based synhesis for single deph image super-resoluion, 2012. 1, 6, 7 [6] M. Chares, M. Elad, and P. Milanfar. A general ieraive regularizaion framework for image denoising. In Informaion Sciences and Sysems, 2006 40h Annual Conference on, pages 452 457, March 2006. 5 [7] Y. Cui, S. Schuon, S. Thrun, D. Sricker, and C. Theobal. Algorihms for 3d shape scanning wih a deph camera. Paern Analysis and Machine Inelligence, IEEE Transacions on, 35(5):1039 1050, May 2013. 1 [8] M. Elad and A. Feuer. Super-resoluion reconsrucion of image sequences. Paern Analysis and Machine Inelligence, IEEE Transacions on, 21(9):817 834, Sep 1999. 2, 3, 5 [9] M. Elad and A. Feuer. Superresoluion resoraion of an image sequence: adapive filering approach. Image Processing, IEEE Transacions on, 8(3):387 395, Mar 1999. 2 [10] S. Farsiu, M. Elad, and P. Milanfar. Video-o-video dynamic super-resoluion for grayscale and color sequences. EURASIP J. Appl. Signal Process., 2006:232 232, Jan. 2006. 2 [11] S. Farsiu, M. Robinson, M. Elad, and P. Milanfar. Fas and robus muliframe super resoluion. Image Processing, IEEE Transacions on, 13(10):1327 1344, Oc 2004. 2, 5 [12] J. Li, Z. Lu, G. Zeng, R. Gan, and H. Zha. Similariyaware pachwork assembly for deph image super-resoluion. In Compuer Vision and Paern Recogniion (CVPR), 2014 IEEE Conference on, pages 3374 3381, June 2014. 1 [13] C. B. Newland, D. A. Gray, and D. Gibbins. Modified kalman filering for image super-resoluion: Experimenal convergence resuls. In Proceedings of he Ninh IASTED Inernaional Conference on Signal and Image Processing, SIP 07, pages 58 63, Anaheim, CA, USA, 2007. ACTA Press. 2 [14] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin. An ieraive regularizaion mehod for oal variaion-based image resoraion. Simul, 4:460 489, 2005. 5 [15] V. Paanaviji, S. Tae-O-So, and S. Jiapunkul. A robus ieraive super-resoluion reconsrucion of image sequences using a lorenzian bayesian approach wih fas affine blockbased regisraion. In Image Processing, 2007. ICIP 2007. IEEE Inernaional Conference on, volume 5, pages V 393 V 396, Sep 2007. 2 [16] S. Schuon, C. Theobal, J. Davis, and S. Thrun. Lidarboos: Deph superresoluion for of 3d shape scanning. In Compuer Vision and Paern Recogniion, 2009. CVPR 2009. IEEE Conference on, pages 343 350, June 2009. 1, 2 [17] M. Surmer, J. Penne, and J. Hornegger. Sandardizaion of inensiy-values acquired by ime-of-fligh-cameras. In Compuer Vision and Paern Recogniion, 2008. CVPRW 2008. IEEE Workshop on, pages 660 664, Sep 2013. 4 [18] J. Tian and K.-K. Ma. A new sae-space approach for superresoluion image sequence reconsrucion. In Image Processing, 2005. ICIP 2005. IEEE Inernaional Conference on, volume 1, pages I 881 4, Sep 2005. 2 [19] Q. L. Q. S. S. X. Wenshu Li1, Chao Zhao1. A parameeradapive ieraive regularizaion model for image denoising, 2012. 5 [20] M. Yamamoo, P. Boulanger, J.-A. Beraldin, and M. Rioux. Direc esimaion of range flow on deformable shape from a video rae range camera. Paern Analysis and Machine Inelligence, IEEE Transacions on, 15(1):82 89, Jan 1993. 3, 4