arxiv: v1 [cs.cv] 25 Apr 2017

Size: px
Start display at page:

Download "arxiv: v1 [cs.cv] 25 Apr 2017"

Transcription

1 Sudheendra Vijayanarasimhan Susanna Ricco arxiv: v1 [cs.cv] 25 Apr 2017 SfM-Ne: Learning of Srucure and Moion from Video Cordelia Schmid Esimaed deph, camera moion, objec moion and segmenaion Rahul Sukhankar Kaerina Fragkiadaki Absrac Flow Masks We propose SfM-Ne, a geomery-aware neural nework for moion esimaion in videos ha decomposes frameo-frame pixel moion in erms of scene and objec deph, camera moion and 3D objec roaions and ranslaions. Given a sequence of frames, SfM-Ne predics deph, segmenaion, camera and rigid objec moions, convers hose ino a dense frame-o-frame moion field (opical flow), differeniably warps frames in ime o mach pixels and back-propagaes. The model can be rained wih various degrees of supervision: 1) self-supervised by he reprojecion phoomeric error (compleely unsupervised), 2) supervised by ego-moion (camera moion), or 3) supervised by deph (e.g., as provided by RGBD sensors). SfMNe exracs meaningful deph esimaes and successfully esimaes frame-o-frame camera roaions and ranslaions. I ofen successfully segmens he moving objecs in he scene, even hough such supervision is never provided. Forward-backward consrains Deph Phoomeric error Phoomeric error Weigh sharing CNN Inpu frames Figure 1. SfM-Ne: Given a pair of frames as inpu, our model decomposes frame-o-frame pixel moion ino 3D scene deph, 3D camera roaion and ranslaion, a se of moion masks and corresponding 3D rigid roaions and ranslaions. I backprojecs he resuling 3D scene flow ino 2D opical flow and warps accordingly o mach pixels from one frame o he nex. Forwardbackward consisency checks consrain he esimaed deph. in he case of self-supervision. SfM-Ne can ake advanage of varying levels of supervision, as demonsraed in our experimens: compleely unsupervised (self-supervised), supervised by camera moion, or supervised by deph (from Kinec). SfM-Ne is inspired by works ha impose geomeric consrains on opical flow, exploiing rigidiy of he visual scene, such as early low-parameric opical flow mehods [2, 19, 23] or he so-called direc mehods for visual SLAM (Simulaneous Localizaion and Mapping) ha perform dense pixel maching from frame o frame while esimaing a camera rajecory and deph of he pixels in he scene [9, 26]. In conras o hose, insead of opimizing direcly over opical flow vecors, 3D poin coordinaes or camera roaion and ranslaion, our model opimizes over neural nework weighs ha, given a pair of frames, produce such 3D srucure and moion. In his way, our mehod learns o esimae srucure and moion, and can in principle improve as i processes more videos, in conras o non-learning based alernaives. I can hus be made robus o lack of exure, degenerae camera moion rajecories or 1. Inroducion We propose SfM-Ne, a neural nework ha is rained o exrac 3D srucure, ego-moion, segmenaion, objec roaions and ranslaions in an end-o-end fashion in videos, by exploiing he geomery of image formaion. Given a pair of frames and camera inrinsics, SfM-Ne, depiced in Figure 1, compues deph, 3D camera moion, a se of 3D roaions and ranslaions for he dynamic objecs in he scene, and corresponding pixel assignmen masks. Those in urn provide a geomerically meaningful moion field (opical flow) ha is used o differeniably warp each frame o he nex. Pixel maching across consecuive frames, consrained by forward-backward consisency on he compued moion and 3D srucure, provides gradiens during raining Google Research Grenoble, France Carnegie Mellon Universiy Inria, 1

2 dynamic objecs (our model explicily accouns for hose), by providing appropriae supervision. Our work is also inspired and builds upon recen works on learning geomerically inerpreable opical flow fields for poin cloud predicion in ime [5] and backpropagaing hrough camera projecion for 3D human pose esimaion [33] or single-view deph esimaion [11, 35]. In summary, our conribuions are: A mehod for self-supervised learning in videos inhe-wild, hrough explici modeling of he geomery of scene moion and image formaion. A deep nework ha predics pixel-wise deph from a single frame along wih camera moion, objec moion, and objec masks direcly from a pair of frames. Forward-backward consrains for learning a consisen 3D srucure from frame o frame and beer exploi self-supervision, exending lef-righ consisency consrains of [13]. We show resuls of our approach on KITTI [12, 21], MoSeg [4], and RGB-D SLAM [27] benchmarks under differen levels of supervision. SfM-Ne learns o predic srucure, objec, and camera moion by raining on realisic video sequences using limied ground-ruh annoaions. 2. Relaed work Back-propagaing hrough warps and camera projecion. Differeniable warping [16] has been used o learn end-o-end unsupervised opical flow [34], dispariy flow in a sereo rig [13] and video predicion [24]. The closes previous works o ours are SE3-Nes [5], 3D image inerpreer [33], and Garg e al. s deph CNN [11]. SE3-Nes [5] use an acuaion force from a robo and an inpu poin cloud o forecas a se of 3D rigid objec moions (roaion and ranslaions) and corresponding pixel moion assignmen masks under a saic camera assumpion. Our work uses similar represenaion of pixel moion masks and 3D moions o capure he dynamic objecs in he scene. However, our work differs in ha 1) we predic deph and camera moion while SE3-Nes operae on given poin clouds and assume no camera moion, 2) SE3-Nes are supervised wih pre-recorded 3D opical flow, while his work admis diverse and much weaker supervision, as well as complee lack of supervision, 3) SE3-Nes consider one frame and an acion as inpu o predic he fuure moion, while our model uses pairs of frames as inpu o esimae he inra-frame moion, and 4) SE3-Nes are applied o oy or lab-like seups whereas we show resuls on real videos. Wu e al. [33] learn 3D sparse landmark posiions of chairs and human body joins from a single image by compuing a simplified camera model and minimizing a camera re-projecion error of he landmark posiions. They use synheic daa o pre-rain he 2D o 3D mapping of heir nework. Our work considers dense srucure esimaion and uses videos o obain he necessary self-supervision, insead of saic images. Garg e al. [11] also predic deph from a single image, supervised by phoomeric error. However, hey do no infer camera moion or objec moion, insead requiring sereo pairs wih known baseline during raining. Concurren work o ours [35] removes he consrain ha he ground-ruh pose of he camera be known a raining ime, and insead esimaes he camera moion beween frames using anoher neural nework. Our approach ackles he more challenging problem of simulaneously esimaing boh camera and objec moion. Geomery-aware moion esimaion. Moion esimaion mehods ha exploi rigidiy of he video scene and he geomery of image formaion o impose consrains on opical flow fields have a long hisory in compuer vision [2, 3, 19]. Insead of non-parameric dense flow fields [14] researchers have proposed affine or projecive ransformaions ha beer exploi he low dimensionaliy of rigid objec moion [23]. When deph informaion is available, moions are rigid roaions and ranslaions [15]. Similarly, direc mehods for visual SLAM having RGB [26] or RGBD [17] video as inpu, perform dense pixel maching from frame o frame while esimaing a camera rajecory and deph of he pixels in he scene wih impressive 3D poin cloud reconsrucions. These works ypically make a saic world assumpion, which makes hem suscepible o he presence of moving objecs in he scene. Insead, SfM-Ne explicily accouns for moving objecs using moion masks and 3D ranslaion and roaion predicion. Learning-based moion esimaion. Recen works [7, 20, 29] propose learning frame-o-frame moion fields wih deep neural neworks supervised wih ground-ruh moion obained from simulaion or synheic movies. This enables efficien moion esimaion ha learns o deal wih lack of exure using raining examples raher han relying only on smoohness consrains of he moion field, as previous opimizaion mehods [28]. Insead of direcly opimizing over unknown moion parameers, such approaches opimize neural nework weighs ha allow moion predicion in he presence of ambiguiies in he given pair of frames. Unsupervised learning in videos. Video holds a grea poenial owards learning semanically meaningful visual represenaions under weak supervision. Recen works have explored his direcion by using videos o propagae in ime semanic labels using moion consrains [25], impose emporal coherence (slowness) on he learn visual feaure [32], predic emporal evoluion [30], learn emporal insance 2

3 512 Camera Moion Transformed Poin Cloud Pair of Frames 512 Objec Moion Flow 384x128x6 Transformed Poin Cloud Single Frame MOTION NETWORK Objec Masks s = 1 s = 2 STRUCTURE NETWORK deconv Poin Cloud 384x128x x x32 48x x8 12x4 24x8 48x x32 192x64 384x Deph Figure 2. SfM-Ne archiecure. For each pair of consecuive frames I, I +1, a conv/deconv sub-nework predics deph d while anoher predics a se of K segmenaion masks m. The coarses feaure maps of he moion-mask encoder are furher decoded hrough fully conneced layers owards 3D roaions and ranslaions for he camera and he K segmenaions. The prediced deph is convered ino a per frame poin-cloud using esimaed or known camera inrinsics. Then, i is ransformed according o he prediced 3D scene flow, as composed by he 3D camera moion and independen 3D mask moions. Transformed 3D deph is projeced back o he 2D nex frame, and hus provides corresponding 2D opical flow fields. Differeniable backward warping maps frame I +1 o I, and gradiens are compued based on pixel errors. Forward-backward consrains are imposed by repeaing his process for he invered frame pair I +1, I and consraining he dephs d and d +1 o be consisen hrough he esimaed scene moion. level associaions [31], predic emporal ordering of video frames [22], ec. Mos of hose unsupervised mehods are shown o be good pre-raining mechanisms for objec deecion or classificaion, as done in [22, 30, 31]. In conras and complemenary o he works above, our model exracs fine-grained 3D srucure and 3D moion from monocular videos wih weak supervision, insead of semanic feaure represenaions. 3. Learning SfM 3.1. SfM-Ne archiecure. Our model is shown in Figure 2. Given frames I, I +1 R w h, we predic frame deph d [0, ) w h, camera roaion and ranslaion {R, c c } SE3, and a se of K moion masks m k [0, 1] w h, k 1,..., K ha denoe membership of each pixel o K corresponding rigid objec moions {R k, k } SE3, k {1,..., K}. Noe ha a pixel may be assigned o none of he moion masks, denoing ha i is a background pixel and par of he saic world. Using he above esimaes, opical flow is compued by firs generaing he 3D poin cloud corresponding o he image pixels using he deph map and camera inrinsics, ransforming he poin cloud based on camera and objec rigid ransformaions, and back projecing he ransformed 3D coordinaes o he image plane. Then, given he opical flow field beween iniial and projeced pixel coordinaes, differeniable backward warping is used o map frame I +1 o I. Forward-backward consrains are imposed by repeaing his process from frame I +1 o I and consraining he dephs d and d +1 o be consisen hrough he esimaed scene moion. We provide deails of each of hese componens below. Deph and per-frame poin clouds. We compue per frame deph using a sandard conv/deconv subnework operaing on a single frame (he srucure nework in Figure 2). We use a RELU acivaion a our final layer, since deph values are non-negaive. Given deph d, we obain he 3D poin cloud X i = (X, i Y i, Z), i i 1,..., w h corresponding o he pixels in he scene using a pinhole camera model. Le (x i, y) i be he column and row posiions of he i h pixel in frame I and le (c x, c y, f) be he camera inrinsics, hen 3

4 X i x i X i = Y i = di w cx y i Z i f h c y (1) f where d i denoes he deph value of he ih pixel. We use he camera inrinsics when available and rever o defaul values of (0.5, 0.5, 1.0) oherwise. Therefore, he prediced deph will only be correc up o a scalar muliplier. Scene moion. We compue he moion of he camera and of independenly moving objecs in he scene using a conv/deconv subnework ha operaes on a pair of images (he moion nework in Figure 2). We deph-concaenae he pair of frames and use a series of convoluional layers o produce an embedding layer. We use wo fully-conneced layers o predic he moion of he camera beween he frames and a predefined number K of rigid body moions ha explain moving objecs in he scene. Le {R, c c } SE3 denoe he 3D roaion and ranslaion of he camera from frame I o frame I +1 (relaive camera pose across consecuive frames). We represen R c using an Euler angle represenaion as R cx (α)r cy (β)r cz (γ) where cos α sin α 0 R cx (α) = sin α cos α 0, cos β 0 sin β R cy (β) = 0 1 0, sin β 0 cos β R cz (γ) = cos γ sin γ, 0 sin γ cos γ and α, β, γ are he angles of roaion abou he x, y, z-axes respecively. The fully-conneced layers are used o predic ranslaion parameers c, he pivo poins of he camera roaion p c R 3 as in [5], and sin α, sin β, sin γ. These las hree parameers are consrained o be in he inerval [ 1, 1] by using RELU acivaion and he minimum funcion. Le {R k, k } SE3, k {1,..., K} denoe he 3D rigid moions of up o K objecs in he scene. We use similar represenaions as for camera moion and predic parameers using fully-conneced layers on op of he same embedding E. While camera moion is a global ransformaion applied o all he pixels in he scene, he objec moion ransforms are weighed by he prediced membership probabiliy of each pixel o each rigid moion, m k [0, 1] (h w), k {1,..., K}. These masks are produced by feeding he embedding layer hrough a deconvoluional ower. We use sigmoid acivaions a he las layer insead of sofmax in order o allow each pixel o belong o any number of rigid body moions. When a pixel has zero acivaion across all K maps i is assigned o he saic background whose moion is a funcion of he global camera moion alone. We allow a pixel o belong o muliple rigid body ransforms in order o capure composiion of moions, e.g., hrough kinemaic chains, such as ariculaed bodies. Learning he required number of moions for a sequence is an ineresing open problem. We found ha we could fix K = 3 for all experimens presened here. Noe ha our mehod can learn o ignore unnecessary objec moions in a sequence by assigning no pixels o he corresponding mask. Opical flow. We obain opical flow by firs ransforming he poin cloud obained in Equaion 1 using he camera and objec moion rigid body ransformaions followed by projecing he 3D poin on o he image plane using he camera inrinsics. In he following, we drop he pixel superscrip i from he 3D coordinaes, since i is clear we are referring o he moion ransformaion of he ih pixel of he h frame. We firs apply he objec ransformaions: X = X + K k=1 mk (i)(r k (X p k ) + k X ). We hen apply he camera ransformaion: X = R(X c p c ) + c. Finally we obain he row and column posiion of he pixel in he second frame (x i +1, y+1) i by projecing he corresponding 3D poin X = (X, Y, Z ) back o he image plane as follows: [ x i +1 w y i +1 h ] = f Z X Y f + [ cx c y ] The flow U, V beween he wo frames a pixel i is hen (U (i), V (i)) = (x i +1 x i, y i +1 y i ) Supervision SfM-Ne invers he image formaion and exracs deph, camera and objec moions ha gave rise o he observed emporal differences, similar o previous SfM works [1, 6]. Such inverse problems are ill-posed as many soluions of deph, camera and objec moion can give rise o he same observed frame-o-frame pixel values. A learning-based soluion, as opposed o direc opimizaion, has he advanage of learning o handle such ambiguiies hrough parial supervision of heir weighs or appropriae pre-raining, or simply because he same coefficiens (nework weighs) need o explain a large abundance of video daa consisenly. We deail he various supervision modes below and explore a subse of hem in he experimenal secion. Self-Supervision. Given unconsrained video, wihou accompanying ground-ruh srucure or moion informaion, our model is rained o minimize he phoomeric error 4

5 beween he firs frame and he second frame warped owards he firs according o he prediced moion field, based on well-known brighness consancy assumpions [14]: L color = 1 w h I (x, y) I +1 (x, y ) 1 where x = x + U (x, y) and y = y + V (x, y). We use differeniable image warping proposed in he spaial ransformer work [16] and compue color consancy loss in a fully differeniable manner. Supervising camera moion. If ground-ruh camera pose rajecories are available, we can supervise our model by compuing corresponding ground-ruh camera roaion and ranslaion R c GT, c GT from frame o frame, and consrain our camera moion predicions accordingly. Specifically, we compue he relaive ransformaion beween prediced and ground-ruh camera moion { err = inv(r)( c c GT c ), R err = inv(r)r c c GT )} and minimize is roaion angle and ranslaion norm [27]: L crans L cro = err 2 = arccos ( ( min 1, max ( 1, race(rerr ) 1 2 ))) (2) Spaial smoohness priors. When our nework is selfsupervised, we add robus spaial smoohness penalies on he opical flow field, he deph, and he inferred moion maps, by penalizing he L1 norm of he gradiens across adjacen pixels, as usually done in previous works [18]. For deph predicion, we penalize he norm of second order gradiens in order o encourage no consan bu raher smoohly changing deph values. Forward-backward consisency consrains. We incorporae forward-backward consisency consrains beween inferred scene deph in differen frames as follows. Given inferred deph d from frame pair I, I +1 and d +1 from frame pair I +1, I, we ask for hose o be consisen under he inferred scene moion, ha is: L F B = 1 w h (d (x, y) + W (x, y)) d +1 (x + U (x, y), y + V (x, y)) where W (x, y) is he Z componen of he scene flow obained from he poin cloud ransformaion. Composing scene flow forward and backward across consecuive frames allows us o impose such forward-backward consisency cycles across more han one frame gaps, however, we have no ye seen empirical gain from doing so. Supervising deph. If deph is available on pars of he inpu image, such as wih video sequences capured by a Kinec sensor, we can use deph supervision in he form of robus deph regression: L deph = 1 w h dmask GT (x, y) d (x, y) d GT (x, y) 1, where dmask GT denoes a binary image ha signals presence of ground-ruh deph. Supervising opical flow and objec moion. Groundruh opical flow, objec masks, or objec moions require expensive human annoaion on real videos. However, hese signals are available in recen synheic daases [20]. In such cases, our model could be rained o minimize, for example, an L1 regression loss beween prediced {U(x, y), V (x, y)} and ground-ruh {U GT (x, y), V GT (x, y)} flow vecors Implemenaion deails Our deph-predicing srucure and objec-maskpredicing moion conv/deconv neworks share similar archiecures bu use independen weighs. Each consis of a series of 3 3 convoluional layers alernaing beween sride 1 and sride 2 followed by deconvoluional operaions consising of a deph-o-space upsampling, concaenaion wih corresponding feaure maps from he convoluional porion, and a 3 3 convoluional layer. Bach normalizaion is applied o all convoluional layer oupus. The srucure nework akes a single frame as inpu, while he moion nework akes a pair of frames. We predic deph values using a 1 1 convoluional layer on op of he image-sized feaure map. We use RELU acivaions because dephs are posiive and a bias of 1 o preven small deph values. The maximum prediced deph value is furher clipped a 100 o preven large gradiens. We predic objec masks from he image-sized feaure map of he moion nework using a 1 1 convoluional layer wih sigmoid acivaions. To encourage sharp masks we muliply he logis of he masks by a parameer ha is a funcion of he number of sep for which he nework has been rained. The pivo variables are prediced as hea maps using a sofmax funcion over all he locaions in he image followed by a weighed average of he pixel locaions. 4. Experimenal resuls The main conribuion of SfM-Ne is he abiliy o explicily model boh camera and objec moion in a sequence, allowing us o rain on unresriced videos conaining moving 5

6 objecs. To demonsrae his, we rained self-supervised neworks (using zero ground-ruh supervision) on he KITTI daases [12, 21] and on he MoSeg daase [4]. KITTI conains pairs of frames capured from a moving vehicle in which oher independenly moving vehicles are visible. MoSeg conains sequences wih challenging objec moion, including ariculaed moions from moving people and animals. KITTI. Our firs experimen validaes ha explicily modeling objec moion is necessary o effecively learn from unconsrained videos. We evaluae unsupervised deph predicion using our models on he KITTI 2012 and KITTI 2015 daases which conain close o 200 frame sequence and sereo pairs. We use a scale-invarian error meric (log RMSE) proposed in [8] due o he global scale ambiguiiy in monocular seups which is defined as ( ) 2 E scaleinv = 1 N d(x, 1 y) 2 N d(x, y) 1, where N is he number of pixels and d = (log(d) log(d GT )) denoes he difference beween he log of ground-ruh and prediced deph maps. We pre-rain he our unsupervised deph predicion models using adjacen frame pairs on he raw KITTI daase which conains 42, 000 frames and rain and evaluae on KITTI 2012 and 2015 which have deph ground ruh. We compare he he resuls of Garg e al. [11] who use sereo pairs o esimae deph. Their approach assumes he camera pose beween he frames is a known consan (sereo baseline) and opimize he phoomeric error in order o esimae he deph. In conras, our model considers a more challenging in he wild seing where we are only given sequences of frames from a video and camera pose, deph and objec moion are all esimaed wihou any form of supervision. Garg e al. repor a log RMSE of on a subse of he KITTI daase. To compare wih our approach on he full se we emulae he model of Garg e al. using our archiecure by removing objec masks from our nework and using sereo pairs wih phoomeric error. We also evaluae our full model on frame sequence pairs wih camera moion esimaion boh wih and wihou explici objec moion esimaion. Table 1 shows he log RMSE error beween he groundruh deph and he hree approaches. When using sereo pairs we obain a value of 0.31 which is on par wih exising resuls on he KITTI benchmark (see [11]). When using frame sequence pairs insead of calibraed sereo pairs he problem becomes more difficul, as we mus now infer he unknown camera and objec moion beween he wo frames. As expeced, he deph esimaes learned in his scenario are less accurae, bu performance is much worse Approach Log RMSE KITTI 2012 KITTI 2015 wih sereo pairs seq. wih moion masks seq. wihou moion masks Table 1. RMSE of Log deph wih respec o ground ruh for our model wih sereo pairs and wih and wihou moion masks on sequences in KITTI 2012 and 2015 daases. When using sereo pairs he camera pose beween he frames is fixed and he model is equivalen o he approach of Garg e al. [11]. Moion masks help improve he error on boh daases bu more so on he KITTI 2015 daase which conains more moving objecs. RGB frame Prediced Deph (sereo pairs) (sequence) Figure 3. Qualiaive comparison of he esimaed deph using our unsupervised model on sequences versus using sereo pairs in he KITTI 2012 benchmark. When using sereo pairs he camera pose beween he pair is consan and hence he model is equivalen o he approach of Garg e al. [11]. For sequences, our model needs o addiionally predic camera roaion and ranslaion beween he wo frames. The firs six rows show successful predicions even wihou camera pose informaion and he las wo illusrae failure cases. The failure cases show ha when here is no ranslaion beween he wo frames deph esimaion fails whereas when using sereo pairs here is always a consan offse beween he frames. when no moion masks are used. The gap beween he wo approaches is wider on he KITTI 2015 daase which conains more moving objecs. This shows ha i is imporan o accoun for moving objecs when raining on videos in he wild. Figure 3 shows qualiaive examples comparing he deph obained when using sereo pairs wih a fixed baseline and when using frame sequences wihou camera pose informaion. When here is large ranslaion beween he frames, deph esimaion wihou camera pose informaion is as good as using sereo pairs. The failure cases in he las wo rows show ha he nework did no learn o accuraely 6

7 Prediced Moion Masks Ground Truh Mask Prediced Flow Ground Truh Flow Figure 4. Ground ruh segmenaion and flow compared o prediced moion masks and flow from SfM-Ne in KITTI The model was rained in a fully unsupervised manner. The op six rows show successful predicion and he las wo show ypical failure cases. predic deph for scenes where i saw lile or no ranslaion beween he frames during raining. This is no he case when using sereo pairs as here is always a consan offse beween he frames. Using more daa could help here because i increases he likelihood of generic scenes appearing in a sequence conaining ineresing camera moion. Analysis of our failure cases sugges possible direcions for improvemen. Moving objecs inroduce significan occlusions, which should be handled carefully. Because our nework has no direc supervision on objec masks or objec moion, i does no necessarily learn ha objec and camera moions should be differen. These priors could be buil ino our loss or learned direcly if some ground-ruh masks or objec moions are provided as explici supervision. Figure 4 provides qualiaive examples of he prediced moion masks and flow fields along wih he ground-ruh in he KITTI 2015 daase. Ofen, he prediced moion masks are fairly close o he ground ruh and help explain par of he moion in he scene. We noice ha objec masks ended o miss very small, disan moving objecs. This may be due o he fac ha hese objecs and heir moions are oo small o be separaed from he background. The boom wo rows show cases where he prediced masks do no correspond o moving objecs. In he firs example, alhough he mask is no semanically meaningful, noe ha he esimaed flow field is reasonable, wih some misakes in he region occluded by he moving car. In he second failure case, he moving car on he lef is compleely missed bu he moion of he saic background is well capured. This is a paricularly difficul example for he self-supervised phoomeric loss because he moving objec appears in heavy shadow. MoSeg. The moving objecs in KITTI are primarily vehicles, which undergo rigid-body ransformaions, making i a good mach for our model. To verify ha our nework can sill learn in he presence of non-rigid moion, we rerained i from scrach under self-supervision on he MoSeg daase, using frames from all sequences. Because each moion mask corresponds o a rigid 3D roaion and ranslaion, we do no expec a single moion mask o capure a deformable objec. Insead, differen rigidly moving objec pars will be assigned o differen masks. This is no a problem from he perspecive of accurae camera moion esimaion, where he imporan issue is disinguishing pixels whose moion is caused by he camera pose ransformaion direcly from hose whose moion is affeced by indepen7

8 RGB frame Prediced flow Moion masks Seq. ransl [27] ro [27] ransl. ours ro ours plan eddy desk desk Table 2. Camera pose relaive error from frame o frame for various video sequences of Freiburg RGBD-SLAM benchmark. he nex and compare wih he prediced camera moion from our model, by measuring ranslaion and roaion error of heir relaive ransformaion, as done in he corresponding evaluaion scrip for relaive camera pose error and deailed in Eq. 2. We repor camera roaion and ranslaion error in Table 2 for each of he Freiburg1 sequences compared o he error in he benchmark s baseline rajecories. Our model was rained from scrach for each sequence and used he focal lengh value provided wih he daase. We observe ha our resuls beer esimae he frame-o-frame ranslaion and are comparable for roaion. Figure 5. Moion segmens compued from SfM-Ne in MoSeg [4]. The model was rained in a fully unsupervised manner. den objec moions in he scene. Qualiaive resuls on sampled frames from he daase are shown in Fig. 5. Because MoSeg only conains groundruh annoaions for segmenaion, we canno quaniaively evaluae he esimaed deph, camera rajecories, or opical flow fields. However, we did evaluae he qualiy of he objec moion masks by compuing Inersecion over Union (IoU) for each ground-ruh segmenaion mask agains he bes maching moion mask and is complemen (a oal of six proposed segmens in each frame, wo from each of he hree moion masks), averaging across frames and groundruh objecs. We obain an IoU of 0.29 which is similar o previous unsupervised approaches for he small number of segmenaion proposals we use per frame. See, for example, he las column of Figure 5 from [10], whose proposed mehods for moving objec proposals achieve IoU around 0.3 wih four proposals. They require more han 800 proposals o reach an IoU above Kinec deph supervision. While he fully unsupervised resuls show promise, our nework can benefi from exra supervision of deph or camera moion when available. The improved deph predicion given ground ruh camera poses on KITTI sereo demonsrae some gain. We also experimened wih adding deph supervision o improve camera moion esimaion using he RGB-D SLAM daase [27]. Given ground-ruh camera pose rajecories, we esimaed relaive camera pose (camera moion) from each frame o 5. Conclusion Curren geomeric SLAM mehods obain excellen egomoion and rigid 3D reconsrucion resuls, bu ofen come a a price of exensive engineering, low olerance o moving objecs which are reaed as noise during reconsrucion and sensiiviy o camera calibraion. Furhermore, maching and reconsrucion are difficul in low exured regions. Incorporaing learning ino deph reconsrucion, camera moion predicion and objec segmenaion, while sill preserving he consrains of image formaion, is a promising way o robusify SLAM and visual odomery even furher. However, he exac raining scenario required o solve his more difficul inference problem remains an open quesion. Exploiing long hisory and far in ime forward-backward consrains wih visibiliy reasoning is an imporan fuure direcion. Furher, exploiing a small amoun of annoaed videos for objec segmenaion, deph, and camera moion, and combining hose wih an abundance of self-supervised videos, could help iniialize he nework weighs in he righ regime and faciliae learning. Many oher curriculum learning regimes, including hose ha incorporae synheic daases, can also be considered. Acknowledgemens. We hank our colleagues Tinghui Zhou, Mahew Brown, Noah Snavely, and David Lowe for heir advice and Bryan Seybold for his work generaing synheic daases for our iniial experimens. 8

9 References [1] I. Akher, Y. A. Sheikh, S. Khan, and T. Kanade. Nonrigid srucure from moion in rajecory space. In NIPS, [2] J. R. Bergen, P. Anandan, K. J. Hanna, and R. Hingorani. Hierarchical model-based moion esimaion. In ECCV, [3] M. Black, Y. Yacoob, A. Jepson, and D. Flee. Learning parameerized models of image moion. In CVPR, [4] T. Brox and J. Malik. Objec segmenaion by long erm analysis of poin rajecories. In ECCV [5] A. Byravan and D. Fox. SE3-Nes: Learning rigid body moion using deep neural neworks. CoRR, abs/ , [6] J. Coseira and T. Kanade. A muli-body facorizaion mehod for moion analysis. ICCV, [7] A. Dosoviskiy, P. Fischer, E. Ilg, P. Häusser, C. Hazirbas, V. Golkov, P. Smag, D. Cremers, and T. Brox. FlowNe: Learning opical flow wih convoluional neworks. In ICCV, [8] D. Eigen, C. Puhrsch, and R. Fergus. Deph map predicion from a single image using a muli-scale deep nework. In NIPS, [9] J. Engel, T. Schöps, and D. Cremers. LSD-SLAM: Largescale direc monocular SLAM. In ECCV, [10] K. Fragkiadaki, P. A. Arbeláez, P. Felsen, and J. Malik. Spaio-emporal moving objec proposals. CoRR, abs/ , [11] R. Garg, B. V. Kumar, G. Carneiro, and I. Reid. Unsupervised cnn for single view deph esimaion: Geomery o he rescue. In ECCV, [12] A. Geiger, P. Lenz, and R. Urasun. Are we ready for auonomous driving? The KITTI vision benchmark suie. In CVPR, [13] C. Godard, O. Mac Aodha, and G. J. Brosow. Unsupervised monocular deph esimaion wih lef-righ consisency. CoRR, abs/ , [14] B. K. Horn and B. G. Schunck. Deermining opical flow. Arificial Inelligence, 17, [15] M. Hornacek, A. Fizgibbon, and C. Roher. SphereFlow: 6 DoF scene flow from RGB-D pairs. In CVPR, [16] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu. Spaial ransformer neworks. In NIPS, [17] C. Kerl, J. Surm, and D. Cremers. Dense visual SLAM for RGB-D cameras. In IROS, [18] N. Kong and M. J. Black. Inrinsic deph: Improving deph ransfer wih inrinsic images. In ICCV, [19] L. Z. Manor and M. Irani. Muli-Frame Esimaion of Planar Moion. PAMI, 22(10): , [20] N. Mayer, E. Ilg, P. Häusser, P. Fischer, D. Cremers, A. Dosoviskiy, and T. Brox. A large daase o rain convoluional neworks for dispariy, opical flow, and scene flow esimaion. In CVPR, [21] M. Menze and A. Geiger. Objec scene flow for auonomous vehicles. In CVPR, [22] I. Misra, C. L. Zinick, and M. Heber. Unsupervised learning using sequenial verificaion for acion recogniion. In ECCV, [23] T. Nir, A. Brucksein, and R. Kimmel. Over-Parameerized Variaional Opical Flow. IJCV, 76(2): , [24] V. Paraucean, A. Handa, and R. Cipolla. Spaio-emporal video auoencoder wih differeniable memory. CoRR, abs/ , [25] A. Pres, C. Leisner, J. Civera, C. Schmid, and V. Ferrari. Learning objec class deecors from weakly annoaed video. In CVPR, [26] T. Schöps, J. Engel, and D. Cremers. Semi-dense visual odomery for AR on a smarphone. In ISMAR, [27] J. Surm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers. A benchmark for he evaluaion of RGB-D SLAM sysems. In IROS, [28] D. Sun, S. Roh, and M. J. Black. Secres of opical flow esimaion and heir principles. In CVPR, [29] J. Thewlis, S. Zheng, P. H. Torr, and A. Vedaldi. Fullyrainable deep maching. In BMVC, [30] J. Walker, C. Doersch, A. Gupa, and M. Heber. An uncerain fuure: Forecasing from saic images using variaional auoencoders. In ECCV, [31] X. Wang and A. Gupa. Unsupervised learning of visual represenaions using videos. In ICCV, [32] L. Wisko and T. J. Sejnowski. Slow feaure analysis: Unsupervised learning of invariances. Neural Compu., 14(4): , [33] J. Wu, T. Xue, J. J. Lim, Y. Tian, J. B. Tenenbaum, A. Torralba, and W. T. Freeman. Single image 3D inerpreer nework. In ECCV, [34] J. J. Yu, A. W. Harley, and K. G. Derpanis. Back o basics: Unsupervised learning of opical flow via brighness consancy and moion smoohness. In ECCV, [35] T. Zhou, M. Brown, N. Snavely, and D. Lowe. Unsupervised learning of deph and ego-moion from video. In CVPR

CAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL

CAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL CAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL Klečka Jan Docoral Degree Programme (1), FEEC BUT E-mail: xkleck01@sud.feec.vubr.cz Supervised by: Horák Karel E-mail: horak@feec.vubr.cz

More information

DAGM 2011 Tutorial on Convex Optimization for Computer Vision

DAGM 2011 Tutorial on Convex Optimization for Computer Vision DAGM 2011 Tuorial on Convex Opimizaion for Compuer Vision Par 3: Convex Soluions for Sereo and Opical Flow Daniel Cremers Compuer Vision Group Technical Universiy of Munich Graz Universiy of Technology

More information

Visual Perception as Bayesian Inference. David J Fleet. University of Toronto

Visual Perception as Bayesian Inference. David J Fleet. University of Toronto Visual Percepion as Bayesian Inference David J Flee Universiy of Torono Basic rules of probabiliy sum rule (for muually exclusive a ): produc rule (condiioning): independence (def n ): Bayes rule: marginalizaion:

More information

STEREO PLANE MATCHING TECHNIQUE

STEREO PLANE MATCHING TECHNIQUE STEREO PLANE MATCHING TECHNIQUE Commission III KEY WORDS: Sereo Maching, Surface Modeling, Projecive Transformaion, Homography ABSTRACT: This paper presens a new ype of sereo maching algorihm called Sereo

More information

CENG 477 Introduction to Computer Graphics. Modeling Transformations

CENG 477 Introduction to Computer Graphics. Modeling Transformations CENG 477 Inroducion o Compuer Graphics Modeling Transformaions Modeling Transformaions Model coordinaes o World coordinaes: Model coordinaes: All shapes wih heir local coordinaes and sies. world World

More information

Probabilistic Detection and Tracking of Motion Discontinuities

Probabilistic Detection and Tracking of Motion Discontinuities Probabilisic Deecion and Tracking of Moion Disconinuiies Michael J. Black David J. Flee Xerox Palo Alo Research Cener 3333 Coyoe Hill Road Palo Alo, CA 94304 fblack,fleeg@parc.xerox.com hp://www.parc.xerox.com/fblack,fleeg/

More information

4.1 3D GEOMETRIC TRANSFORMATIONS

4.1 3D GEOMETRIC TRANSFORMATIONS MODULE IV MCA - 3 COMPUTER GRAPHICS ADMN 29- Dep. of Compuer Science And Applicaions, SJCET, Palai 94 4. 3D GEOMETRIC TRANSFORMATIONS Mehods for geomeric ransformaions and objec modeling in hree dimensions

More information

Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report)

Implementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report) Implemening Ray Casing in Terahedral Meshes wih Programmable Graphics Hardware (Technical Repor) Marin Kraus, Thomas Erl March 28, 2002 1 Inroducion Alhough cell-projecion, e.g., [3, 2], and resampling,

More information

Scale Recovery for Monocular Visual Odometry Using Depth Estimated with Deep Convolutional Neural Fields

Scale Recovery for Monocular Visual Odometry Using Depth Estimated with Deep Convolutional Neural Fields Scale Recovery for Monocular Visual Odomery Using Deph Esimaed wih Deep Convoluional Neural Fields Xiaochuan Yin, Xiangwei Wang, Xiaoguo Du, Qijun Chen Tongji Universiy yinxiaochuan@homail.com,wangxiangwei.cpp@gmail.com,

More information

A Matching Algorithm for Content-Based Image Retrieval

A Matching Algorithm for Content-Based Image Retrieval A Maching Algorihm for Conen-Based Image Rerieval Sue J. Cho Deparmen of Compuer Science Seoul Naional Universiy Seoul, Korea Absrac Conen-based image rerieval sysem rerieves an image from a daabase using

More information

Learning in Games via Opponent Strategy Estimation and Policy Search

Learning in Games via Opponent Strategy Estimation and Policy Search Learning in Games via Opponen Sraegy Esimaion and Policy Search Yavar Naddaf Deparmen of Compuer Science Universiy of Briish Columbia Vancouver, BC yavar@naddaf.name Nando de Freias (Supervisor) Deparmen

More information

EECS 487: Interactive Computer Graphics

EECS 487: Interactive Computer Graphics EECS 487: Ineracive Compuer Graphics Lecure 7: B-splines curves Raional Bézier and NURBS Cubic Splines A represenaion of cubic spline consiss of: four conrol poins (why four?) hese are compleely user specified

More information

4. Minimax and planning problems

4. Minimax and planning problems CS/ECE/ISyE 524 Inroducion o Opimizaion Spring 2017 18 4. Minima and planning problems ˆ Opimizing piecewise linear funcions ˆ Minima problems ˆ Eample: Chebyshev cener ˆ Muli-period planning problems

More information

MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES

MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES B. MARCOTEGUI and F. MEYER Ecole des Mines de Paris, Cenre de Morphologie Mahémaique, 35, rue Sain-Honoré, F 77305 Fonainebleau Cedex, France Absrac. In image

More information

Stereoscopic Neural Style Transfer

Stereoscopic Neural Style Transfer Sereoscopic Neural Syle Transfer Dongdong Chen 1 Lu Yuan 2, Jing Liao 2, Nenghai Yu 1, Gang Hua 2 1 Universiy of Science and Technology of China 2 Microsof Research cd722522@mail.usc.edu.cn, {luyuan,jliao}@microsof.com,

More information

Dynamic Depth Recovery from Multiple Synchronized Video Streams 1

Dynamic Depth Recovery from Multiple Synchronized Video Streams 1 Dynamic Deph Recoery from Muliple ynchronized Video reams Hai ao, Harpree. awhney, and Rakesh Kumar Deparmen of Compuer Engineering arnoff Corporaion Uniersiy of California a ana Cruz Washingon Road ana

More information

Image segmentation. Motivation. Objective. Definitions. A classification of segmentation techniques. Assumptions for thresholding

Image segmentation. Motivation. Objective. Definitions. A classification of segmentation techniques. Assumptions for thresholding Moivaion Image segmenaion Which pixels belong o he same objec in an image/video sequence? (spaial segmenaion) Which frames belong o he same video sho? (emporal segmenaion) Which frames belong o he same

More information

Real-Time Non-Rigid Multi-Frame Depth Video Super-Resolution

Real-Time Non-Rigid Multi-Frame Depth Video Super-Resolution Real-Time Non-Rigid Muli-Frame Deph Video Super-Resoluion Kassem Al Ismaeil 1, Djamila Aouada 1, Thomas Solignac 2, Bruno Mirbach 2, Björn Oersen 1 1 Inerdisciplinary Cenre for Securiy, Reliabiliy, and

More information

arxiv: v2 [cs.cv] 20 May 2018

arxiv: v2 [cs.cv] 20 May 2018 Sereoscopic Neural Syle Transfer Dongdong Chen 1 Lu Yuan 2, Jing Liao 2, Nenghai Yu 1, Gang Hua 2 1 Universiy of Science and Technology of China 2 Microsof Research cd722522@mail.usc.edu.cn, {jliao, luyuan,

More information

Visual Indoor Localization with a Floor-Plan Map

Visual Indoor Localization with a Floor-Plan Map Visual Indoor Localizaion wih a Floor-Plan Map Hang Chu Dep. of ECE Cornell Universiy Ihaca, NY 14850 hc772@cornell.edu Absrac In his repor, a indoor localizaion mehod is presened. The mehod akes firsperson

More information

In Proceedings of CVPR '96. Structure and Motion of Curved 3D Objects from. using these methods [12].

In Proceedings of CVPR '96. Structure and Motion of Curved 3D Objects from. using these methods [12]. In Proceedings of CVPR '96 Srucure and Moion of Curved 3D Objecs from Monocular Silhouees B Vijayakumar David J Kriegman Dep of Elecrical Engineering Yale Universiy New Haven, CT 652-8267 Jean Ponce Compuer

More information

MATH Differential Equations September 15, 2008 Project 1, Fall 2008 Due: September 24, 2008

MATH Differential Equations September 15, 2008 Project 1, Fall 2008 Due: September 24, 2008 MATH 5 - Differenial Equaions Sepember 15, 8 Projec 1, Fall 8 Due: Sepember 4, 8 Lab 1.3 - Logisics Populaion Models wih Harvesing For his projec we consider lab 1.3 of Differenial Equaions pages 146 o

More information

In fmri a Dual Echo Time EPI Pulse Sequence Can Induce Sources of Error in Dynamic Magnetic Field Maps

In fmri a Dual Echo Time EPI Pulse Sequence Can Induce Sources of Error in Dynamic Magnetic Field Maps In fmri a Dual Echo Time EPI Pulse Sequence Can Induce Sources of Error in Dynamic Magneic Field Maps A. D. Hahn 1, A. S. Nencka 1 and D. B. Rowe 2,1 1 Medical College of Wisconsin, Milwaukee, WI, Unied

More information

Video-Based Face Recognition Using Probabilistic Appearance Manifolds

Video-Based Face Recognition Using Probabilistic Appearance Manifolds Video-Based Face Recogniion Using Probabilisic Appearance Manifolds Kuang-Chih Lee Jeffrey Ho Ming-Hsuan Yang David Kriegman klee10@uiuc.edu jho@cs.ucsd.edu myang@honda-ri.com kriegman@cs.ucsd.edu Compuer

More information

Simultaneous Localization and Mapping with Stereo Vision

Simultaneous Localization and Mapping with Stereo Vision Simulaneous Localizaion and Mapping wih Sereo Vision Mahew N. Dailey Compuer Science and Informaion Managemen Asian Insiue of Technology Pahumhani, Thailand Email: mdailey@ai.ac.h Manukid Parnichkun Mecharonics

More information

TrackNet: Simultaneous Detection and Tracking of Multiple Objects

TrackNet: Simultaneous Detection and Tracking of Multiple Objects TrackNe: Simulaneous Deecion and Tracking of Muliple Objecs Chenge Li New York Universiy cl2840@nyu.edu Gregory Dobler New York Universiy greg.dobler@nyu.edu Yilin Song New York Universiy ys1297@nyu.edu

More information

Video Content Description Using Fuzzy Spatio-Temporal Relations

Video Content Description Using Fuzzy Spatio-Temporal Relations Proceedings of he 4s Hawaii Inernaional Conference on Sysem Sciences - 008 Video Conen Descripion Using Fuzzy Spaio-Temporal Relaions rchana M. Rajurkar *, R.C. Joshi and Sananu Chaudhary 3 Dep of Compuer

More information

Robust LSTM-Autoencoders for Face De-Occlusion in the Wild

Robust LSTM-Autoencoders for Face De-Occlusion in the Wild IEEE TRANSACTIONS ON IMAGE PROCESSING, DRAFT 1 Robus LSTM-Auoencoders for Face De-Occlusion in he Wild Fang Zhao, Jiashi Feng, Jian Zhao, Wenhan Yang, Shuicheng Yan arxiv:1612.08534v1 [cs.cv] 27 Dec 2016

More information

Gauss-Jordan Algorithm

Gauss-Jordan Algorithm Gauss-Jordan Algorihm The Gauss-Jordan algorihm is a sep by sep procedure for solving a sysem of linear equaions which may conain any number of variables and any number of equaions. The algorihm is carried

More information

Evaluation and Improvement of Region-based Motion Segmentation

Evaluation and Improvement of Region-based Motion Segmentation Evaluaion and Improvemen of Region-based Moion Segmenaion Mark Ross Universiy Koblenz-Landau, Insiue of Compuaional Visualisics, Universiässraße 1, 56070 Koblenz, Germany Email: ross@uni-koblenz.de Absrac

More information

LAMP: 3D Layered, Adaptive-resolution and Multiperspective Panorama - a New Scene Representation

LAMP: 3D Layered, Adaptive-resolution and Multiperspective Panorama - a New Scene Representation Submission o Special Issue of CVIU on Model-based and Image-based 3D Scene Represenaion for Ineracive Visualizaion LAMP: 3D Layered, Adapive-resoluion and Muliperspecive Panorama - a New Scene Represenaion

More information

A Fast Stereo-Based Multi-Person Tracking using an Approximated Likelihood Map for Overlapping Silhouette Templates

A Fast Stereo-Based Multi-Person Tracking using an Approximated Likelihood Map for Overlapping Silhouette Templates A Fas Sereo-Based Muli-Person Tracking using an Approximaed Likelihood Map for Overlapping Silhouee Templaes Junji Saake Jun Miura Deparmen of Compuer Science and Engineering Toyohashi Universiy of Technology

More information

An Iterative Scheme for Motion-Based Scene Segmentation

An Iterative Scheme for Motion-Based Scene Segmentation An Ieraive Scheme for Moion-Based Scene Segmenaion Alexander Bachmann and Hildegard Kuehne Deparmen for Measuremen and Conrol Insiue for Anhropomaics Universiy of Karlsruhe (H), 76 131 Karlsruhe, Germany

More information

Real-time 2D Video/3D LiDAR Registration

Real-time 2D Video/3D LiDAR Registration Real-ime 2D Video/3D LiDAR Regisraion C. Bodenseiner Fraunhofer IOSB chrisoph.bodenseiner@iosb.fraunhofer.de M. Arens Fraunhofer IOSB michael.arens@iosb.fraunhofer.de Absrac Progress in LiDAR scanning

More information

Viewpoint Invariant 3D Landmark Model Inference from Monocular 2D Images Using Higher-Order Priors

Viewpoint Invariant 3D Landmark Model Inference from Monocular 2D Images Using Higher-Order Priors Viewpoin Invarian 3D Landmark Model Inference from Monocular 2D Images Using Higher-Order Priors Chaohui Wang 1,2, Yun Zeng 3, Loic Simon 1, Ioannis Kakadiaris 4, Dimiris Samaras 3, Nikos Paragios 1,2

More information

Projection & Interaction

Projection & Interaction Projecion & Ineracion Algebra of projecion Canonical viewing volume rackball inerface ransform Hierarchies Preview of Assignmen #2 Lecure 8 Comp 236 Spring 25 Projecions Our lives are grealy simplified

More information

Rao-Blackwellized Particle Filtering for Probing-Based 6-DOF Localization in Robotic Assembly

Rao-Blackwellized Particle Filtering for Probing-Based 6-DOF Localization in Robotic Assembly MITSUBISHI ELECTRIC RESEARCH LABORATORIES hp://www.merl.com Rao-Blackwellized Paricle Filering for Probing-Based 6-DOF Localizaion in Roboic Assembly Yuichi Taguchi, Tim Marks, Haruhisa Okuda TR1-8 June

More information

Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases

Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases Lmarks: A New Model for Similariy-Based Paern Querying in Time Series Daabases Chang-Shing Perng Haixun Wang Sylvia R. Zhang D. So Parker perng@cs.ucla.edu hxwang@cs.ucla.edu Sylvia Zhang@cle.com so@cs.ucla.edu

More information

A Hierarchical Object Recognition System Based on Multi-scale Principal Curvature Regions

A Hierarchical Object Recognition System Based on Multi-scale Principal Curvature Regions A Hierarchical Objec Recogniion Sysem Based on Muli-scale Principal Curvaure Regions Wei Zhang, Hongli Deng, Thomas G Dieerich and Eric N Morensen School of Elecrical Engineering and Compuer Science Oregon

More information

Detection and segmentation of moving objects in highly dynamic scenes

Detection and segmentation of moving objects in highly dynamic scenes Deecion and segmenaion of moving objecs in highly dynamic scenes Aurélie Bugeau Parick Pérez INRIA, Cenre Rennes - Breagne Alanique Universié de Rennes, Campus de Beaulieu, 35 042 Rennes Cedex, France

More information

Hierarchical Recurrent Filtering for Fully Convolutional DenseNets

Hierarchical Recurrent Filtering for Fully Convolutional DenseNets Hierarchical Recurren Filering for Fully Convoluional DenseNes Jo rg Wagner1,2, Volker Fischer1, Michael Herman1 and Sven Behnke2 1- Bosch Cener for Arificial Inelligence - 71272 Renningen - Germany 2-

More information

Deep Appearance Models for Face Rendering

Deep Appearance Models for Face Rendering Deep Appearance Models for Face Rendering STEPHEN LOMBARDI, Facebook Realiy Labs JASON SARAGIH, Facebook Realiy Labs TOMAS SIMON, Facebook Realiy Labs YASER SHEIKH, Facebook Realiy Labs Deep Appearance

More information

Multi-Scale Object Candidates for Generic Object Tracking in Street Scenes

Multi-Scale Object Candidates for Generic Object Tracking in Street Scenes Muli-Scale Objec Candidaes for Generic Objec Tracking in Sree Scenes Aljoša Ošep, Alexander Hermans, Francis Engelmann, Dirk Klosermann, Markus Mahias and Basian Leibe Absrac Mos vision based sysems for

More information

Michiel Helder and Marielle C.T.A Geurts. Hoofdkantoor PTT Post / Dutch Postal Services Headquarters

Michiel Helder and Marielle C.T.A Geurts. Hoofdkantoor PTT Post / Dutch Postal Services Headquarters SHORT TERM PREDICTIONS A MONITORING SYSTEM by Michiel Helder and Marielle C.T.A Geurs Hoofdkanoor PTT Pos / Duch Posal Services Headquarers Keywords macro ime series shor erm predicions ARIMA-models faciliy

More information

NEWTON S SECOND LAW OF MOTION

NEWTON S SECOND LAW OF MOTION Course and Secion Dae Names NEWTON S SECOND LAW OF MOTION The acceleraion of an objec is defined as he rae of change of elociy. If he elociy changes by an amoun in a ime, hen he aerage acceleraion during

More information

Multi-Target Detection and Tracking from a Single Camera in Unmanned Aerial Vehicles (UAVs)

Multi-Target Detection and Tracking from a Single Camera in Unmanned Aerial Vehicles (UAVs) 2016 IEEE/RSJ Inernaional Conference on Inelligen Robos and Sysems (IROS) Daejeon Convenion Cener Ocober 9-14, 2016, Daejeon, Korea Muli-Targe Deecion and Tracking from a Single Camera in Unmanned Aerial

More information

Optimal Crane Scheduling

Optimal Crane Scheduling Opimal Crane Scheduling Samid Hoda, John Hooker Laife Genc Kaya, Ben Peerson Carnegie Mellon Universiy Iiro Harjunkoski ABB Corporae Research EWO - 13 November 2007 1/16 Problem Track-mouned cranes move

More information

A Neural Network Approach to Missing Marker Reconstruction

A Neural Network Approach to Missing Marker Reconstruction A Neural Nework Approach o Missing Marker Reconsrucion Taras Kucherenko Hedvig Kjellsröm Deparmen of Roboics, Percepion, and Learning KTH Royal Insiue of Technology, Sockholm, Sweden Email: {arask,hedvig}@kh.se

More information

Sam knows that his MP3 player has 40% of its battery life left and that the battery charges by an additional 12 percentage points every 15 minutes.

Sam knows that his MP3 player has 40% of its battery life left and that the battery charges by an additional 12 percentage points every 15 minutes. 8.F Baery Charging Task Sam wans o ake his MP3 player and his video game player on a car rip. An hour before hey plan o leave, he realized ha he forgo o charge he baeries las nigh. A ha poin, he plugged

More information

Computer representations of piecewise

Computer representations of piecewise Edior: Gabriel Taubin Inroducion o Geomeric Processing hrough Opimizaion Gabriel Taubin Brown Universiy Compuer represenaions o piecewise smooh suraces have become vial echnologies in areas ranging rom

More information

Mobile Robots Mapping

Mobile Robots Mapping Mobile Robos Mapping 1 Roboics is Easy conrol behavior percepion modelling domain model environmen model informaion exracion raw daa planning ask cogniion reasoning pah planning navigaion pah execuion

More information

Quantitative macro models feature an infinite number of periods A more realistic (?) view of time

Quantitative macro models feature an infinite number of periods A more realistic (?) view of time INFINIE-HORIZON CONSUMPION-SAVINGS MODEL SEPEMBER, Inroducion BASICS Quaniaive macro models feaure an infinie number of periods A more realisic (?) view of ime Infinie number of periods A meaphor for many

More information

Occlusion-Free Hand Motion Tracking by Multiple Cameras and Particle Filtering with Prediction

Occlusion-Free Hand Motion Tracking by Multiple Cameras and Particle Filtering with Prediction 58 IJCSNS Inernaional Journal of Compuer Science and Nework Securiy, VOL.6 No.10, Ocober 006 Occlusion-Free Hand Moion Tracking by Muliple Cameras and Paricle Filering wih Predicion Makoo Kao, and Gang

More information

A METHOD OF MODELING DEFORMATION OF AN OBJECT EMPLOYING SURROUNDING VIDEO CAMERAS

A METHOD OF MODELING DEFORMATION OF AN OBJECT EMPLOYING SURROUNDING VIDEO CAMERAS A METHOD OF MODELING DEFORMATION OF AN OBJECT EMLOYING SURROUNDING IDEO CAMERAS Joo Kooi TAN, Seiji ISHIKAWA Deparmen of Mechanical and Conrol Engineering Kushu Insiue of Technolog, Japan ehelan@is.cnl.kuech.ac.jp,

More information

ACQUIRING high-quality and well-defined depth data. Online Temporally Consistent Indoor Depth Video Enhancement via Static Structure

ACQUIRING high-quality and well-defined depth data. Online Temporally Consistent Indoor Depth Video Enhancement via Static Structure SUBMITTED TO TRANSACTION ON IMAGE PROCESSING 1 Online Temporally Consisen Indoor Deph Video Enhancemen via Saic Srucure Lu Sheng, Suden Member, IEEE, King Ngi Ngan, Fellow, IEEE, Chern-Loon Lim and Songnan

More information

IROS 2015 Workshop on On-line decision-making in multi-robot coordination (DEMUR 15)

IROS 2015 Workshop on On-line decision-making in multi-robot coordination (DEMUR 15) IROS 2015 Workshop on On-line decision-making in muli-robo coordinaion () OPTIMIZATION-BASED COOPERATIVE MULTI-ROBOT TARGET TRACKING WITH REASONING ABOUT OCCLUSIONS KAROL HAUSMAN a,, GREGORY KAHN b, SACHIN

More information

Image Content Representation

Image Content Representation Image Conen Represenaion Represenaion for curves and shapes regions relaionships beween regions E.G.M. Perakis Image Represenaion & Recogniion 1 Reliable Represenaion Uniqueness: mus uniquely specify an

More information

Effects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Fall 2012)

Effects needed for Realism. Ray Tracing. Ray Tracing: History. Outline. Foundations of Computer Graphics (Fall 2012) Foundaions of ompuer Graphics (Fall 2012) S 184, Lecure 16: Ray Tracing hp://ins.eecs.berkeley.edu/~cs184 Effecs needed for Realism (Sof) Shadows Reflecions (Mirrors and Glossy) Transparency (Waer, Glass)

More information

arxiv: v1 [cs.cv] 11 Jan 2019

arxiv: v1 [cs.cv] 11 Jan 2019 A General Opimizaion-based Framework for Global Pose Esimaion wih Muliple Sensors arxiv:191.3642v1 [cs.cv] 11 Jan 219 Tong Qin, Shaozu Cao, Jie Pan, and Shaojie Shen Absrac Accurae sae esimaion is a fundamenal

More information

Robust Segmentation and Tracking of Colored Objects in Video

Robust Segmentation and Tracking of Colored Objects in Video IEEE TRANSACTIONS ON CSVT, VOL. 4, NO. 6, 2004 Robus Segmenaion and Tracking of Colored Objecs in Video Theo Gevers, member, IEEE Absrac Segmening and racking of objecs in video is of grea imporance for

More information

Improving Occupancy Grid FastSLAM by Integrating Navigation Sensors

Improving Occupancy Grid FastSLAM by Integrating Navigation Sensors Improving Occupancy Grid FasSLAM by Inegraing Navigaion Sensors Chrisopher Weyers Sensors Direcorae Air Force Research Laboraory Wrigh-Paerson AFB, OH 45433 Gilber Peerson Deparmen of Elecrical and Compuer

More information

Real-Time Avatar Animation Steered by Live Body Motion

Real-Time Avatar Animation Steered by Live Body Motion Real-Time Avaar Animaion Seered by Live Body Moion Oliver Schreer, Ralf Tanger, Peer Eiser, Peer Kauff, Bernhard Kaspar, and Roman Engler 3 Fraunhofer Insiue for Telecommunicaions/Heinrich-Herz-Insiu,

More information

A High-Speed Adaptive Multi-Module Structured Light Scanner

A High-Speed Adaptive Multi-Module Structured Light Scanner A High-Speed Adapive Muli-Module Srucured Ligh Scanner Andreas Griesser 1 Luc Van Gool 1,2 1 Swiss Fed.Ins.of Techn.(ETH) 2 Kaholieke Univ. Leuven D-ITET/Compuer Vision Lab ESAT/VISICS Zürich, Swizerland

More information

Robust Visual Tracking for Multiple Targets

Robust Visual Tracking for Multiple Targets Robus Visual Tracking for Muliple Targes Yizheng Cai, Nando de Freias, and James J. Lile Universiy of Briish Columbia, Vancouver, B.C., Canada, V6T 1Z4 {yizhengc, nando, lile}@cs.ubc.ca Absrac. We address

More information

A Face Detection Method Based on Skin Color Model

A Face Detection Method Based on Skin Color Model A Face Deecion Mehod Based on Skin Color Model Dazhi Zhang Boying Wu Jiebao Sun Qinglei Liao Deparmen of Mahemaics Harbin Insiue of Technology Harbin China 150000 Zhang_dz@163.com mahwby@hi.edu.cn sunjiebao@om.com

More information

Design Alternatives for a Thin Lens Spatial Integrator Array

Design Alternatives for a Thin Lens Spatial Integrator Array Egyp. J. Solids, Vol. (7), No. (), (004) 75 Design Alernaives for a Thin Lens Spaial Inegraor Array Hala Kamal *, Daniel V azquez and Javier Alda and E. Bernabeu Opics Deparmen. Universiy Compluense of

More information

Analysis of Various Types of Bugs in the Object Oriented Java Script Language Coding

Analysis of Various Types of Bugs in the Object Oriented Java Script Language Coding Indian Journal of Science and Technology, Vol 8(21), DOI: 10.17485/ijs/2015/v8i21/69958, Sepember 2015 ISSN (Prin) : 0974-6846 ISSN (Online) : 0974-5645 Analysis of Various Types of Bugs in he Objec Oriened

More information

Robust Multi-view Face Detection Using Error Correcting Output Codes

Robust Multi-view Face Detection Using Error Correcting Output Codes Robus Muli-view Face Deecion Using Error Correcing Oupu Codes Hongming Zhang,2, Wen GaoP P, Xilin Chen 2, Shiguang Shan 2, and Debin Zhao Deparmen of Compuer Science and Engineering, Harbin Insiue of Technolog

More information

SENSING using 3D technologies, structured light cameras

SENSING using 3D technologies, structured light cameras IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 10, OCTOBER 2017 2045 Real-Time Enhancemen of Dynamic Deph Videos wih Non-Rigid Deformaions Kassem Al Ismaeil, Suden Member,

More information

Robust 3D Visual Tracking Using Particle Filtering on the SE(3) Group

Robust 3D Visual Tracking Using Particle Filtering on the SE(3) Group Robus 3D Visual Tracking Using Paricle Filering on he SE(3) Group Changhyun Choi and Henrik I. Chrisensen Roboics & Inelligen Machines, College of Compuing Georgia Insiue of Technology Alana, GA 3332,

More information

Stereo Vision Based Navigation of a Six-Legged Walking Robot in Unknown Rough Terrain

Stereo Vision Based Navigation of a Six-Legged Walking Robot in Unknown Rough Terrain Sereo Vision Based Navigaion of a Six-Legged Walking Robo in Unknown Rough Terrain Anne Selzer, Heiko Hirschmüller, Marin Görner Absrac This paper presens a visual navigaion algorihm for he six-legged

More information

arxiv: v1 [cs.cv] 4 Jun 2018

arxiv: v1 [cs.cv] 4 Jun 2018 Cube Padding for Weakly-Supervised Saliency Predicion in 360 Videos Hsien-Tzu Cheng 1, Chun-Hung Chao 1, Jin-Dong Dong 1, Hao-Kai Wen, Tyng-Luh Liu 3, Min Sun 1 1 Naional Tsing Hua Universiy Taiwan AI

More information

IntentSearch:Capturing User Intention for One-Click Internet Image Search

IntentSearch:Capturing User Intention for One-Click Internet Image Search JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY 2010 1 InenSearch:Capuring User Inenion for One-Click Inerne Image Search Xiaoou Tang, Fellow, IEEE, Ke Liu, Jingyu Cui, Suden Member, IEEE, Fang

More information

Improved TLD Algorithm for Face Tracking

Improved TLD Algorithm for Face Tracking Absrac Improved TLD Algorihm for Face Tracking Huimin Li a, Chaojing Yu b and Jing Chen c Chongqing Universiy of Poss and Telecommunicaions, Chongqing 400065, China a li.huimin666@163.com, b 15023299065@163.com,

More information

Time Expression Recognition Using a Constituent-based Tagging Scheme

Time Expression Recognition Using a Constituent-based Tagging Scheme Track: Web Conen Analysis, Semanics and Knowledge Time Expression Recogniion Using a Consiuen-based Tagging Scheme Xiaoshi Zhong and Erik Cambria School of Compuer Science and Engineering Nanyang Technological

More information

Real time 3D face and facial feature tracking

Real time 3D face and facial feature tracking J Real-Time Image Proc (2007) 2:35 44 DOI 10.1007/s11554-007-0032-2 ORIGINAL RESEARCH PAPER Real ime 3D face and facial feaure racking Fadi Dornaika Æ Javier Orozco Received: 23 November 2006 / Acceped:

More information

RGB-D Object Tracking: A Particle Filter Approach on GPU

RGB-D Object Tracking: A Particle Filter Approach on GPU RGB-D Objec Tracking: A Paricle Filer Approach on GPU Changhyun Choi and Henrik I. Chrisensen Cener for Roboics & Inelligen Machines College of Compuing Georgia Insiue of Technology Alana, GA 3332, USA

More information

Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors

Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors 878 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 5, MAY 2008 Nonrigid Srucure-from-Moion: Esimaing Shape and Moion wih Hierarchical Priors Lorenzo Torresani, Aaron Herzmann,

More information

Graffiti Detection Using Two Views

Graffiti Detection Using Two Views Graffii Deecion Using wo Views Luigi Di Sefano Federico ombari Alessandro Lanza luigi.disefano@unibo.i federico.ombari@unibo.i alanza@arces.unibo.i Sefano Maoccia sefano.maoccia@unibo.i Sefano Moni sefano.moni3@sudio.unibo.i

More information

Real Time Integral-Based Structural Health Monitoring

Real Time Integral-Based Structural Health Monitoring Real Time Inegral-Based Srucural Healh Monioring The nd Inernaional Conference on Sensing Technology ICST 7 J. G. Chase, I. Singh-Leve, C. E. Hann, X. Chen Deparmen of Mechanical Engineering, Universiy

More information

Gender Classification of Faces Using Adaboost*

Gender Classification of Faces Using Adaboost* Gender Classificaion of Faces Using Adaboos* Rodrigo Verschae 1,2,3, Javier Ruiz-del-Solar 1,2, and Mauricio Correa 1,2 1 Deparmen of Elecrical Engineering, Universidad de Chile 2 Cener for Web Research,

More information

Segmentation by Level Sets and Symmetry

Segmentation by Level Sets and Symmetry Segmenaion by Level Ses and Symmery Tammy Riklin-Raviv Nahum Kiryai Nir Sochen Tel Aviv Universiy, Tel Aviv 69978, Israel ammy@eng.au.ac.il nk@eng.au.ac.il sochen@pos.au.ac.il Absrac Shape symmery is an

More information

Image warping Li Zhang CS559

Image warping Li Zhang CS559 Wha is an image Image arping Li Zhang S559 We can hink of an image as a funcion, f: R 2 R: f(, ) gives he inensi a posiion (, ) defined over a recangle, ih a finie range: f: [a,b][c,d] [,] f Slides solen

More information

Moving Object Detection Using MRF Model and Entropy based Adaptive Thresholding

Moving Object Detection Using MRF Model and Entropy based Adaptive Thresholding Moving Objec Deecion Using MRF Model and Enropy based Adapive Thresholding Badri Narayan Subudhi, Pradipa Kumar Nanda and Ashish Ghosh Machine Inelligence Uni, Indian Saisical Insiue, Kolkaa, 700108, India,

More information

A time-space consistency solution for hardware-in-the-loop simulation system

A time-space consistency solution for hardware-in-the-loop simulation system Inernaional Conference on Advanced Elecronic Science and Technology (AEST 206) A ime-space consisency soluion for hardware-in-he-loop simulaion sysem Zexin Jiang a Elecric Power Research Insiue of Guangdong

More information

Reinforcement Learning by Policy Improvement. Making Use of Experiences of The Other Tasks. Hajime Kimura and Shigenobu Kobayashi

Reinforcement Learning by Policy Improvement. Making Use of Experiences of The Other Tasks. Hajime Kimura and Shigenobu Kobayashi Reinforcemen Learning by Policy Improvemen Making Use of Experiences of The Oher Tasks Hajime Kimura and Shigenobu Kobayashi Tokyo Insiue of Technology, JAPAN genfe.dis.iech.ac.jp, kobayasidis.iech.ac.jp

More information

A Bayesian Approach to Video Object Segmentation via Merging 3D Watershed Volumes

A Bayesian Approach to Video Object Segmentation via Merging 3D Watershed Volumes A Bayesian Approach o Video Objec Segmenaion via Merging 3D Waershed Volumes Yu-Pao Tsai 1,3, Chih-Chuan Lai 1,2, Yi-Ping Hung 1,2, and Zen-Chung Shih 3 1 Insiue of Informaion Science, Academia Sinica,

More information

THE micro-lens array (MLA) based light field cameras,

THE micro-lens array (MLA) based light field cameras, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL., NO., A Generic Muli-Projecion-Cener Model and Calibraion Mehod for Ligh Field Cameras Qi hang, Chunping hang, Jinbo Ling, Qing Wang,

More information

Joint Feature Learning With Robust Local Ternary Pattern for Face Recognition

Joint Feature Learning With Robust Local Ternary Pattern for Face Recognition Join Feaure Learning Wih Robus Local Ternary Paern for Face Recogniion Yuvaraju.M 1, Shalini.S 1 Assisan Professor, Deparmen of Elecrical and Elecronics Engineering, Anna Universiy Regional Campus, Coimbaore,

More information

Reconstruct scene geometry from two or more calibrated images. scene point. image plane. Reconstruct scene geometry from two or more calibrated images

Reconstruct scene geometry from two or more calibrated images. scene point. image plane. Reconstruct scene geometry from two or more calibrated images Sereo and Moion The Sereo Problem Reconsrc scene geomer from wo or more calibraed images scene poin focal poin image plane Sereo The Sereo Problem Reconsrc scene geomer from wo or more calibraed images

More information

SLAM in Large Indoor Environments with Low-Cost, Noisy, and Sparse Sonars

SLAM in Large Indoor Environments with Low-Cost, Noisy, and Sparse Sonars SLAM in Large Indoor Environmens wih Low-Cos, Noisy, and Sparse Sonars Teddy N. Yap, Jr. and Chrisian R. Shelon Deparmen of Compuer Science and Engineering Universiy of California, Riverside, CA 92521,

More information

arxiv: v1 [cs.cv] 11 Jan 2019

arxiv: v1 [cs.cv] 11 Jan 2019 A General Opimizaion-based Framewor for Local Odomery Esimaion wih Muliple Sensors Tong Qin, Jie Pan, Shaozu Cao, and Shaojie Shen arxiv:191.3638v1 [cs.cv] 11 Jan 219 Absrac Nowadays, more and more sensors

More information

Wheelchair-user Detection Combined with Parts-based Tracking

Wheelchair-user Detection Combined with Parts-based Tracking Wheelchair-user Deecion Combined wih Pars-based Tracking Ukyo Tanikawa 1, Yasuomo Kawanishi 1, Daisuke Deguchi 2,IchiroIde 1, Hiroshi Murase 1 and Ryo Kawai 3 1 Graduae School of Informaion Science, Nagoya

More information

Coded Caching with Multiple File Requests

Coded Caching with Multiple File Requests Coded Caching wih Muliple File Requess Yi-Peng Wei Sennur Ulukus Deparmen of Elecrical and Compuer Engineering Universiy of Maryland College Park, MD 20742 ypwei@umd.edu ulukus@umd.edu Absrac We sudy a

More information

Learning nonlinear appearance manifolds for robot localization

Learning nonlinear appearance manifolds for robot localization Learning nonlinear appearance manifolds for robo localizaion Jihun Hamm, Yuanqing Lin, and Daniel. D. Lee GRAS Lab, Deparmen of Elecrical and Sysems Engineering Universiy of ennsylvania, hiladelphia, A

More information

arxiv: v1 [cs.cv] 18 Apr 2017

arxiv: v1 [cs.cv] 18 Apr 2017 Ligh Field Blind Moion Deblurring Praul P. Srinivasan 1, Ren Ng 1, Ravi Ramamoorhi 2 1 Universiy of California, Berkeley 2 Universiy of California, San Diego 1 {praul,ren}@eecs.berkeley.edu, 2 ravir@cs.ucsd.edu

More information

Curves & Surfaces. Last Time? Today. Readings for Today (pick one) Limitations of Polygonal Meshes. Today. Adjacency Data Structures

Curves & Surfaces. Last Time? Today. Readings for Today (pick one) Limitations of Polygonal Meshes. Today. Adjacency Data Structures Las Time? Adjacency Daa Srucures Geomeric & opologic informaion Dynamic allocaion Efficiency of access Curves & Surfaces Mesh Simplificaion edge collapse/verex spli geomorphs progressive ransmission view-dependen

More information

Tracking a Large Number of Objects from Multiple Views

Tracking a Large Number of Objects from Multiple Views Boson Universiy Compuer Science Deparmen Technical Repor BUCS-TR 2009-005 Tracking a Large Number of Objecs from Muliple Views Zheng Wu 1, Nickolay I. Hrisov 2, Tyson L. Hedrick 3, Thomas H. Kun 2, Margri

More information

Multi-View 3D Human Tracking in Crowded Scenes

Multi-View 3D Human Tracking in Crowded Scenes Proceedings of he Thirieh AAAI Conference on Arificial Inelligence (AAAI-16) Muli-View 3D Human Tracking in Crowded Scenes Xiaobai Liu Deparmen of Compuer Science, San Diego Sae Universiy GMCS Building,

More information

MOTION TRACKING is a fundamental capability that

MOTION TRACKING is a fundamental capability that TECHNICAL REPORT CRES-05-008, CENTER FOR ROBOTICS AND EMBEDDED SYSTEMS, UNIVERSITY OF SOUTHERN CALIFORNIA 1 Real-ime Moion Tracking from a Mobile Robo Boyoon Jung, Suden Member, IEEE, Gaurav S. Sukhame,

More information