TrackNet: Simultaneous Detection and Tracking of Multiple Objects
|
|
- Isabel Fisher
- 5 years ago
- Views:
Transcription
1 TrackNe: Simulaneous Deecion and Tracking of Muliple Objecs Chenge Li New York Universiy Gregory Dobler New York Universiy Yilin Song New York Universiy Xin Feng Chongqing Universiy of Technology Yao Wang New York Universiy Absrac Objec deecion and objec racking are usually reaed as wo separae processes. Objec deecion in sill images relies on spaial appearance feaures, whereas objec racking in videos relies on boh spaial appearance and emporal moion feaures. Significan progress has been made for objec deecion in 2D images using deep learning neworks such as region CNN and subsequen varians. The usual pipeline for objec racking requires ha he objec be successfully deeced in he firs frame or in every frame, and racking is done by associaing deecion resuls. Performing objec deecion and objec racking hrough a single nework remains a challenging open quesion. We propose a novel nework srucure ha can direcly deec a 3D ube enclosing a moving objec in a video by exending he region-cnn framework for objec deecion in an image. The proposed rackne works over shor video segmens and oupus a bounding ube for each deeced moving objec, which includes shifed bounding boxes covering he deeced objec in successive frames. A Tube Proposal Nework (TPN) inside he rackne is proposed o predic he objeciveness of each candidae ube and locaion parameers specifying he bounding ube wih a high objeciveness score. We have rained and esed rackne on UA-DETRAC, he larges raffic video daase available for muli-objec deecion and racking, and obained very promising resuls. 1. Inroducion Objec deecion and objec racking have been wo long sanding challenges for he compuer vision communiy, and much progress has been made on boh frons. For objec deecion, complex hand-crafed feaures plus shallow classifiers such as HOG+SVM [3] and muliple resoluion image pyramids plus muliple filers such as DPM [6] were boh popular deecion pipelines. In 2013, AlexNe [14] showed significan performance improvemens over hose radiional pipelines in he ImageNe compeiion [4] for a 1000-caegory image classificaion ask. Wih his success, Convoluional Neural Neworks (CNN) regained populariy in research he communiy for a variey of vision asks from image classificaion, objec deecion, and localizaion o dense image capioning, ec. Objec racking, especially Muli-Objec Tracking (MOT), has many real-world applicaions including auonomous vehicles, virual/augmened realiy, robo navigaion, ec. Mos exising MOT sysems can be classified ino wo groups [16]: Deecion Based Tracking (DBT) and Deecion Free Tracking (DFT). DBT requires objec being deeced for every frame followed by a racker ha associaes or links he deecion regions based eiher on objec feaures or probabilisic movemen characerisics. DFT on he oher hand, requires manual iniializaion of objecs in he iniial frame followed by associaion of hose wih auomaically deeced objecs in subsequen frames (i.e., DFT rackers canno handle objecs ha were no iniialized). Boh approaches rely on pre-compued objec locaions. A general MOT sysem enails an objec deecion sep o find arge locaions in muliple (and someimes every) video frames. We argue ha objec deecion and racking should no be reaed as wo independen asks, bu raher, effecive objec deecion in video should employ boh spaial appearance and emporal moion feaures. In his work, we propose a unified nework srucure for simulaneous objec deecion and racking. Our nework reas a group of consecuive picures (GoP) as a 3D volume, and deecs moving objecs in he GoP as ubes wihin i. As shown in figure 1, our sysem uses he popular 3D convoluional nework [23] and he VGG[21] ne wih pure 2D convoluions as he underlying neworks for spaial and emporal feaure exracion from raw inpu videos. A spaial ransformer module [10] is used o deal wih varying ob- 1
2 2D convoluion Spaial Transformer θ Tθ(G) Tube Proposal Nework FC layers Lcls Spaial-emporal convoluion U V Tube Pooling Pos-TPN Lreg Figure 1. The rackne srucure. A wo branched backbone produces boh appearance-only and spaial-emporal feaures from a video GoP. A spaial ransformer is insered o inroduce global feaure warping. A ube proposal nework is inroduced o generae flexible bounding ube proposals. Proposal ubes are used o guide more specific pooling, and he ube-pooled feaures are used o furher predic more specific class labels and refine ube locaions. The final oupu will be bounding ubes for moving objecs inside his video GoP. A shor rackle is also generaed by connecing he cenroids of bounding boxes a each frame. jec orienaions. A ube proposal nework (TPN) generaes bounding ube proposals for deeced candidaes. A pos- TPN sage hen furher refines he ube classificaion scores and locaion parameers for ube proposals wih higher objeciveness scores. We have explored muliple srucures inside TPN for generaing accurae ube proposals. We evaluae our mehod on he UA-DETRAC[26] daase boh for deecion and racking. Moivaions for Tube Proposals. One obvious advanage of ube proposals over box proposals is he convenience of geing all objecs spaial-emporal locaions in one sho. A bounding ube is readily available afer one forward pass insead of N imes of forward passes plus pos-processing associaion. The oher advanage of ube proposals is is role o provide a much sronger regularizaion during raining. Suppose ha here are B moving objecs, which appeared in each of he N frames. If we firs deec B box proposals in each frame, we would have o examine B N possible ubes, while here are only B ubes ha are correc. Therefore during raining, he ground ruh bounding ubes acually occur very rarely in his high-dimensional space. Each one of he ground ruh ubes carries highly srucured informaion implicily. This sparsiy and implici srucure informaion will serve as very srong regularizaion: only ubes wih cerain spaial and moion feaures over he enire GoP are good candidaes. Furhermore, he sep from proposing boxes o proposing ubes is in fac a very naural exension. When a person sees wo nearby frames (no necessarily adjacen), he will naurally fill in he gap beween one objec s wo appearances. Therefore, when dealing wih objec deecion or racking in videos, proposing ubes is a more naural way han proposing boxes. Exising ube-based algorihms such as [12, 13, 11] have some limiaions in heir ube generaion processes. There, he proposed ubes are achieved from frame-by-frame deecion resuls or by using some racking algorihms o explicily creae a ube, which make he ube proposal sage very ime-consuming and compuaionally expensive. Kang e al. in [11] proposed o generae ube proposals from saic anchors more efficienly, however he feaures are pooled from he same box locaion across muliple frames. Hence only saionary sraigh anchor ubes are considered in heir case. The major innovaions of his work include: i) We exend he region-cnn archiecure o direcly deec bounding ubes covering moving objecs in a video, o simulaneously accomplish deecion and racking; ii) We propose o combine he feaures generaed by a 3D convoluion nework and a 2D convoluion nework o capure boh moion informaion and appearance informaion; iii) We propose o inser a spaial ransformer module in he feaure domain o make he inpu feaures o he ube proposal nework and refinemen nework invarian o camera view angles; iv) We furher propose o iniialize candidae anchor ubes based on he opical flow moion vecors near he candidae locaion. 2. Relaed Work Objec Deecion in Images. Successful objec deecion has progressed rapidly in recen years [8, 19, 17, 18]. Girshick e al. firs inroduced R-CNN [8] o idenify and label objecs in 2D images. From objec region proposals ha are generaed by an independen algorihm (e.g., selecive search [24]), R-CNN runs a forward pass once for each proposed region o deermine wheher his region conains an objec. The same auhors furher improved R-CNN o fas R-CNN[7] by sharing convoluion feaure maps among all objec region proposals, and hence only one forward pass for all region proposals is needed. Faser R-CNN furher improved upon fas R-CNN by inroducing a region proposal nework (RPN) ha direcly regresses fixed anchors o objec region proposals from feaure maps exraced using a 2D convoluional nework. Our work is inspired by he srucure of faser R-CNN [19], bu we replace RPN by a ube proposal nework (TPN) acing on a 3D convoluional nework, and we uilize a 3D (as opposed o 2D) ROI pooling mechanism. The regression loss objecive funcion is also exended o consider he difference beween ground ruh and deeced ube locaions in all frames, which is op-
3 imized end-o-end joinly wih ube classificaion loss. Objec Deecion in Videos. Mos sysems proposed so far for idenifying (and racking) moving objecs in videos rely on 2D objec deecion in each frame, which is compuaionally expensive and does no joinly consider objec(s ) moion informaion. Some sysems (e.g., [12, 13]) have used explici moion informaion (e.g. opical flow) as a linking feaure o associae deecion regions or o smooh deecion scores as a pos-processing sep. However hose moion informaion is derived separaely ouside of he nework and is no inegraed organically wih he nework raining. Recenly, [11] proposed a ubele generaion module (also refered o as Tubele Proposed Nework), however heir ubeles are generaed from 2D proposal boxes generaed by a separae algorihm (e.g., selecive search [24]). The objec deecion is based on pooled feaures from GoogLene [22] feaure maps. Despie he powerful abiliy of GoogLene, his mehod pools from muliple 2D feaure maps which are hen concaenaed as opposed o our spaialemporal feaures which are generaed direcly from video segmens. Tran e al. in [23] proposed C3D, a 3D convoluional CNN srucure, o exrac spaio-emporal feaures o classify videos ino differen caegories. C3D has shown good performance compared wih radiional compuer vision algorihms such as dense rajecories [25]. Shou e al. in [20] exended he C3D nework o locae ineresing acion ime slos in an unrimmed long video by argeing video segmens. We argue ha feaures exraced by his kind of 3D convoluional neworks are well suied for objec deecion in videos. Even hough C3D was no specifically rained for accurae objec localizaion, locaions ha are highly acivaed in he feaures maps ypically correspond o moving objecs. Therefore we use he C3D nework srucure for feaure exracion in he rackne, and we refine he nework weighs hrough end-o-end raining. 3. TrackNe Model Archiecure 3.1. Two Sream Feaure Exracion and Transformaion In order o uilize boh spaial and emporal feaures, our nework is based on VGG ne rained from ImageNe and C3D rained from UCF101 for video classificaions. Figure 1 provides an overview of he rackne srucure. We divide a video ino group of picures(gop) of fixed lengh (8 frames in our implemenaion) and feed he raw video frames in each GoP ino a wo-headed backbone srucure, where he firs head is a VGG-like subnework wih convoluions and spaial max-poolings and he second head is a C3D-like subnework wih 3D convoluions and spaialemporal max-poolings. These wo kind of feaures complimen each oher in ha one focuses more on appearance ransformed fea V (pi,1, pi,t) inerpolae refine cls reg Tube Classificaion Module Tube Offse Regression Module Figure 2. Tube Proposal Nework (TPN) in deail: based on he shared convoluion feaure maps, a TPN consiss of wo pars: he classificaion module and he ube offse regression module. We show a hea map of he prediced objeciveness scores from he classificaion module: regions where objecs are moving have higher (warmer) values. whereas he oher focuses more on moions. The resuling spaial-emporal feaure maps and he 2D feaure maps hen separaely go hrough a squashing convoluional layer wih 1 1 kernel size and 128 oupu channels. These wo squashing layers will reduce he numbers of feaure map channels. Squashed feaure maps are concaenaed aferwards. Spaial Transformer. In some videos, he nework will observe objecs fronal appearance, whereas in ohers, he side appearances of objecs will be observed. Inspired from [10], we uilized a learnable module, he spaial ransformer, o map he concaenaed feaures from differen viewing angles ino a unified manifold. We used affine ransformaion in our case, however one can use more complicaed ransformaions o sui heir case as indicaed in [10]. Our ransformer has a very simple srucure wih only one convoluional layer and one fully conneced layer as shown in figure 1. Six affine ransformaion parameers θ will be oupu from he fully conneced layer and hen used o ransform he original concaenaed feaure maps U. Insead of sampling from he original feaure maps using a regular mesh grid G, he sampling grid will be ransformed using θ o T θ (G), which is applied o original feaure maps U o produce he warped oupu feaure maps V. All feaure maps are ransformed channel-wise in he same way. The ransformed feaures V are hen fed ino he ube proposal nework (TPN) shown in figure 2 o generae ube proposals and a pos-tpn sage o furher refine hose proposals. We will describe he deails of differen subcomponens in he following secions.
4 Figure 3. Example of he M = 9 objeciveness scores shown on he righ as he hea maps. Each score hea map corresponds o a specific se of anchor size and aspec raio. e.g. size=40, aspec raio=0.81. I can be seen ha score maps of smaller-sized anchor ubes are more sensiive o smaller objecs whereas score maps of larger-sized anchor ubes are more sensiive o larger objecs Tube Proposal Nework (TPN) TPN will produce many iniial ube proposals. Similar wih faser RCNN s region proposal nework (RPN), he TPN generaes muliple candidae anchor ubes a each pixel locaion. For each anchor ube, he classificaion module predics he objeciveness score of he ube (i.e. probabiliy of having objecs inside he ube) and he offse regression module generaes he posiion offses from he anchor ubes o he ground ruh bounding ubes. Boh he classificaion module and offse regression module inside TPN are sharing he same ransformed feaure maps V of size 256 W H Saionary and Tiled Anchor Tubes Faser R-CNN s RPN uses M predefined anchors wih differen base sizes and aspec raios. Analogously, we sar from a fixed se of anchor ubes. As discussed in he inroducion secion, he ube space is a very high-dimensional space. I is no possible and no necessary o consider every possible ube. Consider a shor video segmen wih T frames (e.g. 8 frames in a F P S = 30 video), an objec s rajecory is usually very smooh and nearly linear in such shor ime period. These quasi-linear ubeles may differ in size, direcion or speed, however hey all live on a low dimensional manifold wih limied degrees of freedom. Because of he quasi-lineariy of objecs shor rajecories, we use sraigh anchor ubes as he iniial candidaes. A naive way o consruc a 3D ube is o have he same bounding box posiions in all frames, which we call saionary ubes T s, because hey correspond o non-moving objecs. These simple ubes, however, are no always desirable, especially when dealing wih videos wih varying viewing poins and camera posiions. Consider he ypical scenes from he UA-DETRAC[26] daase (see sample picures in figure 4), raffic can flow owards an arbirary direcion, and a saionary ube would have very low overlapping wih he rue ube, resuling in a very big offse, which is hard o regress. In order o have beer iniial ube candidaes, we uilize moion vecors (MV) derived from opical flow fields and consruc iled ubes by modifying he iniial saionary ubes using he average moion direcions. Firsly he dominan moion direcion (posiive or negaive) will be decided based on he voes of all he moion vecors a his pixel locaion in his video GoP. If he dominan direcion is posiive, hen he firs half of he bigges moion vecors will be averaged as he mean MV, while if he dominan direcion is negaive, hen he firs half of he smalles moion vecors will be averaged as he resul. For frame index = [1, N], box posiions of he iled ube T can be derived from he sraigh ubes T s as: T [, :, :, k] = T s [, :, :, k] + mv ( 1) where mv is eiher mv x or mv y, he mean moion vecors in x and y direcions a each pixel locaion. [:, :] sands for parallel compuaion for all pixel locaions in he feaure map a he same ime. (T [, w, h, 0], T [, w, h, 1]) is he upper lef corner posiion whereas (T [, w, h, 2], T [, w, h, 3]) is he lower righ corner posiion of he bounding box a (w, h) in frame. During raining or esing, boh saionary and iled ubes will be used as he iniial ube candidaes. Unlike faser R-CNN, we do no pre-define he size and aspec raio for he anchors. Raher we idenify he ypical sizes and aspec raios of he ground ruh bounding boxes in he raining daa using he K-means clusering (inspired by YOLO [17]), and use he cenroids for all clusers as he sizes and aspec raios for he base saionary anchors. We find ha he nework has difficuly regressing o large offse values if saring from poorly chosen iniial anchors and herefore an appropriae iniializaion for anchor sizes, aspec raios and moving direcions is raher imporan Tube Classificaion Module Based on he shared feaure maps V, M objeciveness scores are prediced in he classificaion module for M anchor ubes a each feaure map locaion by a convoluional layer wih kernel size 3 3 (acing on K = 256 feaure maps). Essenially he objeciveness score for each anchor is deermined from a 3 3 K feaure ensor. This module can be viewed as a fully convoluional nework. During raining, ube overlapping, i.e. 3D inersecion over union (3D-IoU) beween he anchor ubes and ground ruh ubes are compued. Anchor ubes wih high 3D-IoU will be seleced as posiive proposals and assigned label +1, whereas anchor ubes wih low 3D-IoU scores (parially overlapped) will be assigned label 1 and he remaining anchor ubes (including pure background) will be ignored. The classificaion module is rained wih he cross-enropy classificaion loss L clst P N wih respec o heir ground ruh label. In figure 2, he ube classificaion module is shown. The hea map is he objeciveness score oupu, where bigger (warmer) values indicae higher probabiliies of conaining
5 objecs in ha locaion. Here we only show one hea map corresponding o one se of anchor ube size. In our implemenaion, we have M = 9 se of base ubes wih differen sizes or aspec raios as in figure Tube Offse Regression Module Anchor ubes will be ranked based on heir objeciveness scores. For ubes wih higher scores, he offses beween he corner posiions of he ubes and ground ruh posiions are compued as he regression arges. The regression module will be rained o generae hese regression arges from he inpu 3 3 K feaures. Given M candidae anchor ubes a each feaure map locaion, he offses will be prediced so as o bend he sraigh anchor ubes ino a shape closer o he ground ruh ubes. Following R-CNN, we use he cener posiion and widh and heigh o parameerize he posiion of a recangular bounding box in each frame. The offses of hese parameers beween he bounding boxes of all frames in an anchor ube (ST) and hose in he ground ruh ube (GT) are our 3D ube regression arge for his anchor ube. We adop he parameerizaion of he 4 coordinaes in [8], bu similar as [17], we normalize he spaial coordinae by he acual widh and heigh of he video frame, so ha he normalized coordinae and hence he 4 parameers are all in he range of [0, 1], which helps he convergence speed. The 3D Tube regression arges for posiive anchor ube i a frame is defined as: ar i, = X g Y g = (GTcener x) (STcener x) (ST w) = (GTcener y) (STcener y) (ST h ) W g H g = log (GTw) (ST w) = log (GT h) (ST h ) By learning o regress o hese arges, he sysem can derive he refined locaions for all anchor ubes ha have high overlap wih ground ruh bounding ubes. We have explored wo ways o wire he ube offse regression module: (1) direcly predicing offses of all frames and (2) uilizing linear inerpolaion. Opion 1: Direcly predic ube parameers. In his srucure, we direcly esimae he offses of every frame. Given a video GoP of lengh T, he regression nework direcly predics 4 T parameers for every ube. As our sraigh ube candidaes are spreading over all pixel locaions, he regression nework is implemened using a convoluion layer wih 4 T M oupu maps. Opion 2: Linear inerpolaion of bounding box offses from offses a wo frames. Despie he fac ha an objec inside a video can have arbirarily complex moions, mos objecs moions are very smooh in real-world videos. Given a shor enough ime period, we can approximae he rajecory of each corner of he bounding ube wih a sraigh line. This is paricularly rue for raffic videos conaining moving vehicles. Moivaed by his observaion, insead of deermining he offses of he corner posiions in all frames, he regression nework only esimaes he offses in he beginning and ending frames, and linearly inerpolae he offses in oher frames. During raining, he regression loss considers he difference beween he rue offses (arges) and he esimaed offses for all frames, which are inerpolaed from he offses in he beginning and ending frames. The advanage of his approach is ha only 8 parameers are esimaed for a given ube, as opposed o 4 T parameers. Compared o direcly esimaing he offse a every frame, his approach also implicily applies a smoohness consrain along he corner rajecories and prevens he nework o generae erraic rajecories. We implemen he inerpolaion using a convoluion layer wih spaial 1 1 kernel. For example, if we have a video segmen wih lengh T = 8 frames, X 1, Y 1, W 1, H 1, X T, Y T, W T, H T are he prediced cener offses and widh and heigh offses a he firs frame ( = 1) and he las frame ( = 8). The offse a ime frame can be easily implemened using a convoluional layer wih kernel marix: K = [ ] 1, 6/7, 5/7, 4/7, 3/7, 2/7, 1/7, 0 0, 1/7, 2/7, 3/7, 4/7, 5/7, 6/7, 1 If we view he firs frame predicion resul (wih 4 channels for 4 parameers) and he las frame predicion resul as 2 separae inpu feaure maps, hen hese 2 feaure maps convolving wih his kernel will produce 8 feaure maps, corresponding o prediced offses for all 8 frames. Noe ha we could implemen higher order inerpolaion by using more han 2 inpu feaure maps and seing he kernel marix accordingly. We could also rain he kernel marix as par of he regression nework o learn he appropriae inerpolaion kernel pos-tpn: Classificaion and Refinemen As shown in Figure 1, he ube proposal nework generaes many ube proposals, whose posiions are deermined by he original candidae ubes and he prediced offses. Proposal ubes wih high objeciveness scores will go hrough a second sage of classificaion and regression. In his sage, ube proposals will be furher classified ino differen classes (such as car, bus, van ec. for UA-DETRAC daase). The posiion offses for he ube will also be refined. Insead of using he feaures pooled from he 3 3 neighborhood on he feaure map as in TPN, feaures specific for he proposal ube regions are pooled using he ube pooling. Tube Pooling. ROI pooling was inroduced in [9], which
6 Figure 4. Examples of he prediced bounding ubes. The firs and hird rows show he bounding box in he middle frames, and he second and fourh rows show he whole bounding ube on he same middle frames wih he cenroids conneced as he rackles. TrackNe is rained o generae bounding ubes o cover boh small and large vehicle sizes wih differen aspec raios. I also generaes more sparse bounding ubes for fas-moving vehicles and denser bounding ubes for slower vehicles or vehicles ha are are furher away. enables differen proposal regions o be described by he same dimensional feaure vecors. In our case, a proposal ube consiss of bounding boxes in differen frames ha are differen in sizes and locaions. Pooling based on one paricular bounding box inside he ube would be deficien. Insead of pooling from he same feaure map muliple imes as in [11], he union of all bounding boxes in a proposal ube is found and feaures covering he union region are exraced from he ransformed feaure maps V. Afer he ube pooling, his feaure vecor is hen fed ino a pos-tpn subnework, which furher assesses is class and refines he ube posiion informaion. There are wo fully conneced (fc) layers and anoher wo fc layers for predicing classificaion scores and offses separaely. In our implemenaion, 256 proposal ubes (half posiive, half negaive) are considered and feaures are pooled from feaure map V using he ube union ROI, leading o a oal of feaures. For he offse regression, similarly wih he TPN regression module, eiher linear inerpolaion or direcly predicing offses a all frames can be chosen Muli-ask Loss o Train he TrackNe Boh classificaion loss and regression loss are used o penalize he proposed ubes. For he TPN, he prediced ob- jeciveness score for each anchor ube will have he crossenropy classificaion loss LclsT P N wih respec o heir ground ruh label. Posiive anchor ubes will have he regression loss LregT P N wih respec o he offse arges. During pos-tpn sage, rue class labels (e.g. background, car, bus, van) are used for he cross-enropy loss Lcls. Boh regression losses LregT P N and Lreg use he smooh l1 loss defined in [7]. The above losses are combined o form he oal loss for a proposal ube: L(si, pi ) = λ1 Lcls (li, si ) + λ2 T X Lreg (ari,, pi, )+ =1 +λ3 LclsT P N (li, si ) + λ4 T X LregT P N (ari,, pi, ) =1 +λ5 Lsmooh (1) where li is he ground ruh label for anchor ube i, si is he prediced objeciveness score or he specific class score for anchor ube i. ari, is he ground ruh arge, a four-parameer vecor represening he offses beween he ground ruh locaion and he locaion of posiive anchor ube i a ime, i.e. ari, = [ Xg, Yg, Wg, Hg ]i. And pi, is he prediced offse vecor, i.e. pi, =
7 AP area: AR num maxdes: T θ (G) VGG LP 0.10: small medium large TrackNe (no VGG, no ransformer, predic all) TrackNe (no VGG, no ransformer, inerpolae) TrackNe (no ransformer) TrackNe Lef view only TrackNe Righ view only TrackNe Fronal view only TrackNe (300 proposals during es) TrackNe (2000 proposals during es) Fine-uned Faser RCNN Table 1. Deecion resuls of rackne and he Faser RCNN[19] baseline on UA-DETRAC daase[26] (evaluaed using COCO API[15]). The average precision (AP)(%) and average recall (AR)(%) raes are repored under differen seings (i.e. IoU hresholds; bounding box area; hresholds on max deecions per image). TrackNe is our final model. All TrackNe varians repored here used 300 op proposals during he es excep he one indicaed wih 2000 proposals. [ X, Y, W, H ] i. When we use he opion of direcly regressing box locaions in each frame, we add a smoohness loss erm L smooh o furher enhance he smoohness (quasi-lineariy) of he ube, which can be derived from he oal variaion of he ube posiions or he average posiion change beween wo frames. The λs conrol he weighs for differen losses. From experimens, we found hese hyper parameers are no very sensiive. We se λ 1,2,3,4 = 1 and λ 5 = in all of he following experimens. The muli-ask loss for raining is defined as: L muli-ask = N ubes i=1 4. Experimens and Evaluaion L(s i, p i ) (2) Daase. Mos of he objec deecion daase are 2D images, such as ImageNe [4], PASCAL VOC [5], Microsof COCO [15], ec. In he ILSVRC2015 challenge, ImageNe inroduced he VID ask wih 30 caegories o arac aenion in he objec deecion in videos. However mos of he videos only conain very few dominan objecs, whereas in real world, muliple objec deecion and muliple objec racking need o be addressed simulaneously. In order o evaluae our model boh on deecion and racking of muliple objecs, he UA-DETRAC daase[26] is used. This daase was inroduced as a benchmark for boh objec deecion and racking, which consiss of challenging video sequences capured from real-world raffic scenes wih differen viewing angles. We spli he daase ino 45 raining and 15 esing videos and made sure ha boh raining and esing covers all differen camera views. The video lenghs range from around 700 frames o around 2500 frames. This daase spans a variey of differen weahers such as sunny, cloudy, rainy and nigh. We did no sample he daase o ensure ha he raining and esing se each includes samples aken under differen weaher condiions. However, he rained model urns ou o be prey robus o differen weaher condiions excep for a few nigh videos, which have very differen lighing condiions han ohers. Wha s more, he cameras used o capure nigh videos also had ou-of-focus problems in lowligh condiion and resuled in blurry images which may have caused he performance drop. Training. During raining, 8 consecuive video frames are randomly seleced from he raining se wihou any daa augmenaion. We firs fine-une he VGG branch alone under he framework of faser R-CNN [19] using he whole raining se as a warmup. The proposed TrackNe is hen rained wih he VGG branch frozen. We also fine-une he las convoluion layer (conv5a, conv5b) in he C3D backbone. The iniial learning rae was and was reduced by 10 imes afer 10K ieraions. We used Adam opimizer and rained he model for oal 50K ieraions. Fine-une Faser RCNN as he baseline. We fine-une whole Faser RCNN model wih VGG as base nework on he same raining daase for 70k ieraions. Afer fineuning, he faser RCNN achieved a very high recall rae on he UA-DETRAC daase for vehicle deecion and is used as a very srong baseline. We evaluae he rackne using wo ses of merics: (1) objec deecion performance in each frame using he COCO API[15], (2) objec racking performance using he MOT merics[1]. Some visual deecion and racking resuls are in Figure 4. Objec Deecion Performance. In order o compare rackne wih he 2D deecion baseline, we consider all bounding ubes generaed by rackne and evaluae all bounding boxes in each frame. Table 1 shows he average precision (AP) and average recall (AR) raes for differen evaluaion condiions. The crieria of labeling a deecion as a rue mach is sricer when one increases he IoU hreshold. From he able we can see ha: (1) rackne has ouperformed he srong baseline by approximaely 10.7% in erms of map (31.53% versus 20.77%). I is paricularly beer for deecing larger objecs (41.29% versus 25.54%).
8 Rcll Prcn FAR MOTA MOTP TrackNe Lef view only TrackNe Righ view only TrackNe Fronal view only TrackNe Fine-uned Faser RCNN+SORT[2] Table 2. Tracking resuls evaluaed using MOT merics[1]: recall, precision, FAR (false posiive rae), MOTA and MOTP raes(%) wih maching hreshold (euclidean disance) of 0.8 and 1.0. (2) rackne has more confidence when limiing he maximum number of deecions o be only 1. I ouperformed he baseline by around 9.4% (26.11% versus 16.64%). (3) rackne has a lower recall rae compared wih he baseline, especially when he overlapping crieria is sricer (49.61% versus 79.69%). The proposed model has uilized he concaenaion of boh spaial-emporal feaures and appearance feaures. I has higher precision (less false posiives) for objec deecion. Because a bounding box is generaed only as par of a deeced bounding ube. This however also has a consequence of reducing he recall rae a he frame level. Objec Tracking Performance. In order o compare he racking performance, he bounding boxes produced by he baseline are linked using a real-ime associaor(racker) SORT [2]. Since he rackne already produces rackle, no associaion needs o be done wihin one GoP for N frames. Tubes are simply conneced based on he 3D-IoU across GoPs o formulae longer racks. Afer associaion, he racks generaed from faser RCNN+SORT and he rackne are compared using he muliple objec racking merics[1]. Performances under maching hreshold of 0.8 and 1.0 are repored in able 2. The proposed model ouperformed he faser RCNN+SORT baseline by 15% in erms of racking precision (87.3% versus 72.0%) and 1.1% in MOTA score, which agrees wih he observaions ha, rackne has a much lower false posiive rae by uilizing boh moion and appearance informaion. Noe ha rackne has lower MOTP score which implies bigger disances beween he ground ruh racks and he mached racks. This is expeced given he feaure inpu is in a GoP level, no in a frame level Ablaion Analysis In order o undersand he roles which he major design componens are playing, differen versions of rack- Ne are rained and esed using he same daases. In able 1, we show he comparisons beween rackne wihou ransformer, rackne wihou VGG feaure concaenaion, rackne wihou ransformer or VGG in eiher predicing all mode or linear inerpolaion (LP) mode and he full-version rackne. We furher spli he raining and esing daase based on he viewing angles ino lef view, righ view and fronal view, and show he performances when raining and esing only on he sub-daases separaely. From he able i is clearly ha he performances go boosed afer VGG concaenaion and insering spaial ransformer. The linear inerpolaion (LP) has convenienly served as an implici smoohness regularizaion and improved he performance wih even less parameers. 5. Conclusion We presen he rackne, which can deec and rack muliple objecs in videos joinly by generaing bounding ubes. Uilizing he spaial-emporal feaures deeced by a 3D convoluional nework in addiion o he spaial feaures from he VGG nework, he rackne generaes ube proposals and furher classify hem and refine heir locaions. TrackNe consiss of hree sages: (1) feaure exracion and spaial ransformaion, (2) Tube proposal nework(tpn), and (3)pos-TPN classificaion and refinemen. We explored several ways o do ube proposal and offse regression. TrackNe was rained and esed on he challenging raffic video daase UA-DETRAC and achieved very promising resuls. In fuure work we would like o improve rackne in erms of more precise localizaion. Pooling feaures from muliple scales in spaial and emporal domain will be esed and linear inerpolaion srucure will be relaxed o allow more complex moion paerns. Our curren experimen resuls show ha C3D feaures are no sufficien o accuraely locae objecs. We suspec ha i is possible o design a modified 3D convoluional nework ha can capure more deailed spaial informaion han he C3D archiecure, so ha rackne can provide good performance wihou requiring a separae 2D CNN for feaure exracion. This will be anoher direcion for our fuure research. References [1] K. Bernardin and R. Siefelhagen. Evaluaing muliple objec racking performance: he clear mo merics. EURASIP Journal on Image and Video Processing, 2008(1):246309, , 8 [2] A. Bewley, Z. Ge, L. O, F. Ramos, and B. Upcrof. Simple online and realime racking. In Image Processing (ICIP), 2016 IEEE Inernaional Conference on, pages IEEE, [3] N. Dalal and B. Triggs. Hisograms of oriened gradiens for human deecion. In Compuer Vision and Paern Recogni-
9 ion, CVPR IEEE Compuer Sociey Conference on, volume 1, pages IEEE, [4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- Fei. Imagene: A large-scale hierarchical image daabase. In Compuer Vision and Paern Recogniion, CVPR IEEE Conference on, pages IEEE, , 7 [5] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual objec classes (voc) challenge. Inernaional journal of compuer vision, 88(2): , [6] P. F. Felzenszwalb, R. B. Girshick, D. McAlleser, and D. Ramanan. Objec deecion wih discriminaively rained parbased models. IEEE ransacions on paern analysis and machine inelligence, 32(9): , [7] R. Girshick. Fas r-cnn. In Proceedings of he IEEE Inernaional Conference on Compuer Vision, pages , , 6 [8] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feaure hierarchies for accurae objec deecion and semanic segmenaion. In Proceedings of he IEEE conference on compuer vision and paern recogniion, pages , , 5 [9] K. He, X. Zhang, S. Ren, and J. Sun. Spaial pyramid pooling in deep convoluional neworks for visual recogniion. In European Conference on Compuer Vision, pages Springer, [10] M. Jaderberg, K. Simonyan, A. Zisserman, e al. Spaial ransformer neworks. In Advances in Neural Informaion Processing Sysems, pages , , 3 [11] K. Kang, H. Li, T. Xiao, W. Ouyang, J. Yan, X. Liu, and X. Wang. Objec deecion in videos wih ubele proposal neworks. arxiv preprin arxiv: , , 3, 6 [12] K. Kang, H. Li, J. Yan, X. Zeng, B. Yang, T. Xiao, C. Zhang, Z. Wang, R. Wang, X. Wang, e al. T-cnn: Tubeles wih convoluional neural neworks for objec deecion from videos. arxiv preprin arxiv: , , 3 [13] K. Kang, W. Ouyang, H. Li, and X. Wang. Objec deecion from video ubeles wih convoluional neural neworks. In Proceedings of he IEEE Conference on Compuer Vision and Paern Recogniion, pages , , 3 [14] A. Krizhevsky, I. Suskever, and G. E. Hinon. Imagene classificaion wih deep convoluional neural neworks. In Advances in neural informaion processing sysems, pages , [15] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zinick. Microsof coco: Common objecs in conex. In European Conference on Compuer Vision, pages Springer, [16] W. Luo, X. Zhao, and T.-K. Kim. Muliple objec racking: A review. arxiv preprin arxiv: , 1, [17] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-ime objec deecion. In Proceedings of he IEEE Conference on Compuer Vision and Paern Recogniion, pages , , 4, 5 [18] J. Redmon and A. Farhadi. Yolo9000: Beer, faser, sronger. arxiv preprin arxiv: , [19] S. Ren, K. He, R. Girshick, and J. Sun. Faser r-cnn: Towards real-ime objec deecion wih region proposal neworks. In Advances in neural informaion processing sysems, pages 91 99, , 7 [20] Z. Shou, D. Wang, and S.-F. Chang. Temporal acion localizaion in unrimmed videos via muli-sage cnns. In Proceedings of he IEEE Conference on Compuer Vision and Paern Recogniion, pages , [21] K. Simonyan and A. Zisserman. Very deep convoluional neworks for large-scale image recogniion. arxiv preprin arxiv: , [22] C. Szegedy, W. Liu, Y. Jia, P. Sermane, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper wih convoluions. In Proceedings of he IEEE Conference on Compuer Vision and Paern Recogniion, pages 1 9, [23] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spaioemporal feaures wih 3d convoluional neworks. In Proceedings of he IEEE Inernaional Conference on Compuer Vision, pages , , 3 [24] J. R. Uijlings, K. E. Van De Sande, T. Gevers, and A. W. Smeulders. Selecive search for objec recogniion. Inernaional journal of compuer vision, 104(2): , , 3 [25] H. Wang and C. Schmid. Acion recogniion wih improved rajecories. In Proceedings of he IEEE Inernaional Conference on Compuer Vision, pages , [26] L. Wen, D. Du, Z. Cai, Z. Lei, M. Chang, H. Qi, J. Lim, M.- H. Yang, and S. Lyu. Derac: A new benchmark and proocol for muli-objec racking. arxiv preprin arxiv: , , 4, 7
EECS 487: Interactive Computer Graphics
EECS 487: Ineracive Compuer Graphics Lecure 7: B-splines curves Raional Bézier and NURBS Cubic Splines A represenaion of cubic spline consiss of: four conrol poins (why four?) hese are compleely user specified
More informationA Matching Algorithm for Content-Based Image Retrieval
A Maching Algorihm for Conen-Based Image Rerieval Sue J. Cho Deparmen of Compuer Science Seoul Naional Universiy Seoul, Korea Absrac Conen-based image rerieval sysem rerieves an image from a daabase using
More informationSTEREO PLANE MATCHING TECHNIQUE
STEREO PLANE MATCHING TECHNIQUE Commission III KEY WORDS: Sereo Maching, Surface Modeling, Projecive Transformaion, Homography ABSTRACT: This paper presens a new ype of sereo maching algorihm called Sereo
More informationImplementing Ray Casting in Tetrahedral Meshes with Programmable Graphics Hardware (Technical Report)
Implemening Ray Casing in Terahedral Meshes wih Programmable Graphics Hardware (Technical Repor) Marin Kraus, Thomas Erl March 28, 2002 1 Inroducion Alhough cell-projecion, e.g., [3, 2], and resampling,
More informationLearning in Games via Opponent Strategy Estimation and Policy Search
Learning in Games via Opponen Sraegy Esimaion and Policy Search Yavar Naddaf Deparmen of Compuer Science Universiy of Briish Columbia Vancouver, BC yavar@naddaf.name Nando de Freias (Supervisor) Deparmen
More informationWheelchair-user Detection Combined with Parts-based Tracking
Wheelchair-user Deecion Combined wih Pars-based Tracking Ukyo Tanikawa 1, Yasuomo Kawanishi 1, Daisuke Deguchi 2,IchiroIde 1, Hiroshi Murase 1 and Ryo Kawai 3 1 Graduae School of Informaion Science, Nagoya
More informationImage segmentation. Motivation. Objective. Definitions. A classification of segmentation techniques. Assumptions for thresholding
Moivaion Image segmenaion Which pixels belong o he same objec in an image/video sequence? (spaial segmenaion) Which frames belong o he same video sho? (emporal segmenaion) Which frames belong o he same
More informationVisual Indoor Localization with a Floor-Plan Map
Visual Indoor Localizaion wih a Floor-Plan Map Hang Chu Dep. of ECE Cornell Universiy Ihaca, NY 14850 hc772@cornell.edu Absrac In his repor, a indoor localizaion mehod is presened. The mehod akes firsperson
More informationCAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL
CAMERA CALIBRATION BY REGISTRATION STEREO RECONSTRUCTION TO 3D MODEL Klečka Jan Docoral Degree Programme (1), FEEC BUT E-mail: xkleck01@sud.feec.vubr.cz Supervised by: Horák Karel E-mail: horak@feec.vubr.cz
More informationImproved TLD Algorithm for Face Tracking
Absrac Improved TLD Algorihm for Face Tracking Huimin Li a, Chaojing Yu b and Jing Chen c Chongqing Universiy of Poss and Telecommunicaions, Chongqing 400065, China a li.huimin666@163.com, b 15023299065@163.com,
More informationRobust Multi-view Face Detection Using Error Correcting Output Codes
Robus Muli-view Face Deecion Using Error Correcing Oupu Codes Hongming Zhang,2, Wen GaoP P, Xilin Chen 2, Shiguang Shan 2, and Debin Zhao Deparmen of Compuer Science and Engineering, Harbin Insiue of Technolog
More informationMORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES
MORPHOLOGICAL SEGMENTATION OF IMAGE SEQUENCES B. MARCOTEGUI and F. MEYER Ecole des Mines de Paris, Cenre de Morphologie Mahémaique, 35, rue Sain-Honoré, F 77305 Fonainebleau Cedex, France Absrac. In image
More informationA Face Detection Method Based on Skin Color Model
A Face Deecion Mehod Based on Skin Color Model Dazhi Zhang Boying Wu Jiebao Sun Qinglei Liao Deparmen of Mahemaics Harbin Insiue of Technology Harbin China 150000 Zhang_dz@163.com mahwby@hi.edu.cn sunjiebao@om.com
More informationAnalysis of Various Types of Bugs in the Object Oriented Java Script Language Coding
Indian Journal of Science and Technology, Vol 8(21), DOI: 10.17485/ijs/2015/v8i21/69958, Sepember 2015 ISSN (Prin) : 0974-6846 ISSN (Online) : 0974-5645 Analysis of Various Types of Bugs in he Objec Oriened
More informationCOSC 3213: Computer Networks I Chapter 6 Handout # 7
COSC 3213: Compuer Neworks I Chaper 6 Handou # 7 Insrucor: Dr. Marvin Mandelbaum Deparmen of Compuer Science York Universiy F05 Secion A Medium Access Conrol (MAC) Topics: 1. Muliple Access Communicaions:
More informationLAMP: 3D Layered, Adaptive-resolution and Multiperspective Panorama - a New Scene Representation
Submission o Special Issue of CVIU on Model-based and Image-based 3D Scene Represenaion for Ineracive Visualizaion LAMP: 3D Layered, Adapive-resoluion and Muliperspecive Panorama - a New Scene Represenaion
More informationIntentSearch:Capturing User Intention for One-Click Internet Image Search
JOURNAL OF L A T E X CLASS FILES, VOL. 6, NO. 1, JANUARY 2010 1 InenSearch:Capuring User Inenion for One-Click Inerne Image Search Xiaoou Tang, Fellow, IEEE, Ke Liu, Jingyu Cui, Suden Member, IEEE, Fang
More informationProbabilistic Detection and Tracking of Motion Discontinuities
Probabilisic Deecion and Tracking of Moion Disconinuiies Michael J. Black David J. Flee Xerox Palo Alo Research Cener 3333 Coyoe Hill Road Palo Alo, CA 94304 fblack,fleeg@parc.xerox.com hp://www.parc.xerox.com/fblack,fleeg/
More informationMODEL BASED TECHNIQUE FOR VEHICLE TRACKING IN TRAFFIC VIDEO USING SPATIAL LOCAL FEATURES
MODEL BASED TECHNIQUE FOR VEHICLE TRACKING IN TRAFFIC VIDEO USING SPATIAL LOCAL FEATURES Arun Kumar H. D. 1 and Prabhakar C. J. 2 1 Deparmen of Compuer Science, Kuvempu Universiy, Shimoga, India ABSTRACT
More informationMulti-Target Detection and Tracking from a Single Camera in Unmanned Aerial Vehicles (UAVs)
2016 IEEE/RSJ Inernaional Conference on Inelligen Robos and Sysems (IROS) Daejeon Convenion Cener Ocober 9-14, 2016, Daejeon, Korea Muli-Targe Deecion and Tracking from a Single Camera in Unmanned Aerial
More informationVideo Content Description Using Fuzzy Spatio-Temporal Relations
Proceedings of he 4s Hawaii Inernaional Conference on Sysem Sciences - 008 Video Conen Descripion Using Fuzzy Spaio-Temporal Relaions rchana M. Rajurkar *, R.C. Joshi and Sananu Chaudhary 3 Dep of Compuer
More informationJoint Feature Learning With Robust Local Ternary Pattern for Face Recognition
Join Feaure Learning Wih Robus Local Ternary Paern for Face Recogniion Yuvaraju.M 1, Shalini.S 1 Assisan Professor, Deparmen of Elecrical and Elecronics Engineering, Anna Universiy Regional Campus, Coimbaore,
More informationA Hierarchical Object Recognition System Based on Multi-scale Principal Curvature Regions
A Hierarchical Objec Recogniion Sysem Based on Muli-scale Principal Curvaure Regions Wei Zhang, Hongli Deng, Thomas G Dieerich and Eric N Morensen School of Elecrical Engineering and Compuer Science Oregon
More informationWe are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors
We are InechOpen, he world s leading publisher of Open Access books Buil by scieniss, for scieniss 4,000 116,000 120M Open access books available Inernaional auhors and ediors Downloads Our auhors are
More informationGender Classification of Faces Using Adaboost*
Gender Classificaion of Faces Using Adaboos* Rodrigo Verschae 1,2,3, Javier Ruiz-del-Solar 1,2, and Mauricio Correa 1,2 1 Deparmen of Elecrical Engineering, Universidad de Chile 2 Cener for Web Research,
More informationVisual Perception as Bayesian Inference. David J Fleet. University of Toronto
Visual Percepion as Bayesian Inference David J Flee Universiy of Torono Basic rules of probabiliy sum rule (for muually exclusive a ): produc rule (condiioning): independence (def n ): Bayes rule: marginalizaion:
More informationEvaluation and Improvement of Region-based Motion Segmentation
Evaluaion and Improvemen of Region-based Moion Segmenaion Mark Ross Universiy Koblenz-Landau, Insiue of Compuaional Visualisics, Universiässraße 1, 56070 Koblenz, Germany Email: ross@uni-koblenz.de Absrac
More informationCENG 477 Introduction to Computer Graphics. Modeling Transformations
CENG 477 Inroducion o Compuer Graphics Modeling Transformaions Modeling Transformaions Model coordinaes o World coordinaes: Model coordinaes: All shapes wih heir local coordinaes and sies. world World
More informationVideo-Based Face Recognition Using Probabilistic Appearance Manifolds
Video-Based Face Recogniion Using Probabilisic Appearance Manifolds Kuang-Chih Lee Jeffrey Ho Ming-Hsuan Yang David Kriegman klee10@uiuc.edu jho@cs.ucsd.edu myang@honda-ri.com kriegman@cs.ucsd.edu Compuer
More informationA Fast Stereo-Based Multi-Person Tracking using an Approximated Likelihood Map for Overlapping Silhouette Templates
A Fas Sereo-Based Muli-Person Tracking using an Approximaed Likelihood Map for Overlapping Silhouee Templaes Junji Saake Jun Miura Deparmen of Compuer Science and Engineering Toyohashi Universiy of Technology
More informationRobust LSTM-Autoencoders for Face De-Occlusion in the Wild
IEEE TRANSACTIONS ON IMAGE PROCESSING, DRAFT 1 Robus LSTM-Auoencoders for Face De-Occlusion in he Wild Fang Zhao, Jiashi Feng, Jian Zhao, Wenhan Yang, Shuicheng Yan arxiv:1612.08534v1 [cs.cv] 27 Dec 2016
More informationMoving Object Detection Using MRF Model and Entropy based Adaptive Thresholding
Moving Objec Deecion Using MRF Model and Enropy based Adapive Thresholding Badri Narayan Subudhi, Pradipa Kumar Nanda and Ashish Ghosh Machine Inelligence Uni, Indian Saisical Insiue, Kolkaa, 700108, India,
More informationPART 1 REFERENCE INFORMATION CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONITOR
. ~ PART 1 c 0 \,).,,.,, REFERENCE NFORMATON CONTROL DATA 6400 SYSTEMS CENTRAL PROCESSOR MONTOR n CONTROL DATA 6400 Compuer Sysems, sysem funcions are normally handled by he Monior locaed in a Peripheral
More informationDetection Tracking and Recognition of Human Poses for a Real Time Spatial Game
Deecion Tracking and Recogniion of Human Poses for a Real Time Spaial Game Feifei Huo, Emile A. Hendriks, A.H.J. Oomes Delf Universiy of Technology The Neherlands f.huo@udelf.nl Pascal van Beek, Remco
More informationScale Recovery for Monocular Visual Odometry Using Depth Estimated with Deep Convolutional Neural Fields
Scale Recovery for Monocular Visual Odomery Using Deph Esimaed wih Deep Convoluional Neural Fields Xiaochuan Yin, Xiangwei Wang, Xiaoguo Du, Qijun Chen Tongji Universiy yinxiaochuan@homail.com,wangxiangwei.cpp@gmail.com,
More informationResearch Article Auto Coloring with Enhanced Character Registration
Compuer Games Technology Volume 2008, Aricle ID 35398, 7 pages doi:0.55/2008/35398 Research Aricle Auo Coloring wih Enhanced Characer Regisraion Jie Qiu, Hock Soon Seah, Feng Tian, Quan Chen, Zhongke Wu,
More informationReal Time Integral-Based Structural Health Monitoring
Real Time Inegral-Based Srucural Healh Monioring The nd Inernaional Conference on Sensing Technology ICST 7 J. G. Chase, I. Singh-Leve, C. E. Hann, X. Chen Deparmen of Mechanical Engineering, Universiy
More informationA Neural Network Approach to Missing Marker Reconstruction
A Neural Nework Approach o Missing Marker Reconsrucion Taras Kucherenko Hedvig Kjellsröm Deparmen of Roboics, Percepion, and Learning KTH Royal Insiue of Technology, Sockholm, Sweden Email: {arask,hedvig}@kh.se
More informationNEWTON S SECOND LAW OF MOTION
Course and Secion Dae Names NEWTON S SECOND LAW OF MOTION The acceleraion of an objec is defined as he rae of change of elociy. If he elociy changes by an amoun in a ime, hen he aerage acceleraion during
More informationarxiv: v1 [cs.cv] 25 Apr 2017
Sudheendra Vijayanarasimhan Susanna Ricco svnaras@google.com ricco@google.com... arxiv:1704.07804v1 [cs.cv] 25 Apr 2017 SfM-Ne: Learning of Srucure and Moion from Video Cordelia Schmid Esimaed deph, camera
More informationSam knows that his MP3 player has 40% of its battery life left and that the battery charges by an additional 12 percentage points every 15 minutes.
8.F Baery Charging Task Sam wans o ake his MP3 player and his video game player on a car rip. An hour before hey plan o leave, he realized ha he forgo o charge he baeries las nigh. A ha poin, he plugged
More informationWeighted Voting in 3D Random Forest Segmentation
Weighed Voing in 3D Random Fores Segmenaion M. Yaqub,, P. Mahon 3, M. K. Javaid, C. Cooper, J. A. Noble NDORMS, Universiy of Oxford, IBME, Deparmen of Engineering Science, Universiy of Oxford, 3 MRC Epidemiology
More informationSimultaneous Localization and Mapping with Stereo Vision
Simulaneous Localizaion and Mapping wih Sereo Vision Mahew N. Dailey Compuer Science and Informaion Managemen Asian Insiue of Technology Pahumhani, Thailand Email: mdailey@ai.ac.h Manukid Parnichkun Mecharonics
More informationMOTION DETECTORS GRAPH MATCHING LAB PRE-LAB QUESTIONS
NME: TE: LOK: MOTION ETETORS GRPH MTHING L PRE-L QUESTIONS 1. Read he insrucions, and answer he following quesions. Make sure you resae he quesion so I don hae o read he quesion o undersand he answer..
More informationSTRING DESCRIPTIONS OF DATA FOR DISPLAY*
SLAC-PUB-383 January 1968 STRING DESCRIPTIONS OF DATA FOR DISPLAY* J. E. George and W. F. Miller Compuer Science Deparmen and Sanford Linear Acceleraor Cener Sanford Universiy Sanford, California Absrac
More informationDynamic Route Planning and Obstacle Avoidance Model for Unmanned Aerial Vehicles
Volume 116 No. 24 2017, 315-329 ISSN: 1311-8080 (prined version); ISSN: 1314-3395 (on-line version) url: hp://www.ijpam.eu ijpam.eu Dynamic Roue Planning and Obsacle Avoidance Model for Unmanned Aerial
More information4.1 3D GEOMETRIC TRANSFORMATIONS
MODULE IV MCA - 3 COMPUTER GRAPHICS ADMN 29- Dep. of Compuer Science And Applicaions, SJCET, Palai 94 4. 3D GEOMETRIC TRANSFORMATIONS Mehods for geomeric ransformaions and objec modeling in hree dimensions
More informationSpline Curves. Color Interpolation. Normal Interpolation. Last Time? Today. glshademodel (GL_SMOOTH); Adjacency Data Structures. Mesh Simplification
Las Time? Adjacency Daa Srucures Spline Curves Geomeric & opologic informaion Dynamic allocaion Efficiency of access Mesh Simplificaion edge collapse/verex spli geomorphs progressive ransmission view-dependen
More informationMATH Differential Equations September 15, 2008 Project 1, Fall 2008 Due: September 24, 2008
MATH 5 - Differenial Equaions Sepember 15, 8 Projec 1, Fall 8 Due: Sepember 4, 8 Lab 1.3 - Logisics Populaion Models wih Harvesing For his projec we consider lab 1.3 of Differenial Equaions pages 146 o
More informationIn Proceedings of CVPR '96. Structure and Motion of Curved 3D Objects from. using these methods [12].
In Proceedings of CVPR '96 Srucure and Moion of Curved 3D Objecs from Monocular Silhouees B Vijayakumar David J Kriegman Dep of Elecrical Engineering Yale Universiy New Haven, CT 652-8267 Jean Ponce Compuer
More informationAn Improved Square-Root Nyquist Shaping Filter
An Improved Square-Roo Nyquis Shaping Filer fred harris San Diego Sae Universiy fred.harris@sdsu.edu Sridhar Seshagiri San Diego Sae Universiy Seshigar.@engineering.sdsu.edu Chris Dick Xilinx Corp. chris.dick@xilinx.com
More informationGauss-Jordan Algorithm
Gauss-Jordan Algorihm The Gauss-Jordan algorihm is a sep by sep procedure for solving a sysem of linear equaions which may conain any number of variables and any number of equaions. The algorihm is carried
More informationThe Impact of Product Development on the Lifecycle of Defects
The Impac of Produc Developmen on he Lifecycle of Rudolf Ramler Sofware Compeence Cener Hagenberg Sofware Park 21 A-4232 Hagenberg, Ausria +43 7236 3343 872 rudolf.ramler@scch.a ABSTRACT This paper invesigaes
More informationDetection of salient objects with focused attention based on spatial and temporal coherence
ricle Informaion Processing Technology pril 2011 Vol.56 No.10: 1055 1062 doi: 10.1007/s11434-010-4387-1 SPECIL TOPICS: Deecion of salien objecs wih focused aenion based on spaial and emporal coherence
More informationRao-Blackwellized Particle Filtering for Probing-Based 6-DOF Localization in Robotic Assembly
MITSUBISHI ELECTRIC RESEARCH LABORATORIES hp://www.merl.com Rao-Blackwellized Paricle Filering for Probing-Based 6-DOF Localizaion in Roboic Assembly Yuichi Taguchi, Tim Marks, Haruhisa Okuda TR1-8 June
More informationRobust Visual Tracking for Multiple Targets
Robus Visual Tracking for Muliple Targes Yizheng Cai, Nando de Freias, and James J. Lile Universiy of Briish Columbia, Vancouver, B.C., Canada, V6T 1Z4 {yizhengc, nando, lile}@cs.ubc.ca Absrac. We address
More informationDesign Alternatives for a Thin Lens Spatial Integrator Array
Egyp. J. Solids, Vol. (7), No. (), (004) 75 Design Alernaives for a Thin Lens Spaial Inegraor Array Hala Kamal *, Daniel V azquez and Javier Alda and E. Bernabeu Opics Deparmen. Universiy Compluense of
More informationObject detection with CNNs
Object detection with CNNs 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before CNNs After CNNs 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year Region proposals
More informationarxiv: v1 [cs.cv] 4 Jun 2018
Cube Padding for Weakly-Supervised Saliency Predicion in 360 Videos Hsien-Tzu Cheng 1, Chun-Hung Chao 1, Jin-Dong Dong 1, Hao-Kai Wen, Tyng-Luh Liu 3, Min Sun 1 1 Naional Tsing Hua Universiy Taiwan AI
More informationA Bayesian Approach to Video Object Segmentation via Merging 3D Watershed Volumes
A Bayesian Approach o Video Objec Segmenaion via Merging 3D Waershed Volumes Yu-Pao Tsai 1,3, Chih-Chuan Lai 1,2, Yi-Ping Hung 1,2, and Zen-Chung Shih 3 1 Insiue of Informaion Science, Academia Sinica,
More informationObject Trajectory Proposal via Hierarchical Volume Grouping
Objec Trajecory Proposal via Hierarchical Volume Grouping Xu Sun 1, Yuanian Wang 1, Tongwei Ren 1,, Zhi Liu 2, Zheng-Jun Zha 3, and Gangshan Wu 1 1 Sae Key Laboraory for Novel Sofware Technology, Nanjing
More informationarxiv: v2 [cs.cv] 20 May 2018
Sereoscopic Neural Syle Transfer Dongdong Chen 1 Lu Yuan 2, Jing Liao 2, Nenghai Yu 1, Gang Hua 2 1 Universiy of Science and Technology of China 2 Microsof Research cd722522@mail.usc.edu.cn, {jliao, luyuan,
More informationMobile Robots Mapping
Mobile Robos Mapping 1 Roboics is Easy conrol behavior percepion modelling domain model environmen model informaion exracion raw daa planning ask cogniion reasoning pah planning navigaion pah execuion
More informationStereoscopic Neural Style Transfer
Sereoscopic Neural Syle Transfer Dongdong Chen 1 Lu Yuan 2, Jing Liao 2, Nenghai Yu 1, Gang Hua 2 1 Universiy of Science and Technology of China 2 Microsof Research cd722522@mail.usc.edu.cn, {luyuan,jliao}@microsof.com,
More informationMultiple View Discriminative Appearance Modeling with IMCMC for Distributed Tracking
Muliple View Discriminaive ing wih IMCMC for Disribued Tracking Sanhoshkumar Sunderrajan, B.S. Manjunah Deparmen of Elecrical and Compuer Engineering Universiy of California, Sana Barbara {sanhosh,manj}@ece.ucsb.edu
More informationRobot localization under perceptual aliasing conditions based on laser reflectivity using particle filter
Robo localizaion under percepual aliasing condiions based on laser refleciviy using paricle filer DongXiang Zhang, Ryo Kurazume, Yumi Iwashia, Tsuomu Hasegawa Absrac Global localizaion, which deermines
More informationReal-Time Non-Rigid Multi-Frame Depth Video Super-Resolution
Real-Time Non-Rigid Muli-Frame Deph Video Super-Resoluion Kassem Al Ismaeil 1, Djamila Aouada 1, Thomas Solignac 2, Bruno Mirbach 2, Björn Oersen 1 1 Inerdisciplinary Cenre for Securiy, Reliabiliy, and
More informationReal-time 2D Video/3D LiDAR Registration
Real-ime 2D Video/3D LiDAR Regisraion C. Bodenseiner Fraunhofer IOSB chrisoph.bodenseiner@iosb.fraunhofer.de M. Arens Fraunhofer IOSB michael.arens@iosb.fraunhofer.de Absrac Progress in LiDAR scanning
More informationA time-space consistency solution for hardware-in-the-loop simulation system
Inernaional Conference on Advanced Elecronic Science and Technology (AEST 206) A ime-space consisency soluion for hardware-in-he-loop simulaion sysem Zexin Jiang a Elecric Power Research Insiue of Guangdong
More informationA High-Speed Adaptive Multi-Module Structured Light Scanner
A High-Speed Adapive Muli-Module Srucured Ligh Scanner Andreas Griesser 1 Luc Van Gool 1,2 1 Swiss Fed.Ins.of Techn.(ETH) 2 Kaholieke Univ. Leuven D-ITET/Compuer Vision Lab ESAT/VISICS Zürich, Swizerland
More informationA METHOD OF MODELING DEFORMATION OF AN OBJECT EMPLOYING SURROUNDING VIDEO CAMERAS
A METHOD OF MODELING DEFORMATION OF AN OBJECT EMLOYING SURROUNDING IDEO CAMERAS Joo Kooi TAN, Seiji ISHIKAWA Deparmen of Mechanical and Conrol Engineering Kushu Insiue of Technolog, Japan ehelan@is.cnl.kuech.ac.jp,
More informationUpper Body Tracking for Human-Machine Interaction with a Moving Camera
The 2009 IEEE/RSJ Inernaional Conference on Inelligen Robos and Sysems Ocober -5, 2009 S. Louis, USA Upper Body Tracking for Human-Machine Ineracion wih a Moving Camera Yi-Ru Chen, Cheng-Ming Huang, and
More informationAML710 CAD LECTURE 11 SPACE CURVES. Space Curves Intrinsic properties Synthetic curves
AML7 CAD LECTURE Space Curves Inrinsic properies Synheic curves A curve which may pass hrough any region of hreedimensional space, as conrased o a plane curve which mus lie on a single plane. Space curves
More informationOptimal Crane Scheduling
Opimal Crane Scheduling Samid Hoda, John Hooker Laife Genc Kaya, Ben Peerson Carnegie Mellon Universiy Iiro Harjunkoski ABB Corporae Research EWO - 13 November 2007 1/16 Problem Track-mouned cranes move
More informationImproving Occupancy Grid FastSLAM by Integrating Navigation Sensors
Improving Occupancy Grid FasSLAM by Inegraing Navigaion Sensors Chrisopher Weyers Sensors Direcorae Air Force Research Laboraory Wrigh-Paerson AFB, OH 45433 Gilber Peerson Deparmen of Elecrical and Compuer
More informationREGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION
REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological
More informationReal time 3D face and facial feature tracking
J Real-Time Image Proc (2007) 2:35 44 DOI 10.1007/s11554-007-0032-2 ORIGINAL RESEARCH PAPER Real ime 3D face and facial feaure racking Fadi Dornaika Æ Javier Orozco Received: 23 November 2006 / Acceped:
More informationMichiel Helder and Marielle C.T.A Geurts. Hoofdkantoor PTT Post / Dutch Postal Services Headquarters
SHORT TERM PREDICTIONS A MONITORING SYSTEM by Michiel Helder and Marielle C.T.A Geurs Hoofdkanoor PTT Pos / Duch Posal Services Headquarers Keywords macro ime series shor erm predicions ARIMA-models faciliy
More informationMulti-Scale Object Candidates for Generic Object Tracking in Street Scenes
Muli-Scale Objec Candidaes for Generic Objec Tracking in Sree Scenes Aljoša Ošep, Alexander Hermans, Francis Engelmann, Dirk Klosermann, Markus Mahias and Basian Leibe Absrac Mos vision based sysems for
More informationA Framework for Applying Point Clouds Grabbed by Multi-Beam LIDAR in Perceiving the Driving Environment
Sensors 215, 15, 21931-21956; doi:1.339/s15921931 Aricle OPEN ACCESS sensors ISSN 1424-822 www.mdpi.com/journal/sensors A Framewor for Applying Poin Clouds Grabbed by Muli-Beam LIDAR in Perceiving he Driving
More informationDefinition and examples of time series
Definiion and examples of ime series A ime series is a sequence of daa poins being recorded a specific imes. Formally, le,,p be a probabiliy space, and T an index se. A real valued sochasic process is
More informationMulti-View 3D Human Tracking in Crowded Scenes
Proceedings of he Thirieh AAAI Conference on Arificial Inelligence (AAAI-16) Muli-View 3D Human Tracking in Crowded Scenes Xiaobai Liu Deparmen of Compuer Science, San Diego Sae Universiy GMCS Building,
More informationMulti-camera multi-object voxel-based Monte Carlo 3D tracking strategies
RESEARCH Open Access Muli-camera muli-objec voxel-based Mone Carlo 3D racking sraegies Crisian Canon-Ferrer *, Josep R Casas, Monse Pardàs and Enric Mone Absrac This aricle presens a new approach o he
More informationHierarchical Recurrent Filtering for Fully Convolutional DenseNets
Hierarchical Recurren Filering for Fully Convoluional DenseNes Jo rg Wagner1,2, Volker Fischer1, Michael Herman1 and Sven Behnke2 1- Bosch Cener for Arificial Inelligence - 71272 Renningen - Germany 2-
More informationImproving the Efficiency of Dynamic Service Provisioning in Transport Networks with Scheduled Services
Improving he Efficiency of Dynamic Service Provisioning in Transpor Neworks wih Scheduled Services Ralf Hülsermann, Monika Jäger and Andreas Gladisch Technologiezenrum, T-Sysems, Goslarer Ufer 35, D-1585
More informationAUTOMATIC 3D FACE REGISTRATION WITHOUT INITIALIZATION
Chaper 3 AUTOMATIC 3D FACE REGISTRATION WITHOUT INITIALIZATION A. Koschan, V. R. Ayyagari, F. Boughorbel, and M. A. Abidi Imaging, Roboics, and Inelligen Sysems Laboraory, The Universiy of Tennessee, 334
More informationComputer representations of piecewise
Edior: Gabriel Taubin Inroducion o Geomeric Processing hrough Opimizaion Gabriel Taubin Brown Universiy Compuer represenaions o piecewise smooh suraces have become vial echnologies in areas ranging rom
More informationChapter 3 MEDIA ACCESS CONTROL
Chaper 3 MEDIA ACCESS CONTROL Overview Moivaion SDMA, FDMA, TDMA Aloha Adapive Aloha Backoff proocols Reservaion schemes Polling Disribued Compuing Group Mobile Compuing Summer 2003 Disribued Compuing
More information4. Minimax and planning problems
CS/ECE/ISyE 524 Inroducion o Opimizaion Spring 2017 18 4. Minima and planning problems ˆ Opimizing piecewise linear funcions ˆ Minima problems ˆ Eample: Chebyshev cener ˆ Muli-period planning problems
More informationChapter 8 LOCATION SERVICES
Disribued Compuing Group Chaper 8 LOCATION SERVICES Mobile Compuing Winer 2005 / 2006 Overview Mobile IP Moivaion Daa ransfer Encapsulaion Locaion Services & Rouing Classificaion of locaion services Home
More informationTest - Accredited Configuration Engineer (ACE) Exam - PAN-OS 6.0 Version
Tes - Accredied Configuraion Engineer (ACE) Exam - PAN-OS 6.0 Version ACE Exam Quesion 1 of 50. Which of he following saemens is NOT abou Palo Alo Neworks firewalls? Sysem defauls may be resored by performing
More informationAn efficient approach to improve throughput for TCP vegas in ad hoc network
Inernaional Research Journal of Engineering and Technology (IRJET) e-issn: 395-0056 Volume: 0 Issue: 03 June-05 www.irje.ne p-issn: 395-007 An efficien approach o improve hroughpu for TCP vegas in ad hoc
More informationMOTION TRACKING is a fundamental capability that
TECHNICAL REPORT CRES-05-008, CENTER FOR ROBOTICS AND EMBEDDED SYSTEMS, UNIVERSITY OF SOUTHERN CALIFORNIA 1 Real-ime Moion Tracking from a Mobile Robo Boyoon Jung, Suden Member, IEEE, Gaurav S. Sukhame,
More informationCurves & Surfaces. Last Time? Today. Readings for Today (pick one) Limitations of Polygonal Meshes. Today. Adjacency Data Structures
Las Time? Adjacency Daa Srucures Geomeric & opologic informaion Dynamic allocaion Efficiency of access Curves & Surfaces Mesh Simplificaion edge collapse/verex spli geomorphs progressive ransmission view-dependen
More informationSENSING using 3D technologies, structured light cameras
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 10, OCTOBER 2017 2045 Real-Time Enhancemen of Dynamic Deph Videos wih Non-Rigid Deformaions Kassem Al Ismaeil, Suden Member,
More informationCONTEXT MODELS FOR CRF-BASED CLASSIFICATION OF MULTITEMPORAL REMOTE SENSING DATA
ISPRS Annals of he Phoogrammery, Remoe Sensing and Spaial Informaion Sciences, Volume I-7, 2012 XXII ISPRS Congress, 25 Augus 01 Sepember 2012, Melbourne, Ausralia CONTEXT MODELS FOR CRF-BASED CLASSIFICATION
More informationVirtual Recovery of Excavated Archaeological Finds
Virual Recovery of Excavaed Archaeological Finds Jiang Yu ZHENG, Zhong Li ZHANG*, Norihiro ABE Kyushu Insiue of Technology, Iizuka, Fukuoka 820, Japan *Museum of he Terra-Coa Warrlors and Horses, Lin Tong,
More informationIROS 2015 Workshop on On-line decision-making in multi-robot coordination (DEMUR 15)
IROS 2015 Workshop on On-line decision-making in muli-robo coordinaion () OPTIMIZATION-BASED COOPERATIVE MULTI-ROBOT TARGET TRACKING WITH REASONING ABOUT OCCLUSIONS KAROL HAUSMAN a,, GREGORY KAHN b, SACHIN
More informationRobust Segmentation and Tracking of Colored Objects in Video
IEEE TRANSACTIONS ON CSVT, VOL. 4, NO. 6, 2004 Robus Segmenaion and Tracking of Colored Objecs in Video Theo Gevers, member, IEEE Absrac Segmening and racking of objecs in video is of grea imporance for
More informationPrecise Voronoi Cell Extraction of Free-form Rational Planar Closed Curves
Precise Voronoi Cell Exracion of Free-form Raional Planar Closed Curves Iddo Hanniel, Ramanahan Muhuganapahy, Gershon Elber Deparmen of Compuer Science Technion, Israel Insiue of Technology Haifa 32000,
More information