Deep Spatial-Temporal Joint Feature Representation for Video Object Detection

Size: px
Start display at page:

Download "Deep Spatial-Temporal Joint Feature Representation for Video Object Detection"

Transcription

1 sensors Artcle Deep Spatal-Temporal Jont Feature Representaton for Vdeo Object Detecton Baojun Zhao 1,2, Boya Zhao 1,2 ID, Lnbo Tang 1,2, *, Yuq Han 1,2 and Wenzheng Wang 1,2 1 School of Informaton and Electroncs, Bejng Insttute of Technology, Bejng , Chna; zbj@bt.edu.cn (B.Z.); zhaoboya@bt.edu.cn (B.Z.); yuq_han@bt.edu.cn (Y.H.); wwz@bt.edu.cn (W.W.) 2 Bejng Key Laboratory of Embedded Real-tme Informaton Processng Technology, Bejng Insttute of Technology, Bejng , Chna * Correspondence: tanglnbo@bt.edu.cn; Tel.: Receved: 2 February 2018; Accepted: 27 February 2018; Publshed: 4 March 2018 Abstract: Wth development of deep neural networks, many object detecton frameworks have shown great success n felds of smart survellance, self-drvng cars, and facal recognton. However, data sources are usually vdeos, and object detecton frameworks are mostly establshed on stll mages and only use spatal nformaton, whch means that feature consstency cannot be ensured because tranng procedure loses temporal nformaton. To address se problems, we propose a sngle, fully-convolutonal neural network-based object detecton framework that nvolves temporal nformaton by usng Samese networks. In tranng procedure, frst, predcton network combnes multscale feature map to handle objects of varous szes. Second, we ntroduce a correlaton loss by usng Samese network, whch provdes neghborng frame features. Ths correlaton loss represents object co-occurrences across tme to ad consstent feature generaton. Snce correlaton loss should use nformaton of track ID and detecton label, our vdeo object detecton network has been evaluated on large-scale ImageNet VID dataset where t acheves a 69.5% mean average precson (map). Keywords: deep neural network; vdeo object detecton; temporal nformaton; Samese network; multscale feature representaton 1. Introducton Object detecton n mages has receved much attenton n recent years wth tremendous progress mostly due to emergence of deep neural networks, especally deep convolutonal neural networks [1 4], and r regon-based descendants [5 11]. These methods acheve excellent results on stll mage datasets, such as Pattern Analyss, Statstcal Modellng and Computatonal Learnng Vsual Object Classes (PASCAL VOC) and Mcrosoft Common Object n Context (COCO). Wth ths success, computer vson tasks have been extended from stll mage doman to vdeo doman because, n realty, data sources of practcal applcatons, such as smart survellance, self-drvng, and face recognton, are mostly vdeos. Thus, addtonal challenges are [12]: (1) moton blur: due to rapd camera or object movement; (2) qualty: due to qualty of nternet vdeo clps beng lower than that of stll mages, even f resolutons are same; (3) partal occluson: due to poston change n camera or object; and (4) pose: due to unconventonal object-to-camera poses that are frequently shown n vdeo clps. To overcome ths gap, most vdeo object detecton methods [13 16] use exhaustve post-processng n addton to stll mage detectors. For example, T-CNN [13] uses two-stage Faster RCNN [8] detecton framework for ndvdual vdeo frames. Then, context suppresson and trackng are appled for detecton results. Snce y do not actually nvolve temporal nformaton, those vdeo object detecton methods do not have favourable results on vdeo sources. Sensors 2018, 18, 774; do: /s

2 Sensors 2018, 18, of 21 In ths paper, to ntroduce tme-seres nformaton to network, we propose a novel tranng framework based on Samese network [17,18], whch s fully-convolutonal wth respect to adjacent frames. By ntroducng Samese network, we can effectvely mantan feature consstency by computng feature smlarty across adjacent frames, whch mantans feature consstency n detecton process. In tranng stream, nput s a par of neghborng frames. Then, a backbone network and a set of fully-convolutonal layers are cascaded to generate multscale feature representaton for detecton. In ths way, we can handle multscale objects. Afterwards, we desgn a novel loss functon based on Samese network to mantan feature consstency. Correlaton loss s a constrant between neghborng frames that s calculated by computng smlarty of object features wth same label and track ID. In detecton stream, by sharng wth multscale feature representaton, detecton results are optmzed by Soft-NMS [19]. Our network s evaluated on larger-scale ImageNet VID valdaton dataset. Our proposed network has a 1.6% mprovement compared to baselne Sngle Shot Multbox Detector. Moreover, for demonstrate effectveness of proposed framework, we also conduct an experment on YouTube Object dataset and proposed framework has a 4.4% bonus compared to baselne. 2. Related Works In ths secton, we revew stll mage object detecton, vdeo object detecton, vdeo acton detecton, and 3D shape classfcaton Stll Image Object Detecton There are two classes of detecton methods. One method s based on regon proposal classfcaton, and or method s based on sldng wndows. Before larger mprovement of computatonal sources, best of se two approaches are Deformable Part model (DPM) [20] and Selectve Search (SS) [21]. After dramatc mprovement that was shown by R-CNNs [8], regon proposal object detecton methods became prevalent. For one-stage detecton framework based on sldng wndows, You Look Only Once (YOLO) [22] and Sngle Shot Multbox Detector (SSD) [10] became renowned. These two detecton methods have comparable results. Moreover, re are added benefts to usng one-stage archtecture. The processng speed of one-stage method s faster than regon proposal method. For example, SSD300 obtaned a speed of 46 fps, whereas Faster RCNN [6] was 7 fps Vdeo Object Detecton Recently, ImageNet [23] ntroduced a new soluton for object detecton from vdeo clps, whch s well-known ImageNet VID. Ths competton brngs object detecton task nto vdeo doman. In ths competton, nearly all detecton methods that nvolve tme-seres nformaton have been post-processed after detecton on ndvdual frames. The T-CNN [13] ncorporates optcal flow nformaton to fx neghborng frame results. The MCMOT [16] approaches post-processng as a mult-object trackng problem by applyng a seres of man-made rules (e.g., threshold of confdence and changng pont detecton). The Seq-NMS [14] regards post-processng as a confdence re-scorng problem. The boxes of vdeo sequence are re-scored to average confdence. Unfortunately, se methods ncorporate temporal nformaton by post-processng and detecton n a vdeo s a mult-stage ppelne. The temporal nformaton s not truly nvolved n algorthm Vdeo Acton Detecton Dfferent from vdeo object detecton, vdeo acton detecton s to detect every occurrence of a gven acton wthn a long vdeo, and to localze each detecton both n space and tme. How to use temporal nformaton effectvely s common problem n vdeo object detecton and vdeo acton detecton. Fndng Acton Tubes [24] extracts frame-level acton proposals usng selectve search and lnk m usng Vterb algorthm. Mult-regon RCNNs [25] apples two stream

3 Sensors 2018, 18, of 21 R-CNNs for acton detecton, where a spatal Regon Proposal Network (RPN) and a moton RPN are used to generate frame-level acton proposals. Tube CNN [26] proposes an end-to-end Tube Convolutonal Neural Network for acton detecton, whch explots a 3D convolutonal network to extract spatal-temporal features D Shape Classfcaton 3D data has three-dmensonal nformaton, whch conssts of 2D nformaton and depth nformaton. A vdeo clp s smlar to 3D data, whch also has three-dmensonal data (2D nformaton and tme-seres nformaton). The processng of 3D data s manly focused on shape classfcaton. The common problem of vdeo object detecton and 3D shape classfcaton s how to use thrd-dmensonal nformaton (depth or tme-seres nformaton). LWU [27] leverages stochastc gradent Markov Chan Monte Carlo (SG-MCMC) and Bayesan nterpretatons to learn weght uncertanty and mprove robustness n deep neural networks. GM3D [28] extracts geodesc moments as shape features and uses a two-layer stacked sparse auto encoder to dgest se features to predct shape category. 3. Method Fgure 1 llustrates entre ppelne of proposed method. The ppelne s ntroduced n two parts: tranng stream and testng stream. Frst, for tranng procedure to have a better tranng startng pont, backbone of proposed fully-convolutonal neural networks s VGG-16, whch s pre-traned by usng ImageNet CLS-LOC dataset. Second, by feedng forward a set of fully-convolutonal layers cascaded after backbone network, multscale feature representaton s generated to handle multscale objects. Thrd, a set of dfferent anchor shapes are generated on multscale feature maps to adapt objects wth dfferent scales aspect ratos. Fourth, consderng predctons for each anchor and ntersecton of unon (IOU) of anchor and ground truth, detecton loss s formed. Fnally, above procedures are replcated twce to establsh a Samese network. By measurng smlarty wth neghborng frame feature, correlaton loss, whch conssts of center-value loss and anchor coordnate loss, s computed. For testng procedure, after sharng backbone multscale feature representaton, anchor generaton, and predcton, testng results are computed after Soft-NMS optmzaton Network Archtecture The followng secton detals proposed network, whch contans backbone network, multscale feature representaton, anchor generaton, anchor predcton, and tranng sample selecton. The network archtecture explans data flow n feed-forward procedure Backbone Network In ths paper, backbone network s VGG-16 [3], whch s pre-traned on ImageNet CLS-LOC dataset [23]. As shown n Fgure 2, orgnal VGG-16 s a deep CNN that ncludes 13 convolutonal layers and three fully-connected layers. The convolutonal layers generate deep features, whch are n fed nto fully-connected layers. Smlar to SSD, we remove fully-connected layers and only use convolutonal layers to generate feature maps. The followngs detal entre backbone network: (1) Input: mages wth RGB channels. (2) Convolutonal layers: The convolutonal layers manly contan fve groups conv1, conv2, conv3, conv4, and conv5. Conv1 ncludes two convolutonal layers wth kernels. Conv2 ncludes two convolutonal layers wth 3 3 kernels. Smlar to conv1 and conv2, conv3, conv4, and conv5 nclude three convolutonal layers wth 256, 512, and kernels, respectvely.

4 Sensors 2018, 18, x FOR PEER REVIEW 4 of 21 (2) Convolutonal layers: The convolutonal layers manly contan fve groups conv1, conv2, conv3, conv4, and conv5. Conv1 ncludes two convolutonal layers wth kernels. Sensors 2018, 18, of 21 Conv2 ncludes two convolutonal layers wth 3 3 kernels. Smlar to conv1 and conv2, conv3, conv4, and conv5 nclude three convolutonal layers wth 256, 512, and kernels, respectvely. (3) The actvaton functon s rectfed lnear unts [29], and kernel sze of max poolng layer (3) The actvaton functon s rectfed lnear unts [29], and kernel sze of max poolng s 2 2. layer s 2 2. Channel reducng layer 3 x 3 x 512 x 1024 dlaton 1 x 1 x 1024 x 1024 s1-p0 1 x 1 x 1024 x 256 Conv4_3 38 x 38 x 512 Conv6 Conv7 Conv8 Conv9 Conv10 Correlatuon loss Neghborngframe feature 1 x 1 x 256 x 128 s1-p0 3 x 3 x 128 x 256 s1-p0 1 x 1 x 256 x 128 s1-p0 3 x 3 x 128 x 256 s1-p0 Detecton loss VGG16 Frame t+1 Detecton loss s1-p0 3 x 3 x 256 x 512 s2-p1 1 x 1 x 512 x 128 s1-p0 3 x 3 x 128 x 256 s2-p1 VGG16 Frame t Multscale feature Conv4_3 38 x 38 x 512 Mult feature representaton Frame n Detecton results VGG16 Conv4_3 38 x 38 x 512 Fgure 1. The archtecture of proposed method. The orange part s tranng procedure by Fgure 1. neghborng The archtecture of green proposed method. Theoforange frames. The part s testng procedure frame n.part s tranng procedure by neghborng frames. The REVIEW green part s testng procedure of frame n. Sensors 2018, 18, x FOR PEER 5 of archtecture The archtecture of VGG16 that showng convolutonal convolutonal and layers. Along wthwth FgureFgure 2. The of VGG16 that showng andpoolng poolng layers. Along feed-forward procedure, sze of feature map s decreased. s and p refer to strde and feed-forward procedure, sze of feature map s decreased. s and p refer to strde and paddng paddng sze, respectvely. sze, respectvely Multscale Feature Representaton After backbone network, multscale feature representaton network s cascaded, whch s generated by feed-forward convolutonal networks [4,30]. Accordng to SSD [10], conv4_3 has a dfferent feature scale compared to or layers n backbone network. Therefore, multscale feature representaton s started from conv4_3 after applyng a L2 normalzaton to scale feature norm at each locaton n feature map to 20 and to learn scale durng back propagaton.

5 Sensors 2018, 18, of Multscale Feature Representaton After backbone network, multscale feature representaton network s cascaded, whch s generated by feed-forward convolutonal networks [4,30]. Accordng to SSD [10], conv4_3 has a dfferent feature scale compared to or layers n backbone network. Therefore, multscale feature representaton s started from conv4_3 after applyng a L2 normalzaton to scale feature norm at each locaton n feature map to 20 and to learn scale durng back propagaton. Consderng that sze of nput mage s ntalzed to , conv4_3 has a feature map sze, and conv4_3 s frst feature scale map. The multscale feature representaton s generated after conv4_3 by applyng feed-forward convoluton. Tradtonally, low-resoluton feature map can be generated from hgh-resoluton feature map by a certan convolutonal layer, but computatonal costs are hgh. To reduce computatonal costs, each scale block contans 2 convolutonal layers. The frst convolutonal kernel sze s 1 1 M N. Here, M s upper layer channel number, and N s current channel number. By usng a 1 1 M N kernel, mddle feature map channel s decreased from M to N. The second convoluton kernel sze s 3 3 N K, where K s current scale-feature channel number. Table 1 shows detals of multscale feature representaton. There are sx scale feature maps wth feature map szes of 38 38, 19 19, 10 10, 5 5, 3 3, and 1 1, where s conv4_3 n VGG16. In optons lst, s s convolutonal kernel strde, p s paddng sze, and dlaton s dlated convoluton. In dlaton, strde s 1, paddng sze s 6, and convolutonal kernel sze s 6 wth 3 3 parameters. Table 1. Detals of multscale feature representaton. Stage Conv Kernel Sze Feature Map Sze Usage Optons Conv4_3 ( ) Detecton for scale 1 s-1, p-1 Conv6_1 ( ) Enlarge receptve feld dlaton Conv6_2 ( ) Detecton for scale 2 s-1, p-0 Conv7_1 ( ) Reduce channels s-1, p-0 Conv7_2 ( ) Detecton for scale 3 s-2, p-1 Conv8_1 ( ) Reduce channels s-1, p-0 Conv8_2 ( ) Detecton for scale 4 s-2, p-1 Conv9_1 ( ) Reduce channels s-1, p-0 Conv9_2 ( ) Detecton for scale 5 s-1, p-0 Conv10_1 ( ) Reduce channels s-1, p-0 Conv10_2 ( ) Detecton for scale 6 s-1, p Anchor Generaton The anchor n proposed network plays two roles. The frst role s n data selecton procedure. The IOU of ground truth and an anchor decdes wher ths anchor s a postve sample [31]. Tradtonally, f IOU s more than 50%, anchor s a postve sample. The second role s n tranng and testng procedure. In tranng procedure, classfcaton and locaton loss are computed by anchors. In testng procedure, boundng box results are computed by anchor locaton layers and, after non-maxmum suppresson of boundng box results, detecton results are computed. Fgure 3 shows flow of anchor generaton. Tradtonal object detecton methods suggest processng an mage at dfferent scales and n combnes results. Snce we have multscale feature representaton, we can utlze se feature maps wth dfferent scales n a sngle network to have a smlar effect wth tradtonal mage processng method. Smlar to Faster RCNN and SSD, anchors are generated on feature maps by dense samplng. Frst, to handle dfferent szes of objects, dfferent scales of objects are detected on dfferent feature maps. Therefore, n dfferent feature maps, we allocate dfferent scales. In feature map wth

6 Tradtonal object detecton methods suggest processng an mage at dfferent scales and n combnes results. Snce we have multscale feature representaton, we can utlze se feature maps wth dfferent scales n a sngle network to have a smlar effect wth tradtonal mage processng method. Smlar to Faster RCNN and SSD, anchors are generated on feature maps by Sensorsdense 2018, 18, samplng. 774 Frst, to handle dfferent szes of objects, dfferent scales of objects are detected on 6 of 21 dfferent feature maps. Therefore, n dfferent feature maps, we allocate dfferent scales. In feature map wth low resoluton, anchor scale s large, and n feature map wth hgh low resoluton, anchor scale s large, small. and Second, n for feature a certan map scale, wth hgh anchor resoluton, should have anchor dfferent scale s small. aspect Second, ratos. forthe a certan desgn scale, of se aspect anchor ratos should are decded have dfferent by object s aspect aspect ratos. ratos. TheSuppose desgnthat of se aspect we ratos have are m feature decded maps byand object s detecton aspect s appled ratos. to Suppose se m feature that wemaps, have manchor featurescale maps of and detecton feature s appled map s computed to se mas feature follows: maps, anchor scale of feature map s computed as follows: s n = s mn + s s -s max mn s = s + max mn ( n-1), n mn nî [1, m] (1) m-1 (n 1), n [1, m] (1) In formulaton, s mnmum scale of objects to be detected, and s In maxmum formulaton, scale of objects s mn sto be mnmum detected. n represents scale of nth objects square tofeature be detected, map. and s max s s scale maxmum of scale certan offeature objects map. Accordng to be detected. to nscales, represents anchor nth area square s obtaned as h map. s n s = scale. of certanfor feature aspects map. of Accordng anchors, to dfferent scales, aspect anchor ratos areaset s accordng obtaned as to anchor object s area = aspect s 2 n. For ratos. Tradtonally, aspects of anchors, aspect dfferent ratos contan aspect ratos = [1, are2, set 3, accordng, ]. Furrmore, to object s for aspect scale ratos. Tradtonally, aspect ratos contan a r = [1, 2, 3, 1 2, 1 3 ]. Furrmore, for scale consstency, a scale of s n = consstency, a scale of = and aspect rato of 1 are also consdered. The wdth and heght s n s n+1 are and computed aspect follows: rato of 1 are also consdered. The wdth and heght are computed as follows: { ìï anchor_w ï anchor _ w = s a n n ar anchor_h r í (2) ï (2) ïî _ h = s / a n n ra r Fnally, re are sx anchors for each pxel n feature maps. Fnally, re are sx anchors for each pxel n feature maps. In proposed model, we allocate dfferent scales from 0.1 to 0.95, n whch 0.95 ndcates In proposed model, we allocate dfferent scales from 0.1 to 0.95, n whch 0.95 ndcates that that object occupes entre mage and 0.1 ndcate down-samplng rate on Conv4_3. object occupes entre mage and 0.1 ndcate down-samplng rate on Conv4_3. Moreover, Moreover, we set startng scale of Conv6_2 s 0.2. The aspect ratos contan we set startng scale of Conv6_2 s 0.2. The aspect ratos contan a r = [1, 2, 3, 1 2, = 1 [1, 2, 3, 3 ] and aspect, ] rato and aspect rato of 1 for scale of of 1 for scale of. s n s n+1. Table 2 shows anchor detals wth heght, weght and number of each feature map n Table Secton shows In table, anchor for detals clarty, wth channel heght, number weght of and feature number map s of hdden. each feature To nclude map n Secton more objects In wth table, dfferent for clarty, aspect ratos channel and scales, number re ofare sx feature anchor map shapes s hdden. for each To pxel nclude on more objects feature wth dfferent map. aspect ratos and scales, re are sx anchor shapes for each pxel on feature map. Feature map Heght Wdth Anchor 1(scale-0.2, aspect_rato-3): 0.346, Anchor 2(scale-0.2, aspect_rato-2): 0.283, Anchor 3(scale-0.2, aspect_rato-1): 0.2, 0.2 Anchor 4(scale-0.2, aspect_rato-0.5): 0.141, Anchor 5(scale-0.2, aspect_rato-0.33):0.115, Anchor 6(scale-0.27, aspect_rato-1): 0.27, 0.27 Fgure 3. Anchor generaton flow. For each pxel on feature map, sx anchor shapes are generated that share same center wth scales of 0.2 and In addton, center of each anchor s pxel center. In fgure, center s (0.5, 0.5). Table 2. Anchor detals n multscale feature representaton. Feature Feature Map Sze Anchor Heght Anchor Wdth Number Conv4_ Conv6_ Conv7_ Conv8_2 5 5 Conv9_2 3 3 Conv10_ , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

7 Conv8_2 5 5 Conv9_2 3 3 Sensors 2018, 18, , , , , Conv10_ of , , , , Anchor Predcton , , , , , , , , , , , , , , , , , , , , After assocatng a set of anchors wth each feature map, anchor predcton s next key procedure durng algorthm. At each anchor, we predct offsets relatve to anchor shapes and per-class scores that ndcate presence of a class nstance n ths anchor. Suppose that class number that we want to predct s n. For each anchor, output sze s (n + 1) + 4. (n + 1) s class number and background, and 4 s boundng box offset for ths anchor. Fgure 4 shows predcton procedure of an anchor. Referrng to multscale feature representaton n Secton 3.1.2, Table 3 shows detaled predcton kernels. The confdence output of each of each anchor anchor s (n s + ( 1) 61) dmensons 6 dmensons wth (n wth + 1) ( classes 1) classes and sx shapes. and sx The shapes. boundng The boundng box outputbox of each output anchor of each s (n + anchor 1) 4s dmensons ( 1) 4 wth dmensons (n + 1) classes wth ( and (bx, 1) classes by, bw, and bh). (bx, by, bw, bh) Confdence kernel Convoluton Confdence predcton Feature map Example anchor Convoluton Boundng box predcton Boundng box kernel Fgure Fgure Confdence Confdence and and boundng boundng box box predcton predcton from from an an anchor. anchor. The The red red pont pont s s center center of of example example anchor. anchor. There Thereare aretwo twokernels kernelsof of for for confdence confdence predcton predcton and and boundng box box predcton for for ths ths anchor. anchor. The The red red kernel kernel s s confdence confdence kernel kernel and and blue kernel blue s kernel boundng s boundng box predcton box predcton kernel. kernel. Table 3. Detals of predcton kernel. Feature Feature Map Sze Confdence Kernel Locaton Kernel Conv4_ (n + 1) (n + 1) 4 Conv6_ (n + 1) (n + 1) 4 Conv7_ (n + 1) (n + 1) 4 Conv8_ (n + 1) (n + 1) 4 Conv9_ (n + 1) (n + 1) 4 Conv10_ (n + 1) (n + 1) Tranng Sample Selecton In tranng procedure, tranng samples are generated from anchors. A matchng strategy between anchors and ground truths s appled. The strategy begns by matchng each anchor to ground truth. Dfferent from Multbox, postve samples are anchors wth a Jaccard overlap hgher than a threshold. The ors are negatve samples. Ths strategy smplfes tranng sample selecton problems and provdes network wth more postve samples. Moreover, to mprove performance, network also ncorporates hard example mnng and data augmentaton.

8 Sensors 2018, 18, of Hard Example Mnng Snce negatve samples are selected from background, number of negatve samples s much larger than number of postve samples. Ths dfference causes a sgnfcant mbalance between postve and negatve tranng samples. To overcome ths dffculty, followng OHEM [32], we sort samples by r predcton scores. By selectng top scores, we restrct rato between negatve and postve tranng samples to 3:1. In ths way, we can avod tranng manly on negatve samples. 2. Data Augmentaton To mprove generalzaton of network, we also apply data augmentaton. Smlar to SSD, each tranng mage s augmented by one of followng optons: Use entre orgnal nput mage; Sample a patch so that mnmum Jaccard overlap wth objects s 0.1, 0.3, 0.5, 0.7, or 0.9; Randomly sample a patch. After aforementoned samplng step, each augmented mage s reszed to a fxed sze and s horzontally flpped wth probablty of Loss Functon In proposed network, loss functon manly contans two parts: detecton loss and correlaton loss. The detecton loss relates to object classfcaton and object boundng box regresson. The correlaton loss s neghborng object feature correlaton, whch could nvolve tme-seres nformaton of network Detecton Loss Detecton loss relates to an object n a sngle frame, and t nvolves frame nformaton concernng object. Smlar to Faster RCNN [6], detecton loss conssts of a combned classfcaton loss L cls and a boundng box regresson loss L bbox. The overall detecton loss s sum of L cls and L bbox. Suppose that f s a feature n a certan anchor wth a Jaccard overlap that s hgher than 0.5. Then: L det ( f, l, g, c) = 1 n (al bbox( f, l, g) + L cls ( f, c)), (3) where n s number of anchors whose Jaccard overlap s hgher than 0.5; f s feature of certan anchor; c s classfcaton score of anchor; and l and g are boundng box offsets and ground truth, respectvely. Smlar to SSD, L cls s formed by mult class softmax: where: L cls ( f, c) = ( n postve fj label log( c label ) + c label = exp(clabel ) label c label n negtve log( c label )), (4), (5) In formulaton, c label s anchor classfcaton predcton score of certan label n th anchor box. For example, c person 5 s predcton score of person n 5th anchor box. Let fj label = [1, 0] be ndcator for th anchor box to jth ground truth of label. Moreover, background label s 0. Therefore, f negtve, n c label = c 0.

9 Sensors 2018, 18, of 21 L bbox s based on Huber loss [33] between ground truth and boundng box. The Huber loss s less senstve to outlers n data than squared error loss. L bbox s formulated as follows: L loc = n postve k (bx,by,bw,bh) f label j Huber(pre m g m j ), (6) n whch: Huber(x) = { 0.5x 2 ( x α) α( x 0.5α) ( x > α) (7) Smlar to R-CNN, α s a hyper-parameter and set to 1. Suppose that boundng box has four parameters, whch are (bx, by, bw, bh). (bx, by) s center of boundng box; and (bw, bh) s wdth and heght, respectvely. Pre and g refer to predcton box and ground truth box, respectvely. and j are as same as L cls. Moreover, when formulatng L bbox, we adopt parameterzatons of anchor boundng box(a) and ground truth box(g) followng R-CNN [8]: g bx j g by j g bw j g bh j = gbx j a bx a bw = gby j a by a bh = log( gbw j a bw = log( gbh j Ths parameterzaton enhances effect of center (bx, by) and weakens effect of wdth (bw) and heght (bh) Correlaton Loss Detecton loss s manly concerned wth ntra-frame nformaton. However, n vdeo sources, detecton loss does not consder nter-frame nformaton. Correlaton loss s a strategy that nvolves nter-frame nformaton by measurng feature dfference between frames. As s known, tradtonal detecton frameworks based on deep neural networks are dscrmnatve algorthms. The man concern s how to plot a lne or surface n feature space. If feature s on left sde of lne, feature s generated by postve sample. Accordng to well-constructed feature space of a convolutonal neural network, lnear classfer (Softmax) could easly dfferentate negatve features from postve features. Due to dscrmnatve algorthm characterstcs, feature n feature space s scattered. In vdeo object detecton, object s contnuously movng. Therefore, when tranng dataset s completed, ntra-frame-based detecton framework may have a favorable result. The proposed correlaton loss s nspred by trackng task. In trackng task [16,34 37], feature consstency s key pont to judge trackng result, especally n correlaton-based trackng algorthm such as correlaton flter [34]. In deep feature [12] and flow-guded feature [38], accordng to object s movement, feature s coped or aggregated from key frames of or frames. In ths way, feature stablty has been guaranteed. Dfferent from se approaches, we formulate a correlaton loss to supervse feature consstency. As shown n Fgure 1, correlaton loss s computed by Samese network, whch s a two-path network that uses replcaton. snce we construct multscale feature map, correlaton a bh ) ) (8)

10 Sensors 2018, 18, of 21 loss s generated from t. The correlaton loss (L corr ) s combnaton of center value loss (L center_value ) and anchor coordnate loss (L coordnate ): L corr ( f t, f t+1, map t+1, a t+1 ) = 1 n (L center_value( f t, f t+1 ) + L coordnate ( f t, map t+1, a t+1 )), (9) where n s number of ground truths; f t and f t+1 are anchor center features of neghborng frame t and t + 1, respectvely; and a t+1 s postve anchor box n frame t + 1. Each postve anchor box has 4 parameters (bx, by, bw, bh). map t+1 s entre scale feature map n frame t + 1. Moreover, postve anchor s dfferent from postve sample selecton. In correlaton loss procedure, anchor selecton s maxmum Jaccard overlap between ground truth box and anchors, whch means that number of postve anchors s same as ground truth. Frst, correlaton loss s a value that compares correlaton dstance between same labelled object features n neghborng frame. The followng s formulaton of (L center_value ): L center_value ( f t, f t+1 n ) = f t,label_t,track_t f t+1,label_t+1,track_t+1 j, postve f (label_t = label_t + 1) and (track_t = track_t + 1) (10) Sensors 2018, 18, x FOR PEER REVIEW 11 of 21 Here, s correlaton operaton. The correlaton operaton measures consstency of feature. label_t and track_t are ground truth label and track ID, respectvely, that relate to ths n anchor. By searchng t tall + 1 t+ 1 postve anchorst, label n _ t, trackneghborng _ t t+ 1 t+ frame, 1, label _ t+ 1, we track _ can t+ 1 obtan one-to-one L ( f, map, a ) = ( (, ), )/ _ correspondng postve anchor wth ts label and track ID n frame t and frame t + 1. f t,label_t,track_t, coordnate å Ds T f map a map sze Îpostve (11) s center f ( label value _ t of = label postve _ t + 1) and anchor. ( track _ t = track _ t + 1) Second, anchor coordnate loss measures trackng result loss on neghborng frame feature. where: Inspred by correlaton flter, we take postve anchor n frame t as frst-frame trackng ground truth. We obtan response map n t, label _ t, track _ t t+ 1 t, label frame _ t, track t _ + t 1. Then, t+ 1 T( f, map ) = max( f Ämap )[ bx, coordnate by] of hghest (12) response s trackng result n frame t + 1. Fgure 5 shows L coordnate computaton flow. Feature map for frame t Correlaton heat map Kernel correlaton-s1-p1 Feature map for frame t+1 Max pont coordnate Fgure 5. Coordnate loss computaton flow. The red and green ponts are postve anchors of Fgure 5. Coordnate loss computaton flow. The red and green ponts are postve anchors of frame frame t and frame t + 1, respectvely. A score heat map s computed wth correlaton operaton t and frame t + 1, respectvely. A score heat map s computed wth correlaton operaton by usng a by usng a 3 3 kernel n frame t wth red pont center and feature map of frame t + 1, 3 3 kernel n frame t wth red pont center and feature map of frame t + 1, smlar to trackng on smlar frame to t + trackng 1. After on obtanng frame t + max 1. After pont obtanng coordnate, L max pont coordnate, can be coordnate can be computed by max pont coordnate computed by and max green pont pont coordnate coordnate. and green pont coordnate. The T (Track) formulaton obtans center (bx, by) of max response locaton. In, Ds computes center Eucldean dstance of trackng result and postve anchor n frame t + 1. Moreover, n case coordnate loss s hgh, we normalze t by sze of feature map. By applyng correlaton loss, object feature n neghborng frame can be more consstent.

11 Sensors 2018, 18, of 21 Then, anchor coordnate loss s formed as: L coordnate ( f t, map t+1, a t+1 ) = n postve Ds(T( f t,label_t,track_t, map t+1 ), a t+1,label_t+1,track_t+1 )/map_sze, f (label_t = label_t + 1) and (track_t = track_t + 1) (11) where: T( f t,label_t,track_t, map t+1 ) = max( t,label_t,track_t f map t+1 )[bx, by] (12) The T (Track) formulaton obtans center (bx, by) of max response locaton. In L coordnate, Ds computes center Eucldean dstance of trackng result and postve anchor n frame t + 1. Moreover, n case coordnate loss s hgh, we normalze t by sze of feature map. By applyng correlaton loss, object feature n neghborng frame can be more consstent. 4. Experment and Results We report results of ImageNet VID valdaton dataset. The tranng set contans ImageNet VID tranng dataset. The performance of our method s compared to R-CNN [8], Fast R-CNN [5], orgnal SSD [10], T-CNN [15], and TPN + LSTM [39], and wnner of competton of ImageNet VID 2015 [13]. The detaled evaluaton metrcs are descrbed n Secton 4.1. All methods n experments were programmed based on Pytorch deep learnng framework. The computatonal resources nclude a TITAN X GPU, 128 GB of memory, and an Intel Xeon E CPU (2.30 GHz). The operatng system used s Ubuntu Moreover, n order to show effectveness of proposed method, we also evaluate our model on YouTube Object (YTO) dataset and Unsupervsed [40], YOLO [22], Context [41], a_lstm [42], T-CNN [15], Base [10] models are selected for comparson ImageNet Dataset We evaluate our method by usng 2015 ImageNet object detecton from a vdeo (VID) [23] dataset Sensors 2018, that 18, contans x FOR PEER 30REVIEW classes n 3862 tranng and 555 valdaton vdeos. The 30 object categores 12 of 21 n ImageNet VID are a subset of 200 categores n ImageNet DET dataset. The objects have ground ground truth truth annotatons annotatons for for r r boundng boundng box box and and a trackng a trackng ID n ID a n vdeo. a vdeo. Snce Snce ground ground truth for truth for test set test s not set publcly s not publcly avalable, avalable, we measure we measure performance performance as mean as average mean precson average (map) precson over (map) 30 classes over 30 on classes valdaton on valdaton set by followng set by followng protocols protocols n [12,13,15,38,39], n [12,13,15,38,39], whch s standard whch s standard practce. Fgure practce. 6 shows Fgure 6 shows number of number ground truths of ground n each truths class, n n each whch class, we n can whch see that we can tranng see that ground tranng truth ground n each class truth s n unbalanced. each class s Then, unbalanced. we subsample Then, we subsample VID tranng set VID by usng tranng only set 30 by frames usng only from 30 each frames vdeo. from each vdeo. Fgure The number of ground truths n n each each class. class. Fgure 7 shows object area statstcal nformaton n dataset, n whch we can see more than half of objects have areas lower than The postve samples are anchors wth Jaccard overlap of ground truths of more than 0.5. If Jaccard overlaps between all of anchors and ground truths are lower than 0.5, re would be no

12 Sensors 2018, 18, of 21 Fgure 6. The number of ground truths n each class. Fgure 7 shows object area statstcal nformaton n dataset, n whch we can see more than half of objects have areas lower than The postve samples are anchors wth Jaccard overlap of ground truths of more than 0.5. If Jaccard overlaps between all of anchors and ground truths are lower than 0.5, re would be no postve samples. Fgure 88 shows statstcal nformaton of small of small objects. objects. In ths In ths dataset, dataset, f f scale scale of of smallest smallest anchor anchor s 0.1, s 0.1, framework framework wll mss wll about mss about 10% of 10% of ground ground truths. truths. Fgure 7. The object area characterstcs of Imagenet VID dataset. Fgure 7. The object area characterstcs of Imagenet VID dataset. Sensors 2018, 18, x FOR PEER REVIEW 13 of 21 Fgure Fgure The The small small object object area area characterstcs characterstcs of of Imagenet Imagenet VID VID dataset. dataset Model Tranng 4.2. Model Tranng Baselne-sngle shot multbox: The baselne s SSD300, whch means nput s reszed to Baselne-sngle shot multbox: The baselne s SSD300, whch means nput s reszed to The multscale feature map szes are 38 38, 19 19, 10 10, 5 5, 3 3, and The multscale feature map szes are 38 38, 19 19, 10 10, 5 5, 3 There are sx anchor shapes n multscale feature map, whch contans [1, 2, 3, 3, and 1 1. There are sx anchor shapes n multscale feature map, whch contans [1, 2, 3, 1 wth an 2, 1 3 ] wth an aspect rato of 1, for a scale of s n s n+1. The backbone s VGG 16 net wth fully-connected layers removed. Followng SSD300, we we change change pool5 pool5 from from s2 to 3 to 33 3 s1 and use and anuse a trous an algorthm algorthm to fll to holes. fll holes. The baselne uses conv4_3, conv6_2, conv7_2, conv8_2, conv9_2, and and conv10_2 conv10_2 to predct to predct both both locatons locatons and and confdences. confdences. The The baselne baselne sets sets anchor anchor wth wth a scale a scale of 0.1of on0.1 conv4_3, conv4_3, and and or or layers layers are ntalzed are ntalzed by by Xaver Xaver method method [43]. For [43]. allfor predcton all predcton layers, layers, we ncluded we ncluded sx anchor sx anchor shapes as shapes descrbed as descrbed n Secton n Secton The baselne The baselne uses uses 10 3 learnng 10 rate learnng for 200,000 rate for teratons. 200,000 teratons. The tranng was contnued for 200,000 teratons wth 10 and 10. Moreover, optmzaton trck has a momentum of 0.9 and a weght decay of Proposed network: Our network s establshed on SSD300, whose nput mage sze s also reszed to The tranng dfferences between proposed network and baselne are as follows:

13 Sensors 2018, 18, of 21 The tranng was contnued for 200,000 teratons wth 10 4 and Moreover, optmzaton trck has a momentum of 0.9 and a weght decay of Proposed network: Our network s establshed on SSD300, whose nput mage sze s also reszed to The tranng dfferences between proposed network and baselne are as follows: 1. Input mage The nput mage s a par of neghborng frames (frame t and frame t + 1). The neghborng frames must have at least one object par wth same label and track ID. If one of neghborng frames s empty, ths frame par s removed. 2. Correlaton loss computaton Correlaton loss (L corr ) has two subsets: (L center_value ) and (L coordnate ). (L center_value ) s computed by center values of postve anchors. In L coordnate, when computng response map, kernel sze s 3 3, and ts center s postve anchor center. Furrmore, frame t + 1 s padded by one pxel. 3. Learnng rate Snce loss n proposed network s more than baselne, we frst warm up network wth a learnng rate of 10 4 for 10,000 teratons. Then, proposed network uses 10 3 learnng rate for 200,000 teratons. The tranng was contnued for 200,000 teratons wth 10 4 and Tranng tme The tranng tme of baselne s almost three days. Due to correlaton loss computaton and detecton loss that contans two mages, tranng tme of our method s longer than baselne. Followng above tranng process, entre tranng tme s approxmately seven days. In addton to above procedure, proposed network tranng s mostly same as baselne. The proposed network also uses sx layers to predct, and each layer s anchor shape s sx. The optmzaton trck that we use s a momentum of 0.9 and a weght decay Testng and Results We test baselne and proposed network on valdaton dataset wth subsamplng. Moreover, we also adopt Soft-NMS [19] to accurately fx canddate boundng boxes. Moreover, Seq-NMS [14], whch s a post-processng method, s appled after detecton process. Table 4 shows map results on valdaton dataset. From table, baselne SSD300 acheves a map of 67.9%. Our proposed method acheves 69.5%, whch s a 1.6% mprovement compared to baselne. From Table 4, we compare our proposed network wth R-CNN [8], Fast R-CNN [5], T-CNN [15], TPN + LSTM [39], baselne and wnner n competton of ImageNet VID 2015 [13] (mult-model). The R-CNN and Fast R-CNN are baselnes desgned for stll mages. The last lst s wnner n competton of ImageNet VID 2015 and result s fuson of DeepID net, CRAFT and post-processng procedures. For sngle model, proposed network acheves best score, whch s a 1.1% mprovement over TPN + LSTM. Moreover, because our proposed network s based on SSD300, and because testng process s smlar, test tme per mage s smlar. Based on our equpment (Ttan X), testng tme per mage s approxmately 32 fps because re are more anchors than orgnal SSD. Durng network feed-forward, Fgure 9 shows some mddle features, contanng conv4_3, conv6_2, conv7_2, conv8_2, conv9_2, and conv10_2.

14 Sensors 2018, 18, of 21 Table 4. Average precson of each class and map. The bold values n table are best results among sngle models for a certan class. Class [8] [5] [15] [39] Baselne Arplane Antelope Bear Bcycle Brd Bus Car Cattle Dog Dc_cat Elephant Fox Gant_panda Hamster Horse Lon Lzard Monkey Motorcycle Rabbt Red_panda Sheep Snake Squrrel Tger Tran Turtle Watercraft Sensors 2018, 18, Whale x FOR PEER REVIEW Zebra map 63.0 Fgure shows 68.4 some Durng network 45.3 feed-forward, Our [13] of contanng 73.8 mddle features, conv4_3, conv6_2, conv7_2, conv8_2, conv9_2, and conv10_2. Image Conv4_3 Conv6_2 Conv7_2 Conv8_2 Conv9_2 Conv10_2 Fgure 9. Multscale feature representaton of network. The frst lne s multscale feature Fgure 9. Multscale feature representaton of network. The frst lne s multscale feature representaton of proposed method and second lne s feature of baselne. representaton of proposed method and second lne s feature of baselne. To measure effect on features, wher nvolvng correlaton loss or not, we extract To measure effect on features, wher nvolvng correlaton or not, we computng extract features features of anchor wth maxmum Jaccard overlap of groundloss truth. Then, of anchor wth maxmum Jaccard of metrc groundstruth. Then, computng of smlarty of neghbourng frame, overlap smlarty Eucldean dstance. Forsmlarty each class, neghbourng frame, smlarty metrc s Eucldean dstance. For each class, we choose 10 we choose 10 pars of neghbourng frames and smlarty metrc s average of those smlartes. In order to avod effect of channel sze, smlarty ndex s normalzed by ℎ : n a a Smlarty = å ( f frame - f frame )2 / channels / n _t _ t +1 (13) a Here, are features of certan anchor on neghbourng frame. _ and The channel s number of feature map. Fgure 10 shows feature s smlarty of proposed network and baselne. We can see our

15 Fgure 9. Multscale feature representaton of network. The frst lne s multscale feature representaton of proposed method and second lne s feature of baselne. To measure effect on features, wher nvolvng correlaton loss or not, we extract features of18, Sensors 2018, 774anchor wth maxmum Jaccard overlap of ground truth. Then, computng 15 of 21 smlarty of neghbourng frame, smlarty metrc s Eucldean dstance. For each class, we choose 10 pars of neghbourng frames and smlarty metrc s average of those pars of neghbourng frames smlarty s sze, average of thosendex smlartes. In orderby to smlartes. In order to avodand effect of metrc channel smlarty s normalzed avod effect of channel sze, smlarty ndex s normalzed by channels: ℎ : n nq a aa Smlarty )2 )/2channels /n Smlarty == ffframe /channels/n å ( (f faframe_t frame _ t _ t +1+1 f rame_t a (13) (13) a Smlarty Here, and f a are features of certan anchor on neghbourng frame. Here, f farame_t_ and f ramet +1 are features of certan anchor on neghbourng frame. The channel s number of feature map. Fgure 10 shows feature s smlarty of proposed network and baselne. baselne. We We can see our proposed method mantans a better feature feature smlarty smlarty than than baselne. baselne. Fgure 10. The feature smlarty smlarty of proposed method. The The horzontal horzontal axs axs s s number of multscale feature map and vertcal axs s smlarty ndex. Fgure shows detecton detecton results results of of valdaton valdaton dataset. dataset. Fgure shows Model Model Analyss Analyss The number of anchor shapes and mult-scale feature maps are key hyper-parameters n proposed framework. To evaluate effects of dfferent number of anchor shapes and feature maps, addtonal comparson experments are conducted on ImageNet VID dataset Number of Anchor Shapes In proposed framework, number of anchor shapes for each pxel n mult-scale feature map s one of key hyper-parameter and number of anchor shapes could affect detecton performance and speed. To analyze how number of anchors affect detecton performance and speed, we conduct a comparson experment between dfferent numbers of anchors. Table 5 shows detals about effect on detecton performance and speed. The settngs of ths experment are as followng: Anchor-6: For each pxel n mult-scale feature map, re are sx anchor shapes n multscale feature map, whch contans [1, 2, 3, 12, 13 ] and an aspect rato of 1 for a scale of sn sn+1. In our framework, total number of anchors s 11,640. Anchor-4: For each pxel n mult-scale feature map, re are four anchor shapes n multscale feature map, whch contans [1, 2, 12 ] and an aspect rato of 1 for a scale of sn sn+1. In our framework, total number of anchors s Anchor-4 and 6: In ths settng, we follow orgnal SSD settng. In Conv_4_3, Conv9_2, and Conv10_2, re are four anchor shapes n multscale feature map, whch contans [1, 2, 21 ]

16 Sensors 2018, 18, of 21 and an aspect rato of 1 for a scale of sn sn+1. Then, n Conv_6_2, Conv7_2, and Conv8_2, re are sx anchor shapes n multscale feature map, whch contans [1, 2, 3, 12, 13 ] and an aspect rato of 1 for a scale of sn sn+review 1. In our framework, total number of anchors s of 21 Sensors 2018, 18, x FOR PEER (a) (b) (c) Fgure 11. Results of valdaton dataset. (a) Blue boundng boxes are ground truths. (b) Fgure 11. Yellow Results of valdaton (a) Blue boundng results. ground truths. (b) Yellow boundng boxes are dataset. baselne results. (c) Red boundngboxes boxes are are our boundng boxes are baselne results. (c) Red boundng boxes are our results.

17 Sensors 2018, 18, of 21 Table 5. Detecton performance on dfferent number of anchor shapes. The bold values n table are fastest detecton speed and best performance. Settngs Anchor-6 Anchor-4 Anchor-4 and 6 Anchor number 11, Detecton speed 32 fps 51 fps 46 fps Mean AP From table, we can see that more anchors refer to better mean average precson. Addtonally, more anchors need more computatonal resources and detecton speed s lower. If we remove aspect rato of 3 and 1 3, performance drops 1.6% and detecton speed has an 18 fps bonus Number of Mult-Scale Feature Maps In our framework, number of mult-scale feature maps s also a key hyper-parameter. To nvestgate effect of mult-scale feature representaton, we do anor comparson experment between dfferent numbers of mult-scale feature maps. Table 6 shows detals about effect on detecton performance and speed. Moreover, anchor shapes n each pxel s sx, whch conssts of [1, 2, 3, 1 2, 1 3 ] and an aspect rato of 1 for a scale of s n s n+1. The followngs are expermental settngs: 1. Feature-6: In ths settng, we use sx feature maps, whch are Conv4_3, Conv6_2, Conv7_2, Conv8_2, Conv9_2, and Conv10_2. In our framework, total number of anchors s 11, Feature-5: In ths settng, we use fve feature maps, whch are Conv4_3, Conv6_2, Conv7_2, Conv8_2, and Conv9_2. In our framework, total number of anchors s 11, Feature-4: In ths settng, we use four feature maps, whch are Conv4_3, Conv6_2, Conv7_2, and Conv8_2. In our framework, total number of anchors s 11, Feature-3: In ths settng, we use three feature maps, whch are Conv4_3, Conv6_2, and Conv7_2. In our framework, total number of anchors s 11,430. Table 6. Detecton performance on dfferent number of feature maps. The bold values n table are fastest detecton speed and best performance. Settngs Feature-6 Feature-5 Feature-4 Feature-3 Anchor number 11,640 11,634 11,580 11,430 Detecton speed 32 fps 32 fps 32 fps 33 fps Mean AP From table, we can see that more feature maps can sgnfcantly enhance mean average precson. Snce anchors are mostly generated on feature maps wth lower resoluton, speed between settngs does not make a great dfference. Compared wth Feature-3, Feature-6 has a 2.7% bonus on mean average precson Evaluaton of YouTube Object (YTO) Dataset In order to show effectveness of proposed network, we evaluate our model on a vdeo object detecton task wth YTO dataset [44] YouTube Object Dataset The YTO dataset contans 10 object classes, whch nclude an arplane, brd, boat, car, cat, cow, dog, horse, motor-bke, and tran. Moreover, se 10 object classes are a subset of ImageNet VID dataset and se objects are also movng objects. Dfferent from VID dataset whch contans full annotatons on all vdeo frames, YTO tranng dataset s weakly annotated,.e., each vdeo s only ensured to contan one object of correspondng class, and only a few frames, whereas objects n

18 Sensors 2018, 18, of 21 YTO test dataset are all annotated. In total, YTO dataset contans 155 vdeos. However, t only contans 6087 annotated frames; among m 4306 are for tranng and 1781 are for testng. The weak annotaton makes t nfeasble to tran proposed network on YTO dataset. Followng [13,15], snce YTO classes are a subset of VID dataset classes, we only use ths dataset for evaluaton and can drectly apply traned models on YTO dataset for evaluaton. The evaluaton metrc s same as ImageNet VID dataset Evaluaton Results We evaluate model traned on ImageNet to YTO test dataset and several state-of--art methods are selected for comparson. Table 7 shows detaled AP lsts computed by Unsupervsed [40], YOLO [22], Context [41], a_lstm [42], T-CNN [15], Base [10] and our own YTO test dataset. Table 7. Detecton performance on YTO dataset. The bold values n table are best results among sngle models for a certan class. Class [40] [22] [41] [42] [15] Base Our Arplane Brd Boat Car Cat Cow Dog Horse Moterbke Tran Mean AP From table, we can see that our proposed framework outperforms by a large margn. Compared wth baselne, our proposed method has around a 4.4% mprovement and, compared wth T-CNN, our proposed method has a 4.0% bonus. 5. Conclusons In ths paper, we propose a fast and accurate object detecton network for vdeo sources. The proposed network s a sngle-shot object detecton network. Unlke tradtonal sngle-frame-based object detecton network, our proposed network nvolves frame-to-frame nformaton by usng object par relatons among neghborng frames. Followng Samese network, we formulate a correlaton loss to restran deep features. In ths way, we ncorporate correlaton nto dscrmnatve algorthm. The backbone network s VGG16 model that reduced fully-connected layers. Based on backbone network, we establsh a multscale feature representaton to predct detectons on multple layers. Dfferent anchor scales are appled to dfferent feature maps for dfferent object scales. Hard example mnng and data augmentaton are also used to balance tranng samples and to test generalzatons. Our proposed model has been tested on a large object detecton dataset, namely, ImageNet VID dataset. Ths dataset s largest vdeo object detecton dataset. Compared to baselne, our proposed network has a 1.6% bonus, and test tme does not ncrease. In order to show effectveness of proposed framework, we evaluate model on YTO dataset and proposed framework has a 4.4% bonus compared to baselne. Although our proposed network has a better performance than baselne, t stll has some lmtatons. The frst lmtaton s use of tme-seres nformaton. The correlaton loss s rgd to feature map. The second lmtaton s that we do not fully use trackng nformaton. The trackng

19 Sensors 2018, 18, of 21 process s only on feature map and does not output exact trackng result. If network could output exact trackng result, testng tme could be faster. Acknowledgments: The work descrbed n ths paper s supported by 111 Project of Chna under grant B14010 and Chang Jang Scholars Programme (grant no. T ). Author Contrbutons: Baojun Zhao, Boya Zhao and Lnbo Tang created and desgned framework. Boya Zhao and Wenzheng Wang performed algorthm on Pytorch. Boya Zhao and Wenzheng Wang debugged code. Boya Zhao and Yuq Han wrote paper. Conflcts of Interest: The authors declare no conflcts of nterest. References 1. LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagaton appled to handwrtten zp code recognton. Neural Comput. 1989, 1, [CrossRef] 2. Krzhevsky, A.; Sutskever, I.; Hnton, G.E. Imagenet classfcaton wth deep convolutonal neural networks. Adv. Neural Inf. Process. Syst. 2012, 2012, [CrossRef] 3. Smonyan, K.; Zsserman, A. Very deep convolutonal networks for large-scale mage recognton. arxv 2014, arxv: Szegedy, C.; Lu, W.; Ja, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabnovch, A. Gong deeper wth convolutons. In Proceedngs of IEEE Conference on Computer Vson and Pattern Recognton, Boston, MA, USA, 7 12 June 2015; pp Grshck, R. Fast r-cnn. In Proceedngs of IEEE Internatonal Conference on Computer Vson, Los Alamtos, CA, USA, 7 13 December 2015; pp Ren, S.; He, K.; Grshck, R.; Sun, J. Faster r-cnn: Towards real-tme object detecton wth regon proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, [CrossRef] [PubMed] 7. Da, J.; L, Y.; He, K.; Sun, J. R-fcn: Object detecton va regon-based fully convolutonal networks. Adv. Neural Inf. Process. Syst. 2016, 2016, Grshck, R.; Donahue, J.; Darrell, T.; Malk, J. Rch feature herarches for accurate object detecton and semantc segmentaton. In Proceedngs of IEEE Conference on Computer Vson and Pattern Recognton, Columbus, OH, USA, June 2014; pp Zhong, J.; Le, T.; Yao, G. Robust Vehcle Detecton n Aeral Images Based on Cascaded Convolutonal Neural Networks. Sensors 2017, 17, [CrossRef] [PubMed] 10. Lu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Sngle shot multbox detector. In European Conference on Computer Vson; Sprnger: Cham, Swtzerland, 2016; pp Oh, S.I.; Kang, H.B. Object Detecton and Classfcaton by Decson-Level Fuson for Intellgent Vehcle Systems. Sensors 2017, 17, 207. [CrossRef] [PubMed] 12. Zhu, X.; Xong, Y.; Da, J.; Yuan, L.; We, Y. Deep feature flow for vdeo recognton. arxv 2016, arxv: Kang, K.; L, H.; Yan, J.; Zeng, X.; Yang, B.; Xao, T.; Zhang, C.; Wang, Z.; Wang, R.; Wang, X. T-CNN: Tubelets wth convolutonal neural networks for object detecton from vdeos. IEEE Trans. Crcuts Systems Vdeo Technol [CrossRef] 14. Han, W.; Khorram, P.; Pane, T.L.; Ramachandran, P.; Babaezadeh, M.; Sh, H.; L, J.; Yan, S.; Huang, T.S. Seq-nms for vdeo object detecton. arxv 2016, arxv: Kang, K.; Ouyang, W.; L, H.; Wang, X. Object detecton from vdeo tubelets wth convolutonal neural networks. In Proceedngs of IEEE Conference on Computer Vson and Pattern Recognton, Seattle, WA, USA, June 2016; pp Lee, B.; Erdenee, E.; Jn, S.; Nam, M.Y.; Jung, Y.G.; Rhee, P.K. Mult-class mult-object trackng usng changng pont detecton. In European Conference on Computer Vson; Sprnger: Cham, Swtzerland, 2016; pp Chopra, S.; Hadsell, R.; LeCun, Y. Learnng a smlarty metrc dscrmnatvely, wth applcaton to face verfcaton. In Proceedngs of IEEE CVPR Computer Socety Conference on Computer Vson and Pattern Recognton, San Dego, CA, USA, June 2005; pp Wenberger, K.Q.; Saul, L.K. Dstance metrc learnng for large margn nearest neghbor classfcaton. J. Mach. Learn. Res. 2009, 10,

20 Sensors 2018, 18, of Bodla, N.; Sngh, B.; Chellappa, R.; Davs, L.S. Soft-NMS Improvng Object Detecton wth One Lne of Code. In Proceedngs of 2017 IEEE Internatonal Conference on Computer Vson (ICCV), Vence, Italy, October 2017; pp Felzenszwalb, P.; McAllester, D.; Ramanan, D. A dscrmnatvely traned, multscale, deformable part model. In Proceedngs of IEEE CVPR Conference on Computer Vson and Pattern Recognton, Anchorage, AK, USA, June 2008; pp Ujlngs, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selectve search for object recognton. Int. J. Comput. Vs. 2013, 104, [CrossRef] 22. Redmon, J.; Dvvala, S.; Grshck, R.; Farhad, A. You only look once: Unfed, real-tme object detecton. In Proceedngs of IEEE Conference on Computer Vson and Pattern Recognton, Seattle, WA, USA, June 2016; pp Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Saesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernsten, M. Imagenet large scale vsual recognton challenge. Int. J. Comput. Vs. 2015, 115, [CrossRef] 24. Gkoxar, G.; Malk, J. Fndng acton tubes. In Proceedngs of 2015 IEEE Conference on Computer Vson and Pattern Recognton (CVPR), Boston, MA, USA, 7 12 June 2015; pp Peng, X.; Schmd, C. Mult-regon two-stream R-CNN for acton detecton. In European Conference on Computer Vson; Sprnger: Cham, Swtzerland, 2016; pp Hou, R.; Chen, C.; Shah, M. Tube convolutonal neural network (T-CNN) for acton detecton n vdeos. In Proceedngs of IEEE Internatonal Conference on Computer Vson, Vence, Italy, October L, C.; Stevens, A.; Chen, C.; Pu, Y.; Gan, Z.; Carn, L. Learnng Weght Uncertanty wth Stochastc Gradent MCMC for Shape Classfcaton. In Proceedngs of Computer Vson and Pattern Recognton, Seattle, WA, USA, June 2016; pp Lucano, L.; Hamza, A.B. Deep learnng wth geodesc moments for 3D shape classfcaton. Pattern Recognt. Lett [CrossRef] 29. Nar, V.; Hnton, G.E. Rectfed lnear unts mprove restrcted boltzmann machnes. In Proceedngs of 27th Internatonal Conference on Machne Learnng (ICML-10), Hafa, Israel, June 2010; pp Erhan, D.; Szegedy, C.; Toshev, A.; Anguelov, D. Scalable object detecton usng deep neural networks. In Proceedngs of IEEE Conference on Computer Vson and Pattern Recognton, Columbus, OH, USA, June 2014; pp Hosang, J.; Benenson, R.; Schele, B. How good are detecton proposals, really? arxv 2014, arxv: Shrvastava, A.; Gupta, A.; Grshck, R. Tranng regon-based object detectors wth onlne hard example mnng. In Proceedngs of IEEE Conference on Computer Vson and Pattern Recognton, Seattle, WA, USA, June 2016; pp Huber, P.J. Robust Estmaton of a Locaton Parameter. Ann. Math. Stat. 1964, 35, [CrossRef] 34. Henrques, J.F.; Casero, R.; Martns, P.; Batsta, J. Hgh-speed trackng wth kernelzed correlaton flters. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, [CrossRef] [PubMed] 35. Baoxan, W.; Lnbo, T.; Jngln, Y.; Baojun, Z.; Shugen, W. Vsual Trackng Based on Extreme Learnng Machne and Sparse Representaton. Sensors 2015, 15, Bertnetto, L.; Valmadre, J.; Henrques, J.F.; Vedald, A.; Torr, P.H. Fully-convolutonal samese networks for object trackng. In European Conference on Computer Vson; Sprnger: Cham, Swtzerland, 2016; pp Zhao, Z.; Han, Y.; Xu, T.; L, X.; Song, H.; Luo, J. A Relable and Real-Tme Trackng Method wth Color Dstrbuton. Sensors 2017, 17, [CrossRef] [PubMed] 38. Zhu, X.; Wang, Y.; Da, J.; Yuan, L.; We, Y. Flow-Guded Feature Aggregaton for Vdeo Object Detecton. arxv 2017, arxv: Kang, K.; L, H.; Xao, T.; Ouyang, W.; Yan, J.; Lu, X.; Wang, X. Object detecton n vdeos wth tubelet proposal networks. In Proceedngs of IEEE Conference on Computer Vson and Pattern Recognton, Hawa, HI, USA, July 2017; p Kwak, S.; Cho, M.; Laptev, I.; Ponce, J. Unsupervsed Object Dscovery and Trackng n Vdeo Collectons. In Proceedngs of IEEE Internatonal Conference on Computer Vson, Washngton, DC, USA, 7 13 December 2015; pp

21 Sensors 2018, 18, of Trpath, S.; Lpton, Z.; Belonge, S.; Nguyen, T. Context Matters: Refnng Object Detecton n Vdeo wth Recurrent Neural Networks. In Proceedngs of Brtsh Machne Vson Conference, York, UK, September 2016; pp Lu, Y.; Lu, C.; Tang, C.K. Onlne Vdeo Object Detecton Usng Assocaton LSTM. In Proceedngs of IEEE Internatonal Conference on Computer Vson, Vence, Italy, October 2017; pp Glorot, X.; Bengo, Y. Understandng dffculty of tranng deep feedforward neural networks. In Proceedngs of Thrteenth Internatonal Conference on Artfcal Intellgence and Statstcs, Sanya, Chna, October 2010; pp Ferrar, V.; Schmd, C.; Cvera, J.; Lestner, C.; Prest, A. Learnng object class detectors from weakly annotated vdeo. In Proceedngs of IEEE Conference on Computer Vson and Pattern Recognton, Provdence, RI, USA, June 2012; pp by authors. Lcensee MDPI, Basel, Swtzerland. Ths artcle s an open access artcle dstrbuted under terms and condtons of Creatve Commons Attrbuton (CC BY) lcense (

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1 4/14/011 Outlne Dscrmnatve classfers for mage recognton Wednesday, Aprl 13 Krsten Grauman UT-Austn Last tme: wndow-based generc obect detecton basc ppelne face detecton wth boostng as case study Today:

More information

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth

More information

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching A Fast Vsual Trackng Algorthm Based on Crcle Pxels Matchng Zhqang Hou hou_zhq@sohu.com Chongzhao Han czhan@mal.xjtu.edu.cn Ln Zheng Abstract: A fast vsual trackng algorthm based on crcle pxels matchng

More information

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,

More information

Face Detection with Deep Learning

Face Detection with Deep Learning Face Detecton wth Deep Learnng Yu Shen Yus122@ucsd.edu A13227146 Kuan-We Chen kuc010@ucsd.edu A99045121 Yzhou Hao y3hao@ucsd.edu A98017773 Mn Hsuan Wu mhwu@ucsd.edu A92424998 Abstract The project here

More information

Local Quaternary Patterns and Feature Local Quaternary Patterns

Local Quaternary Patterns and Feature Local Quaternary Patterns Local Quaternary Patterns and Feature Local Quaternary Patterns Jayu Gu and Chengjun Lu The Department of Computer Scence, New Jersey Insttute of Technology, Newark, NJ 0102, USA Abstract - Ths paper presents

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Machine Learning 9. week

Machine Learning 9. week Machne Learnng 9. week Mappng Concept Radal Bass Functons (RBF) RBF Networks 1 Mappng It s probably the best scenaro for the classfcaton of two dataset s to separate them lnearly. As you see n the below

More information

Lecture 5: Multilayer Perceptrons

Lecture 5: Multilayer Perceptrons Lecture 5: Multlayer Perceptrons Roger Grosse 1 Introducton So far, we ve only talked about lnear models: lnear regresson and lnear bnary classfers. We noted that there are functons that can t be represented

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

Data Mining: Model Evaluation

Data Mining: Model Evaluation Data Mnng: Model Evaluaton Aprl 16, 2013 1 Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct

More information

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices Steps for Computng the Dssmlarty, Entropy, Herfndahl-Hrschman and Accessblty (Gravty wth Competton) Indces I. Dssmlarty Index Measurement: The followng formula can be used to measure the evenness between

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1. SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY SSDH: Sem-supervsed Deep Hashng for Large Scale Image Retreval Jan Zhang, and Yuxn Peng arxv:607.08477v2 [cs.cv] 8 Jun 207 Abstract Hashng

More information

An Image Fusion Approach Based on Segmentation Region

An Image Fusion Approach Based on Segmentation Region Rong Wang, L-Qun Gao, Shu Yang, Yu-Hua Cha, and Yan-Chun Lu An Image Fuson Approach Based On Segmentaton Regon An Image Fuson Approach Based on Segmentaton Regon Rong Wang, L-Qun Gao, Shu Yang 3, Yu-Hua

More information

Classifier Selection Based on Data Complexity Measures *

Classifier Selection Based on Data Complexity Measures * Classfer Selecton Based on Data Complexty Measures * Edth Hernández-Reyes, J.A. Carrasco-Ochoa, and J.Fco. Martínez-Trndad Natonal Insttute for Astrophyscs, Optcs and Electroncs, Lus Enrque Erro No.1 Sta.

More information

Collaboratively Regularized Nearest Points for Set Based Recognition

Collaboratively Regularized Nearest Points for Set Based Recognition Academc Center for Computng and Meda Studes, Kyoto Unversty Collaboratvely Regularzed Nearest Ponts for Set Based Recognton Yang Wu, Mchhko Mnoh, Masayuk Mukunok Kyoto Unversty 9/1/013 BMVC 013 @ Brstol,

More information

TN348: Openlab Module - Colocalization

TN348: Openlab Module - Colocalization TN348: Openlab Module - Colocalzaton Topc The Colocalzaton module provdes the faclty to vsualze and quantfy colocalzaton between pars of mages. The Colocalzaton wndow contans a prevew of the two mages

More information

Cluster Analysis of Electrical Behavior

Cluster Analysis of Electrical Behavior Journal of Computer and Communcatons, 205, 3, 88-93 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/cc http://dx.do.org/0.4236/cc.205.350 Cluster Analyss of Electrcal Behavor Ln Lu Ln Lu, School

More information

Edge Detection in Noisy Images Using the Support Vector Machines

Edge Detection in Noisy Images Using the Support Vector Machines Edge Detecton n Nosy Images Usng the Support Vector Machnes Hlaro Gómez-Moreno, Saturnno Maldonado-Bascón, Francsco López-Ferreras Sgnal Theory and Communcatons Department. Unversty of Alcalá Crta. Madrd-Barcelona

More information

Optimizing Document Scoring for Query Retrieval

Optimizing Document Scoring for Query Retrieval Optmzng Document Scorng for Query Retreval Brent Ellwen baellwe@cs.stanford.edu Abstract The goal of ths project was to automate the process of tunng a document query engne. Specfcally, I used machne learnng

More information

arxiv: v2 [cs.cv] 9 Apr 2018

arxiv: v2 [cs.cv] 9 Apr 2018 Boundary-senstve Network for Portrat Segmentaton Xanzh Du 1, Xaolong Wang 2, Dawe L 2, Jngwen Zhu 2, Serafettn Tasc 2, Cameron Uprght 2, Stephen Walsh 2, Larry Davs 1 1 Computer Vson Lab, UMIACS, Unversty

More information

Classifying Acoustic Transient Signals Using Artificial Intelligence

Classifying Acoustic Transient Signals Using Artificial Intelligence Classfyng Acoustc Transent Sgnals Usng Artfcal Intellgence Steve Sutton, Unversty of North Carolna At Wlmngton (suttons@charter.net) Greg Huff, Unversty of North Carolna At Wlmngton (jgh7476@uncwl.edu)

More information

Comparing Image Representations for Training a Convolutional Neural Network to Classify Gender

Comparing Image Representations for Training a Convolutional Neural Network to Classify Gender 2013 Frst Internatonal Conference on Artfcal Intellgence, Modellng & Smulaton Comparng Image Representatons for Tranng a Convolutonal Neural Network to Classfy Gender Choon-Boon Ng, Yong-Haur Tay, Bok-Mn

More information

Discriminative Dictionary Learning with Pairwise Constraints

Discriminative Dictionary Learning with Pairwise Constraints Dscrmnatve Dctonary Learnng wth Parwse Constrants Humn Guo Zhuoln Jang LARRY S. DAVIS UNIVERSITY OF MARYLAND Nov. 6 th, Outlne Introducton/motvaton Dctonary Learnng Dscrmnatve Dctonary Learnng wth Parwse

More information

Support Vector Machines

Support Vector Machines /9/207 MIST.6060 Busness Intellgence and Data Mnng What are Support Vector Machnes? Support Vector Machnes Support Vector Machnes (SVMs) are supervsed learnng technques that analyze data and recognze patterns.

More information

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and

More information

Fast Feature Value Searching for Face Detection

Fast Feature Value Searching for Face Detection Vol., No. 2 Computer and Informaton Scence Fast Feature Value Searchng for Face Detecton Yunyang Yan Department of Computer Engneerng Huayn Insttute of Technology Hua an 22300, Chna E-mal: areyyyke@63.com

More information

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram Shape Representaton Robust to the Sketchng Order Usng Dstance Map and Drecton Hstogram Department of Computer Scence Yonse Unversty Kwon Yun CONTENTS Revew Topc Proposed Method System Overvew Sketch Normalzaton

More information

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a

More information

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS P.G. Demdov Yaroslavl State Unversty Anatoly Ntn, Vladmr Khryashchev, Olga Stepanova, Igor Kostern EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS Yaroslavl, 2015 Eye

More information

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm Recommended Items Ratng Predcton based on RBF Neural Network Optmzed by PSO Algorthm Chengfang Tan, Cayn Wang, Yuln L and Xx Q Abstract In order to mtgate the data sparsty and cold-start problems of recommendaton

More information

Action Tubelet Detector for Spatio-Temporal Action Localization

Action Tubelet Detector for Spatio-Temporal Action Localization Acton Tubelet Detector for Spato-Temporal Acton Localzaton Vcky Kalogeton, Phlppe Wenzaepfel, Vttoro Ferrar, Cordela Schmd To cte ths verson: Vcky Kalogeton, Phlppe Wenzaepfel, Vttoro Ferrar, Cordela Schmd.

More information

An Entropy-Based Approach to Integrated Information Needs Assessment

An Entropy-Based Approach to Integrated Information Needs Assessment Dstrbuton Statement A: Approved for publc release; dstrbuton s unlmted. An Entropy-Based Approach to ntegrated nformaton Needs Assessment June 8, 2004 Wllam J. Farrell Lockheed Martn Advanced Technology

More information

Research and Application of Fingerprint Recognition Based on MATLAB

Research and Application of Fingerprint Recognition Based on MATLAB Send Orders for Reprnts to reprnts@benthamscence.ae The Open Automaton and Control Systems Journal, 205, 7, 07-07 Open Access Research and Applcaton of Fngerprnt Recognton Based on MATLAB Nng Lu* Department

More information

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 4, APRIL

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 4, APRIL IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 4, APRIL 2016 1713 Weakly Supervsed Fne-Graned Categorzaton Wth Part-Based Image Representaton Yu Zhang, Xu-Shen We, Janxn Wu, Member, IEEE, Janfe Ca,

More information

Smoothing Spline ANOVA for variable screening

Smoothing Spline ANOVA for variable screening Smoothng Splne ANOVA for varable screenng a useful tool for metamodels tranng and mult-objectve optmzaton L. Rcco, E. Rgon, A. Turco Outlne RSM Introducton Possble couplng Test case MOO MOO wth Game Theory

More information

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification Introducton to Artfcal Intellgence V22.0472-001 Fall 2009 Lecture 24: Nearest-Neghbors & Support Vector Machnes Rob Fergus Dept of Computer Scence, Courant Insttute, NYU Sldes from Danel Yeung, John DeNero

More information

Focal Loss in 3D Object Detection

Focal Loss in 3D Object Detection 1 Focal Loss n 3D Object Detecton eng Yun1 Le Ta2 Yuan Wang2 Chengju Lu3 Mng Lu2 Fg. 1. Upper two rows show projected 3D object detecton results from the detector traned wth bnary cross entropy. Lower

More information

A high precision collaborative vision measurement of gear chamfering profile

A high precision collaborative vision measurement of gear chamfering profile Internatonal Conference on Advances n Mechancal Engneerng and Industral Informatcs (AMEII 05) A hgh precson collaboratve vson measurement of gear chamferng profle Conglng Zhou, a, Zengpu Xu, b, Chunmng

More information

Classifier Swarms for Human Detection in Infrared Imagery

Classifier Swarms for Human Detection in Infrared Imagery Classfer Swarms for Human Detecton n Infrared Imagery Yur Owechko, Swarup Medasan, and Narayan Srnvasa HRL Laboratores, LLC 3011 Malbu Canyon Road, Malbu, CA 90265 {owechko, smedasan, nsrnvasa}@hrl.com

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input

Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input Real-tme Jont Tracng of a Hand Manpulatng an Object from RGB-D Input Srnath Srdhar 1 Franzsa Mueller 1 Mchael Zollhöfer 1 Dan Casas 1 Antt Oulasvrta 2 Chrstan Theobalt 1 1 Max Planc Insttute for Informatcs

More information

3D vector computer graphics

3D vector computer graphics 3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres

More information

Feature Reduction and Selection

Feature Reduction and Selection Feature Reducton and Selecton Dr. Shuang LIANG School of Software Engneerng TongJ Unversty Fall, 2012 Today s Topcs Introducton Problems of Dmensonalty Feature Reducton Statstc methods Prncpal Components

More information

Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch

Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch Deep learnng s a good steganalyss tool when embeddng key s reused for dfferent mages, even f there s a cover source-msmatch Lonel PIBRE 2,3, Jérôme PASQUET 2,3, Dno IENCO 2,3, Marc CHAUMONT 1,2,3 (1) Unversty

More information

Learning-based License Plate Detection on Edge Features

Learning-based License Plate Detection on Edge Features Learnng-based Lcense Plate Detecton on Edge Features Wng Teng Ho, Woo Hen Yap, Yong Haur Tay Computer Vson and Intellgent Systems (CVIS) Group Unverst Tunku Abdul Rahman, Malaysa wngteng_h@yahoo.com, woohen@yahoo.com,

More information

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task Proceedngs of NTCIR-6 Workshop Meetng, May 15-18, 2007, Tokyo, Japan Term Weghtng Classfcaton System Usng the Ch-square Statstc for the Classfcaton Subtask at NTCIR-6 Patent Retreval Task Kotaro Hashmoto

More information

arxiv: v3 [cs.cv] 3 Aug 2017

arxiv: v3 [cs.cv] 3 Aug 2017 Drone-based Object Countng by Spatally Regularzed Regonal Proposal Network Meng-Ru Hseh1, Yen-Lang Ln2, and Wnston H. Hsu1 arv:1707.05972v3 [cs.cv] 3 Aug 2017 1 2 Natonal Tawan Unversty, Tape, Tawan GE

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

The Comparison of Calibration Method of Binocular Stereo Vision System Ke Zhang a *, Zhao Gao b

The Comparison of Calibration Method of Binocular Stereo Vision System Ke Zhang a *, Zhao Gao b 3rd Internatonal Conference on Materal, Mechancal and Manufacturng Engneerng (IC3ME 2015) The Comparson of Calbraton Method of Bnocular Stereo Vson System Ke Zhang a *, Zhao Gao b College of Engneerng,

More information

ALEXNET FEATURE EXTRACTION AND MULTI-KERNEL LEARNING FOR OBJECT- ORIENTED CLASSIFICATION

ALEXNET FEATURE EXTRACTION AND MULTI-KERNEL LEARNING FOR OBJECT- ORIENTED CLASSIFICATION ALEXNET FEATURE EXTRACTION AND MULTI-KERNEL LEARNING FOR OBJECT- ORIENTED CLASSIFICATION Lng Dng 1, Hongy L 2, *, Changmao Hu 2, We Zhang 2, Shumn Wang 1 1 Insttute of Earthquake Forecastng, Chna Earthquake

More information

CS 534: Computer Vision Model Fitting

CS 534: Computer Vision Model Fitting CS 534: Computer Vson Model Fttng Sprng 004 Ahmed Elgammal Dept of Computer Scence CS 534 Model Fttng - 1 Outlnes Model fttng s mportant Least-squares fttng Maxmum lkelhood estmaton MAP estmaton Robust

More information

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Learning the Kernel Parameters in Kernel Minimum Distance Classifier Learnng the Kernel Parameters n Kernel Mnmum Dstance Classfer Daoqang Zhang 1,, Songcan Chen and Zh-Hua Zhou 1* 1 Natonal Laboratory for Novel Software Technology Nanjng Unversty, Nanjng 193, Chna Department

More information

Reducing Frame Rate for Object Tracking

Reducing Frame Rate for Object Tracking Reducng Frame Rate for Object Trackng Pavel Korshunov 1 and We Tsang Oo 2 1 Natonal Unversty of Sngapore, Sngapore 11977, pavelkor@comp.nus.edu.sg 2 Natonal Unversty of Sngapore, Sngapore 11977, oowt@comp.nus.edu.sg

More information

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity

Efficient Segmentation and Classification of Remote Sensing Image Using Local Self Similarity ISSN(Onlne): 2320-9801 ISSN (Prnt): 2320-9798 Internatonal Journal of Innovatve Research n Computer and Communcaton Engneerng (An ISO 3297: 2007 Certfed Organzaton) Vol.2, Specal Issue 1, March 2014 Proceedngs

More information

Audio Event Detection and classification using extended R-FCN Approach. Kaiwu Wang, Liping Yang, Bin Yang

Audio Event Detection and classification using extended R-FCN Approach. Kaiwu Wang, Liping Yang, Bin Yang Audo Event Detecton and classfcaton usng extended R-FCN Approach Kawu Wang, Lpng Yang, Bn Yang Key Laboratory of Optoelectronc Technology and Systems(Chongqng Unversty), Mnstry of Educaton, ChongQng Unversty,

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Decson surface s a hyperplane (lne n 2D) n feature space (smlar to the Perceptron) Arguably, the most mportant recent dscovery n machne learnng In a nutshell: map the data to a predetermned

More information

Convolutional Neural Network- based Human Recognition for Vision Occupancy Sensors

Convolutional Neural Network- based Human Recognition for Vision Occupancy Sensors 10 Int'l Conf. IP, Comp. Vson, and Pattern Recognton IPCV'18 Convolutonal Neural Network- based Human Recognton for Vson Occupancy Sensors Seung Soo Lee and Manbae Km * Dept. of Computer and Communcatons

More information

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices Hgh resoluton 3D Tau-p transform by matchng pursut Wepng Cao* and Warren S. Ross, Shearwater GeoServces Summary The 3D Tau-p transform s of vtal sgnfcance for processng sesmc data acqured wth modern wde

More information

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION Paulo Quntlano 1 & Antono Santa-Rosa 1 Federal Polce Department, Brasla, Brazl. E-mals: quntlano.pqs@dpf.gov.br and

More information

The Study of Remote Sensing Image Classification Based on Support Vector Machine

The Study of Remote Sensing Image Classification Based on Support Vector Machine Sensors & Transducers 03 by IFSA http://www.sensorsportal.com The Study of Remote Sensng Image Classfcaton Based on Support Vector Machne, ZHANG Jan-Hua Key Research Insttute of Yellow Rver Cvlzaton and

More information

Margin-Constrained Multiple Kernel Learning Based Multi-Modal Fusion for Affect Recognition

Margin-Constrained Multiple Kernel Learning Based Multi-Modal Fusion for Affect Recognition Margn-Constraned Multple Kernel Learnng Based Mult-Modal Fuson for Affect Recognton Shzh Chen and Yngl Tan Electrcal Engneerng epartment The Cty College of New Yor New Yor, NY USA {schen, ytan}@ccny.cuny.edu

More information

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Comparison Study of Textural Descriptors for Training Neural Network Classifiers Comparson Study of Textural Descrptors for Tranng Neural Network Classfers G.D. MAGOULAS (1) S.A. KARKANIS (1) D.A. KARRAS () and M.N. VRAHATIS (3) (1) Department of Informatcs Unversty of Athens GR-157.84

More information

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection A Gradent Dfference based Technque for Vdeo Text Detecton Palaahnakote Shvakumara, Trung Quy Phan and Chew Lm Tan School of Computng, Natonal Unversty of Sngapore {shva, phanquyt, tancl }@comp.nus.edu.sg

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

X- Chart Using ANOM Approach

X- Chart Using ANOM Approach ISSN 1684-8403 Journal of Statstcs Volume 17, 010, pp. 3-3 Abstract X- Chart Usng ANOM Approach Gullapall Chakravarth 1 and Chaluvad Venkateswara Rao Control lmts for ndvdual measurements (X) chart are

More information

Supplementary Material DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

Supplementary Material DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents Supplementary Materal DESIRE: Dstant Future Predcton n Dynamc Scenes wth Interactng Agents Namhoon Lee 1, Wongun Cho 2, Paul Vernaza 2, Chrstopher B. Choy 3, Phlp H. S. Torr 1, Manmohan Chandraker 2,4

More information

Face Recognition using 3D Directional Corner Points

Face Recognition using 3D Directional Corner Points 2014 22nd Internatonal Conference on Pattern Recognton Face Recognton usng 3D Drectonal Corner Ponts Xun Yu, Yongsheng Gao School of Engneerng Grffth Unversty Nathan, QLD, Australa xun.yu@grffthun.edu.au,

More information

Research of Image Recognition Algorithm Based on Depth Learning

Research of Image Recognition Algorithm Based on Depth Learning 208 4th World Conference on Control, Electroncs and Computer Engneerng (WCCECE 208) Research of Image Recognton Algorthm Based on Depth Learnng Zhang Jan, J Xnhao Zhejang Busness College, Hangzhou, Chna,

More information

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems A Unfed Framework for Semantcs and Feature Based Relevance Feedback n Image Retreval Systems Ye Lu *, Chunhu Hu 2, Xngquan Zhu 3*, HongJang Zhang 2, Qang Yang * School of Computng Scence Smon Fraser Unversty

More information

Hierarchical Image Retrieval by Multi-Feature Fusion

Hierarchical Image Retrieval by Multi-Feature Fusion Preprnts (www.preprnts.org) NOT PEER-REVIEWED Posted: 26 Aprl 207 do:0.20944/preprnts20704.074.v Artcle Herarchcal Image Retreval by Mult- Fuson Xaojun Lu, Jaojuan Wang,Yngq Hou, Me Yang, Q Wang* and Xangde

More information

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection 2009 10th Internatonal Conference on Document Analyss and Recognton A Gradent Dfference based Technque for Vdeo Text Detecton Palaahnakote Shvakumara, Trung Quy Phan and Chew Lm Tan School of Computng,

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Joint Example-based Depth Map Super-Resolution

Joint Example-based Depth Map Super-Resolution Jont Example-based Depth Map Super-Resoluton Yanje L 1, Tanfan Xue,3, Lfeng Sun 1, Janzhuang Lu,3,4 1 Informaton Scence and Technology Department, Tsnghua Unversty, Bejng, Chna Department of Informaton

More information

Dynamic Camera Assignment and Handoff

Dynamic Camera Assignment and Handoff 12 Dynamc Camera Assgnment and Handoff Br Bhanu and Ymng L 12.1 Introducton...338 12.2 Techncal Approach...339 12.2.1 Motvaton and Problem Formulaton...339 12.2.2 Game Theoretc Framework...339 12.2.2.1

More information

FACE detection and alignment are essential to many face

FACE detection and alignment are essential to many face IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 10, OCTOBER 2016 1499 Jont Face Detecton and Algnment Usng Multtask Cascaded Convolutonal Networks Kapeng Zhang, Zhanpeng Zhang, Zhfeng L, Senor Member, IEEE,

More information

The Research of Support Vector Machine in Agricultural Data Classification

The Research of Support Vector Machine in Agricultural Data Classification The Research of Support Vector Machne n Agrcultural Data Classfcaton Le Sh, Qguo Duan, Xnmng Ma, Me Weng College of Informaton and Management Scence, HeNan Agrcultural Unversty, Zhengzhou 45000 Chna Zhengzhou

More information

International Conference on Applied Science and Engineering Innovation (ASEI 2015)

International Conference on Applied Science and Engineering Innovation (ASEI 2015) Internatonal Conference on Appled Scence and Engneerng Innovaton (ASEI 205) Desgn and Implementaton of Novel Agrcultural Remote Sensng Image Classfcaton Framework through Deep Neural Network and Mult-

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mnng Massve Datasets Jure Leskovec, Stanford Unversty http://cs46.stanford.edu /19/013 Jure Leskovec, Stanford CS46: Mnng Massve Datasets, http://cs46.stanford.edu Perceptron: y = sgn( x Ho to fnd

More information

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points; Subspace clusterng Clusterng Fundamental to all clusterng technques s the choce of dstance measure between data ponts; D q ( ) ( ) 2 x x = x x, j k = 1 k jk Squared Eucldean dstance Assumpton: All features

More information

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database

A Multi-step Strategy for Shape Similarity Search In Kamon Image Database A Mult-step Strategy for Shape Smlarty Search In Kamon Image Database Paul W.H. Kwan, Kazuo Torach 2, Kesuke Kameyama 2, Junbn Gao 3, Nobuyuk Otsu 4 School of Mathematcs, Statstcs and Computer Scence,

More information

Wishing you all a Total Quality New Year!

Wishing you all a Total Quality New Year! Total Qualty Management and Sx Sgma Post Graduate Program 214-15 Sesson 4 Vnay Kumar Kalakband Assstant Professor Operatons & Systems Area 1 Wshng you all a Total Qualty New Year! Hope you acheve Sx sgma

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications 14/05/1 Machne Learnng: Algorthms and Applcatons Florano Zn Free Unversty of Bozen-Bolzano Faculty of Computer Scence Academc Year 011-01 Lecture 10: 14 May 01 Unsupervsed Learnng cont Sldes courtesy of

More information

Integrated Expression-Invariant Face Recognition with Constrained Optical Flow

Integrated Expression-Invariant Face Recognition with Constrained Optical Flow Integrated Expresson-Invarant Face Recognton wth Constraned Optcal Flow Chao-Kue Hseh, Shang-Hong La 2, and Yung-Chang Chen Department of Electrcal Engneerng, Natonal Tsng Hua Unversty, Tawan 2 Department

More information

Load Balancing for Hex-Cell Interconnection Network

Load Balancing for Hex-Cell Interconnection Network Int. J. Communcatons, Network and System Scences,,, - Publshed Onlne Aprl n ScRes. http://www.scrp.org/journal/jcns http://dx.do.org/./jcns.. Load Balancng for Hex-Cell Interconnecton Network Saher Manaseer,

More information

Face Tracking Using Motion-Guided Dynamic Template Matching

Face Tracking Using Motion-Guided Dynamic Template Matching ACCV2002: The 5th Asan Conference on Computer Vson, 23--25 January 2002, Melbourne, Australa. Face Trackng Usng Moton-Guded Dynamc Template Matchng Lang Wang, Tenu Tan, Wemng Hu atonal Laboratory of Pattern

More information

On Modeling Variations For Face Authentication

On Modeling Variations For Face Authentication On Modelng Varatons For Face Authentcaton Xaomng Lu Tsuhan Chen B.V.K. Vjaya Kumar Department of Electrcal and Computer Engneerng, Carnege Mellon Unversty Abstract In ths paper, we present a scheme for

More information

A Background Subtraction for a Vision-based User Interface *

A Background Subtraction for a Vision-based User Interface * A Background Subtracton for a Vson-based User Interface * Dongpyo Hong and Woontack Woo KJIST U-VR Lab. {dhon wwoo}@kjst.ac.kr Abstract In ths paper, we propose a robust and effcent background subtracton

More information

High-Boost Mesh Filtering for 3-D Shape Enhancement

High-Boost Mesh Filtering for 3-D Shape Enhancement Hgh-Boost Mesh Flterng for 3-D Shape Enhancement Hrokazu Yagou Λ Alexander Belyaev y Damng We z Λ y z ; ; Shape Modelng Laboratory, Unversty of Azu, Azu-Wakamatsu 965-8580 Japan y Computer Graphcs Group,

More information

(a) Input data X n. (b) VersNet. (c) Output data Y n. (d) Supervsed data D n. Fg. 2 Illustraton of tranng for proposed CNN. 2. Related Work In segment

(a) Input data X n. (b) VersNet. (c) Output data Y n. (d) Supervsed data D n. Fg. 2 Illustraton of tranng for proposed CNN. 2. Related Work In segment 一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS 信学技報 IEICE Techncal Report SANE2017-92 (2018-01) Deep Learnng for End-to-End Automatc Target Recognton from Synthetc

More information

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Face Recognition University at Buffalo CSE666 Lecture Slides Resources: Face Recognton Unversty at Buffalo CSE666 Lecture Sldes Resources: http://www.face-rec.org/algorthms/ Overvew of face recognton algorthms Correlaton - Pxel based correspondence between two face mages Structural

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

Scale Selective Extended Local Binary Pattern For Texture Classification

Scale Selective Extended Local Binary Pattern For Texture Classification Scale Selectve Extended Local Bnary Pattern For Texture Classfcaton Yutng Hu, Zhlng Long, and Ghassan AlRegb Multmeda & Sensors Lab (MSL) Georga Insttute of Technology 03/09/017 Outlne Texture Representaton

More information

PERFORMANCE EVALUATION FOR SCENE MATCHING ALGORITHMS BY SVM

PERFORMANCE EVALUATION FOR SCENE MATCHING ALGORITHMS BY SVM PERFORMACE EVALUAIO FOR SCEE MACHIG ALGORIHMS BY SVM Zhaohu Yang a, b, *, Yngyng Chen a, Shaomng Zhang a a he Research Center of Remote Sensng and Geomatc, ongj Unversty, Shangha 200092, Chna - yzhac@63.com

More information

2 ZHENG et al.: ASSOCIATING GROUPS OF PEOPLE (a) Ambgutes from person re dentfcaton n solaton (b) Assocatng groups of people may reduce ambgutes n mat

2 ZHENG et al.: ASSOCIATING GROUPS OF PEOPLE (a) Ambgutes from person re dentfcaton n solaton (b) Assocatng groups of people may reduce ambgutes n mat ZHENG et al.: ASSOCIATING GROUPS OF PEOPLE 1 Assocatng Groups of People We-Sh Zheng jason@dcs.qmul.ac.uk Shaogang Gong sgg@dcs.qmul.ac.uk Tao Xang txang@dcs.qmul.ac.uk School of EECS, Queen Mary Unversty

More information

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like: Self-Organzng Maps (SOM) Turgay İBRİKÇİ, PhD. Outlne Introducton Structures of SOM SOM Archtecture Neghborhoods SOM Algorthm Examples Summary 1 2 Unsupervsed Hebban Learnng US Hebban Learnng, Cntd 3 A

More information

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique //00 :0 AM Outlne and Readng The Greedy Method The Greedy Method Technque (secton.) Fractonal Knapsack Problem (secton..) Task Schedulng (secton..) Mnmum Spannng Trees (secton.) Change Money Problem Greedy

More information