(12) United States Patent (10) Patent No.: US 9,373,057 B1

Size: px

Start display at page:

Download "(12) United States Patent (10) Patent No.: US 9,373,057 B1"

Valentine Walker
5 years ago
Views:

1 US B1 (12) Unted States Patent () Patent No.: Erhan et al. (45) Date of Patent: Jun. 21, 2016 (54) TRAINING A NEURAL NETWORK TO 6,934,415 B2 * 8/2005 Stentford... GO6K 9,4671 DETECT OBJECTS IN MAGES 382,205 7,142,269 B2 * 1 1/2006 Ikeno... GO2F 1 ( r ar. 349,6 (71) Applcant: Google Inc., Mountan Vew, CA (US) 7,602,944 B2 * /2009 Campbell... GO6K9/ ,240 (72) Inventors: Dumtru Erhan, Vence, CA (US); 7,660,437 B2 * 2/20 Breed... GO6K9/00369 Chrstan Szegedly, Sunnyvale, CA 382/0 (US); Dragomr Anguelov, San Francsco, CA (US) OTHER PUBLICATIONS r ar. Alexe et al., What s an object? 20 IEEE Conference on Com (73) Assgnee: Google Inc., Mountan Vew, CA (US) putervson and Pattern Recognton (CVPR), Jun. 20, pp r Carrera and Smnchsescu, "Constraned parametrc mn-cuts for ( c ) Notce: Subject to any dsclamer, the term of ths automatc object segmentaton. 20 IEEE Conference on Com patent s extended or adjusted under 35 puter Vson and Pattern Recognton (CVPR), Jun. 20, pp Dean et al., Fast, Accurate Detecton of 0,000 Object Classes on (21) Appl. No.: 14/528,815 a Sngle Machne. Proceedngs of the 2013 IEEE Conference on Computer Vson and Pattern Recognton, 2013, pp (22) Fled: Oct. 30, 2014 (Contnued) Related U.S. Applcaton Data (60) Provsonal applcaton No. 61/899,124, fled on Nov. Prmary Examner Yosef Kassa 1, (74) Attorney, Agent, or Frm Fsh & Rchardson P.C. (51) Int. Cl. (57) ABSTRACT G06K 9/00 ( ) Methods, systems, and apparatus, ncludng computer pro G06K 9/62 ( ) grams encoded on computer storage meda, for tranng a G06K 9/66 ( ) neural network to detect object n mages. One of the methods (52) U.S. Cl. ncludes recevng a tranng mage and object locaton data CPC... G06K9/6256 ( ); G06K9/6202 for the tranng mage; provdng the tranng mage to a ( ); G06K 9/66 ( ) neural network and obtanng boundng box data for the tran (58) Feld of Classfcaton Search ng mage from the neural network, wheren the boundng box USPC / , 158, 159, 278: 706/15 data comprses data defnng a pluralty of canddate bound s s s s s s p 9. p y 7O6/16 ng boxes n the tranng mage and a respectve confdence See applcaton fle for complete search hstory. score for each canddate boundng box n the tranng mage: determnng an optmal set of assgnments usng the object (56) References Cted locaton data for the tranng mage and the boundng box data for the tranng mage, wheren the optmal set of assgnments U.S. PATENT DOCUMENTS assgns a respectve canddate boundng box to each of the object locatons; and tranng the neural network on the tran ng mage usng the optmal set of assgnments. 6,549,646 B1 * 4/2003 Yeh... GO6K 9, ,132 6,671,400 B1* 12/2003 Ekpar... HO4N 5, , Clams, 3 Drawng Sheets Neural Network Tranng System 0 Boundng Box Data 8 Object Detecton Neural Network 1 O2 Parameter Walues 1 Tranng mage 6 Tranng Images 4

Page 2 (56) References Cted OTHER PUBLICATIONS Endres and Hoem, Category ndependent object proposals. ECCV Proceedngs of the 11th European conference on Com puter vson: Part V, 20, pp. 575-588.

2 Page 2 (56) References Cted OTHER PUBLICATIONS Endres and Hoem, Category ndependent object proposals. ECCV Proceedngs of the 11th European conference on Com puter vson: Part V, 20, pp Everngham et al., The pascal vsual object classes (voc) challenge. Internatonal Journal of ComputerVson, 88(2): , Jun. 20. Felzenszwalb et al., "Object detecton wth dscrmnatvely traned part-based models. IEEE Transactons on Pattern Analyss and Machne Intellgence, 32(9): , Sep. 20. Fschler and Elschlager, The representaton and matchng of pcto ral structures. IEEE Transactons on Computers, c-22(1): Jan Grshck et al., Dscrmnatvely traned deformable part models, release 5. Sep. 5, 2012 retreved on Nov. 3, 2014). Retreved from the Internet: URLs. lease5/>, 3 pages. Gu et al., Recognton usng regons IEEE Conference on Computer Vson and Pattern Recognton, CVPR 2009, Jun. 2009, pp Krzhevsky et al., ImageNet Classfcaton wth Deep Convolutonal Neural Networks. NIPS, pp. 1-9, Lampert et al., Beyond sldng wndows: Object localzaton by effcent subwndow search. IEEE Conference on ComputerVson and Pattern Recognton, 2008, CVPR 2008, Jun. 2008, pp Song et al., Sparselet models for effcent multclass object detec ton. ECCV12 Proceedngs of the 12th European conference on Computer Vson vol. Part II, 2012, pp Szegedy et al., Deep neural networks for object detecton. In Advances n Neural Informaton Processng Systems (NIPS), 2013, pp van de Sande et al., "Segmentaton as selectve search for object recognton IEEE Internatonal Conference on Computer Vson (ICCV), Nov. 2011, pp Zhu et al., Latent herarchcal structural learnng for object detec ton. In 20 IEEE Conference on Computer Vson and Pattern Recognton (CVPR), Jun. 20, pp * cted by examner

3 U.S. Patent Jun. 21, 2016 Sheet 1 of 3 Neural Network Tranng System 0 Boundng Box Data 8 Object Detecton Neural NetWOrk 2 Parameter Values 1 Tranng mage 6 Tranng Images 4 FIG. 1

4 U.S. Patent Jun. 21, 2016 Sheet 2 of 3 20, Receve tranng mage Process tranng mage usng neural network 204 Update parameter values for the neural network 206 FIG. 2

5 U.S. Patent Jun. 21, 2016 Sheet 3 of 3 Obtan boundng box data and object locaton data 302 ldentfy boundng boxes that correspond to object locatons 304 Update parameter values of neural network FIG. 3

1. TRAINING ANEURAL NETWORK TO DETECT OBJECTS IN MAGES 2 CROSS-REFERENCE TO RELATED APPLICATION Ths applcaton clams prorty to U.S. Provsonal Appl caton No. 61/899,124, fled on Nov. 1, 2013.

6 1. TRAINING ANEURAL NETWORK TO DETECT OBJECTS IN MAGES 2 CROSS-REFERENCE TO RELATED APPLICATION Ths applcaton clams prorty to U.S. Provsonal Appl caton No. 61/899,124, fled on Nov. 1, The dsclosure of the pror applcaton s consdered part of and s ncorpo rated by reference n the dsclosure of ths applcaton. BACKGROUND Ths specfcaton relates to detectng objects n mages. Deep neural networks are machne learnng systems that employ multple layers of models, where the outputs of lower level layers are used to construct the outputs of hgher level layers. SUMMARY In general, one nnovatve aspect of the Subject matter descrbed n ths specfcaton can be emboded n methods that nclude the actons of recevng a tranng mage and object locaton data for the tranng mage, wheren the object locaton data dentfes one or more object locatons n the tranng mage; provdng the tranng mage to a neural net work and obtanng boundng box data for the tranng mage from the neural network, wheren the boundng box data comprses data defnng a pluralty of canddate boundng boxes n the tranng mage and a respectve confdence score for each canddate boundng box n the tranng mage; deter mnng an optmal set of assgnments usng the object loca ton data for the tranng mage and the boundng box data for the tranng mage, wheren the optmal set of assgnments assgns a respectve canddate boundng box to each of the object locatons; and tranng the neural network on the tran ng mage usng the optmal set of assgnments. Other embodments of ths aspect nclude correspondng computer systems, apparatus, and computer programs recorded on one or more computer storage devces, each confgured to perform the actons of the methods. A system of one or more computers can be confgured to perform partcular operatons or actons by vrtue of havng Software, frmware, hardware, or a combnaton of them nstalled on the system that n operaton causes or cause the system to perform the actons. One or more computer pro grams can be confgured to perform partcular operatons or actons by vrtue of ncludng nstructons that, when executed by data processng apparatus, cause the apparatus to perform the actons. The foregong and other embodments can each optonally nclude one or more of the followng features, alone or n combnaton. Determnng the optmal set of assgnments can nclude performng a bpartte matchng between the object locatons and the canddate boundng boxes to select the optmal set of assgnments. Performng the bpartte match ng can nclude: selectng as the optmal set of assgnments a set of assgnments that mnmzes a loss functon that ncludes a localzaton loss term and a confdence loss term. The locaton loss term for a partcular set of assgnments can be based on, for each of the object locatons, a dstance n the tranng mage between the object locaton and a canddate boundng box assgned to the object locaton by the partcular set of assgnments. The locaton loss term F for the partcu lar set of assgnments X can satsfy: wheren ranges from 1 to a total number of canddate boundng boxes, j ranges from 1 to a total number of object locatons, l, s an -th canddate boundng box, g, s a j-th object locaton, x, equals one f I, s assgned to g, n the partcular set of assgnments X and Zero fl, s not assgned to g, n the partcular set of assgnments X, and l - gll s an L dstance between normalzed coordnates of h and normalzed coordnates of g. The confdence loss term for a partcular set of assgnments can be based on, for each canddate boundng box that s assgned to any of the object locatons by the partcular set of assgnments, how close the confdencescore for the canddate boundng box s to a frst target confdencescore for canddate boundng boxes that are assgned to object locatons. The confdence loss term for the partcular set of assgn ments can be further based on, for each canddate boundng box that s not assgned to any of the object locatons by the partcular set of assgnments, how close the confdence score for the canddate boundng box s to a second target conf dence score for canddate boundng boxes that are not assgned to object locatons, wheren the second target con fdence score s lower than the frst target confdencescore. The confdence loss F, for the partcular set of assgn ments X can satsfy: Fo, (x,c) = -Xx log(c)- X. ( -Xx,es - c),,j whereranges from 1 to a total number of canddate boundng boxes, jranges from 1 to a total number of object locatons, c, s a confdence score for an -th canddate boundng box, and x, equals one fl, s assgned to a j-th object locaton by the partcular set of assgnments X and Zero fl, s not assgned to the j-th object locaton by the partcular set of assgnments X. The neural network can be a deep convolutonal neural network. The neural network can be a deep neural network that comprses an output layer and one or more hdden layers, and tranng the neural network can nclude: tranng the output layer by mnmzng a loss functon gven the optmal set of assgnments; and tranng the hdden layers through backpropagaton. Partcular embodments of the subject matter descrbed n ths specfcaton can be mplemented so as to realze one or more of the followng advantages. A neural network can be traned to effectvely predct multple boundng boxes n an nput mage, wth the confdence score assgned to each boundng box by the neural network accurately reflectng the lkelhood that the boundng box contans an mage of an object. Addtonally, the neural network can be traned to predct the boundng boxes and generate accurate confdence scores whle beng agnostc to the object category that the objects contaned n the boundng boxes belong to. The detals of one or more embodments of the subject matter of ths specfcaton are set forth n the accompanyng drawngs and the descrpton below. Other features, aspects,

7 3 and advantages of the Subject matter wll become apparent from the descrpton, the drawngs, and the clams. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 shows an example neural network tranng system. FIG. 2 s a flow dagram of an example process for tranng a neural network on a tranng mage. FIG.3 s a flow dagram of an example process for updatng the values of the parameters of a neural network usng bound ng box data and object locaton data. Lke reference numbers and desgnatons n the varous drawngs ndcate lke elements. DETAILED DESCRIPTION Ths specfcaton generally descrbes a system that can tran a neural network that s confgured to receve an nput mage and generate data defnng a predetermned number of canddate boundng boxes wthn the nput mage and, for each canddate boundng box, a confdence score that repre sents the lkelhood that the boundng box contans an mage of an object. FIG. 1 shows an example neural network tranng system 0 that s confgured to tran an object detecton neural network 2. The neural network tranng system 0 s an example of a system mplemented as computer programs on one or more computers n one or more locatons, n whch the systems, components, and technques descrbed below can be mplemented. The object detecton neural network 2 s a neural net work that s confgured to receve an nput mage and output boundng box data that defnes a predetermned number of canddate boundng boxes wthn the nput mage. Generally, the predetermned number wll be an nteger greater than one, e.g., ten, ffty, or one hundred, so that multple canddate boundng boxes are defned by the output of the object detec ton neural network 2 for each nput mage receved by the neural network. Each canddate boundng box covers a por ton of the nput mage at a respectve poston n the nput mage. The object detecton neural network 2 also outputs, as part of the boundng box data and for each canddate boundng box, a respectve confdence score that represents the lkelhood that the canddate boundng box contans an mage of an object. In partcular, the object detecton neural network 2 generates the output data for a gven nput mage n accordance wth current values of a set of parameters of the neural network, e.g., the current values for each of the param eters stored n a parameter values repostory 1. Generally, the object detecton neural network 2 s a deep neural network that ncludes an output layer and one or more hdden layers. For example, the object detecton neural network 2 may be a deep convolutonal neural network that ncludes one or more convolutonal layers, one or more fully-connected layers, and an output layer, wth each convolutonal and fully connected layer applyng a transformaton to nputs receved from the precedng layer n the network n accordance wth current values of a respectve set of parameters for the layer. Optonally, the deep convolutonal neural network can also nclude other types of neural network layers, e.g., max pool ng and regularzaton layers. The layers that make up an example deep convolutonal neural network are descrbed n more detal n Imagenet classfcaton wth deep convolu tonal neural networks, Alex Krzhevsky, Ilya Sutskever, and Geoffrey E. Hnton, NIPS, pages , The output layer of the object detecton neural network 2 receves an nput from the precedng layer and apples one or more transformatons to the receved nput to generate the data defnng the canddate boundng boxes and the corre spondng confdence scores. In some mplementatons, the output layer apples a lnear transformaton to the receved nput to generate, for each of the predetermned number of canddate boundng boxes, data dentfyng the coordnates of the vertces of the canddate boundng box wthn the nput mage. For example, the output layer can generate, for each canddate boundng box, values value that dentfy the nor malzed coordnates of the upper-left vertex of the boundng box and values that dentfy the normalzed coordnates of the lower-rght vertex of the boundng box. In these mplemen tatons, the output layer also apples a lnear transformaton and then a non-lnear transformaton to the receved nput to generate, for each of the canddate boundng boxes, a value that represents the confdence score for the boundng box. The neural network tranng system 0 trans the object detecton neural network 2 on a set of tranng mages 4 n order to determne traned values of the parameters of the object detecton neural network 2. That s, the neural net work tranng system 0 trans the neural network n order to update the values of the parameters n the parameter repos tory 1 from ntal values to traned values. Each tranng mage n the set of tranng mages 4 s assocated wth object locaton data that dentfes the locatons of one or more objects n the tranng mage,.e., data defnng one or more object locaton boundng boxes wthn the tranng mage that each ncludes an mage of a respectve object. Generally, n order to tran the neural network on a gven tranng mage, e.g., a tranng mage 6, the neural network tranng system 0 provdes the tranng mage to the object detecton neural network 2 and obtans from the object detecton neural network 2 boundng box data, e.g., bound ng box data 8 for the tranng mage 6. The boundng box data ncludes data that defnes the predetermned number of canddate boundng boxes wthn the tranng mage and the confdence score generated by the object detecton neural network 2 for each canddate boundng box. The neural network tranng system 0 updates the current values of the parameters of the object detecton neural network 2 usng the boundng box data and the object locaton data assocated wth the tranng mage that dentfes the locatons of the objects n the tranng mage. Tranng the object detecton neural network s descrbed n more detal below wth refer ence to FIGS. 2 and 3. FIG. 2 s a flow dagram of an example process 200 for tranng a neural network on a tranng mage. For conve nence, the process 200 wll be descrbed as beng performed by a system of one or more computers located n one or more locatons. For example, a neural network tranng system, e.g., the neural network tranng system 0 of FIG. 1, appro prately programmed, can perform the process 200. The system can perform the process 200 for each tranng mage n a set of tranng mages as part of a neural network tranng technque n order to tran a neural network, e.g., the object detecton neural network 2 of FIG. 1. That s, the system can perform the process 200 for each of the tranng mages n the set of tranng data n order to determne traned values of the parameters of the neural network. The system receves a tranng mage (step 202). The tran ng mage s assocated wth object locaton data that defnes one or more object boundng boxes wthn the tranng mage, wth each object boundng box contanng an mage of a respectve object. The system processes the tranng mage usng the neural network (step 204). That s, the system provdes the tranng mage to the neural network and obtans from the neural

5 network boundng box data for the tranng mage,.e., data dentfyng a predetermned number of canddate boundng boxes wthn the tranng mage and a confdence score for each canddate boundng box.

8 5 network boundng box data for the tranng mage,.e., data dentfyng a predetermned number of canddate boundng boxes wthn the tranng mage and a confdence score for each canddate boundng box. The confdence score for a gven canddate boundng box represents the lkelhood that the boundng box contans an mage of an object. The neural network generates the boundng box data for the tranng mage n accordance wth current values of the parameters of the neural network. The system updates the values of the parameters of the neural network usng the boundng box data and the object locaton data assocated wth the tranng mage (step 206). Updatng the parameter values of the neural network s descrbed n more detal below wth reference to FIG. 3. FIG. 3 s a flow dagram of an example process 300 for updatng the values of the parameters of a neural network usng boundng box data and object locaton data. For conve nence, the process 300 wll be descrbed as beng performed by a system of one or more computers located n one or more locatons. For example, a neural network tranng system, e.g., the neural network tranng system 0 of FIG. 1, appro prately programmed, can perform the process 300. The system obtans boundng box data and object locaton data for a tranng mage (step 302). The system dentfes canddate boundng boxes dentfed by the neural network that correspond to object locatons dentfed n the object locaton data for the tranng mage (step 304). That s, the system determnes, for each object locaton, a respectve canddate boundng box that corre sponds to the object locaton. In partcular, the system per forms a bpartte matchng to generate an optmal set of assgnments that assgns a respectve canddate boundng box to each object locaton assocated wth the tranng mage. As part of the bpartte matchng, the system selects as the opt mal set of assgnments the set that mnmzes a loss functon that ncludes a localzaton loss term and a confdence loss term. For example, the system may select the set of assgn ments X that satsfes: x = argmn F(x, l, c), X where F(X.l.c) s the loss functon, X s a set of assgnments, 1 s a canddate boundng box, and c s the confdence score for the canddate boundng box, and where the mnmzaton s Subject to the constrant that each set of assgnments X must assgn exactly one canddate boundng box to each object locaton. In some mplementatons, the loss functon satsfes: where C. s a constant value, F, s the locaton loss and F. s the confdence loss. Generally, the locaton loss for a gven set of assgnments s based on, for each canddate boundng box that s assgned to an object locaton by the set of assgnments, the dstance n the tranng mage between the canddate boundng box and the object locaton that the canddate boundng box s assgned to. For example, the locaton loss for a gven set of assgnments X may satsfy: where ranges from 1 to the total number of canddate bound ng boxes, j ranges from 1 to the total number of object locatons, l, s the -th canddate boundng box, g, s the j-th object locaton, X, equals one fl s assgned to g, n the set of assgnments x and Zero f l s not assgned to g, n the set of assgnments X, and l; - gll s the L dstance between the normalzed coordnates of h and the normalzed coordnates of g. Generally, the confdence loss for a gven set of assgn ments s based on, for each canddate boundng box that s assgned to an object locaton by the gven set of assgnments, how close the confdence score for the canddate boundng box s to a frst target confdence score for canddate boundng boxes that areassgned to object locatons, e.g., a score of one. The confdence loss for a gven set of assgnments s also based on, for each canddate boundng box that s not assgned to an object locaton by the gven set of assgnments, how close the confdence score for the canddate boundng box s to a second target confdence score for canddate boundng boxes that are not assgned to object locatons, wth the second target confdence score beng lower than the frst target confdence score e.g., the second target score beng Zero f the frst target confdence score s one. For example, the confdence loss for a gven set of assgnments X may satsfy: F.C, c)=-xx,j logo)-x (-)--s-). where ranges from 1 to the total number of canddate bound ng boxes, j ranges from 1 to the total number of object locatons, c, s the confdence score for the -th canddate boundng box, and X, equals one f l, s assgned to a j-th object locaton by the set of assgnments X and Zero fl, s not assgned to the j-th object locaton by the set of assgnments X. The system updates the values of the parameters of the neural network usng the optmal set of assgnments (step 306). Generally, the system updates the values of the param eters of the neural network to mnmze the loss functon, gven that the set of assgnments s the optmal set of assgn ments. Thus, the system updates the values of the parameters so that the dstances between the canddate boundng boxes and the object locatons to whch the canddate boundng boxes are assgned by the optmal set of assgnments are reduced, the confdence scores for canddate boundng boxes that are assgned to an object locaton by the optmal set of assgnments are ncreased, and the confdence scores for can ddate boundng boxes that are not assgned to an object locaton by the optmal set of assgnments are decreased. In partcular, the system updates the values of the param eters by performng an teraton of a backpropagaton neural network tranng procedure, e.g., a stochastc gradent descent backpropagaton tranng technque, to determne the updated values of the parameters of the neural network. That s, the system backpropagates the error computed for the output of the output layer through to each layer below the output layer n the neural network n order to adjust the parameters of each of the neural network layers.

7 Thus, by performng the backpropagaton neural network tranng procedure for each tranng mage n the set of tran ng mages, the system trans the neural network to accurately determne, for an nput mage

9 7 Thus, by performng the backpropagaton neural network tranng procedure for each tranng mage n the set of tran ng mages, the system trans the neural network to accurately determne, for an nput mage for whch object locatons are not known by the system, whch of the canddate boundng boxes dentfed by the boundng box data generated by the neural network for the nput mage are lkely to contan an mage of an object, and for each of those canddate boundng boxes, to locate the canddate boundng box accurately n the porton of the mage that contans the mage of the object. In some mplementatons, pror to dentfyng the cand date boundng boxes dentfed by the neural network for a gven tranng mage that correspond to object locatons n the tranng mage, the system clusters the object locatons to determne a set of object locaton clusters that the system can use as prors for each of the canddate boundng boxes. Add tonally, n some mplementatons, the system matches each object locaton to one of the prors rather than matchng the canddate boundng boxes to the object locatons. Embodments of the subject matter and the functonal operatons descrbed n ths specfcaton can be mplemented n dgtal electronc crcutry, n tangbly-emboded computer Software or frmware, n computer hardware, ncludng the structures dsclosed n ths specfcaton and ther structural equvalents, or n combnatons of one or more of them. Embodments of the subject matter descrbed n ths specf caton can be mplemented as one or more computer pro grams,.e., one or more modules of computer program nstructons encoded on a tangble non transtory program carrer for executon by, or to control the operaton of data processng apparatus. Alternatvely or n addton, the pro gram nstructons can be encoded on an artfcally generated propagated sgnal, e.g., a machne-generated electrcal, opt cal, or electromagnetc sgnal, that s generated to encode nformaton for transmsson to Sutable recever apparatus for executon by a data processng apparatus. The computer storage medum can be a machne-readable storage devce, a machne-readable storage substrate, a random or seral access memory devce, or a combnaton of one or more of them. The term data processng apparatus' encompasses all knds of apparatus, devces, and machnes for processng data, ncludng by way of example a programmable proces Sor, a computer, or multple processors or computers. The apparatus can nclude specal purpose logc crcutry, e.g., an FPGA (feld programmable gate array) or an ASIC (applca ton specfc ntegrated crcut). The apparatus can also nclude, n addton to hardware, code that creates an execu ton envronment for the computer program n queston, e.g., code that consttutes processor frmware, a protocol stack, a database management system, an operatng system, oracom bnaton of one or more of them. A computer program (whch may also be referred to or descrbed as a program, Software, a Software applcaton, a module, a software module, a Scrpt, or code) can be wrtten n any form of programmng language, ncludng compled or nterpreted languages, or declaratve or procedural lan guages, and t can be deployed n any form, ncludng as a stand-alone program or as a module, component, Subroutne, or other unt Sutable for use n a computng envronment. A computer program may, but need not, correspond to a fle n a fle system. A program can be stored n a porton of a fle that holds other programs or data, e.g., one or more scrpts stored n a markup language document, n a sngle fle dedcated to the program n queston, or n multple coordnated fles, e.g., fles that store one or more modules, Sub programs, or por tons of code. A computer program can be deployed to be executed on one computer or on multple computers that are located at one ste or dstrbuted across multple stes and nterconnected by a communcaton network. The processes and logc flows descrbed n ths specfca ton can be performed by one or more programmable com puters executng one or more computer programs to perform functons by operatng on nput data and generatng output. The processes and logc flows can also be performed by, and apparatus can also be mplemented as, specal purpose logc crcutry, e.g., an FPGA (feld programmable gate array) or an ASIC (applcaton specfc ntegrated crcut). Computers sutable for the executon of a computer pro gram nclude, by way of example, can be based on general or specal purpose mcroprocessors or both, or any other knd of central processng unt. Generally, a central processng unt wll receve nstructons and data from a read only memory or a random access memory or both. The essental elements of a computer are a central processng unt for performng or executng nstructons and one or more memory devces for storng nstructons and data. Generally, a computer wll also nclude, or be operatvely coupled to receve data from or transfer data to, or both, one or more mass storage devces for storng data, e.g., magnetc, magneto optcal dsks, or optcal dsks. However, a computer need not have such devces. Moreover, a computer can be embedded n another devce, e.g., a moble telephone, a personal dgtal assstant (PDA), a moble audo or vdeo player, a game console, a Global Pos tonng System (GPS) recever, or a portable storage devce, e.g., a unversal seral bus (USB) flash drve, to name just a few. Computer readable meda sutable for storng computer program nstructons and data nclude all forms of non-vola tle memory, meda and memory devces, ncludng by way of example semconductor memory devces, e.g., EPROM, EEPROM, and flash memory devces; magnetc dsks, e.g., nternal hard dsks or removable dsks; magneto optcal dsks; and CD ROM and DVD-ROM dsks. The processor and the memory can be Supplemented by, or ncorporated n, specal purpose logc crcutry. To provde for nteracton wth a user, embodments of the Subject matter descrbed n ths specfcaton can be mple mented on a computer havng a dsplay devce, e.g., a CRT (cathode ray tube) or LCD (lqud crystal dsplay) montor, for dsplayng nformaton to the user and a keyboard and a pontng devce, e.g., amouse or a trackball, by whch the user can provde nput to the computer. Other knds of devces can be used to provde for nteracton wth a user as well; for example, feedback provded to the user can be any form of sensory feedback, e.g., vsual feedback, audtory feedback, or tactle feedback; and nput from the user can be receved n any form, ncludng acoustc, speech, or tactle nput. In add ton, a computer can nteract wth a user by sendng docu ments to and recevng documents from a devce that s used by the user; for example, by sendng web pages to a web browser on a user's clent devce n response to requests receved from the web browser. Embodments of the subject matter descrbed n ths spec fcaton can be mplemented n a computng system that ncludes a back end component, e.g., as a data server, or that ncludes a mddleware component, e.g., an applcaton server, or that ncludes a front end component, e.g., a clent computer havng a graphcal user nterface or a Web browser through whch a user can nteract wth an mplementaton of the Subject matter descrbed n ths specfcaton, or any comb naton of one or more Suchback end, mddleware, or frontend components. The components of the system can be ntercon nected by any form or medum of dgtal data communcaton, e.g., a communcaton network. Examples of communcaton

10 9 networks nclude a local area network ( LAN ) and a wde area network ( WAN ), e.g., the Internet. The computng system can nclude clents and servers. A clent and server are generally remote from each other and typcally nteract through a communcaton network. The relatonshp of clent and server arses by vrtue of computer programs runnng on the respectve computers and havng a clent-server relatonshp to each other. Whle ths specfcaton contans many specfc mplemen taton detals, these should not be construed as lmtatons on the scope of any nventon or of what may be clamed, but rather as descrptons of features that may be specfc to partcular embodments of partcular nventons. Certan fea tures that are descrbed n ths specfcaton n the context of separate embodments can also be mplemented n combna ton n a sngle embodment. Conversely, varous features that are descrbed n the context of a sngle embodment can also be mplemented n multple embodments separately or n any sutable subcombnaton. Moreover, although features may be descrbed above as actng n certan combnatons and even ntally clamed as Such, one or more features from a clamed combnaton can n Some cases be excsed from the comb naton, and the clamed combnaton may be drected to a Subcombnaton or varaton of a Subcombnaton. Smlarly, whle operatons are depcted n the drawngs n a partcular order, ths should not be understood as requrng that such operatons be performed n the partcular order shown or n sequental order, or that all llustrated operatons be performed, to acheve desrable results. In certan crcum stances, multtaskng and parallel processng may be advan tageous. Moreover, the separaton of varous system modules and components n the embodments descrbed above should not be understood as requrng such separaton n all embod ments, and t should be understood that the descrbed program components and systems can generally be ntegrated together n a sngle software product or packaged nto multple soft ware products. Partcular embodments of the subject matter have been descrbed. Other embodments are wthn the scope of the followng clams. For example, the actons rected n the clams can be performed n a dfferent order and stll acheve desrable results. As one example, the processes depcted n the accompanyng fgures do not necessarly requre the par tcular order shown, or sequental order, to acheve desrable results. In certan mplementatons, multtaskng and parallel processng may be advantageous. What s clamed s: 1. A method for tranng a neural network that receves an nput mage and outputs a predetermned number of cand date boundng boxes that each cover a respectve porton of the nput mage at a respectve poston n the nput mage and a respectve confdence score for each canddate boundng box that represents a lkelhood that the canddate boundng box contans an mage of an object, the method comprsng: recevng a tranng mage and object locaton data for the tranng mage, wheren the object locaton data dent fes one or more object locatons n the tranng mage; provdng the tranng mage to the neural network and obtanng boundng box data for the tranng mage from the neural network, wheren the boundng box data com prses data defnng a pluralty of canddate boundng boxes n the tranng mage and a respectve confdence score for each canddate boundng box n the tranng mage; determnng an optmal set of assgnments usng the object locaton data for the tranng mage and the boundng box data for the tranng mage, wheren the optmal set of assgnments assgns a respectve canddate boundng box to each of the object locatons; and tranng the neural network on the tranng mage usng the optmal set of assgnments. 2. The method of clam 1, wheren determnng the optmal set of assgnments comprses performng a bpartte matchng between the object locatons and the canddate boundng boxes to select the optmal set of assgnments. 3. The method of clam 2, wheren performng the bpartte matchng comprses: selectng as the optmal set of assgnments a set of assgn ments that mnmzes a loss functon that ncludes a localzaton loss term and a confdence loss term. 4. The method of clam3, wheren the locaton loss term for a partcular set of assgnments s based on, for each of the object locatons, a dstance n the tranng mage between the object locaton and a canddate boundng box assgned to the object locaton by the partcular set of assgnments. 5. The method of clam 4, wheren the locaton loss term F for the partcular set of assgnments X satsfes: Flo (x,1) =X 5xll-gll,,j wheren ranges from 1 to a total number of canddate boundng boxes, j ranges from 1 to a total number of object locatons, l, s an -th canddate boundng box, g, s a j-th object locaton, x, equals one f, s assgned to g, n the partcular set of assgnments x and Zero f, s not assgned to g, n the partcular set of assgnments X, and l - gll s an L dstance between normalzed coordnates of 1, and normalzed coordnates of g. 6. The method of clam3, wheren the confdence loss term for a partcular set of assgnments s based on, for each can ddate boundng box that s assgned to any of the object locatons by the partcular set of assgnments, how close the confdence score for the canddate boundng box s to a frst target confdence score for canddate boundng boxes that are assgned to object locatons. 7. The method of clam 6, wheren the confdence loss term for the partcular set of assgnments s further based on, for each canddate boundng box that s not assgned to any of the object locatons by the partcular set of assgnments, how close the confdence score for the canddate boundng box s to a second target confdence score for canddate boundng boxes that are not assgned to object locatons, wheren the second target confdence score s lower than the frst target confdence score. 8. The method of clam 7, wheren the confdence loss F, for the partcular set of assgnments X satsfes: F.C, c)=-xx,j logo)-x (-)--s-). whereranges from 1 to a total number of canddate bound ng boxes, j ranges from 1 to a total number of object locatons, c, s a confdence score for an -th canddate boundng box, and x, equals one fl, s assgned to a j-th

11 11 object locaton by the partcular set of assgnments X and Zero fl, s not assgned to the j-th object locaton by the partcular set of assgnments X. 9. The method of clam 1, wheren the neural network s a deep convolutonal neural network.. The method of clam 1, wheren the neural network s a deep neural network that comprses an output layer and one or more hdden layers, and wheren tranng the neural network comprses: tranng the output layer by mnmzng a loss functon gven the optmal set of assgnments; and tranng the hdden layers through backpropagaton. 11. A system for tranng a neural network that receves an nput mage and outputs a predetermned number of cand date boundng boxes that each cover a respectve porton of the nput mage at a respectve poston n the nput mage and a respectve confdence score for each canddate boundng box that represents a lkelhood that the canddate boundng box contans an mage of an object, the system comprsng one or more computers and one or more storage devces storng nstructons that when executed by the one or more computers cause the one or more computers to perform operatons comprsng: recevng a tranng mage and object locaton data for the tranng mage, wheren the object locaton data dent fes one or more object locatons n the tranng mage; provdng the tranng mage to the neural network and obtanng boundng box data for the tranng mage from the neural network, wheren the boundng box data com prses data defnng a pluralty of canddate boundng boxes n the tranng mage and a respectve confdence score for each canddate boundng box n the tranng mage; determnng an optmal set of assgnments usng the object locaton data for the tranng mage and the boundng box data for the tranng mage, wheren the optmal set of assgnments assgns a respectve canddate boundng box to each of the object locatons; and tranng the neural network on the tranng mage usng the optmal set of assgnments. 12. The system of clam 11, wheren determnng the opt mal set of assgnments comprses performng a bpartte matchng between the object locatons and the canddate boundng boxes to select the optmal set of assgnments. 13. The system of clam 12, wheren performng the bpar tte matchng comprses: Selectng as the optmal set of assgnments a set of assgn ments that mnmzes a loss functon that ncludes a localzaton loss term and a confdence loss term. 14. The system of clam 13, wheren the locaton loss term for a partcular set of assgnments s based on, for each of the object locatons, a dstance n the tranng mage between the object locaton and a canddate boundng box assgned to the object locaton by the partcular set of assgnments. 15. The system of clam 14, wheren the locaton loss term F for the partcular set of assgnments X satsfes: Flo (x,1) =X 5xll-gl,,j wheren ranges from 1 to a total number of canddate boundng boxes, j ranges from 1 to a total number of object locatons, 1, s an -th canddate boundng box, g, s a j-th object locaton, x, equals one f, s assgned to g, n the partcular set of assgnments X and Zero fl, s not assgned to g, n the partcular set of assgnments X, and l; - gll, s an L dstance between normalzed coordnates of 1, and normalzed coordnates of g. 16. The system of clam 13, wheren the confdence loss term for a partcular set of assgnments s based on, for each canddate boundng box that s assgned to any of the object locatons by the partcular set of assgnments, how close the confdence score for the canddate boundng box s to a frst target confdence score for canddate boundng boxes that are assgned to object locatons. 17. The system of clam 16, wheren the confdence loss term for the partcular set of assgnments s further based on, for each canddate boundng box that s not assgned to any of the object locatons by the partcular set of assgnments, how close the confdence score for the canddate boundng box s to a second target confdence score for canddate boundng boxes that are not assgned to object locatons, wheren the second target confdence score s lower than the frst target confdence score. 18. The system of clam 17, wheren the confdence loss F for the partcular set of assgnments X satsfes: F.C, c)=-xx,j logo)-x (-2) st-c). whereranges from 1 to a total number of canddate bound ng boxes, j ranges from 1 to a total number of object locatons, c, s a confdence score for an -th canddate boundng box, and x, equals one fl, s assgned to a j-th object locaton by the partcular set of assgnments X and Zero fl, s not assgned to the j-th object locaton by the partcular set of assgnments X. 19. The system of clam 11, wheren the neural network s a deep neural network that comprses an output layer and one or more hdden layers, and wheren tranng the neural net work comprses: tranng the output layer by mnmzng a loss functon gven the optmal set of assgnments; and tranng the hdden layers through backpropagaton. 20. A computer storage medum encoded wth a computer program, the computer program comprsng nstructons that when executed by one or more computers cause the one or more computers to perform operatons for tranng a neural network that receves an nput mage and outputs a predeter mned number of canddate boundng boxes that each cover a respectve porton of the nput mage at a respectve poston n the nput mage and a respectve confdence score for each canddate boundng box that represents a lkelhood that the canddate boundng box contans an mage of an object, the operatons comprsng: recevng a tranng mage and object locaton data for the tranng mage, wheren the object locaton data dent fes one or more object locatons n the tranng mage; provdng the tranng mage to the neural network and obtanng boundng box data for the tranng mage from the neural network, wheren the boundng box data com prses data defnng a pluralty of canddate boundng

12 13 boxes n the tranng mage and a respectve confdence score for each canddate boundng box n the tranng mage; determnng an optmal set of assgnments usng the object locaton data for the tranng mage and the boundng box data for the tranng mage, wheren the optmal set of assgnments assgns a respectve canddate boundng box to each of the object locatons; and tranng the neural network on the tranng mage usng the optmal set of assgnments. k k k k k 14

(12) United States Patent Ogawa et al.

(12) United States Patent Ogawa et al. US007151027B1 (12) Unted States Patent Ogawa et al. (o) Patent No.: (45) Date of Patent: US 7,151,027 Bl Dec. 19, 2006 (54) METHOD AND DEVICE FOR REDUCING INTERFACE AREA OF A MEMORY DEVICE (75) Inventors: