Smart Phone-based Indoor Guidance System for the Visually Impaired

Size: px

Start display at page:

Download "Smart Phone-based Indoor Guidance System for the Visually Impaired"

Augustine Willis
6 years ago
Views:

Brgham Young Unversty BYU ScholarsArchve All heses and Dssertatons 2012-03-13 Smart Phone-based Indoor Gudance System for the Vsually Impared Brandon Lee aylor Brgham Young Unversty -

edu/etd Part of the Electrcal and Computer Engneerng Commons BYU ScholarsArchve Ctaton aylor, Brandon Lee, "Smart Phone-based Indoor Gudance System for the Vsually Impared" (2012).

1 Brgham Young Unversty BYU ScholarsArchve All heses and Dssertatons Smart Phone-based Indoor Gudance System for the Vsually Impared Brandon Lee aylor Brgham Young Unversty - Provo Follow ths and addtonal works at: Part of the Electrcal and Computer Engneerng Commons BYU ScholarsArchve Ctaton aylor, Brandon Lee, "Smart Phone-based Indoor Gudance System for the Vsually Impared" (2012). All heses and Dssertatons hs hess s brought to you for free and open access by BYU ScholarsArchve. It has been accepted for ncluson n All heses and Dssertatons by an authorzed admnstrator of BYU ScholarsArchve. For more nformaton, please contact scholarsarchve@byu.edu.

2 Smart Phone-based Indoor Gudance System for the Vsually Impared Brandon aylor A thess submtted to the faculty of Brgham Young Unversty n partal fulfllment of the requrements for the degree of Master of Scence D. J. Lee, Char Doran K. Wlde James K. Archbald Department of Electrcal Engneerng Brgham Young Unversty Aprl 2012 Copyrght 2012 Brandon aylor All Rghts Reserved

3 ABSRAC Smart Phone-based Gudance System for the Vsually Impared Brandon aylor Department of Electrcal Engneerng, BYU Master of Scence A smart phone camera based ndoor gudance system to ad the vsually mpared s presented. Most proposed systems for adng the vsually mpared wth ndoor navgaton are not feasble for wdespread use due to cost, usablty, or portablty. We use a smart phone vson based system to create an ndoor gudance system that s smple, accessble, nexpensve, and dscrete to ad the vsually mpared to navgate unfamlar envronments such as publc buldngs. he system conssts of a smart phone and a server. he smart phone transmts pctures of the user s locaton to the server. he server processes the mages and matches them to a database of stored mages of the buldng. After matchng features, the locaton and orentaton of the person s calculated usng 3D locaton correspondence data stored for features of each mage. Postonal nformaton s then transmtted back to the smart phone and communcated to the user va text-to-speech. hs thess focuses on developng the vson technology for ths unque applcaton rather than buldng the complete system. Expermental results demonstrate the ablty of the system to quckly and accurately determne the pose of the user n a unversty buldng. Keywords: POSE, vsually mpared, fundamental matrx, drect lnear transform, SURF

4 ACKNOWLEDGMENS I would lke to express my apprecaton and grattude for my advsor, Dr. Dah Jye Lee, for gvng me the opportunty to work wth hm and lendng all of hs help, gudance, and support. I would also lke to gve my thanks to my commttee members, Dr. Doran Wlde and Dr. James Archbald. Furthermore, my famly has always been there for me and I am forever grateful for ther love and encouragement. Most of all I thank my wfe for her love, support, and sacrfce for helpng me acheve my goals.

5 ABLE OF CONENS LIS OF ABLES... v LIS OF FIGURES... v 1 Introducton Motvaton Recent Developments for the Vsually Impared Obstacle Avodance Navgaton Assstance System Requrements System System Overvew Stored Image Map wth 3D Correlaton Image Map Global Data Image Data Collectng the Image Data Floor Plans and Supplemental Locaton Data Hand-held Devce Server Vson Processng Background Algorthm Overvew Feature Detecton, Descrpton and Matchng Common Methods Speeded Up Robust Features v

6 3.3.3 SURF Feature Descrptors Scale Space Matchng Images Removng Outlers Eppolar Geometry RANSAC Pont Algorthm wth RANSAC Pose Calculaton estng and Results est Objectves Locaton Informaton and Image Map estng Hardware est Procedure est Results Image Frequency Algorthm Robustness Pose Estmaton Accuracy Insuffcent Features Concluson Summary Contrbutons Future Work REFERENCES v

7 LIS OF ABLES able 4-1: Estmated vs. Measured Postons...46 v

8 LIS OF FIGURES Fgure 2-1: System Overvew...11 Fgure 2-2: Local Coordnate System...12 Fgure 2-3: Sample Map Image Data...13 Fgure 2-4: 2D and 3D Pont Correspondence for Pose...14 Fgure 3-1: Vson and Pose Estmaton Algorthm Overvew...23 Fgure 3-2: Integral Image Creaton and Use...25 Fgure 3-3: Haar Wavelets...26 Fgure 3-4: SURF Feature Descrptor Grd...27 Fgure 3-5: Matchng Dstant Image...28 Fgure 3-6: Eppolar Geometry...30 Fgure 3-7: Eppolar Lnes...31 Fgure 3-8: Eppolar Lne RANSAC Outler...33 Fgure 3-9: Intal Matched Feature Ponts...34 Fgure 3-10: Matched Feature Ponts after RANSAC wth Homography...34 Fgure 3-11: False Inler after Fundamental Matrx RANSAC...35 Fgure 3-12: Pont Correspondence Relaton...37 Fgure 3-13: Reprojecton Outler...39 Fgure 3-14: Fnal Pose Estmate...40 Fgure 3-15: Floor Plan wth Pose...40 Fgure 4-1: Clyde Buldng 4th Floor wth Local Coordnate System...42 Fgure 4-2: Map Progresson Images...45 Fgure 4-3: Insuffcent Unque Features...47 v

9 1 INRODUCION Vsually mpared people need assstance to fnd ther way n unfamlar places. he current systems n place for helpng the vsually mpared are not suffcent for navgatng unknown envronments. here are two mportant requrements for navgaton. he frst s localzaton, or fgurng out the person s current locaton. he second s path fndng, or determnng the route a person needs to take to get from one place to another. Nether of these needs s suffcently met by Bralle markngs n buldngs. More recently some ndoor gudance systems have been developed wth the vsually mpared n mnd. However, several problems keep these systems from beng a feasble commercal soluton. In ths thess we present a unque 3-D vson algorthm that s sutable for buldng an ndoor gudance system for the vsually mpared that overcomes these obstacles. In order to create an ndoor gudance system that s smple, accessble, nexpensve, and dscrete, we use a smart phone vson based system to help the vsually mpared as they navgate unfamlar envronments such as publc buldngs. he system conssts of a smart phone and a server. he smart phone transmts pctures of the user s locaton to the server. he server processes the mages and matches them to a database of stored mages of the buldng. he camera pose s calculated from the matched mage, and the current locaton and navgaton nstructons are gven to the user. 1

10 1.1 Motvaton Navgatng unknown envronments s a necessty n lfe that has always proven dffcult for ndvduals wth vson mparments. wo prmary problems mpede a person s navgaton ablty. he frst problem s obstacle avodance. Obstacle avodance deals wth the objects and terran n the mmedate area around the ndvdual such as people, stars, tables, and walls. ools have been developed and are actvely used among vsually mpared to ad n obstacle avodance such as gude canes and Seeng Eye dogs. he second problem s way fndng or route navgaton. ryng to get from pont A to pont B proves dffcult wthout vsual clues of buldng layouts or terran mappngs. No wdespread system has been mplemented that ads n route navgaton. Studes show that the vsually mpared feel that they have an especally dffcult tme learnng new routes [1]. Beng unable to handle these types of stuatons themselves also negatvely affects the feelng of ndependence of the vsually mpared. Vsually mpared people have few optons when t comes to fndng ther way n an unfamlar place. radtonally the vsually mpared must ether be guded by or get drectons from another person. Havng someone to gude s useful but not always an opton and leaves the mpared feelng qute dependent on other people. Askng for drectons s common but has several challenges. Some ndvduals fnd t hard to translate drectons nto effectve nformaton for the vsually mpared person. he dffculty arses because most people wthout vsual mparments navgate dfferently, typcally relyng on vsual landmarks. In addton, the vsually mpared person must remember the drectons, snce wrtng them down s not feasble. If the vsually mpared becomes lost, the only opton s to fnd someone to help. he only wdespread soluton for helpng the vsually mpared s Bralle sgns. A major problem wth ths system s that t s not adequate for navgaton. Most publc buldngs nclude 2

11 Bralle sgns on doors, elevators and other varous locatons. Although Bralle sgns provde some beneft, they are only helpful for learnng where the person s. Bralle sgns do not offer drectons. Fndng the Bralle sgns s also an ssue as one must feel around to locate the sgn. Lastly, the largest problem wth Bralle s that most vsually mpared people do not know Bralle [2], [3]. In the US only about 10 percent of legally blnd people know Bralle. GPS has proved to be an ndspensble tool as an outdoor navgatonal ad for everyone ncludng the vsually mpared. Programs have been developed for smart phones that utlze textto-speech and voce commands to allow the vsually mpared to take advantage of GPS gudance systems. However, GPS s nadequate for ndoor navgaton as the sgnal encounters too much nterference to be usable. Currently, there s no feasble soluton to ndoor navgaton for the vsually mpared. he tradtonal methods just dscussed (Bralle, GPS, spoken drectons) are clearly not an optmal means for fndng one s way n an unfamlar ndoor locaton. More recently, several systems have been tested for ndoor navgaton. Studes have shown that these technologcal ads mprove the ablty for the vsually mpared to fnd ther way [4]. Unfortunately, many of these systems have complcated setups that cause the user to stand out or are awkward to use [4]. Other systems requre cumbersome sensor rgs that hamper normal movement and are not sutable for people to carry around and use on a normal bass [4], [5]. Moreover, studes show that vsually mpared people would prefer to not attract unnecessary attenton to themselves whle usng a navgaton system [1]. Another challenge to ndoor navgaton tools s the cost. he average user cannot afford a system whch may cost over $1,000. Addtonally, the fnancal burden placed on a company s buldng to change nfrastructure and accommodate a system s a factor that must be consdered. A feasble, commercal soluton requres a system to be compact, portable, lght 3

12 weght, dscrete, and easy to use. he system must also be cost effectve for both the user and provder. Lastly, the system should offer hands-free operaton to allow users to hold personal belongngs such as a gude cane or harness for a Seeng Eye dog, as these belongngs are essental for obstacle avodance. In ths thess we propose a novel soluton for ndoor route navgaton that overcomes the lmtatons of other systems. Our proposed soluton, dubbed the Seeng Eye Phone, uses a smart phone that can be worn around the user s neck that takes pctures n regular ntervals of the user s locaton. he pctures are then sent to a server for mage processng. he captured nput mage s compared aganst an mage database of the buldng beng navgated. he poston and orentaton of the user s estmated from the mages, and then detals are sent to the user about ther current locaton as well as nstructons to the desred locaton. hs database s constructed offlne wth perodc mages along wth ther stored feature ponts, 3D locaton, and other pertnent map nformaton such as room numbers that one would look for on an ndoor map. 1.2 Recent Developments for the Vsually Impared Due to a large number of the populaton sufferng from some sort of vsual mparment, a sgnfcant amount of work has been done recently to fnd methods to ad the vsually mpared. Most of the research typcally focuses n the areas of readng, obstacle avodance, and navgaton or wayfndng. Wth our ever expandng technologes, new solutons are constantly beng dscovered to better ad the vsually mpared Obstacle Avodance Recently, several methods have focused on obstacle avodance for the vsually mpared. Sonar has been used to provde gudance assstance n detectng obstacles and for navgaton 4

13 [6], [7], [8]. Hesch used a foot-mounted pedometer and a whte cane-mounted sensng package. he sensors were comprsed of a three-axs gyroscope and a two-dmensonal laser scanner [9]. Wong used stereo vson to ad n obstacle detecton and communcated the object dstance and detals through structured sound as well as verbal sounds [10]. he NavBelt s a belt worn by the user wth an array of ultrasonc sensors to assst n obstacle avodance by sendng sounds to headphones. he gude cane uses a smlar sensor array, but s attached to a cane wth a motorzed wheel. hs wheel can then steer to avod objects much lke a gude dog would [5]. he tongue vson system takes thngs to another level of detal. he system s comprsed of a head mounted camera that s attached to an electronc stmulator that s fed nto the mouth. Multple electrodes stmulate the tongue to represent what the camera sees [4] Navgaton Assstance Many dfferent technologes have been used for navgaton and localzaton. Wth GPS devces qute prevalent, several systems have been mplemented for outdoor gudance for the vsually mpared [11], [12], [13]. Moulton created an outdoor talkng navgaton system wth speech recognton usng GPS [13]. GPS has proven to be an accurate and relable method for outdoor navgaton but fals when used ndoors. Only more recently have people been researchng better ndoor navgaton systems. Kala desgned the Buldng Navgator system whch provdes nformaton about the spatal layout of rooms, hallways, and other features. It uses synthetc speech output to communcate wth the user and demonstrated mprovement n the user s ablty to fnd destnatons [14]. Several have used Rado Frequency Identfcaton (RFID) systems as an ndoor navgatonal method [15], [16], [17], [18]. Na expanded on the RFID by usng a smart floor where every tle has a unque RFID tag to provde more accurate and frequent localzaton [19]. jan created a dgtal sgn 5

14 system of patterned retro reflectve tags that could be read by a handheld camera and vson system [20]. Coughlan used a moble phone camera system that reads barcodes and conveys the barcode message wth speech [21]. Several styles of RF based localzaton have also been used [22], [23], [24]. Bswas used the sgnal strength of known W-F locatons for robotc localzaton and navgaton ndoors [23]. Inoue et al. tested a soluton smlar to the W-F system where several lcense free band rados were nstalled n locatons around the buldng [24]. hese rados would communcate wth a handheld recever whch would determne locaton by the sgnal strength of the transmtters. he handheld recever was n turn communcatng wth a smart phone va Bluetooth. One system requres a robot to accompany the user [25]. Whle all of these methods have proved useful for assstng the vsually mpared, they have some notable drawbacks. Sonar provdes a cheap method for locatng obstacles but only gves rough measurements and requres dedcated hardware. It also does not help wth navgaton. Hesch s pedometer and cane works well for obstacle detecton but does nothng as far as navgaton. Although the tongue vson system gves more specal detal than others, t s extremely awkward to use. It s very unappealng wth the headgear pluggng nto the mouth. Even wth the hgh amount of detal t provdes, ths type of system s not preferred because of how much unwanted attenton t attracts to the user. GPS s relable for use n outdoor navgaton but s less accurate wth smaller handheld devces such as smart phones and s unusable n an ndoor envronment because the sgnal encounters too much nterference. RFID has proven to work qute well n ndoor navgaton and localzaton. One dsadvantage of RFID s that t s a dscrete system that can only provde perodc nformaton when close enough to an nstalled RFID tag n a buldng. Increased map densty and accuracy means ncreasng the number of RFID tags nstalled. he number of RFID sensors used by Na, where there s a separate sensor for every floor panel, effectvely makes the 6

15 navgaton contnuous but greatly ncreases the nfrastructure cost as the number of RFID tags would be n the thousands to tens of thousands to outft a buldng. For new buldngs t would not be as much of an ssue, but nstallng t nto currently exstng buldngs poses a real problem due to the requred renovatng. RFID also requres a dedcated sensor that s not readly avalable for the user. Other sgn systems used can provde extra nformaton about the locaton but also have the dsadvantage of not provdng contnuous navgatonal nformaton. Sgn based systems are also more notceable as they take up wall space and must be large enough for a camera to see. he more sgns used, the more wall space occuped, and the sgns are not partcularly attractve and can clutter up a buldng usng them. Coughlan s system takes advantage of the usefulness of a smart phone camera system, but agan uses dscrete tags that contan unque nformaton for the gven locaton. Every sgn must be specally created for the gven locaton nstalled. he rado beacon and WF systems have naccuraces due to the sgnals not beng constant. Buldngs must be calbrated snce the sgnal strength changes wth walls and other objects n the way. Even after calbraton, movng people, temperature, and other factors can drastcally change the sgnal strength leadng to naccurate postonng. he rado beacon system has smlar drawbacks to the W-F systems, but also requres a separate rado to be carred. Usng robots for gudance allows more sensors to be used for ncreased accuracy, but t s not feasble n envronments wth other people movng around and would draw a lot of unwanted attenton to the user. Robotc systems would be qute expensve wth the amount of sensors used and the battery requred for t to run. he user would also need to keep track of the robot. 7

16 1.3 System Requrements here s a need for an ndoor gudance system that the vsually mpared can use n whch all the basc requrements dscussed earler are fulflled. hose basc requrements are: low cost, pose accuracy, smple, dscrete, and easy to use. here are several systems that have been developed for the vsually mpared. hey functon well and help the vsually mpared navgate. However, each of these systems lacks one or more of the crtera. he basc requrements are dscussed n further detal below. In order to create a system that s usable, desrable, and feasble for commercal use the followng crtera are mportant: Inexpensve for user: Cost always proves a bg concern for the average user. If the system costs too much for most people to afford then t s hard to justfy makng t a wdespread soluton. One technology specalst for the vsually mpared suggested that any system costng more than $1,000 could succeed only f ts benefts are enormous and obvous [4]. Inexpensve nfrastructure setup: Another of the man drawbacks to a commercal soluton s the upfront costs for buldng the nfrastructure. he ablty to nstall the system n exstng buldngs at a reasonable expense and wth mnmal renovaton s another key problem that must be overcome. Accurate ndoor pose estmaton: Outdoor systems usually only need to be accurate to a few meters to tens of meters to be usable. For an ndoor system accuracy s very mportant. Inaccuraces of only several meters can mean the dfference between beng at the restroom and beng n a completely dfferent hallway. Portable: he user s devce must be easy to carry around and easy to operate. 8

17 Not cumbersome: Ease of use s crucal to a good system. If the user thnks that the system s too much of a hassle to use then he may prefer not to use t. he system should also allow the person free use of hs hands and hearng. Dscrete: Although some users would stll use a system that draws unwanted attenton to themselves, a dscrete system s stll preferred, and for some a necessary requrement. 9

18 2 SYSEM 2.1 System Overvew he Seeng Eye Phone system conssts prmarly of a server and a smart phone. he smart phone receves nput from the camera and from the user. he phone takes pctures at regular ntervals and sends them to the server along wth a tme stamp and ts most recently known poston. he user can also speak a command, such as askng for ther current locaton or how to get to ther desred destnaton. he commands are nterpreted by the use of voce recognton software and passed on to the server. he server matches the nput mages to the map mages n the database. Once a match s found the server calculates the camera pose n the 3D real world coordnate system and then uses the floor plan to fnd a route to the desred destnaton. It sends the locaton and drectons back to the phone whch uses a text-to-speech functon to drect the user to hs desred destnaton. Fgure 2-1 shows an overvew of the system. he map uses a local coordnate system wth sectons of the coordnate system gven labels of desrable locatons or useful navgaton nformaton. hese nclude thngs lke hallways, doors, restrooms, room numbers, floor levels, and other useful nformaton one may desre to fnd or need to navgate. 10

19 Fgure 2-1: System Overvew 2.2 Stored Image Map wth 3D Correlaton he mage map of a buldng s a collecton of data consstng of mages, mage features wth correspondng feature descrptors and ther 3D coordnates, floor plans, and other locaton data. hs mage map s created offlne as a part of nstallng the Seeng Eye Phone system n the buldng. It s then used to match mages from the phone and fnd drectons and destnatons Image Map Global Data he mage map contans several peces of key data that all te together. At the top level each buldng wll use a local coordnate system as shown n Fgure 2-2. he lattude and longtude of the orgn are stored along wth the rotaton relatve to a global coordnate system. 11

20 Havng the global data allows the last poston of the user comng nto the buldng to be used as a startng pont for mage searchng. Fgure 2-2: Local Coordnate System (Image courtesy of Google.com) Image Data As part of the mage map buldng process, map mages are taken at regularly spaced ntervals of hallways and other navgable areas of the buldng. he map mages must be spaced frequently enough to ensure the phone mages can always match to a map mage that s not too much dfferent from the phone mages. Usng mages that are taken closer together mproves the locaton estmaton accuracy because of the smaller dsparty between feature ponts. Matchng 12

21 to dstant mages leads to hgher naccuraces n 3D pose estmaton due to the mage resoluton. However, f the mages are spaced too close together the map wll take up too much memory and slow down mage matchng. Images taken every feet prove to be a good balance between accuracy, map sze, and processng speed. he mages are processed wth a feature detector to fnd qualty 2D feature ponts. Feature descrptors are then created usng pxel data n the proxmty of the feature pont. A collecton of several dfferent feature descrptors make up the feature vector. he feature vectors are typcally made up of 64 or 128 feature descrptors. hs feature vector can then be matched aganst the nput mage feature vectors to fnd the closest matches. Fgure 2-3 shows a sample of mage data that would be ncluded n a buldng s mage map. Fgure 2-3: Sample Map Image Data 13

22 In order to speed up real tme processng of the Seeng Eye Phone system, the runtme verson of the map uses preprocessed mages. Once the map s created from the mages, only the features, descrptors, and ponts need to be stored for use by the system. he entre mage s not needed once processed. Usng preprocessed mages speeds up real tme processng n the fnal system by savng redundant computatons. he man purpose of matchng mages s to determne the pose of the camera and hence the locaton and orentaton of the user. he mage feature ponts alone, however, are not enough to gve an accurate pose measurement. Each feature pont must be pared wth a correspondng real world 3D poston n the local coordnate system as shown n Fgure 2-4. When the nput mage s matched to the map mage, the 3D coordnates are used n conjuncton wth the 2D feature ponts n the nput mage and the camera parameters n order to calculate an accurate pose estmate of the phone. Fgure 2-4: 2D and 3D Pont Correspondence for Pose 14

23 2.2.3 Collectng the Image Data he most effcent way to create the map would be the use of a robotc system whch would take pctures at regular ntervals. It would use a SLAM algorthm, or Smultaneous Localzaton and Mappng, along wth stereo cameras, dead reckonng, and other sensors or algorthms to generate the map mages along wth ther correspondng 3D ponts. Fducals (easy to detect markers) placed at key locatons around the buldng to be mapped as an absolute reference can help ncrease the accuracy of the 3D ponts t fnds. he development of the mapcreatng system s not dscussed n ths thess but left to future research Floor Plans and Supplemental Locaton Data In order to effectvely navgate, floor plans and other locaton data must be added to the map. Some of ths data may nclude key locatons one may wsh to fnd. hese locatons could be nformaton desks, room numbers, rest rooms, and other key locatons that would be ncluded on a prnted map or commonly requested destnatons. Floor plans are descrbed n the local coordnate system and assgned a floor number. he combnaton of the floor plan wth the coordnate system allows the path fndng algorthm to determne the poston of the key locatons as well as obstacles and pathways. 2.3 Hand-held Devce he hand held devce only has a few requrements. It must be able to use W-F, run custom programs, and have a camera. Most smart phones have all of the requred features and are very common n today s socety. Along wth beng portable, user frendly, and dscrete, the smart phone makes an deal platform for the gudance system. he user wears the phone around hs neck and nteracts wth the phone usng voce recognton software whch s readly avalable 15

24 for these types of phones. he phone frst connects to the server over W-F. he phone then starts takng mages at a constant rate and sends them to the server. Smple commands can be gvng to the phone lke Where am I? or Fnd room 496. he server processes the mages and returns the locaton or drectons to the desred locaton. ext-to-speech programs, also readly avalable, convey the locaton and drectons to the user to gude hm to the destnaton. 2.4 Server he server performs the mage matchng, pose calculaton, and navgaton plannng wth the use of the stored mage map. Once the server receves an mage, t frst checks the addtonal nformaton sent wth the mage ncludng the most recent locaton. It uses ths ntal locaton to determne the part of the buldng to start searchng mages. Usually the phone would connect to the buldng s local network whch would provde a good startng pont. If enterng from outsde the buldng, t can pull the last known GPS- or network-based poston f avalable. Once the ntal startng pont s found, the server only needs to look n the vcnty of the prevous spot, thus mprovng mage matchng speed. he server uses a feature detector to fnd feature ponts n the smart phone mage. It then starts matchng these features wth the features stored n the mage map database. After determnng the ntal matched features, the camera geometry s used to remove ncorrect features. Once the server fnds the correctly matched mage of the user s locaton, t assocates the 3D ponts of the map features wth the smart phone mage features. Wth the assocated ponts and the camera s ntrnsc parameters, the pose of the user wthn the buldng s estmated. A route s computed to the desred destnaton and drectons are transmtted back to the user. he vson processng done by the server s dscussed n greater detal n chapter 3. 16

25 he server does not have strngent hardware requrements. An average quad core computer s all that s needed to process the mages. However, the buldng does need enough W-F coverage for the smart phone to stay constantly n contact wth the server. 17

26 3 VISION PROCESSING he core of ths thess focuses on the vson processng used to calculate the camera pose and hence the user s locaton and orentaton usng only one camera wth no added features to the scene. Whle some systems have used smart phone cameras to determne locaton they have all used specalzed markers. he Seeng Eye Phone system uses the natural scenery for processng nstead of fllng up the walls wth nformaton markers. Needng to recognze features of the natural ndoor scenes requres more processng power, but provdes an accurate contnuous estmate of the user s poston nstead of recognzng perodc ndvdual markers. hs system also leaves the locaton n ts orgnal form nstead of clutterng up the walls wth odd markers. 3.1 Background here exst several systems for fndng pose usng mage processng. Many robotc systems are desgned around SLAM, or smultaneous localzaton and mappng. he unque problem of SLAM s creatng a map of one s surroundngs whle at the same tme localzng oneself wthn the map. SLAM proves to be a dffcult problem snce each queston requres knowledge of the other. SLAM s prmarly used for systems navgaton n completely unknown envronments and requres the system to travel the envronment more than once n order to buld up the map. In our applcaton, localzaton of the user s beng done n an already known envronment. he map must be made avalable n order to determne the locaton as the user 18

27 moves. In addton, SLAM needs a way to fnd dstance to obstacles on the fly, such as stereo vson. Stereo vson s very popular for vson based systems that need to fnd the dstance to features because t can compute dstance to objects seen by both cameras. Stereo vson works usng the same prncples that our eyes do. here are two cameras set up close together sde by sde wth ther focal axs parallel to each other and facng the same drecton. Each camera sees a slghtly dfferent vew of the same scene. hat dfference allows the dstance from the cameras to be computed or trangulated. In order to accurately compute the dstance, the relatve pose of the cameras must be known along wth ther ndvdual focal lengths and other ntrnsc camera values. Careful set up and calbraton of the two cameras s requred n order to accurately measure dstances to objects and n turn localze the cameras n a map. Because of these requrements stereo vson systems tend to be qute large and cumbersome. MonoSLAM has been developed to work wth only a sngle camera for smultaneous localzaton and mappng; however, t uses an assumed movement model for the camera n order to compare frame to frame mages and s less accurate [26]. hs movement model would not work as well for a person who s walkng as t would on a flyng or rollng robot. In general, MonoSLAM requres a smooth and predctable movement of the camera. People have less predctable movement as they can start, stop, turn or change drecton at any tme. here s also camera bob due to the steppng moton a person makes that would be very dffcult to account for. Many robotc vehcles rely on other supplemental sensors lke odometers, LIDAR, and sonar to accompany vson algorthms and help wth localzaton wthn a map. Most of these systems are expensve or not readly avalable for use on small commonly used devces lke 19

28 smart phones. LIDAR for example can easly cost several thousand dollars. Many methods usng these sensors also requre a more defned movement model. People do not have a predctable enough movement model to use. Because smart phones are beng used as the gudance devce, the system must work wth the commonly used hardware avalable on the majorty of smart phones. Although most smart phones have some sensors lke accelerometers and compasses, t s assumed for ths work that only one camera s avalable to capture mages wth no secondary sensors to ad n localzaton. he addton of supplemental sensors to mprove accuracy s left to future research. he purpose of ths system s to both gve the user knowledge about hs current locaton and also gude the user to a specfed known locaton. he proposed Seeng Eye Phone system takes advantage of the fact that the user s found wthn a known envronment and so a-pror nformaton can be used nstead of buldng up a map on the fly lke SLAM systems. However, because stereo vson or sngle camera systems lke MonoSLAM are not avalable for depth nformaton, another method s used to provde 3D nformaton. In order to accomplsh ths, the mage map s created beforehand contanng known features wth ther 3D locatons. Havng the 3D locatons avalable allows the pose of the user to be calculated wthout the need for stereo vson, specfc movement models, and expensve or unavalable sensors. 3.2 Algorthm Overvew he vson algorthm uses the followng key parts: feature detecton, descrpton and matchng; outler elmnaton; and pose estmaton. A feature detector s used to fnd features n the smart phone mage and match them to the map mages. he phone mages go through the exact same algorthm that the map mages do, only the map mages are processed pror to the system beng used nstead of n real tme. Once features are determned, unque feature 20

29 descrptors are created for feature pont and collected together to form a feature vector. hese feature vectors can then be compared lke any vectors, where a very small dstance error between them consttutes a match. Durng the matchng process, when an mage has too few matched features t s dscarded. Once the number of matched features exceeds a threshold a potental match to the mage locaton s found. hese matched features are referred to as nlers n the frst decson box n Fgure 3-1. Only mages n the near vcnty of the most recent known locaton are compared to the phone mage n order to speed up processng tme. Some of the remanng matched features from the frst step wll nevtably be ncorrect and cause errors n the pose estmaton algorthm; therefore, they must be removed pror to pose estmaton. hese ncorrectly matched features are known as outlers. RANSAC, or RAndom SAmple Consensus, s employed to help remove the outlers. RANSAC attempts to match a data set to a model whle smultaneously dscardng outlers. he fundamental matrx relates correspondng ponts of a par of mages, so t s used as the model for RANSAC n order to accomplsh outler removal. Even though the phone mage and map mage are not set up lke a true stereo vson par wth a known relatonshp between the two cameras, the fundamental matrx stll proves useful for gettng rd of outlers. A random sample of pont correspondences, or matched feature pont pars, s used to create the parameters of the fundamental matrx. he pont pars that do not ft the model are dscarded as outlers, and the model parameters are recomputed usng only the nlers. Only models wth a mnmum number of nlers are consdered for a possble soluton. he process of choosng random data sets, computng model parameters, and fndng outlers s repeated untl a model wth the least amount of nler error s found. After performng RANSAC, as long as there are stll enough matches above the threshold, 21

30 the algorthm contnues on that mage, otherwse t returns to the mage matchng stage to fnd a new mage. he remanng nlers gve a feature correlaton between the two mages that allows the algorthm to obtan the correspondence between the phone mage s 2D features and the 3D real world locaton. he pose s then determned usng the camera s ntrnsc parameters wth the 2D to 3D pont correspondences. he Drect Lnear ransform runs on a subset of correspondences to fnd a rough ntal pose estmate. he Levenberg-Marquardt algorthm, a mnmzaton algorthm, further refnes the pose estmate to fnd a more optmal soluton. Although outlers were dscarded prevously, due to the nature of fundamental matrces some outlers may stll reman n ths step of the algorthm. Once agan we employ RANSAC to remove any remanng outlers. hs tme the pose s used for the model. After a pose estmate s found from a random subset of data, the 3D ponts are reprojected onto the mage usng the pose estmate and the camera s ntrnsc parameters. he algorthm compares the reprojected ponts to the orgnal matched ponts. A large error results n an outler. As before, the RANSAC teratons contnue untl a soluton wth the lowest error s found. Assumng the number of nlers remanng after usng RANSAC s stll above a certan threshold the algorthm fnally declares a true mage match. he resultng pose s transmtted back to the smart phone and also passed on to the navgaton algorthm. Fgure 3-1 gves the general outlne of the vson processng used to fnd the pose. 22

31 Feature Detecton and Descrpton Feature Matchng No Enough Inlers Yes Fundamental Matrx wth RANSAC No Enough Inlers Yes Drect Lnear ransform Levenberg Marquardt Reprojecton wth RANSAC No Enough Inlers Yes Pose [R t] o Camera o Navgaton Algorthm Fgure 3-1: Vson and Pose Estmaton Algorthm Overvew 23

32 3.3 Feature Detecton, Descrpton and Matchng Common Methods Algorthms for fndng, descrbng, and matchng features have been studed extensvely. One popular detector s the Harrs corner detector [27]. It uses egenvalues of mage dervatves to detect corners. he Harrs corner detector s not scale nvarant though. Scale Invarant Feature ransform (SIF) s a more recently developed algorthm that s wdely used for ts proven performance [28]. In SIF, features are found usng Dfference of Gaussans to approxmate Laplacan of Gaussans. o fnd features SIF, searches at multple scales and Gaussan blur levels of the mage. hese scale and blur levels are called octaves. he use of octaves provdes robust scale nvarant features. he gradent magntudes and orentatons are computed to gve each feature an overall orentaton whch provdes rotaton nvarance. A feature descrptor s then computed fndng gradent magntudes and orentatons n subsectons of a wndow around the feature to create a 128 element feature vector. SIF produces very good results n turns of fndng and matchng features, but proves too slow to use n near real tme applcatons Speeded Up Robust Features Speeded Up Robust Features (SURF) proved to be the best choce for ths applcaton [29]. SURF s another method of fndng and descrbng features. SURF has the advantages of scale and rotaton nvarant features and runs much faster than SIF. SURF uses a Hessanmatrx approxmaton for feature detecton on ntegral mages. he use of the ntegral mages, or summed area tables, allows for fast feature detecton usng the Hessan-matrx approxmaton as a box flter snce t only requres four addtons to calculate. Snce ntegral mages are used, the Hessan flter s scaled nstead of the mage n order to handle scale nvarance. By only changng 24

33 the scale of the flter nstead of havng the computatonal burden of scalng and blurrng the mage multple tmes the speed at whch SURF handles scale nvarance s greatly ncreased. he ntegral mage s created by summng all pxel values to the left and above the current pxel as demonstrated n Fgure 3-2. In order to sum the pxels n the red box wth the orgnal mage we smply sum each pxel value whch n ths case uses 9 addtons resultng n the sum of 578. Wth the ntegral mage, we only need 4 addtons: Sum = = 578. he number of addtons stays the same regardless of the area summed over when usng the ntegral mage. Havng the number of addtons constant greatly ncreases speed, especally when scalng up the flters. Image Integral Image Fgure 3-2: Integral Image Creaton and Use SURF Feature Descrptors Both the orentaton and the feature vector make heavy use of the Haar-wavelet flters. Haar-wavelets are popularly used for spatal flterng and are shown n Fgure 3-3. he black part 25

34 has a value of -1 and the whte a value of +1. hs makes the flter deal for detectng hgh contrast areas lke edges and corners that make for good features. he feature descrptors consst of a collecton of mage data around the feature pont as well as an overall feature orentaton to descrbe the feature. he feature orentaton gves the feature rotaton nvarance when t comes to matchng. he orentaton gven s based on the sum of Haar-wavelet responses n an angle around the feature pont. hen the feature vector s constructed of Haar-wavelet responses n each of the 16 subsectons of a 4x4 wndow of 25 ponts for each subsecton around the feature pont. A sample wndow s llustrated n Fgure 3-4 where the feature pont would be n the center of the grd. he response s found once wth the horzontal flter and once wth the vertcal flter resultng n 32 data ponts. he response s stored along wth orentaton n both the x and y drecton, yeldng a 64 element feature vector. Havng a sze of only 64 elements, ths feature vector decreases the computatonal tme of the algorthm n comparson to SIF that uses a 128 element feature vector. he reader s referred to the SURF paper by Bay et al. for further detals [29]. Fgure 3-3: Haar Wavelets 26

Fgure 3-4: SURF Feature Descrptor Grd (Normal mage depcted. In algorthm the ntegral mage s used for speed) 3.3.4 Scale Space he scale space gves SURF scale nvarance to allow features to be detected at dfferent dstances.

35 Fgure 3-4: SURF Feature Descrptor Grd (Normal mage depcted. In algorthm the ntegral mage s used for speed) Scale Space he scale space gves SURF scale nvarance to allow features to be detected at dfferent dstances. More scale octaves result n hgher scale nvarance. he recommended number of octaves s usually 4 or 5. For ths applcaton, however, only 2 octaves are needed. In fact, makng the features themselves too scale nvarant can actually be detrmental to pose estmaton for ths applcaton. Usually more scale nvarance s needed for object detecton because the object could be at any number of scale ranges n the mage that contans t. However, n ths ndoor locaton fndng applcaton t s undesrable. Searchng for an mage of the end of a hallway provdes a good example. he left mage of the end of the hallway n Fgure 3-5 wll be the object to be found n the smart phone mage on the rght. If 5 octaves are used for the 27

feature detecton, the camera mage on the rght taken from one end of the hallway could be matched to the map mage on the left taken from the other end of the hallway.

36 feature detecton, the camera mage on the rght taken from one end of the hallway could be matched to the map mage on the left taken from the other end of the hallway. he matched mage porton s marked by a black rectangle n the mage on the rght. hs small matched area results n feature ponts that are too close together and at low resolutons, whch wll cause a large error n the pose estmaton. herefore, t s desrable to have the features the phone mage maps closer to the phone rather than farther away. Usng frequent map mages allows mages to be matched to nearby mages, and also essentally creates several levels of scale nvarance that are bult nto the map. Usng 2 octaves for the features proves more than enough to handle the smaller scale changes n between map mages. Fgure 3-5: Matchng Dstant Image Matchng Images SURF s run on the map mages offlne and on the phone mages n real tme. Usng SURF on both mages gves feature vectors that can be drectly compared to each other. he way the feature vectors are set up allows for fast and easy comparson. he orentaton of the feature ponts are used to compensate for rotaton n the mages. Applyng the rotaton to the feature 28

37 vector smply shfts the elements n the vector so the correct elements n the feature vectors can be drectly compared to each other. hen the vectors can be subtracted from each other to obtan an error fgure. If the error s under a certan threshold, the feature par s consdered a match. 3.4 Removng Outlers Usng too hgh of a threshold for the feature matchng dstance reduces the number of ncorrectly matched features, or outlers, but also tends to elmnate a good number of correctly matched features. In order to mprove the accuracy of the pose estmaton t s desrable to keep as many good features as possble and fnd another way to elmnate bad features. As a result, the threshold on what s consdered a postve match for a feature s relaxed n order to maxmze the number of correct features for further testng. Camera geometry and RANSAC are used to remove the extra false features ntroduced by the low threshold Eppolar Geometry Assumng the mage features reman statc, the phone mage and map mage are related by a transformaton. Usng the pont correspondences for the two mages, the transformaton between the two mages can be approxmated as the fundamental matrx F. hs matrx descrbes the eppolar geometry between the two mages. he fundamental matrx F s defned as 0, ( 3-1 ) = p r Fpl where Fpl s the eppolar lne n the mage contanng pont p r. he pont pr eppolar lne. See Fgure 3-6. must le upon the 29

38 Fgure 3-6: Eppolar Geometry Every feature pont n one mage must le upon the eppolar lne created by the correspondng feature pont n the other mage. Fgure 3-7 demonstrates the features lyng on the correspondng eppolar lnes created usng the fundamental matrx and feature ponts of the other mage. If the correspondng feature pont les too far away from the eppolar lne, t s consdered an outler and removed from the lst of matched features. However, n order to remove these bad features the fundamental matrx must frst be calculated usng the features lst whch may contan bad features and lead to an naccurate fundamental matrx. In order to fnd a vald fundamental matrx and remove outlers, RANSAC s used wth the 8-Pont algorthm [30]. 30

39 Fgure 3-7: Eppolar Lnes RANSAC Random sample consensus, or RANSAC, s a non-lnear teratve technque used to estmate model parameters from a set of data ponts whle at the same tme elmnatng bad data ponts, or outlers. RANSAC has proven tself as a robust parameter estmaton method. Although the algorthm s teratve, typcally only a few teratons are needed for good results thus allowng the algorthm to run quckly. RANSAC starts by takng a random smaller sample set from the large data set to get an ntal parameter estmate. he elements of the sample set are automatcally consdered nlers for the teraton they are used. After the ntal estmate s made, every other element of the complete data set s then tested aganst the model. A dstance measure s determned for the 31

40 amount of error and a threshold establshed to classfy ponts as nlers or outlers. he nlers are added to the ntal sample set. If the resultng number of nlers s above requred nler threshold, the model parameters are recalculated wth the all of the nlers. he model parameters and the total error of the nlers are recorded as the current best guess. hs process s repeated on subsequent teratons wth new random data sets. he resultng error on each teraton s compared aganst the recorded best error. If the new parameters produce less error, than the new error and parameters are recorded as the new best estmate. After a set number of teratons the ponts that match the best estmate are consdered nlers Pont Algorthm wth RANSAC For the 8-pont algorthm, usng p r = ( x ', y',1) and p l = ( x, y,1) we multply out Equaton 3-1 and rearrange the elements to get a vector multplcaton of x' x, x' y, x', y', xy', y, y', x, y,1)( F, F, F, F, F, F, F, F, F ) = af 0. ( 3-2 ) ( = Equaton 3-2 holds for any gven vald set of pont correspondences n a. Matrx A s then created by stackng the row vectors of 8 dfferent pont correspondences chosen to get the matrx multplcaton of Af = 0. In order to obtan f ' as an ntal estmate of f the Sngular Value Decomposton (SVD) of A s performed. he column of V that corresponds to the smallest sngular value of the SVD s the estmate f '. After rearrangng ths column of V back nto matrx form to create an ntal fundamental matrx F ', the SVD of ' F s then taken. Settng the lowest sngular value of the decomposton to 0 and then multplyng the decomposton back together results n a vald fundamental matrx. hs second decomposton s done to account for 32

41 the fact that the system has nose and F ' does not ntally satsfy the condton that the fundamental matrx s only rank 2. he RANSAC algorthm starts by choosng 8 random ponts from the set of matched features n one mage that may contan outlers. he parameters of the fundamental matrx F are then computed wth these 8 pont correspondences. he model s then appled to all matched ponts and the error (dstance from the eppolar lne) computed as e = p Fp. Any ponts that r l are above a certan threshold dstance from the eppolar lne are marked as outlers shown by the red dot n the left mage of Fgure 3-8. Fgure 3-8: Eppolar Lne RANSAC Outler Once all outlers are elmnated for the teraton, f the number of nlers s above a certan threshold the total error of the nlers s stored along wth the fundamental matrx model F as the current best guess. he process s then repeated wth a new set of 8 random ponts. After each teraton the total error s compared to the current best error. If the new fundamental matrx has less error, t replaces the current one. he algorthm stops after a certan number of teratons or after a certan error threshold s reached. Once the fnal fundamental matrx s determned, the 33

outlers are removed from the set of matched ponts and the remanng ponts can be used for the pose estmaton. For ths applcaton, the resultng fundamental matrx s not needed.

42 outlers are removed from the set of matched ponts and the remanng ponts can be used for the pose estmaton. For ths applcaton, the resultng fundamental matrx s not needed. he reason for usng RANSAC was smply to help elmnate outlers. Fgure 3-9 shows an ntal matched mage par wth many outlers. Fgure 3-10 shows the fnal result after runnng RANSAC. Fgure 3-9: Intal Matched Feature Ponts Fgure 3-10: Matched Feature Ponts after RANSAC wth Homography here s a chance that some ncorrect features stll reman after ths step. Because the fundamental matrx multpled by one pont defnes a lne n the other mage, f an erroneous 34

43 matched pont happens to le upon that eppolar lne t wll be consdered a vald pont after performng the RANSAC. An example of ths can be seen n Fgure 3-11, where the outler clearly les on the eppolar lne of the pont n the other mage. Wth the addton of 3D nformaton, durng pose estmaton we can further flter out these erroneous ponts to ncrease our accuracy. Fgure 3-11: False Inler after Fundamental Matrx RANSAC 3.5 Pose Calculaton Once the majorty of outlers are gone, the algorthm proceeds wth fndng the smart phone pose. Usng the 2D to 3D pont correspondences along wth the camera s ntrnsc 35

44 parameters, the smart phone camera pose can be found wth respect to the defned map coordnates. Several methods exst wth dfferent levels of speed, ease of mplementaton, and accuracy for solvng for the pose gven a set of correspondences. Several methods focus on usng only 3 or 4 ponts for pose estmaton, but more ponts results n greater accuracy when the ponts are found n a nosy envronment. For ease of mplementaton and accuracy, the pose s estmated usng the drect lnear transformaton (DL) [31] for the ntal startng pont further optmzed by the Levenberg Marquardt algorthm (LMA) [32]. he LMA requres an ntal startng pont n order to converge on a mnmzed optmal soluton. he DL provdes the startng pont whle the LMA mnmzes the geometrc error of the pose estmate. Usng 6 or more ponts for the DL results n an overdetermned system, so as before, sngular value decomposton s used to solve the DL [33]. All data ponts are represented by homogeneous coordnates. he 2D mage ponts p = ( x, y, w ) relate to the 3D ponts P = ( X, Y, Z, W ) by [ R t] P p ~ K. ( 3-3 ) he rotaton and translaton matrx[ R t], or the extrnsc camera parameter matrx, brng the 3D ponts nto the camera s coordnate system, whle the ntrnsc parameter matrx K projects the ponts onto the mage plane. hs relatonshp between the 2D and 3D ponts s demonstrated n Fgure

45 37 Fgure 3-12: Pont Correspondence Relaton Due to usng homogeneous ponts the two sdes of Equaton 3-3 are equal to a scale factor. Combnng the ntrnsc and extrnsc matrces H = [ ] t R K, the relatonshp between our ponts can be expressed as 0 = HP p. By rewrtng H as ts three transposed row vectors ] [ 3 2 1,h,h h H = the cross product s expanded out to obtan = w P x w P x w P x P h h P h h P h h HP p ( 3-4 ) Separatng out the h vectors on the rght sde of the equaton and leavng the ponts multpled by the projecton elements on the left results n the matrx multplcaton

46 = = h h h 0 P P P 0 P P P 0 P h h P h h P h h x y x w y w w P x w P x w P x. ( 3-5 ) Because the left matrx s lnearly dependent, one equaton can be removed leavng = h h h P 0 P P P 0 x w y w. ( 3-6 ) Equaton 3-6 s for any gven 2D to 3D pont correspondence. Stackng up 6 dfferent correspondences on the left sde of Equaton 3-6 creates the overdetermned equaton. Callng ths stacked matrx A the norm of Ah s mnmzed wth the constrant that 1 = h by usng the SVD n the same manner as dscussed n secton After usng the SVD to fnd an estmate of H, the components of R and t are then easly extracted from the soluton after multplyng 1 K on the left sde. Once the ntal pose s calculated usng the DL, the LMA refnes the pose whle mnmzng the reprojecton error. LMA uses a method of gradent descent combned wth Gauss-Newton teraton to mnmze the sum of the squares of the errors between the data ponts and the functon. he LMA takes the ntal pose as the startng pont for the algorthm. Although the algorthm can take nto account all ponts to solve for the pose, there may stll be erroneous ponts so RANSAC s once agan used to get rd of the few remanng outlers. Sx ponts are chosen for the random sample on whch to run the pose estmate. Usng the resultng rotaton and translaton vector, the known 3D ponts are reprojected onto the mage plane and the dstance between the feature pont and the projected pont are measured for the error fgure. See Fgure 3-13.

47 Fgure 3-13: Reprojecton Outler If the error s above a threshold these ponts are dscarded as outlers and removed from the nlers set. he purple lne n Fgure 3-13 represents a large error that exceeds the set threshold. he remanng ponts fall wthn acceptable lmts. Wth the remanng nlers the algorthm s run agan to refne the pose and measure the total reprojected error. he pose and error are saved and the algorthm contnues the RANSAC teratons untl the most accurate pose estmate s found. Fgure 3-14 and Fgure 3-15 show the fnal result of the pose estmaton algorthm (locaton and orentaton) overlad on the phone mage and the floor plan. If the pose estmate s extremely dfferent than the prevous known poston, t s dscarded as an error and the next phone mage s used. 39

48 Fgure 3-14: Fnal Pose Estmate Fgure 3-15: Floor Plan wth Pose 40

49 4 ESING AND RESULS 4.1 est Objectves he man objectve of the test was to determne the ablty and robustness of the vson algorthms to accurately fnd the pose of the camera n a unversty buldng. he frequency of map mages was also tested to determne the optmal spacng dstance between mages. Many text-to-speech, voce recognton and route plannng algorthms have been tested and used n commercal applcatons. estng and comparson of these dfferent methods and algorthms wll not be made. For the purpose of ths thess, we are evaluatng the feasblty of usng computer vson to analyze smart phone mages to determne a user s locaton and orentaton n an ndoor scenaro from an mage-based mappng system. herefore, the testng focuses on the vson algorthms for ndoor pose estmaton. 4.2 Locaton Informaton and Image Map he test was setup n the hallways of the 4 th floor of the Brgham Young Unversty Clyde buldng. Pctures were taken of the hallways n 10 foot ncrements wth a pont and shoot camera. he close mage spacng allowed for testng several dfferent ncrements for map mages. hese mages were then run through the SURF algorthm to fnd the 20 best features n each mage. hese features were n turn measured by hand n the hallways n order to fnd the three-dmensonal coordnates wth respect to a predetermned world coordnate orgn. Future measurements for mplementaton of the system would be done n conjuncton wth capturng the 41

50 mages by a robot runnng SLAM for nstance, but to establsh ground truth the features were measured by hand. For ease of measurement and analyss the North-West corner of the hallway was chosen as the orgn as shown n Fgure 4-1. he feature keyponts along wth ther 3D coordnates and feature descrptors were stored n YML format as the mage map dataset. Fgure 4-1: Clyde Buldng 4th Floor wth Local Coordnate System 4.3 estng Hardware he smart phone used for the experment s an LG Optmus S. he Optmus S s a lower ter smart phone to show that hgher end phones are not requred for the use of ths system. he phone only uses a sngle core 600MHz processor runnng Androd 2.2 Froyo. he computer used for the server s an AMD quad core runnng at 3.4GHz wth 8GB of RAM and a 1B 7200 rpm hard drve runnng Wndows 7. he program was created wth Mcrosoft Vsual Studo

51 4.4 est Procedure Calbraton pctures were taken on the smart phone camera to fnd the cameras ntrnsc parameters and dstorton coeffcents. hese parameters were saved to be used by the pose estmaton algorthm. Pctures were then taken wth a smart phone camera whle walkng down the mapped out hallways and then fed to the server for processng. he mage sze of both the map and camera pctures was set at 640x est Results Image Frequency o begn our testng we checked usablty at dfferent dstances from a map mage. Map mages too far away proved dffcult n feature matchng even wth the scale space at several octaves. Most features show up on the walls whch are qute vsble up close, but at great dstances are undetectable due to the skew created by the narrow angle of the camera wth the feature pont. Even at some dstances where a match mght be made despte the long dstance to the map mage, the pose estmaton suffered from the ncreased error due to pxelaton. We found ntervals of 10 feet to be a good compromse of accuracy and speed Algorthm Robustness In the Clyde buldng lghtng caused problems wth feature matchng snce the floor s very reflectve. he use of the fundamental matrx wth RANSAC proved robust enough to elmnate most of these erroneous features. For the mages that stll had outlers after the prevous step, the remanng outlers were elmnated by the second RANSAC stage whle calculatng the pose. 43

52 here were a couple of phone pctures taken that were unable to map to an mage. However, another pcture n the near vcnty was able to match to the same locaton that the other mage was unable to map to. hese mages that fal to map are not very frequent, around 1 n 15 or 20 mages, and do not negatvely affect pose estmaton. When an mage s unable to fnd a match n the near vcnty of the prevous mage t s dscarded and the next phone mage s used. Durng testng ths next mage was always able to fnd a match Pose Estmaton Accuracy Overall the Seeng Eye Phone system was able to accurately localze the phone wthn the mapped out hallways. Several mages of the pose estmaton are shown n Fgure 4-2. he mages show the pcture taken by the smart phone wth the map poston overlad for reference. Postons were tested where the user was walkng n the mddle of the hall as well as walkng at each edge of the hall. he pctures on the edge of the hall effectvely rendered most of the features on that sde of the hall useless due to the angle. Wth the frequency of mages used, the system was stll able to handle the dfferent angles wthout much dffculty. Pctures that were taken at extreme angles n relaton to the map mage dd have problems fndng matches. An example would be facng a wall n the hallway nstead of facng the open hallway. In order to fnd a match, the user would need to turn enough to look down the hallway. Pctures were also used that were taken closer to map mages and farther away n order to test the scale nvarance of the system. As expected, some smart phone mages would map to multple map mages snce many of the hallway mages overlap each other. Usually the closer of the mages wll have the strongest and most matches, but f not, matchng to the next mage further down s stll vald for pose estmaton. 44

Fgure 4-2: Map Progresson Images he accuracy wth only 20 ponts used n the map mages s very good as shown n able

Accuracy to a few feet s good enough for ndoor localzaton as hgher resoluton s not needed.

hs vson based method has more detal n postonal resoluton than other methods such as perodc RFID tags or sgns.

53 Fgure 4-2: Map Progresson Images he accuracy wth only 20 ponts used n the map mages s very good as shown n able 4-1. he estmated poston was almost always wthn a few feet of the measured poston. Accuracy to a few feet s good enough for ndoor localzaton as hgher resoluton s not needed. wo or three feet of potental naccuracy wll stll allow a person to reach a doorway or other desred locaton. hs vson based method has more detal n postonal resoluton than other methods such as perodc RFID tags or sgns. It s also much more consstent than W-F or other RF based locaton methods that can vary wdely wth temperature, people movng around, and obstructons that affect the sgnal strength. 45

54 able 4-1: Estmated vs. Measured Postons Real (x,y) (m) Estmated (x,y) (m) Dfference (m) x y x y x y he algorthm s fast enough for real tme use wth a small number of mages to search. Because the search looks n the area around the last know poston only 5 to 15 mages may need to be searched. he search for the ntal startng locaton may take longer, but t s not unreasonable n tme. It can be lkened to the tme t takes for GPS systems to ntally acqure satellte postons before beng able to fnd ts poston. he system can start fndng the current locaton whle the user gves the command for the desred destnaton. GPS can also be used when frst enterng a buldng to determne the entrance to begn the mage search on and ncrease the ntal locaton detecton speed. 46

55 4.5.4 Insuffcent Features hs method of usng exstng features assumes there are suffcent features avalable to recognze locatons. Some areas have null spots where there are not enough features to make a map mage such as end of some hallways or when facng a corner, as can be seen n Fgure 4-3. Some of these areas are just ntermttent postons whch do not requre a good pose estmate for wayfndng. When turnng for nstance, t s understood to turn 90 degrees to a new hallway or when enterng a door. Matchng an mage halfway through the turn s unnecessary. Once the turn s completed the matchng can contnue usng the new hallway s features. Future research may be able to utlze the phone s extra sensors to help handle areas that do not contan enough features to fnd the pose. Fgure 4-3: Insuffcent Unque Features 47

A Binarization Algorithm specialized on Document Images and Photos

A Binarization Algorithm specialized on Document Images and Photos A Bnarzaton Algorthm specalzed on Document mages and Photos Ergna Kavalleratou Dept. of nformaton and Communcaton Systems Engneerng Unversty of the Aegean kavalleratou@aegean.gr Abstract n ths paper, a