Human-Readable Fiducial Marker Classification using Convolutional Neural Networks

Size: px
Start display at page:

Download "Human-Readable Fiducial Marker Classification using Convolutional Neural Networks"

Transcription

1 606 Human-Readable Fiducial Marker Classification using Convolutional Neural Networks Yanfeng Liu, Eric T. Psota, and Lance C. Pérez Department of Electrical and Computer Engineering University of Nebraska-Lincoln Lincoln, United States Abstract Many applications require both the location and identity of objects in images and video. Most existing solutions, like QR codes, AprilTags, and ARTags use complex machine-readable fiducial markers with heuristically derived methods for detection and classification. However, in applications where humans are integral to the system and need to be capable of locating objects in the environment, fiducial markers must be human readable. An obvious and convenient choice for human readable fiducial markers are alphanumeric characters (Arabic numbers and English letters). Here, a method for classifying characters using a convolutional neural network (CNN) is presented. The network is trained with a large set of computer generated images of characters where each is subjected to a carefully designed set of augmentations designed to simulate the conditions inherent in video capture. These augmentations include rotation, scaling, shearing, and blur. Results demonstrate that training on large numbers of synthetic images produces a system that works on real images captured by a video camera. The result also reveal that certain characters are generally more reliable and easier to recognize than others, thus the results can be used to intelligently design a human-readable fiducial markers system that avoids confusing characters. Keywords computer vision, convolutional neural network, machine learning, fiducial marker. I. INTRODUCTION Fiducial markers play an important role in systems that need to track the location and identity of multiple objects in a scene. Several methods have been presented over the past decade to solve the fiducial marker problem such as QR codes [14], AprilTags [11], ARTags [2], and circular dot patterns [10]. These markers usually feature black and white patterns to contain binary information, as shown in Figure 1. Regarding tag design, researchers generally focus on properties like minimum tag size, minimum distance from tag to camera, maximum viewing angle, and optimal shapes for detection [3, 7]. Specially designed fiducial markers have the advantage of low false positive rate, low false negative rate, and low inter-marker confusion rate, but these markers are not human-readable and require highly specialized detector/decoder algorithms [13]. In Figure 1. Examples of machine-readable fiducial markers Figure 2. Synthetized training image examples. The distortions include rotation, shearing, translating, scaling, contrast adjustment, motion blur, and Gaussian noise. From top to bottom: 6, A, C, Q. addition, they often require a relatively large portion of the overall resolution in the image. While these solutions work well under a variety of circumstances, they are not suitable to applications that require humans to identify markers or situations where a relatively low-resolution crop is assigned to each marker. Alternatively, the set of alphanumeric characters are ubiquitous as human-readable fiducial markers. For example, they are already being used to identify athletes, livestock, and automobiles. However, the existing analysis on machinereadable markers are design-dependent and their conclusions cannot be applied to other markers. There are no guidelines in terms of how to choose an optimal set of human-readable fiducial markers that are robust to variations in lighting, orientation, and size. One method of classification that is particularly well-suited to handling these variations is convolutional neural networks (CNNs). CNNs have achieved significant breakthroughs in recent years [4, 5, 8]. Compared to traditional classification methods, CNNs do not rely on heuristically designed algorithms for the target objects. With enough training data and sufficient complexity, CNNs can be taught to extract features on many levels and have even been demonstrated to exceed the ability of humans to recognize objects in images [5] /17/$ IEEE

2 607 In this paper, a CNN is designed and trained to recognize human-readable alphanumeric characters as fiducial markers. The training uses a large set of synthetically generated images of distorted characters. The results demonstrate that, while training was performed on synthetically generated data, the CNN can recognize a highly challenging set of characters cropped from real images with more than 50% accuracy. Also, the results reveal inherent confusion between characters. Thus, for applications where only a subset of the characters is needed for identification, a set of easily differentiable characters can be chosen in order to maximize classification accuracy. An analysis and categorization of main causes for confusion is provided and demonstrates that certain characters are intrinsically difficult to differentiate regardless of the classification algorithm being used. II. RELATED WORK Previous research has explored methods for character recognition using convolutional neural networks. In [6], the authors proposed to treat each English word as an individual pattern, and trained a CNN on a data set of 90k words. Each word is synthetically generated by the computer, adding variations in view angle, orientation, blur, font, and noise. The locations of the words are hypothesized using a region proposal method inspired by [4]. Liu and Huang trained a CNN to recognize Chinese car plate characters in realistic scenes [9]. Chinese characters are morphologically different from alphanumeric characters, so the authors trained a separate softmax layer while sharing all the hidden layers with the softmax layer for alphanumeric characters. The authors also create their own database of Chinese characters due to the shortage of such data. Each image was captured in real scenes on the street and then hand labeled for training and testing. Radzi and Khalil-Hani implemented the CNN method to recognize Malaysian car plate characters and made speed and accuracy improvements in several stages of the technique [12]. The training images of characters are extracted from license plates, viewed from various angles. They are then binarized, resized, centered, and labeled. This paper differs from previous work on character/word recognition in two important ways. First, to our best knowledge, no research paper has studied the reliability of each individual alphanumeric fiducial marker as compared to other markers. We thoroughly compared all markers without leaving any characters out, whereas the study presented in [9] intentionally left out I and O and [12] left out I, O, and Z. Second, while [6] achieves impressive accuracy with text detection and recognition, their network is trained by treating each word as a whole. We propose to look at each character separately and measure their features and reliability under full scale variation and distortion. Moreover, the data augmentation methods used by [6], [9], and [12] are limited in terms of rotation and translation, making these methods poorly suited to generalized fiducial marker detection and tracking. There were no upside-down characters in them, and [9] and [12] centered the characters before training and testing. III. DATA AUGMENTATION Properly training a deep convolutional neural network requires a tremendous amount of highly variable data to prevent over-fitting. Collecting the data manually can be tedious and impractical. Therefore, data augmentation is often used to procedurally augment the training data. For each number (0 to 9) and English letter (A to Z) considered, we generate 5000 pictures with 400x400 resolution, and apply a total of seven types of randomized distortion to each image: rotation, shearing, translating, scaling, contrast, motion blur, and Gaussian noise. The first four categories of distortions are combined as a single affine transformation given by T = s $ s & t $ t & α & 0 α $ cos θ sin θ 0 sin θ cos θ 0, where s x and s y are used to adjust scale in the horizontal and vertical directions, t x and t y are used to shift the center point, α x and α y allow for shearing, and θ rotates the image. The contrast adjustment modifies the lowest intensity value and the highest intensity value, effectively linearly mapping the intensity values in the original image to the new range. Motion blur is simulated by convolving the image with an oriented, uniform, line filter. The kernel is generated by assigning a random movement distance and movement angle. Finally, Gaussian noise is sampled from an additive, independent noise source that follows the distribution f x = 1 ($8:) < σ 2π e8 => <. To avoid the effects of aliasing, the augmentations are applied to the larger 400x400 image. After the distortions, the pictures are resized to 32x32 and fed into the convolutional neural network. Figure 2 illustrates some examples of the augmented pictures. Table 1. Data augmentation parameters range Parameter Range Rotation angle θ 0 ~ 360 Horizontal shearing α $ 0 ~ 0.5 Vertical shearing α & 0 ~ 0.5 Horizontal translation t $ 80 ~80 Vertical translation t & 80 ~ 80 Horizontal scaling s $ 0.3 ~ 1 Vertical scaling s & 0.3 ~ 1 Contrast lower bound 0 ~ 0.45 Contrast upper bound 0.55 ~ 1 Motion blur distance 3 ~ 7 Motion blur angle 0 ~ 360 Gaussian noise mean 0 Gaussian noise variance 0 ~ 0.05

3 608 Figure 4. Manually cropped images from real testing videos. Top to bottom: 6, A, C, G. Figure 3. The architecture of the convolutional neural network trained to detect alphanumeric markers. IV. CONVOLUTIONAL NEURAL NETWORK ARCHITECTURE The convolutional neural network that was trained to recognize digits and letters has 15 layers, including 1 input layer, 3 groups of convolution rectifier max-pooling layers, 1 fully connected layer, 1 rectifier layer, another fully connected layer, 1 softmax layer, and 1 classification layer. This architecture was empirically found to provide a suitable balance between accuracy and overfitting. Figure 3 illustrates the convolutional neural network architecture. V. TRAINING PARAMETERS The convolutional neural network is trained using stochastic gradient descent with moment. The initial learning rate was set to 0.01 and dropped by a factor of 10 for every 20 epochs. The maximum epoch is set at 100. The data set was randomly divided into training set and testing set, so that on average the training set has roughly an equal amount of training pictures for each category. To train a neural network in a reasonable amount of time, we used paralleling computing with an NVIDIA Titan Black GPU. VI. RESULTS The training set and testing set are randomly divided at a ratio of 1:1 from a total of = images. The range of parameters used to generate these images are provided in Table 1. The training set and testing set used during the training stage are very similar to each other because they are both generated by the computer. Thus, an additional testing set was built and tested to analyze classification performance in real life situations. A series of videos of alphanumeric characters were captured in natural conditions and 150 character images were manually cropped out of each character video for testing purposes. Thus, a total of 5400 images crops were used for the analysis. They were then set to black and white and down-sampled to 32x32 to fit the neural network input. Characters that were partially occluded were not selected, since this was not considered by the original augmentations. Figure 4 shows examples of cropped images from real videos. To find the data augmentation settings that gives the best accuracy during both the training stage and during real video testing, two of the distortion options, Gaussian noise and contrast adjustment, were switched on and off. The affine transformation distortion is kept in all training settings because it simulates realistic image scale and angle variation. The setting that gives the highest accuracy (92.39%) on the computergenerated testing set is no Gaussian noise, no contrast adjustment. In the video testing set, there was not as much Gaussian noise as simulated in the training set and there are also effects like background reflection and bright glare, which are not considered in the training set. Despite these unaccounted-for image distortions, accuracy on the highly challenging manually cropped images is 59.18%. Figure 5 illustrates the major causes of confusion for the testing set by providing examples of each. The first type of confusion comes from scaling. Some alphanumeric markers are Figure 5. Common confusions during testing stage, with confusion type labeled.

4 609 Table 2. Success rate and top confusion rate by alphanumeric order. The most accurate marker (X) and the least accurate marker (9) are highlighted. Marker Success rate (%) Most confused with #1 (%) #2 (%) #3 (%) O 8.28 D Z 0.60 J Z E 3.92 J 0.48 C P 0.96 A 0.64 F S G L V 1.48 A B 5.68 I G A V 1.72 Y 0.52 F 0.40 B D 0.36 R 0.32 C U E 0.08 D Q 3.20 G E F 0.44 V 0.16 F J G D 1.40 H M 0.80 N I Z J I 0.60 F K J 0.16 Y L E 0.16 J 0.16 M N 1.44 W 1.32 H 0.92 N H 1.12 Z 1.12 H 1.12 O D 0.76 G 0.52 P Q 0.36 V 0.12 Q D 3.00 O 1.52 G 0.28 R H 0.44 W 0.24 P 0.20 S J 0.12 T Y 1.40 L U C D 0.16 V A 1.12 W 0.72 W M V 0.40 X Y 0.12 H Y T 0.88 A Z N scale sensitive. For example, 0, Q and O all have an elliptical shape, and the main difference is simply that 0 is smaller and skinnier than Q and O. Scaling together with shearing can cause them to look very similar. The second type of confusion comes from rotation. For example, 9 and 6 are identical when rotated by 180 degrees. Since we set the random rotation to be from 0 to 360 in the data augmentation stage, the success rate for 9 and 6 would be expected to be near 50%, which is supported by the results. The third type of confusion comes from shearing. For example, 7 and L are not identical by rotation alone, but when stretched unevenly on horizontal and vertical directions (which Figure 6. Success rate by marker. The horizontal threshold line is drawn at 95%. happens when viewed from an off angle), they are difficult to differentiate from one another. The fourth type of confusion comes from image overexposure during the capture process. The exposure of an image is controlled by the aperture, shutter speed, and gain of the camera. If those settings are not carefully chosen, the image sensor saturates and the image loses a large amount of detail due to high average pixel intensity. This causes the marker to lose its exact shape and confuses the neural network. While it might be possible to explicitly train the network to handle overexposure, this variable was not considered in this work to reduce the number of data augmentation types. In general, the problem of overexposure can be solved by purposely underexposing the image during capture and maximizing local contrast in post processing. The fifth type of confusion comes from contrast. When the contrast ratio is low, there is little difference among the pixels of an image, causing them to have all low or all high values and resulting in a situation where all the neurons in the neural network are activated to similar extents. This will cause the network to give unpredictable results that are not bound to a particular marker. The sixth type of confusion comes from motion blurring. When objects move quickly relative to the camera during capture, its image is dragged along its path. This causes the image to blur. While blurring was considered by the augmentations of the training data, a uniform line filter was used to simplify the process. In practice, if the object does not move with a constant velocity in relation to the camera, a non-uniform blurring might occur. The seventh type of confusion comes from noise in the background, mainly presented as reflection of other objects in the scene. This type of noise is different from Gaussian noise. Gaussian noise is mainly created by the amplification of the sensor signal, while the noise here occurs in the form of reflections and glare.

5 610 In many cases, the confusions of the convolutional neural network are due to transformations that make it nearly impossible to differentiate between alphanumeric characters. For example, it is impossible to differentiate a rotated 6 from a rotated 9 when using Arial font. In contrast, even though some pairs are abstractly similar, the convolutional neural network is able to recognize subtle differences between them, such as C and U, Z, N and 2, W and M, 1 and I. However, this result is fontdependent. If the markers are presented in a different font that increases the similarity between them, the results would likely vary. Based on the success rates shown in Figure 6, top three most common confusion cases for each marker shown in Table 2, and the confusion type analysis provided above, we suggest the following rules when selecting alphanumeric fiducial markers: 1. Avoid using pairs of markers that are morphologically similar after a certain affine transformation or overexposure effect (6 9, L 7, S 5, 0 O Q, and 8 B). However, if only one marker in a pair is used, then there will be significantly less confusion. 2. Use markers that are not easily confused with others if possible (With a success rate threshold set at 95%, these markers are X, I, K, C, Y, U, J, H, R, T, P, 1, 2, F, Z, A, N, 4, E, and W). VII. CONCLUSION AND FUTURE RESEARCH AGENDA In this study, a convolutional neural network is trained to classify human readable alphanumeric fiducial markers, and the accuracy of classification for each marker is evaluated. It is demonstrated that some characters are more reliable and easier to classify, and provide some advice for selecting markers in future applications. We also demonstrated and categorized major types of confusion and provided rationale regarding the observed error rates. In future research, it is worthwhile to explore the effect of different fonts and distortion simulations on the accuracy rates. In the context of human-machine interaction, convolutional neural networks provide substantial improvement to machine detection and recognition success rates even in challenging environments, as demonstrated in this paper. As the industry 4.0 revolution progresses it can be expected that, in the future, workers will rarely be required to manually label and detect objects by themselves; they will instead be required to make high-level decisions to ensure that the application has the optimal settings for a specific scenario [1]. [5] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition [6] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, Reading text in the wild with convolutional neural networks, International Journal of Computer Vision (2016): [7] J. Köhler, A. Pagani, and D. Stricker, Detection and identification techniques for markers used in computer vision, OASIcs-OpenAccess Series in Informatics. Vol. 19. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Advances in neural information processing systems [9] Y. Liu and H. Huang, Car plate character recognition using a convolutional neural network with shared hidden layers, Chinese Automation Congress (CAC), IEEE, 2015 [10] L. Naimark and E. Foxlin, Circular data matrix fiducial system and robust image processing for a wearable vision-inertial self-tracker. In ISMAR 02: Proceedings of the 1st International Symposium on Mixed and Augmented Reality, page 27. IEEE Computer Society, [11] E. Olson, AprilTag: a robust and flexible visual fiducial system, Robotics and Automation (ICRA), 2011 IEEE International Conference on. IEEE, [12] S. Radzi and M. Khalil-Hani, Character recognition of license plate number using convolutional neural network, Visual Informatics: Sustaining Research and Innovations (2011): [13] A. C. Rice, R. Harle, and A. R. Beresford, Analysing fundamental properties of marker-based vision system designs," Pervasive and Mobile Computing 2.4 (2006): [14] D. Wave, Quick response specification, REFERENCES [1] F. Ansari and U. Seidenberg, A portfolio for optimal coolaboration of human and cyber physical production systems in problem-solving, CELDA: 311 [2] M. Fiala, ARTag, a fiducial marker system using digital techniques, Computer Vision and Pattern Recognition, CVPR IEEE Computer Society Conference on. Vol. 2. IEEE, [3] M. Fiala, Designing highly reliable fiducial markers, IEEE Trans. Pattern Anal. Mach. Intell.32 (7) (2010) [4] R. Girshick, Fast R-CNN, Proceedings of the IEEE International Conference on Computer Vision

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Handwritten Hindi Numerals Recognition System

Handwritten Hindi Numerals Recognition System CS365 Project Report Handwritten Hindi Numerals Recognition System Submitted by: Akarshan Sarkar Kritika Singh Project Mentor: Prof. Amitabha Mukerjee 1 Abstract In this project, we consider the problem

More information

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University. Visualizing and Understanding Convolutional Networks Christopher Pennsylvania State University February 23, 2015 Some Slide Information taken from Pierre Sermanet (Google) presentation on and Computer

More information

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm Group 1: Mina A. Makar Stanford University mamakar@stanford.edu Abstract In this report, we investigate the application of the Scale-Invariant

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material

DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material DeepIM: Deep Iterative Matching for 6D Pose Estimation - Supplementary Material Yi Li 1, Gu Wang 1, Xiangyang Ji 1, Yu Xiang 2, and Dieter Fox 2 1 Tsinghua University, BNRist 2 University of Washington

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Pose estimation using a variety of techniques

Pose estimation using a variety of techniques Pose estimation using a variety of techniques Keegan Go Stanford University keegango@stanford.edu Abstract Vision is an integral part robotic systems a component that is needed for robots to interact robustly

More information

Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques

Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques Text Information Extraction And Analysis From Images Using Digital Image Processing Techniques Partha Sarathi Giri Department of Electronics and Communication, M.E.M.S, Balasore, Odisha Abstract Text data

More information

DESIGNING A REAL TIME SYSTEM FOR CAR NUMBER DETECTION USING DISCRETE HOPFIELD NETWORK

DESIGNING A REAL TIME SYSTEM FOR CAR NUMBER DETECTION USING DISCRETE HOPFIELD NETWORK DESIGNING A REAL TIME SYSTEM FOR CAR NUMBER DETECTION USING DISCRETE HOPFIELD NETWORK A.BANERJEE 1, K.BASU 2 and A.KONAR 3 COMPUTER VISION AND ROBOTICS LAB ELECTRONICS AND TELECOMMUNICATION ENGG JADAVPUR

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS Cognitive Robotics Original: David G. Lowe, 004 Summary: Coen van Leeuwen, s1460919 Abstract: This article presents a method to extract

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

Critique: Efficient Iris Recognition by Characterizing Key Local Variations

Critique: Efficient Iris Recognition by Characterizing Key Local Variations Critique: Efficient Iris Recognition by Characterizing Key Local Variations Authors: L. Ma, T. Tan, Y. Wang, D. Zhang Published: IEEE Transactions on Image Processing, Vol. 13, No. 6 Critique By: Christopher

More information

Weighted Convolutional Neural Network. Ensemble.

Weighted Convolutional Neural Network. Ensemble. Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com

More information

Segmentation Framework for Multi-Oriented Text Detection and Recognition

Segmentation Framework for Multi-Oriented Text Detection and Recognition Segmentation Framework for Multi-Oriented Text Detection and Recognition Shashi Kant, Sini Shibu Department of Computer Science and Engineering, NRI-IIST, Bhopal Abstract - Here in this paper a new and

More information

Training Convolutional Neural Networks for Translational Invariance on SAR ATR

Training Convolutional Neural Networks for Translational Invariance on SAR ATR Downloaded from orbit.dtu.dk on: Mar 28, 2019 Training Convolutional Neural Networks for Translational Invariance on SAR ATR Malmgren-Hansen, David; Engholm, Rasmus ; Østergaard Pedersen, Morten Published

More information

Application of Geometry Rectification to Deformed Characters Recognition Liqun Wang1, a * and Honghui Fan2

Application of Geometry Rectification to Deformed Characters Recognition Liqun Wang1, a * and Honghui Fan2 6th International Conference on Electronic, Mechanical, Information and Management (EMIM 2016) Application of Geometry Rectification to Deformed Characters Liqun Wang1, a * and Honghui Fan2 1 School of

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

MORPH-II: Feature Vector Documentation

MORPH-II: Feature Vector Documentation MORPH-II: Feature Vector Documentation Troy P. Kling NSF-REU Site at UNC Wilmington, Summer 2017 1 MORPH-II Subsets Four different subsets of the MORPH-II database were selected for a wide range of purposes,

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Photo-realistic Renderings for Machines Seong-heum Kim

Photo-realistic Renderings for Machines Seong-heum Kim Photo-realistic Renderings for Machines 20105034 Seong-heum Kim CS580 Student Presentations 2016.04.28 Photo-realistic Renderings for Machines Scene radiances Model descriptions (Light, Shape, Material,

More information

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg Human Detection A state-of-the-art survey Mohammad Dorgham University of Hamburg Presentation outline Motivation Applications Overview of approaches (categorized) Approaches details References Motivation

More information

Deep Neural Networks:

Deep Neural Networks: Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,

More information

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro

Ryerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro Ryerson University CP8208 Soft Computing and Machine Intelligence Naive Road-Detection using CNNS Authors: Sarah Asiri - Domenic Curro April 24 2016 Contents 1 Abstract 2 2 Introduction 2 3 Motivation

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation

Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial Region Segmentation IJCSNS International Journal of Computer Science and Network Security, VOL.13 No.11, November 2013 1 Moving Object Segmentation Method Based on Motion Information Classification by X-means and Spatial

More information

Visual object classification by sparse convolutional neural networks

Visual object classification by sparse convolutional neural networks Visual object classification by sparse convolutional neural networks Alexander Gepperth 1 1- Ruhr-Universität Bochum - Institute for Neural Dynamics Universitätsstraße 150, 44801 Bochum - Germany Abstract.

More information

Finding Tiny Faces Supplementary Materials

Finding Tiny Faces Supplementary Materials Finding Tiny Faces Supplementary Materials Peiyun Hu, Deva Ramanan Robotics Institute Carnegie Mellon University {peiyunh,deva}@cs.cmu.edu 1. Error analysis Quantitative analysis We plot the distribution

More information

FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU

FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU 1. Introduction Face detection of human beings has garnered a lot of interest and research in recent years. There are quite a few relatively

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Rotation Invariance Neural Network

Rotation Invariance Neural Network Rotation Invariance Neural Network Shiyuan Li Abstract Rotation invariance and translate invariance have great values in image recognition. In this paper, we bring a new architecture in convolutional neural

More information

Scene Text Detection Using Machine Learning Classifiers

Scene Text Detection Using Machine Learning Classifiers 601 Scene Text Detection Using Machine Learning Classifiers Nafla C.N. 1, Sneha K. 2, Divya K.P. 3 1 (Department of CSE, RCET, Akkikkvu, Thrissur) 2 (Department of CSE, RCET, Akkikkvu, Thrissur) 3 (Department

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Evaluation

More information

TEXT SEGMENTATION ON PHOTOREALISTIC IMAGES

TEXT SEGMENTATION ON PHOTOREALISTIC IMAGES TEXT SEGMENTATION ON PHOTOREALISTIC IMAGES Valery Grishkin a, Alexander Ebral b, Nikolai Stepenko c, Jean Sene d Saint Petersburg State University, 7 9 Universitetskaya nab., Saint Petersburg, 199034,

More information

Convolutional Neural Networks for Facial Expression Recognition

Convolutional Neural Networks for Facial Expression Recognition Convolutional Neural Networks for Facial Expression Recognition Shima Alizadeh Stanford University shima86@stanford.edu Azar Fazel Stanford University azarf@stanford.edu Abstract In this project, we have

More information

Identifying and Reading Visual Code Markers

Identifying and Reading Visual Code Markers O. Feinstein, EE368 Digital Image Processing Final Report 1 Identifying and Reading Visual Code Markers Oren Feinstein, Electrical Engineering Department, Stanford University Abstract A visual code marker

More information

ARTag, a fiducial marker system using digital techniques

ARTag, a fiducial marker system using digital techniques ARTag, a fiducial marker system using digital techniques Mark Fiala National Research Council of Canada NRC 1200 Montreal RD, Ottawa, Canada K1A-0R6 mark.fiala@nrc-cnrc.gc.ca Abstract Fiducial marker systems

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting

More information

Deep Face Recognition. Nathan Sun

Deep Face Recognition. Nathan Sun Deep Face Recognition Nathan Sun Why Facial Recognition? Picture ID or video tracking Higher Security for Facial Recognition Software Immensely useful to police in tracking suspects Your face will be an

More information

Vision. OCR and OCV Application Guide OCR and OCV Application Guide 1/14

Vision. OCR and OCV Application Guide OCR and OCV Application Guide 1/14 Vision OCR and OCV Application Guide 1.00 OCR and OCV Application Guide 1/14 General considerations on OCR Encoded information into text and codes can be automatically extracted through a 2D imager device.

More information

Wide area tracking method for augmented reality supporting nuclear power plant maintenance work

Wide area tracking method for augmented reality supporting nuclear power plant maintenance work Journal of Marine Science and Application, Vol.6, No.1, January 2006, PP***-*** Wide area tracking method for augmented reality supporting nuclear power plant maintenance work ISHII Hirotake 1, YAN Weida

More information

Fuzzy Set Theory in Computer Vision: Example 3

Fuzzy Set Theory in Computer Vision: Example 3 Fuzzy Set Theory in Computer Vision: Example 3 Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Purpose of these slides are to make you aware of a few of the different CNN architectures

More information

[10] Industrial DataMatrix barcodes recognition with a random tilt and rotating the camera

[10] Industrial DataMatrix barcodes recognition with a random tilt and rotating the camera [10] Industrial DataMatrix barcodes recognition with a random tilt and rotating the camera Image processing, pattern recognition 865 Kruchinin A.Yu. Orenburg State University IntBuSoft Ltd Abstract The

More information

I. INTRODUCTION. Figure-1 Basic block of text analysis

I. INTRODUCTION. Figure-1 Basic block of text analysis ISSN: 2349-7637 (Online) (RHIMRJ) Research Paper Available online at: www.rhimrj.com Detection and Localization of Texts from Natural Scene Images: A Hybrid Approach Priyanka Muchhadiya Post Graduate Fellow,

More information

Neural Network Neurons

Neural Network Neurons Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given

More information

Convolutional Neural Networks

Convolutional Neural Networks Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

More information

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural

More information

[2006] IEEE. Reprinted, with permission, from [Wenjing Jia, Huaifeng Zhang, Xiangjian He, and Qiang Wu, A Comparison on Histogram Based Image

[2006] IEEE. Reprinted, with permission, from [Wenjing Jia, Huaifeng Zhang, Xiangjian He, and Qiang Wu, A Comparison on Histogram Based Image [6] IEEE. Reprinted, with permission, from [Wenjing Jia, Huaifeng Zhang, Xiangjian He, and Qiang Wu, A Comparison on Histogram Based Image Matching Methods, Video and Signal Based Surveillance, 6. AVSS

More information

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Supplementary material for Analyzing Filters Toward Efficient ConvNet Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable

More information

arxiv:submit/ [cs.cv] 13 Jan 2018

arxiv:submit/ [cs.cv] 13 Jan 2018 Benchmark Visual Question Answer Models by using Focus Map Wenda Qiu Yueyang Xianzang Zhekai Zhang Shanghai Jiaotong University arxiv:submit/2130661 [cs.cv] 13 Jan 2018 Abstract Inferring and Executing

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Computational Foundations of Cognitive Science

Computational Foundations of Cognitive Science Computational Foundations of Cognitive Science Lecture 16: Models of Object Recognition Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk February 23, 2010 Frank Keller Computational

More information

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit Augmented Reality VU Computer Vision 3D Registration (2) Prof. Vincent Lepetit Feature Point-Based 3D Tracking Feature Points for 3D Tracking Much less ambiguous than edges; Point-to-point reprojection

More information

A Generalized Method to Solve Text-Based CAPTCHAs

A Generalized Method to Solve Text-Based CAPTCHAs A Generalized Method to Solve Text-Based CAPTCHAs Jason Ma, Bilal Badaoui, Emile Chamoun December 11, 2009 1 Abstract We present work in progress on the automated solving of text-based CAPTCHAs. Our method

More information

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing

Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing Neural Networks for unsupervised learning From Principal Components Analysis to Autoencoders to semantic hashing feature 3 PC 3 Beate Sick Many slides are taken form Hinton s great lecture on NN: https://www.coursera.org/course/neuralnets

More information

arxiv: v1 [cs.cv] 11 Aug 2017

arxiv: v1 [cs.cv] 11 Aug 2017 Augmentor: An Image Augmentation Library for Machine Learning arxiv:1708.04680v1 [cs.cv] 11 Aug 2017 Marcus D. Bloice Christof Stocker marcus.bloice@medunigraz.at stocker.christof@gmail.com Andreas Holzinger

More information

Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks *

Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks * Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks * Tee Connie *, Mundher Al-Shabi *, and Michael Goh Faculty of Information Science and Technology, Multimedia University,

More information

Using Machine Learning for Classification of Cancer Cells

Using Machine Learning for Classification of Cancer Cells Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.

More information

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers A. Salhi, B. Minaoui, M. Fakir, H. Chakib, H. Grimech Faculty of science and Technology Sultan Moulay Slimane

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

Janitor Bot - Detecting Light Switches Jiaqi Guo, Haizi Yu December 10, 2010

Janitor Bot - Detecting Light Switches Jiaqi Guo, Haizi Yu December 10, 2010 1. Introduction Janitor Bot - Detecting Light Switches Jiaqi Guo, Haizi Yu December 10, 2010 The demand for janitorial robots has gone up with the rising affluence and increasingly busy lifestyles of people

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information

Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Fully Automatic Methodology for Human Action Recognition Incorporating Dynamic Information Ana González, Marcos Ortega Hortas, and Manuel G. Penedo University of A Coruña, VARPA group, A Coruña 15071,

More information

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015

CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 CEA LIST s participation to the Scalable Concept Image Annotation task of ImageCLEF 2015 Etienne Gadeski, Hervé Le Borgne, and Adrian Popescu CEA, LIST, Laboratory of Vision and Content Engineering, France

More information

CNN-based Human Body Orientation Estimation for Robotic Attendant

CNN-based Human Body Orientation Estimation for Robotic Attendant Workshop on Robot Perception of Humans Baden-Baden, Germany, June 11, 2018. In conjunction with IAS-15 CNN-based Human Body Orientation Estimation for Robotic Attendant Yoshiki Kohari, Jun Miura, and Shuji

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks

Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks Ruwan Tennakoon, Reza Hoseinnezhad, Huu Tran and Alireza Bab-Hadiashar School of Engineering, RMIT University, Melbourne,

More information

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Convolution Neural Network for Traditional Chinese Calligraphy Recognition Convolution Neural Network for Traditional Chinese Calligraphy Recognition Boqi Li Mechanical Engineering Stanford University boqili@stanford.edu Abstract script. Fig. 1 shows examples of the same TCC

More information

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,

More information

Conspicuous Character Patterns

Conspicuous Character Patterns Conspicuous Character Patterns Seiichi Uchida Kyushu Univ., Japan Ryoji Hattori Masakazu Iwamura Kyushu Univ., Japan Osaka Pref. Univ., Japan Koichi Kise Osaka Pref. Univ., Japan Shinichiro Omachi Tohoku

More information

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN 2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2 Face Recognition Using Vector Quantization Histogram and Support Vector Machine

More information

Mobile Camera Based Calculator

Mobile Camera Based Calculator Mobile Camera Based Calculator Liwei Wang Jingyi Dai Li Du Department of Electrical Engineering Department of Electrical Engineering Department of Electrical Engineering Stanford University Stanford University

More information

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning Justin Chen Stanford University justinkchen@stanford.edu Abstract This paper focuses on experimenting with

More information

CSE 559A: Computer Vision

CSE 559A: Computer Vision CSE 559A: Computer Vision Fall 2018: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao Xia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/

More information

Learning visual odometry with a convolutional network

Learning visual odometry with a convolutional network Learning visual odometry with a convolutional network Kishore Konda 1, Roland Memisevic 2 1 Goethe University Frankfurt 2 University of Montreal konda.kishorereddy@gmail.com, roland.memisevic@gmail.com

More information

arxiv: v1 [cs.cv] 28 Sep 2018

arxiv: v1 [cs.cv] 28 Sep 2018 Camera Pose Estimation from Sequence of Calibrated Images arxiv:1809.11066v1 [cs.cv] 28 Sep 2018 Jacek Komorowski 1 and Przemyslaw Rokita 2 1 Maria Curie-Sklodowska University, Institute of Computer Science,

More information

Texture Sensitive Image Inpainting after Object Morphing

Texture Sensitive Image Inpainting after Object Morphing Texture Sensitive Image Inpainting after Object Morphing Yin Chieh Liu and Yi-Leh Wu Department of Computer Science and Information Engineering National Taiwan University of Science and Technology, Taiwan

More information

Rapid Natural Scene Text Segmentation

Rapid Natural Scene Text Segmentation Rapid Natural Scene Text Segmentation Ben Newhouse, Stanford University December 10, 2009 1 Abstract A new algorithm was developed to segment text from an image by classifying images according to the gradient

More information

Seminars in Artifiial Intelligenie and Robotiis

Seminars in Artifiial Intelligenie and Robotiis Seminars in Artifiial Intelligenie and Robotiis Computer Vision for Intelligent Robotiis Basiis and hints on CNNs Alberto Pretto What is a neural network? We start from the frst type of artifcal neuron,

More information

arxiv: v3 [cs.cv] 2 Jun 2017

arxiv: v3 [cs.cv] 2 Jun 2017 Incorporating the Knowledge of Dermatologists to Convolutional Neural Networks for the Diagnosis of Skin Lesions arxiv:1703.01976v3 [cs.cv] 2 Jun 2017 Iván González-Díaz Department of Signal Theory and

More information

Chapter 3 Image Registration. Chapter 3 Image Registration

Chapter 3 Image Registration. Chapter 3 Image Registration Chapter 3 Image Registration Distributed Algorithms for Introduction (1) Definition: Image Registration Input: 2 images of the same scene but taken from different perspectives Goal: Identify transformation

More information

Deep Automatic Licence Plate Recognition system

Deep Automatic Licence Plate Recognition system Deep Automatic Licence Plate Recognition system Vishal Jain vjain20687@gmail.com Soma Biswas soma.biswas@ee.iisc. ernet.in Zitha Sasindran zithasasindran@gmail.com Harish S Bharadwaj harishsb9490@gmail.com

More information

Yudistira Pictures; Universitas Brawijaya

Yudistira Pictures; Universitas Brawijaya Evaluation of Feature Detector-Descriptor for Real Object Matching under Various Conditions of Ilumination and Affine Transformation Novanto Yudistira1, Achmad Ridok2, Moch Ali Fauzi3 1) Yudistira Pictures;

More information

An ICA based Approach for Complex Color Scene Text Binarization

An ICA based Approach for Complex Color Scene Text Binarization An ICA based Approach for Complex Color Scene Text Binarization Siddharth Kherada IIIT-Hyderabad, India siddharth.kherada@research.iiit.ac.in Anoop M. Namboodiri IIIT-Hyderabad, India anoop@iiit.ac.in

More information