Vision based indoor object detection for a drone

Size: px
Start display at page:

Download "Vision based indoor object detection for a drone"

Transcription

1 EXAMENSARBETE INOM TEKNIKOMRÅDET TEKNISK FYSIK OCH HUVUDOMRÅDET DATALOGI OCH DATATEKNIK, AVANCERAD NIVÅ, 30 HP STOCKHOLM, SVERIGE 2017 Vision based indoor object detection for a drone LINNEA GRIP KTH SKOLAN FÖR DATAVETENSKAP OCH KOMMUNIKATION

2 Vision based indoor object detection for a drone LINNEA GRIP Master in Computer Science Date: May 31, 2017 Supervisor: Patric Jensfelt Examiner: Hedvig Kjellström Swedish title: Bildbaserad detektion av inomhusobjekt för drönare School of Computer Science and Communication

3 i Abstract Drones are a very active area of research and object detection is a crucial part in achieving full autonomy of any robot. We investigated how state-of-the-art object detection algorithms perform on image data from a drone. For the evaluation we collected a number of datasets in an indoor office environment with different cameras and camera placements. We surveyed the literature of object detection and selected to research the algorithm R-FCN (Region based Fully Convolutional Network) for the evaluation. The performances on the different datasets were then compared, showing that using footage from a drone may be advantageous in scenarios where the goal is to detect as many objects as possible. Further, it was shown that the network, even if trained on normal angled images, can be used for detecting objects in fish eye images and that usage of a fish eye camera can increase the total number of detected objects in a scene.

4 ii Sammanfattning Drönare är ett mycket aktivt forskningsområde och objektigenkänning är en viktig del för att uppnå full självstyrning för robotar. Vi undersökte hur dagens bästa objektigenkänningsalgoritmer presterar på bilddata från en drönare. Vi gjorde en literatturstudie och valde att undersöka algoritmen R-FCN (Region based Fully Convolutional Network). För att evaluera algoritmen spelades flera dataset in i en kontorsmiljö med olika kameror och kameraplaceringar. Prestandan på de olika dataseten jämfördes sedan och det visades att användningen av bilder från en drönare kan vara fördelaktig då målet är att hitta så många objekt som möjligt. Vidare visades att nätverket, även om det är tränat på bilder från en vanlig kamera, kan användas för att hitta objekt i vidvinklade bilder och att användningen av en vidvinkelkamera kan öka det totala antalet detekterade objekt i en scen.

5 Contents Contents iii 1 Introduction Research Question and Hypotheses Limitations Report Outline Background Convolutional Neural Networks Other object detection methods Common Datasets Metrics Related work Drones Object detection The Object Detection Algorithm 11 5 Method Experiment Design Evaluation Experiments Fish Eye Camera Experimental Setup Results Analysis Distance to Objects Experimental Setup Results Analysis Camera Angle Experimental Setup Results Analysis iii

6 iv CONTENTS 7 Real Drone 29 8 Summary and Discussion Conclusions Error Sources Connection to Other Research Future Work Summary Bibliography 34 A Social Aspects 37 A.1 Sustainability A.2 Ethics A.3 Society

7 Chapter 1 Introduction Object detection is important for reaching higher level autonomy for robots. It is a very active area of research in robotics, applied computer vision and machine learning. Unmanned Aerial Vehicles (UAVs), or drones are being used more and more as robotic platforms. It is of interest to see how to make use of methods that have been developed in computer vision and machine learning and used for other robot embodiments on drones. The objective of this degree project is to determine how an existing object detection method can be used on image data from a drone. It will examine whether the flexibility of a drone compared to that of a traditional ground based robot can be used to improve the performance of object detection, assuming that a drone is indeed more flexible. For example, as a drone can move closer to objects than a wheeled robot can it may be possible to detect more small objects in data from a drone. Here, objects that can easily be held in one hand such as cups, cell phones and bottles are counted as small. When a drone navigates a building in search for objects, it is of interest for the drone to be able to view as much of its surroundings as possible. To achieve a large field of view the camera could be mounted on a tilting mechanism on the drone. This requires to put on more weight on the drone and to avoid this a wide angle (fish eye) camera is used instead. However, images taken by a fish eye camera are distorted and quite different from images taken by a normal camera. Therefor, it cannot be assumed that object detection algorithms normally used on "normal" images perform well on fish eye images. Part of the study is to investigate the performance of algorithms widely used on normal images on fish eye images. Previous works ([1],[2]) stress that the images captured by a drone often is different from those available for training, which are often taken by a hand held camera. Difficulties in detecting objects in data from a drone may arise due to the positioning of the camera compared to in images taken by a human, depending on what type of images the network is trained on. Therefor, different ways of positioning the drone and the camera with respect to objects will be evaluated. 1

8 2 CHAPTER 1. INTRODUCTION 1.1 Research Question and Hypotheses Can the performance of an object detection algorithm in an indoor scene be maintained and/or improved using the flexibility of a drone as compared to a ground based robot and if so, how? After a literature study the algorithm that is currently best suited for indoor detection of objects is chosen. The chosen algorithm is then evaluated on different data sets in order to determine whether there are any benefits and/or drawbacks in using data acquired by a drone instead of by a wheeled robot when trying to detect objects in an indoor scene, what type of camera to use and how to take advantage of the flexibility of the drone. Several hypotheses will be addressed, including the following. 1. The chosen algorithm can be used on image data acquired by a drone. 2. The chosen algorithm, trained on images from a normal camera, can be used to some extent on images from a fish eye camera. 3. More objects can be detected in data from a fish eye camera than from a normal camera, because of the larger field of view. 4. More objects can be detected from a closer viewpoint. 1.2 Limitations It will be assumed that a drone equipped with a RGB camera sends a continuous stream of images to a computer which then performs computations off board the drone. It is not part of the project to perform "light weight" object detection on board the drone. The drone will navigate (not part of the project) an indoor, office-like, environment and encounter and try to detect objects. It is expected to be able to detect objects such as chairs, screens and people. However, also detection of smaller object such as mugs and cell phones will be attempted. 1.3 Report Outline In Chapter 2 relevant theory of object detection is outlined. Chapter 3 touches on important works that have been made in the areas of object detection as well as drones. Chapter 4 describes briefly the algorithm used for detecting objects throughout the project. The general method used for performing experiments and evaluating performance of the object detection algorithm on different datasets is described in Chapter 5. The three sections of Chapter 6 each present an experiment. They contain first a brief motivation of why the experiment was important, then a description of the experimental setup, the results obtained and lastly a short analysis of the results. These three experiments were carried out without using a real drone. Chapter 7 then show data acquired by a camera mounted on a real, flying drone, and the detections as predicted by the algorithm.

9 CHAPTER 1. INTRODUCTION 3 The results are discussed in Chapter 8, which also proposes future work. In particular, Section 8.5 contains a summary of the report and appendix A presents a brief discussion about social aspects of using drones and object detection.

10 Chapter 2 Background In this chapter important theory and concepts that may not be common knowledge is explained. Object detection entails detecting instances of predefined object classes in images. The object instances should also be localized using a so called bounding box, a box containing the object in the image. There are many ways of performing object detection, each method with different strengths and weaknesses. 2.1 Convolutional Neural Networks Convolutional Neural Networks (CNNs) are special types of Neural Networks that are especially well designed for usage on images. This allows for optimizing the architecture so that the amount of parameters of the network can be reduced, compared to a regular Neural Network, and the method made more efficient [3]. A CNN generally consists of two main parts; convolutional layers followed by fully connected layers. CNNs are trained end-to-end, that is, from pixels to final classification without needing to introduce any particular feature extractor which make CNNs a good choice for various general object detection tasks. However, training a CNN requires very large sets of images compared to other object detection methods [4]. There are several CNN based methods available and state-of-the-art object detection of today builds on CNNs, as will be described in Chapter 3. Convolutional layers The convolutional layers of a CNN performs a sliding window operation and outputs feature maps. Each convolutional layer of a CNN represents a certain type of feature and each corresponding output feature map is a spatial activation image where the strongest responses to the feature of interest are indicated. For example, a convolutional layer applied on an image of a box could output a feature map showing strong activations on the positions of the corners of the box. In Figure 2.1, the depth of the dotted box represents the number of these feature layers. In the learning process the weights of the convolu- 4

11 CHAPTER 2. BACKGROUND 5 tional layers are tuned so that features that minimize prediction errors more are taken into larger account than less helpful features. The convolutional layers do not require any specific image size. Figure 2.1: Schematic figure of a CNN. Figure copied from [3]. Fully-connected layers Fully-connected layers are often added on top of the convolutional layers to perform the actual classification, since the outputted feature maps of the convolutional layers are still low level. The fully-connected layers are built in the same way as regular Neural Networks and have full connectivity since all neurons of each layer are connected to all outputs from the previous layer. Often, fully-connected layers consist of regular Neural Networks or other classifiers, such as support vector machines (e.g. [5],[6]). These layers take the feature maps as input and classifies objects in the image depending on the activated features in the feature maps. The fully-connected layers require fixed size input vectors - a property that used to cause problems when images were of differing sizes (e.g. [5],[7]) but have now been addressed (e.g. [6],[8]), as will be mentioned in Chapter 3. Region Proposals Many object detection methods of today rely on some type of region proposal algorithm, which can be integrated with (e.g. [8],[9]) or separated from (e.g. [5],[7],[6]) the CNN itself. The region proposal algorithms suggest regions, or bounding boxes, in the images likely to contain objects so that the rest of the computations (classification and finer localization) can be made only in these probable regions. 2.2 Other object detection methods There are several ways, apart from CNNs, to perform object detection. The different methods have different strengths and weaknesses, such as different computation time, accuracy or performance on different types of objects. For example, methods based on HOG [10] or SIFT [11] may be more suitable for on-board classification (for example on a drone) because it requires less memory and works on a CPU. However, as of today CNNs are the primary approach to most object detection problems [12] with outstanding performance.

12 6 CHAPTER 2. BACKGROUND 2.3 Common Datasets There are several widely used datasets in the object detection community to train and evaluate performance of different methods and networks on standard images. Some of the largest are the 20 category dataset PASCAL Visual Object Classes (VOC) challenge [13], ImageNet [14] with millions of classified images and at least one million images with corresponding bounding boxes and Microsoft COCO [15] which presents a dataset of more than 300,000 images and 80 labeled categories, including smaller objects such as fruit, cell phones and computer mouses in natural, everyday scenes. 2.4 Metrics There are some common methods for measuring performance of object detection. Intersection over Union Introduced in [13], the Intersection over Union (IoU) is a metric commonly used in object detection for evaluating correctness of a bounding box. IoU is computed by IoU = Intersection area Union area where the intersection area is the area of the intersection between the predicted bounding box and the true bounding box (their overlap). Similarly the union area is the union of the two. A predicted bounding box close to the true bounding box yields an IoU close to 1. Precision and Recall Precision of a classifier on a dataset is defined as the number of true positives over the total number of detected positives, that is Precision = (2.1) Number of true positives Number of true positives + Number of false positives. (2.2) Here, a true positive is a detection of an instance that is actually present in the image. A false positive is a detection of an instance that is not present in the image. That is, the number of true positives is the number of objects correctly classified as a certain class and the number of false positives is the number of objects incorrectly classified as that certain class. When no false positives are detected the precision is 1. The precision is then 1 regardless of whether there are any true positives. A precision of 1 means that all detected objects were true, but doesn t say anything about how many actually existing objects were not detected. In the same way, the recall of a classifier on a dataset is defined as the number of true positives over the true number of instances, that is Recall = Number of true detected positives Number of true detected positives + Number of false not detected negatives. (2.3)

13 CHAPTER 2. BACKGROUND 7 Here, a false negative is an instance of an object that is present in the image but not detected. When there are no false negatives the recall is 1, regardless of whether there are any true positives or not. A recall of 1 only means that no objects that should have been detected were left out and doesn t say anything about the quality of the actual predictions made. It is desirable to maximize both precision and recall, so that few instances are wrongly classified while at the same time few instances that should have been classified are left out. F1 score The F1 score is a way to summarize precision and recall in one number to evaluate the overall performance of a classifier. The F1 score is defined as F 1 = 2 precision recall precision + recall. (2.4) Mean Average Precision Average precision is related to the area under a precision-recall curve for a category, that is, precision plotted to recall. It is desirable for this area to be large in order for precision and recall to be maximized. The mean Average Precision (map) is the Average Precision averaged over all class categories in a dataset and is a common way of evaluating how well an object detection method performs.

14 Chapter 3 Related work In this chapter previous work related to the project is briefly described. First, research related to drones and how computer vision has been used on drones is surveyed. Secondly, research in the area of object detection is described, followed by a short description of fine tuning of a CNN. 3.1 Drones Drones are platforms capable of flying, e.g. small unmanned helicopters. A drone, as other robots, can be programmed to different levels of autonomy, from being radio controlled to being fully autonomous. To achieve full autonomy a well developed navigation and perception system is required. Drones are very flexible compared to ground based robots, as they can fly over and around things and thus view objects from a larger variety of angles. However, there is a limitation as to how much weight one can put on a drone which in turn limits the number of sensors, the on board computational power and so on. However, data can be streamed to a larger computer and processed there. In 2014 imagery from a drone was used to count animals in images of natural environments [1]. They used imagery taken from high altitude ( meters) with a skewed angle compared to "human" photos, which are usually taken from an altitude of about 1-2 meters from the front. Since their goal was to perform object detection on board the drone GPU-requiring CNN methods were not applicable at the time and a HOG [10] based method was used. [1] stresses that most object detection algorithms are trained and tested on images taken from a "human" perspective, that is, from a certain height and angle, and can thus not be assumed to perform well on other types of images. Drones have also been used for tracking objects on the ground, as in [16] where color thresholding was used to detect a colored rectangle to follow. In this case, no classification of the object was made. Further, [2] used an RGB camera together with a heat camera to detect humans from on board a drone. They first found human-temperature silhouettes and then used a cascade of boosted classifiers with Haar-like features on the RGB image of the corresponding position to ensure the presence of a human. Also here it is stressed that the images of in- 8

15 CHAPTER 3. RELATED WORK 9 terest are very different from images generally used in computer vision (with a "human" perspective) since they are taken from a large hight and thus a skewed angle. 3.2 Object detection Already in 1989 the first deep learning approach to object detection was proposed in [17] where supervised back-propagation networks were used to detect hand written digits in zip codes. However, until the year 2012 methods based on feature extraction such as SIFT [11] and HOG [10] were in focus and performance on the PASCAL VOC challenge improved slowly. In 2012, [18] reintroduced the usage of Convolutional Neural Networks in object detection and won the ImageNet Large-Scale Visual Recognition Challenge [14] with their network called AlexNet. This was the starting point for a lot more research on CNNs in object detection. [5] combined AlexNet with region proposals in 2013 (they used Selective Search [19]) and thus improved performance on PASCAL VOC significantly (from previous best result of 35.1% map [19] to 53.7% map). The method was named R-CNN (Regions with CNN features) since it is based on first generating region proposals for the input image and then extracting a feature vector for each proposed region using a CNN. Lastly each region is classified using a Support Vector Machine (SVM). In 2014, [12] used the features extracted by a CNN called overfeat [4] in various recognition tasks such as image classification and scene recognition. They achieved astounding results compared to current state-of-the-art methods in all tasks on various datasets, including PASCAL VOC [13], and thus showed that deep learning with CNNs should be considered the primary approach in any visual recognition task. Spatial Pyramid Pooling networks (SPPnets [6]) took on the problem of earlier CNNs requiring fixed sized input images in 2015 by adding a SPP layer between the last convolutional layer and the first fully-connected layers. In this way, the need to crop or warp images in order to run them through a CNN was eliminated. Further, SPPnets speed up R- CNN by sharing computation across regions. That is, in SPPnets the features of an image are computed only once instead of separately for each region of interest. SPPnets proved to be x faster than R-CNN and to perform better or comparable [6]. Also in 2015, Fast R-CNN [7] improved the work of R-CNN [5] further by proposing a network that can simultaneously be trained to classify objects and to tune their spatial locations - leading to a significant increase in training speed (9x faster than R-CNN [5] and 3x faster than SPPnet [6]) while also achieving better accuracy on PASCAL VOC (66% map). ResNet [20] introduced a deep residual learning framework in the end of 2015 which allowed networks to grow much deeper than before. They reformulated the network layers as learning residual functions with reference to inputs instead of learning unreferenced functions and showed that the residual mappings can be easier optimized than the original mappings. In 2016 the R-CNN algorithm was even further developed by integrating the fast R-CNN

16 10 CHAPTER 3. RELATED WORK [7] with a Region Proposal Network (RPN), resulting in faster R-CNN [9]. Until [9], the main bottleneck in object detection was the region proposals, which were often time consuming. The RPNs of [9] share convolutional layers with the object detection networks [7], [6] and simultaneously regress region bounds and the probability of the region to contain an object at each location on a grid of the image. Usage of RPNs ensure nearly cost free region proposal and also improved accuracy of the proposed regions. Later in 2016, [8] proposed a Region-based Fully Convolutional Network (R-FCN) which improved object detection performance by further centralizing the method. While the previous methods had, to different extents, performed some computations several times for different regions of the images R-FCN is fully convolutional with almost all computations shared across the whole image. Until today, R-FCN is considered a state-of-theart method for object detection and therefor the work of this project will be based on R- FCN.

17 Chapter 4 The Object Detection Algorithm According to the findings of the previous chapter R-FCN [8], being one of the best object detection frameworks of today with competitive accuracy and fast computations, is used in this project. The details of R-FCN can be found in the paper [8] but a brief overview of the architecture used is given here. Figure 4.1: Key idea of R-FCN for object detection. Figure copied from [8]. Figure 4.1 shows the overall architecture of R-FCN. The first "white box" consists of a backbone network, in this case ResNet-101 [20]. ResNet-101 is a residual network with 100 convolutional layers followed by a pooling and a fully connected classification layer. Here, the two last layers are removed and the 100 convolutional layers are used to compute feature maps. From these feature maps k k(c +1) position-sensitive score maps are computed (the last "plate" in Figure 4.1). Here C is the number of object categories (+1 for background) and k is the dimension of the position-sensitive score maps (3 3 in the figure). These score 11

18 12 CHAPTER 4. THE OBJECT DETECTION ALGORITHM maps are activated on a specific relative position to a certain object, for example top-left or right-bottom. For each object category there are k 2 score maps. An example showing how the position-sensitive score maps work is showed in Figure 4.2. Figure 4.2: Illustration of the position-sensitive score maps of R-FCN, with k = 3. The figure is copied from [8]. Simultaneously, Regions of Interest (RoIs) are extracted using the Region Proposal Network (RPN) of [9] and the same output feature maps. A pooling layer then generates C + 1 channel score maps for each RoI, using the information from the position-sensitive score maps. Finally, the categories and bounding boxes are computed using a Softmax function [21] and a box regression convolutional layer respectively. The network used in this project is pre-trained on a 80 class dataset from Microsoft COCO [15]. Several of the classes present in the dataset are "small", as defined in Chapter 1.

19 Chapter 5 Method The hypotheses stated in Section 1.1 are addressed in three different experiments. In this chapter, the general method of the experiments is described. Each experiments is described in more detail in Chapter Experiment Design In all of the experiments a hand held camera is used instead of a camera mounted on a flying drone. Not using a real drone facilitates the experiments greatly since controlling a drone is difficult. Further, images obtained by hand are assumed to be very similar to corresponding images that would have been obtained using a drone. In Chapter 7 detections made on images recorded from a real drone are displayed to show that this is true. The procedure of each experiment includes the following steps: 1. Record various image sequences and extract a number of images (between 20 and 25), equally spaced in time. 2. Manually annotate ground truth bounding boxes to the images. 3. Input the images to R-FCN and save the resulting bounding boxes. 4. Compare the bounding boxes from R-FCN with the ground truth bounding boxes. The evaluation method is described in Section 5.2. In step 1 between 20 and 25 images are extracted from the image sequences. In each of these images several object instances are generally present so that the total number of objects in each dataset is larger than the number of images. Two of the experiments are designed to directly address some of the hypotheses stated in Section 1.1. In one experiment the number of detected objects in three different datasets, one recorded with a normal camera, one recorded with a fish eye camera and one recorded with a fish eye camera and then rectified, are compared in order to determine with which type of camera most objects can be detected (hypotheses 2 and 3 in Section 1.1). 13

20 14 CHAPTER 5. METHOD In another experiment the number of detected objects in four different datasets recorded from different horizontal and vertical distances to a table with objects on it are compared in order to determine from what distance most objects can be detected (hypothesis 4 in Section 1.1). The third and last experiment compares the number of detected objects in three datasets recorded with different camera angles in order to determine how to mount the camera on the drone. 5.2 Evaluation The experiments of Chapter 6 each contain at least two different datasets. Performances of R-FCN on the different datasets are compared rather than defining a threshold for a "good" or "bad" performance. That is, since each experiment is designed to show in what way most objects can be detected, it is of more interest to see on which one of the datasets R-FCN performs better than to state whether it performs well on each individual dataset. To evaluate performance, precisions, recalls and F1 scores are computed both for individual class categories and as an average over all categories present in a dataset. The procedure of computing these values can be reviewed in Chapter 2. Further, since the goal is to detect as many objects as possible, as mentioned in Section 1.1, the total number of correctly detected objects as well as the total number of objects actually present in the images are counted for each dataset. In computing precision and recall what is considered a correct classification, or a true positive, needs to be defined. Here a IoU threshold of IoU > 0.5 for a true positive is used, as shown in Figure 5.1. This is the standard IoU threshold of PASCAL VOC [13], and is also used in for example [8] and [9]. (a) IoU > 0.5, positive. (b) IoU < 0.5, negative. Figure 5.1: Illustration of the IoU requirement for a true positive.

21 Chapter 6 Experiments This chapter contains three sections which each describe one experiment. They start with a short motivation of why the experiment was performed followed by a description of the experimental setup, the results and finally a short analysis of the results. 6.1 Fish Eye Camera The goal of this experiment was to show whether a network trained on non-fish eye images can be used on fish eye images with satisfactory results. To the best of the authors knowledge this has not been tested before and the results are used in the choice of camera to use in the remainder of the project Experimental Setup A fish eye camera (with a field of view close to 180 ) and a normal angled camera were mounted close to each other (the fish eye cameras lens about 3 cm above the normal camera lens) facing the same way, as shown in Figure 6.1. Figure 6.1: Figure describing the setup of the cameras used in the fish eye experiment. The circle with a F represents the lens of the fish eye camera and the circle with a N represents the lens of the normal camera. 15

22 16 CHAPTER 6. EXPERIMENTS Image sequences were recorded simultaneously with the two cameras walking around an office room and with various objects in it. A third image sequence was created with rectified versions of the fish eye images. 25 images, equally spaced in time, were extracted from each image sequence and input to the R-FCN. Since the goal of this experiment was to compare performances on the three dataset rather than to determine how "well" the network performs on a global scale this relatively small number of images was sufficient. Further, as mentioned in 5, each image generally contains more than one object so the number of objects in the datasets is larger than the number of images. The three datasets were also manually annotated with bounding boxes for the evaluation. Then, the performances on the three datasets were evaluated, comparing the annotated ground truths with the detection results from R-FCN for all datasets. (a) Example image from the normal camera. (b) Example image from the fish eye camera. (c) Example of a rectified image from the fish eye camera. Figure 6.2: Examples of the images used in the fish eye experiment. Example images from the three datasets can be seen in Figure 6.2. It can be seen that the image quality of the two cameras is not exactly the same. That is, comparing Figures 6.2a and 6.2b there are some differences other than the field of view. For example, Figure 6.2a is darker than Figure 6.2b and this fact may affect the detection performance slightly. However, also the training data [15] is from different cameras of varied quality and the differences in image quality should not affect the results too much Results Table 6.1 summarizes the results for all three datasets in the experiment. For both the normal angled camera, the fish eye camera and the rectified image of the fish eye camera the total number of ground truth instances and correct detections of all present classes and the averaged precision, recall and F1 score over all present object classes are shown. The average precision for the fish eye camera was 1.0 which means that there were no false detections in the dataset. Further, the average recall of the fish eye camera was lower than that of the normal camera which means that a larger fraction of the present objects were not detected. The F1 score, which summarizes precision and recall was slightly lower for the fish eye camera than for the normal camera, suggesting lower performance.

23 CHAPTER 6. EXPERIMENTS 17 Camera Number of Number of Number of Average Average Average ground truths correct incorrect precision recall F1 score detections detections Normal camera Fish eye camera Rectified image Table 6.1: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score averaged over all classes for one dataset recorded with a normal camera, one recorded with a fish eye camera and one with rectified images recorded with a fish eye camera. The lowest performance was that of the rectified image of the fish eye camera. Fewer objects were also detected on this dataset compared to the fish eye dataset. The total number of ground truths in the rectified dataset is lower than that of the fish eye dataset since some parts of the images are lost in the rectification process. The total number of correct detections was highest for the fish eye camera, nearly twice the number of correct detections in the normal camera dataset. (a) Example image from the normal camera with bounding boxes. (b) Example image from the fish eye camera with bounding boxes. (c) Example of a rectified image from the fish eye camera with bounding boxes. Figure 6.3: Examples of the bounding boxes generated by R-FCN in the fish eye experiment. Figure 6.3 shows examples of the bounding boxes found by R-FCN in the three datasets. Tables 6.2, 6.3 and 6.4 show the results of the fish eye experiment for each present class. They show that some object classes are more easily detected than others. For example, no knifes were detected in any of the datasets while many bottles, cups and keyboards were detected. This is probably because the distance and viewing angle was better suited (more similar to that of the training data) for the latter objects. Table 6.3 shows a precision of 1.0 for all classes which is because no false positives were detected in the dataset.

24 18 CHAPTER 6. EXPERIMENTS class apple banana bottle cell_phone chair cup diningtable fork keyboard knife mouse orange tvmonitor number of ground truths number of correct detections number of incorrect detections precision recall F1 score Table 6.2: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score for each class in a dataset recorded with a normal camera. class apple banana bottle cell_phone chair cup fork keyboard knife laptop mouse orange tvmonitor number of ground truths number of correct detections number of incorrect detections precision recall F1 score Table 6.3: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score for each class in a dataset recorded with a fish eye camera. class apple banana bottle cell_phone chair cup fork keyboard knife laptop mouse orange tvmonitor number of ground truths number of correct detections number of incorrect detections precision recall F1 score Table 6.4: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score for each class in a dataset recorded with a fish eye camera and then rectified. Comparing Tables 6.2, 6.3 and 6.4 there are some differences in what object categories are present. For example, only Table 6.2 contains the category "diningtable", but on the other hand does not contain the category "laptop". There are different reasons for these differences. The diningtable class is present in Table 6.2 because an incorrect detection of a diningtable was made on that dataset. The laptop class is not present because the laptop seen by the fish eye camera could not be seen by the normal camera (see Figures 6.2 and 6.3, where a laptop can be seen on the right hand side of the fish eye and rectified images) Analysis The total number of correct detections of the fish eye camera was higher than that of the normal camera, strengthening the hypothesis that overall more objects can be detected using a fish eye camera. Further, the F1 score was a little lower for the fish eye camera than for the normal camera but not much. It can thus be said that it is advantageous to use a fish eye camera for object detection using a network trained on normal images if the goal is maximizing the total number of detected objects. Of course, the reason for this advantage of the fish eye camera is the wider field of view and not that it is easier to

25 CHAPTER 6. EXPERIMENTS 19 detect objects in fish eye images. However, since the fish eye camera performed well it is used in the remainder of the project. Surprisingly, objects were detected not only in the center of the fish eye images but also on the distorted borders. Figure 6.4 is an example of this. This fact speaks for the advantage of using a fish eye camera to detect many objects - some of the "extra" detected objects compared to the normal camera are actually outside of the normal cameras field of view and the numbers cannot be only due to, for example, different image quality. Figure 6.4: An example of detections on the borders of a fish eye image. 6.2 Distance to Objects The goal of this experiment was to examine from what distance to view "object clusters" in order to detect as many objects as possible. More objects are expected to be detected when the camera is closer to the objects as compared to when it is further away. The experiment shows whether this is true or not Experimental Setup A similar office environment as in the previous experiment (Section 6.1) was viewed by the fish eye camera (since it performed best in detecting as many objects as possible). More specific, a table with some objects on it was viewed from different horizontal and vertical distances. The distances were measured from the front edge of the table. The camera was facing forward. An image sequence was recorded from each distance and height from the table edge and 20 images, equally spaced in time, extracted and run through R-FCN. Like in Section 6.1, this relatively small number of images is sufficient since the goal of the experiment is to compare datasets of equal sizes rather than determining how good the performance of the network is on a more global scale. Bounding boxes for objects in the images were also manually annotated and the results compared as explained in Section 5.2.

26 20 CHAPTER 6. EXPERIMENTS In order to determine from what distance most objects can be detected two different horizontal distances and two different vertical distances were examined. First, a horizontal distance of 0 cm between the camera and the table edge was used as a "close" distance. Then, as a "far away" distance 50 cm was used. Note that a typical ground robot would often have difficulties getting even this close to objects. Further, the closest vertical distance was chosen to be 15 cm (not 0 cm because it would not be possible to fly a drone that close to the table, and a camera is typically not mounted on the lower parts of a drone). Then, the "far away" vertical distance was chosen to be 35 cm, from where many objects were still present in the image. That is, if the camera was moved even higher, there were few objects in the image because of the camera facing forward. Figure 6.5 illustrates the different camera positions with respect to the table and Figure 6.6 shows examples of images from each dataset. Figure 6.5: Illustration of the distances in the experiment. The table is seen from the side. The dots represent the different camera positions.

27 CHAPTER 6. EXPERIMENTS 21 (a) 0 cm horizontal distance, 15 cm vertical distance. (b) 0 cm horizontal distance, 35 cm vertical distance. (c) 50 cm horizontal distance, 15 cm vertical distance. (d) 50 cm horizontal distance, 35 cm vertical distance. Figure 6.6: Example images from different distances Results Table 6.5 shows the results of the distance experiment. While more ground truths are present in the two datasets recorded from a 50 cm horizontal distance the number of correct detections is larger in the 0 cm horizontal distance datasets. This means that the average recalls and the F1 scores in these datasets are higher. Horizontal Vertical Number of Number of Number of Average Average Average distance [cm] distance [cm] ground truths correct incorrect precision recall F1 score detections detections Table 6.5: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score averaged over all classes for datasets recorded with different horizontal and vertical distances to a table.

28 22 CHAPTER 6. EXPERIMENTS Figure 6.7 shows examples of the bounding boxes found by R-FCN for the different distance datasets. (a) 0 cm horizontal distance, 15 cm vertical distance. (b) 0 cm horizontal distance, 35 cm vertical distance. (c) 50 cm horizontal distance, 15 cm vertical distance. (d) 50 cm horizontal distance, 35 cm vertical distance. Figure 6.7: Example images showing the resulting bounding boxes for different distances. Tables 6.6, 6.7, 6.8 and 6.9 show the results for each present object class in the datasets. In the two 0 cm horizontal distance datasets, the 15 cm vertical distance dataset show a larger recall of keyboards and mice than the 35 cm vertical distance dataset while it is the other way round for bottles and cups. This indicates that each type of object has an "optimal" viewing distance which needs to be adjusted in order to detect that type of object. Smaller objects need to be viewed from a closer distance.

29 CHAPTER 6. EXPERIMENTS 23 class apple banana bottle cup keyboard mouse scissors tvmonitor number of ground truths number of correct detections number of incorrect detections precision recall F1 score Table 6.6: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score for each class in dataset from 0 cm away from and 15 cm above table. class apple banana bottle cup keyboard mouse scissors tvmonitor number of ground truths number of correct detections number of incorrect detections precision recall F1 score Table 6.7: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score for each class in dataset from 0 cm away from and 35 cm above table. class apple bottle chair cup keyboard laptop mouse scissors tvmonitor number of ground truths number of correct detections number of incorrect detections precision recall F1 score Table 6.8: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score for each class in dataset from 50 cm away from and 15 cm above table. class apple banana bottle chair cup keyboard laptop mouse scissors tvmonitor number of ground truths number of correct detections number of incorrect detections precision recall F1 score Table 6.9: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score for each class in dataset from 50 cm away from and 35 cm above table Analysis The experiment showed, as expected, that small objects can be more easily detected from a closer horizontal distance. The experiment didn t show as clear results for the vertical

30 24 CHAPTER 6. EXPERIMENTS distance which could be because the the change in vertical distance was not as large as the one in horizontal distance (20 cm compared to 50 cm). However, it is possible to conclude that being close to small objects increases the chance of detecting them while larger objects may need a larger distance for best performance.

31 CHAPTER 6. EXPERIMENTS Camera Angle I this experiment three different camera angles are tested. The goal is to determine how to mount the camera on the drone for best object detection performance. One of the advantages of using a drone for detecting objects in a room is that it can fly over large objects, such as tables, in order to get a different kind of view than, for example, a ground robot can. Therefor, in this experiment the camera will be moved above and along a table Experimental Setup It is of interest to be as close to the objects as possible, however a drone cannot fly too close to things (and again, a camera is generally not mounted on the lower parts of a drone). In an office environment, the fish eye camera was moved from one side to the other about 0.4 meters above a table with objects on it. The distance of 0.4 meters was chosen trying to keep the camera as close as possible to the table, because of the results of the distance experiment in Section 6.2. However, because in this experiment the drone was moved along the table, and since there were objects on the table it was not possible to keep a closer distance. The camera was moved along the table three times, first with a 0 degree angle of the camera, then with a 45 degree angle and lastly with a 90 degree angle. What is meant by the different angles is demonstrated in Figure 6.8. Each time, an image sequence was recorded and 20 images extracted and run through R-FCN. Examples from the three datasets are shown in Figure 6.9. The images were also manually annotated with bounding boxes and the results compared. (a) 0 degree camera. (b) 45 degree camera. (c) 90 degree camera. Figure 6.8: Illustration of the different camera angles. The green arrows show the directions in which the cameras were moved.

32 26 CHAPTER 6. EXPERIMENTS (a) Example image from the 0 degree dataset. (b) Example image from the 45 degree dataset. (c) Example image from the 90 degree dataset. Figure 6.9: Examples images from the different camera angle datasets Results Table 6.10 shows the results of the angle experiment. The F1 score is a lot higher for the 90 degree dataset, which is expected as most training data images were probably taken from a close to 90 degrees perspective. Angle [deg] Number of Number of Number of Average Average Average ground truths correct incorrect precision recall F1 score detections detections Table 6.10: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score averaged over all classes in datasets from different angles. Figure 6.10 shows the bounding boxes predicted by R-FCN in example images from the three datasets in the experiment. (a) Example image from the 0 (b) Example image from degree dataset with bounding the 45 degree dataset with boxes. bounding boxes. (c) Example image from the 90 degree dataset with bounding boxes. Figure 6.10: Examples images from the different camera angle datasets with bounding boxes from R-FCN.

33 CHAPTER 6. EXPERIMENTS 27 Tables 6.11, 6.12 and 6.13 show the results for each object category present in the different datasets. It can be seen that the higher F1 score of the 90 degree dataset compared to the other two is mostly due to a higher recall of large objects, such as TV-monitors and chairs. class banana bottle cell_phone chair cup keyboard laptop mouse tvmonitor number of ground truths number of correct detections number of incorrect detections precision recall F1 score Table 6.11: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score for each class in dataset from 0 degrees camera angle. class banana bottle cell_phone chair cup keyboard mouse tvmonitor number of ground truths number of correct detections number of incorrect detections precision recall F1 score Table 6.12: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score for each class in dataset from 45 degrees camera angle. class banana bottle chair cup keyboard mouse tvmonitor number of ground truths number of correct detections number of incorrect detections precision recall F1 score Table 6.13: Number of ground truth instances, number of correctly detected instances, number of incorrectly detected instances, precision, recall and F1 score for each class in dataset from 90 degrees camera angle. Similar to in Section 6.1, not all object categories are present in all of Tables 6.11, 6.12 and The reasons are the same, either that misclassifications were made or that object instances were outside of the field of view in some of the datasets Analysis The performance on the 0 degree dataset is very low compared to the other two, which is logical since the objects look very different from this point of view compared to the training data. This fact was mentioned in Chapter 3, as others ([1], [2]) working on drones

34 28 CHAPTER 6. EXPERIMENTS had already stressed the difficulties in detecting objects in images that are different from training data. For an example of how different objects may look from above, see Figure 6.11 and note how the cup looks almost completely round as compared to what a cup looks like from the side. Figure 6.11: An example of an image from the 0 degree dataset. The F1 score of the forward facing (90 degree) camera dataset is much larger than that of the other two datasets. This suggests that the camera should be mounted facing forward. However, as can be seen in Table 6.13, one of the reasons for the superior performance of the 90 degree dataset is that more large objects (chairs) were detected. Many of these chairs were in the background of the images (Figure 6.12 is an example of this) and thus distorted in the other datasets. Therefor, depending on the environment where the drone will move, how it is meant to fly and what objects it is meant to detect it may be reasonable to mount the camera slightly tilted. (a) 45 degree camera angle. (b) 90 degree camera angle. Figure 6.12: Example image showing detection of chairs in the background.

35 Chapter 7 Real Drone This chapter describes a small experiment that was made without any formal evaluation in order to show whether or not R-FCN could be used on image data from a real, flying drone. A Parrot Bebop 2 drone [22], with a 180 field of view camera (as suggested in Section 6.1), mounted facing forward (as suggested in Section 6.3), was manually navigated in an office room. Attempts were made to fly as close as possible to a table with objects on it, but no distances as close as in the experiments of Chapter 6 could be reached due to difficulties in controlling the drone. A stream of images was recorded and the images input to R-FCN. Figure 7.1 shows an extract of images with bounding boxes from the above mentioned dataset. Figure 7.1: Example images from a dataset recorded from a real flying drone with bounding boxes predicted by R-FCN. 29

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

Object Detection on Self-Driving Cars in China. Lingyun Li

Object Detection on Self-Driving Cars in China. Lingyun Li Object Detection on Self-Driving Cars in China Lingyun Li Introduction Motivation: Perception is the key of self-driving cars Data set: 10000 images with annotation 2000 images without annotation (not

More information

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation

Object detection using Region Proposals (RCNN) Ernest Cheung COMP Presentation Object detection using Region Proposals (RCNN) Ernest Cheung COMP790-125 Presentation 1 2 Problem to solve Object detection Input: Image Output: Bounding box of the object 3 Object detection using CNN

More information

Deep Learning for Object detection & localization

Deep Learning for Object detection & localization Deep Learning for Object detection & localization RCNN, Fast RCNN, Faster RCNN, YOLO, GAP, CAM, MSROI Aaditya Prakash Sep 25, 2018 Image classification Image classification Whole of image is classified

More information

Visual features detection based on deep neural network in autonomous driving tasks

Visual features detection based on deep neural network in autonomous driving tasks 430 Fomin I., Gromoshinskii D., Stepanov D. Visual features detection based on deep neural network in autonomous driving tasks Ivan Fomin, Dmitrii Gromoshinskii, Dmitry Stepanov Computer vision lab Russian

More information

Yield Estimation using faster R-CNN

Yield Estimation using faster R-CNN Yield Estimation using faster R-CNN 1 Vidhya Sagar, 2 Sailesh J.Jain and 2 Arjun P. 1 Assistant Professor, 2 UG Scholar, Department of Computer Engineering and Science SRM Institute of Science and Technology,Chennai,

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

Classification of objects from Video Data (Group 30)

Classification of objects from Video Data (Group 30) Classification of objects from Video Data (Group 30) Sheallika Singh 12665 Vibhuti Mahajan 12792 Aahitagni Mukherjee 12001 M Arvind 12385 1 Motivation Video surveillance has been employed for a long time

More information

OBJECT DETECTION HYUNG IL KOO

OBJECT DETECTION HYUNG IL KOO OBJECT DETECTION HYUNG IL KOO INTRODUCTION Computer Vision Tasks Classification + Localization Classification: C-classes Input: image Output: class label Evaluation metric: accuracy Localization Input:

More information

DEEP NEURAL NETWORKS FOR OBJECT DETECTION

DEEP NEURAL NETWORKS FOR OBJECT DETECTION DEEP NEURAL NETWORKS FOR OBJECT DETECTION Sergey Nikolenko Steklov Institute of Mathematics at St. Petersburg October 21, 2017, St. Petersburg, Russia Outline Bird s eye overview of deep learning Convolutional

More information

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK 1 Po-Jen Lai ( 賴柏任 ), 2 Chiou-Shann Fuh ( 傅楸善 ) 1 Dept. of Electrical Engineering, National Taiwan University, Taiwan 2 Dept.

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

Automatic detection of books based on Faster R-CNN

Automatic detection of books based on Faster R-CNN Automatic detection of books based on Faster R-CNN Beibei Zhu, Xiaoyu Wu, Lei Yang, Yinghua Shen School of Information Engineering, Communication University of China Beijing, China e-mail: zhubeibei@cuc.edu.cn,

More information

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left

More information

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard

(Deep) Learning for Robot Perception and Navigation. Wolfram Burgard (Deep) Learning for Robot Perception and Navigation Wolfram Burgard Deep Learning for Robot Perception (and Navigation) Lifeng Bo, Claas Bollen, Thomas Brox, Andreas Eitel, Dieter Fox, Gabriel L. Oliveira,

More information

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger YOLO9000: Better, Faster, Stronger Date: January 24, 2018 Prepared by Haris Khan (University of Toronto) Haris Khan CSC2548: Machine Learning in Computer Vision 1 Overview 1. Motivation for one-shot object

More information

Yelp Restaurant Photo Classification

Yelp Restaurant Photo Classification Yelp Restaurant Photo Classification Rajarshi Roy Stanford University rroy@stanford.edu Abstract The Yelp Restaurant Photo Classification challenge is a Kaggle challenge that focuses on the problem predicting

More information

Regionlet Object Detector with Hand-crafted and CNN Feature

Regionlet Object Detector with Hand-crafted and CNN Feature Regionlet Object Detector with Hand-crafted and CNN Feature Xiaoyu Wang Research Xiaoyu Wang Research Ming Yang Horizon Robotics Shenghuo Zhu Alibaba Group Yuanqing Lin Baidu Overview of this section Regionlet

More information

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR

Object Detection. CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Object Detection CS698N Final Project Presentation AKSHAT AGARWAL SIDDHARTH TANWAR Problem Description Arguably the most important part of perception Long term goals for object recognition: Generalization

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Janitor Bot - Detecting Light Switches Jiaqi Guo, Haizi Yu December 10, 2010

Janitor Bot - Detecting Light Switches Jiaqi Guo, Haizi Yu December 10, 2010 1. Introduction Janitor Bot - Detecting Light Switches Jiaqi Guo, Haizi Yu December 10, 2010 The demand for janitorial robots has gone up with the rising affluence and increasingly busy lifestyles of people

More information

Pedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016

Pedestrian Detection Using Correlated Lidar and Image Data EECS442 Final Project Fall 2016 edestrian Detection Using Correlated Lidar and Image Data EECS442 Final roject Fall 2016 Samuel Rohrer University of Michigan rohrer@umich.edu Ian Lin University of Michigan tiannis@umich.edu Abstract

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation BY; ROSS GIRSHICK, JEFF DONAHUE, TREVOR DARRELL AND JITENDRA MALIK PRESENTER; MUHAMMAD OSAMA Object detection vs. classification

More information

Deep Face Recognition. Nathan Sun

Deep Face Recognition. Nathan Sun Deep Face Recognition Nathan Sun Why Facial Recognition? Picture ID or video tracking Higher Security for Facial Recognition Software Immensely useful to police in tracking suspects Your face will be an

More information

TorontoCity: Seeing the World with a Million Eyes

TorontoCity: Seeing the World with a Million Eyes TorontoCity: Seeing the World with a Million Eyes Authors Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun * Project Completed

More information

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi

Mask R-CNN. By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi Mask R-CNN By Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick Presented By Aditya Sanghi Types of Computer Vision Tasks http://cs231n.stanford.edu/ Semantic vs Instance Segmentation Image

More information

Learning Semantic Environment Perception for Cognitive Robots

Learning Semantic Environment Perception for Cognitive Robots Learning Semantic Environment Perception for Cognitive Robots Sven Behnke University of Bonn, Germany Computer Science Institute VI Autonomous Intelligent Systems Some of Our Cognitive Robots Equipped

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross Girshick Jian Sun Present by: Yixin Yang Mingdong Wang 1 Object Detection 2 1 Applications Basic

More information

Final Report: Smart Trash Net: Waste Localization and Classification

Final Report: Smart Trash Net: Waste Localization and Classification Final Report: Smart Trash Net: Waste Localization and Classification Oluwasanya Awe oawe@stanford.edu Robel Mengistu robel@stanford.edu December 15, 2017 Vikram Sreedhar vsreed@stanford.edu Abstract Given

More information

CS4758: Moving Person Avoider

CS4758: Moving Person Avoider CS4758: Moving Person Avoider Yi Heng Lee, Sze Kiat Sim Abstract We attempt to have a quadrotor autonomously avoid people while moving through an indoor environment. Our algorithm for detecting people

More information

Fuzzy Set Theory in Computer Vision: Example 3

Fuzzy Set Theory in Computer Vision: Example 3 Fuzzy Set Theory in Computer Vision: Example 3 Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Purpose of these slides are to make you aware of a few of the different CNN architectures

More information

Efficient Segmentation-Aided Text Detection For Intelligent Robots

Efficient Segmentation-Aided Text Detection For Intelligent Robots Efficient Segmentation-Aided Text Detection For Intelligent Robots Junting Zhang, Yuewei Na, Siyang Li, C.-C. Jay Kuo University of Southern California Outline Problem Definition and Motivation Related

More information

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,

More information

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS

R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS R-FCN: OBJECT DETECTION VIA REGION-BASED FULLY CONVOLUTIONAL NETWORKS JIFENG DAI YI LI KAIMING HE JIAN SUN MICROSOFT RESEARCH TSINGHUA UNIVERSITY MICROSOFT RESEARCH MICROSOFT RESEARCH SPEED/ACCURACY TRADE-OFFS

More information

Category-level localization

Category-level localization Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Apparel Classifier and Recommender using Deep Learning

Apparel Classifier and Recommender using Deep Learning Apparel Classifier and Recommender using Deep Learning Live Demo at: http://saurabhg.me/projects/tag-that-apparel Saurabh Gupta sag043@ucsd.edu Siddhartha Agarwal siagarwa@ucsd.edu Apoorve Dave a1dave@ucsd.edu

More information

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Instance-aware Semantic Segmentation via Multi-task Network Cascades Instance-aware Semantic Segmentation via Multi-task Network Cascades Jifeng Dai, Kaiming He, Jian Sun Microsoft research 2016 Yotam Gil Amit Nativ Agenda Introduction Highlights Implementation Further

More information

Classification and Detection in Images. D.A. Forsyth

Classification and Detection in Images. D.A. Forsyth Classification and Detection in Images D.A. Forsyth Classifying Images Motivating problems detecting explicit images classifying materials classifying scenes Strategy build appropriate image features train

More information

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm Instructions This is an individual assignment. Individual means each student must hand in their

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

Using Machine Learning for Classification of Cancer Cells

Using Machine Learning for Classification of Cancer Cells Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.

More information

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task

Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Fine-tuning Pre-trained Large Scaled ImageNet model on smaller dataset for Detection task Kyunghee Kim Stanford University 353 Serra Mall Stanford, CA 94305 kyunghee.kim@stanford.edu Abstract We use a

More information

Exploiting scene constraints to improve object detection algorithms for industrial applications

Exploiting scene constraints to improve object detection algorithms for industrial applications Exploiting scene constraints to improve object detection algorithms for industrial applications PhD Public Defense Steven Puttemans Promotor: Toon Goedemé 2 A general introduction Object detection? Help

More information

Inception Network Overview. David White CS793

Inception Network Overview. David White CS793 Inception Network Overview David White CS793 So, Leonardo DiCaprio dreams about dreaming... https://m.media-amazon.com/images/m/mv5bmjaxmzy3njcxnf5bml5banbnxkftztcwnti5otm0mw@@._v1_sy1000_cr0,0,675,1 000_AL_.jpg

More information

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation M. Blauth, E. Kraft, F. Hirschenberger, M. Böhm Fraunhofer Institute for Industrial Mathematics, Fraunhofer-Platz 1,

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Recap Image Classification with Bags of Local Features

Recap Image Classification with Bags of Local Features Recap Image Classification with Bags of Local Features Bag of Feature models were the state of the art for image classification for a decade BoF may still be the state of the art for instance retrieval

More information

Single-Shot Refinement Neural Network for Object Detection -Supplementary Material-

Single-Shot Refinement Neural Network for Object Detection -Supplementary Material- Single-Shot Refinement Neural Network for Object Detection -Supplementary Material- Shifeng Zhang 1,2, Longyin Wen 3, Xiao Bian 3, Zhen Lei 1,2, Stan Z. Li 4,1,2 1 CBSR & NLPR, Institute of Automation,

More information

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey

Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey Presented at the FIG Congress 2018, May 6-11, 2018 in Istanbul, Turkey Evangelos MALTEZOS, Charalabos IOANNIDIS, Anastasios DOULAMIS and Nikolaos DOULAMIS Laboratory of Photogrammetry, School of Rural

More information

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang

SSD: Single Shot MultiBox Detector. Author: Wei Liu et al. Presenter: Siyu Jiang SSD: Single Shot MultiBox Detector Author: Wei Liu et al. Presenter: Siyu Jiang Outline 1. Motivations 2. Contributions 3. Methodology 4. Experiments 5. Conclusions 6. Extensions Motivation Motivation

More information

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018

Mask R-CNN. Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018 Mask R-CNN Kaiming He, Georgia, Gkioxari, Piotr Dollar, Ross Girshick Presenters: Xiaokang Wang, Mengyao Shi Feb. 13, 2018 1 Common computer vision tasks Image Classification: one label is generated for

More information

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing

3 Object Detection. BVM 2018 Tutorial: Advanced Deep Learning Methods. Paul F. Jaeger, Division of Medical Image Computing 3 Object Detection BVM 2018 Tutorial: Advanced Deep Learning Methods Paul F. Jaeger, of Medical Image Computing What is object detection? classification segmentation obj. detection (1 label per pixel)

More information

AUTONOMOUS IMAGE EXTRACTION AND SEGMENTATION OF IMAGE USING UAV S

AUTONOMOUS IMAGE EXTRACTION AND SEGMENTATION OF IMAGE USING UAV S AUTONOMOUS IMAGE EXTRACTION AND SEGMENTATION OF IMAGE USING UAV S Radha Krishna Rambola, Associate Professor, NMIMS University, India Akash Agrawal, Student at NMIMS University, India ABSTRACT Due to the

More information

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018

Object Detection. TA : Young-geun Kim. Biostatistics Lab., Seoul National University. March-June, 2018 Object Detection TA : Young-geun Kim Biostatistics Lab., Seoul National University March-June, 2018 Seoul National University Deep Learning March-June, 2018 1 / 57 Index 1 Introduction 2 R-CNN 3 YOLO 4

More information

ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall Midterm Examination

ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall Midterm Examination ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall 2008 October 29, 2008 Notes: Midterm Examination This is a closed book and closed notes examination. Please be precise and to the point.

More information

Rich feature hierarchies for accurate object detection and semant

Rich feature hierarchies for accurate object detection and semant Rich feature hierarchies for accurate object detection and semantic segmentation Speaker: Yucong Shen 4/5/2018 Develop of Object Detection 1 DPM (Deformable parts models) 2 R-CNN 3 Fast R-CNN 4 Faster

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Storyline Reconstruction for Unordered Images

Storyline Reconstruction for Unordered Images Introduction: Storyline Reconstruction for Unordered Images Final Paper Sameedha Bairagi, Arpit Khandelwal, Venkatesh Raizaday Storyline reconstruction is a relatively new topic and has not been researched

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Project 3 Q&A. Jonathan Krause

Project 3 Q&A. Jonathan Krause Project 3 Q&A Jonathan Krause 1 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations 2 Outline R-CNN Review Error metrics Code Overview Project 3 Report Project 3 Presentations

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Bus Detection and recognition for visually impaired people

Bus Detection and recognition for visually impaired people Bus Detection and recognition for visually impaired people Hangrong Pan, Chucai Yi, and Yingli Tian The City College of New York The Graduate Center The City University of New York MAP4VIP Outline Motivation

More information

Lecture 5: Object Detection

Lecture 5: Object Detection Object Detection CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 5: Object Detection Bohyung Han Computer Vision Lab. bhhan@postech.ac.kr 2 Traditional Object Detection Algorithms Region-based

More information

Multi-View 3D Object Detection Network for Autonomous Driving

Multi-View 3D Object Detection Network for Autonomous Driving Multi-View 3D Object Detection Network for Autonomous Driving Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia CVPR 2017 (Spotlight) Presented By: Jason Ku Overview Motivation Dataset Network Architecture

More information

Pedestrian Detection Using Structured SVM

Pedestrian Detection Using Structured SVM Pedestrian Detection Using Structured SVM Wonhui Kim Stanford University Department of Electrical Engineering wonhui@stanford.edu Seungmin Lee Stanford University Department of Electrical Engineering smlee729@stanford.edu.

More information

Object Recognition II

Object Recognition II Object Recognition II Linda Shapiro EE/CSE 576 with CNN slides from Ross Girshick 1 Outline Object detection the task, evaluation, datasets Convolutional Neural Networks (CNNs) overview and history Region-based

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Rich feature hierarchies for accurate object detection and semantic segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation Rich feature hierarchies for accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Presented by Pandian Raju and Jialin Wu Last class SGD for Document

More information

Advanced Video Analysis & Imaging

Advanced Video Analysis & Imaging Advanced Video Analysis & Imaging (5LSH0), Module 09B Machine Learning with Convolutional Neural Networks (CNNs) - Workout Farhad G. Zanjani, Clint Sebastian, Egor Bondarev, Peter H.N. de With ( p.h.n.de.with@tue.nl

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

Skin and Face Detection

Skin and Face Detection Skin and Face Detection Linda Shapiro EE/CSE 576 1 What s Coming 1. Review of Bakic flesh detector 2. Fleck and Forsyth flesh detector 3. Details of Rowley face detector 4. Review of the basic AdaBoost

More information

ECE 5470 Classification, Machine Learning, and Neural Network Review

ECE 5470 Classification, Machine Learning, and Neural Network Review ECE 5470 Classification, Machine Learning, and Neural Network Review Due December 1. Solution set Instructions: These questions are to be answered on this document which should be submitted to blackboard

More information

CS 523: Multimedia Systems

CS 523: Multimedia Systems CS 523: Multimedia Systems Angus Forbes creativecoding.evl.uic.edu/courses/cs523 Today - Convolutional Neural Networks - Work on Project 1 http://playground.tensorflow.org/ Convolutional Neural Networks

More information

Lab 9. Julia Janicki. Introduction

Lab 9. Julia Janicki. Introduction Lab 9 Julia Janicki Introduction My goal for this project is to map a general land cover in the area of Alexandria in Egypt using supervised classification, specifically the Maximum Likelihood and Support

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

FINE-GRAINED image classification aims to recognize. Fast Fine-grained Image Classification via Weakly Supervised Discriminative Localization

FINE-GRAINED image classification aims to recognize. Fast Fine-grained Image Classification via Weakly Supervised Discriminative Localization 1 Fast Fine-grained Image Classification via Weakly Supervised Discriminative Localization Xiangteng He, Yuxin Peng and Junjie Zhao arxiv:1710.01168v1 [cs.cv] 30 Sep 2017 Abstract Fine-grained image classification

More information

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon Deep Learning For Video Classification Presented by Natalie Carlebach & Gil Sharon Overview Of Presentation Motivation Challenges of video classification Common datasets 4 different methods presented in

More information

Toward Scale-Invariance and Position-Sensitive Region Proposal Networks

Toward Scale-Invariance and Position-Sensitive Region Proposal Networks Toward Scale-Invariance and Position-Sensitive Region Proposal Networks Hsueh-Fu Lu [0000 0003 1885 3805], Xiaofei Du [0000 0002 0071 8093], and Ping-Lin Chang [0000 0002 3341 8425] Umbo Computer Vision

More information

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation APPEARING IN IEEE TRANSACTIONS ON IMAGE PROCESSING, OCTOBER 2016 1 Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation Yuanzhouhan Cao, Chunhua Shen, Heng Tao Shen

More information

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization

Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization Hide-and-Seek: Forcing a network to be Meticulous for Weakly-supervised Object and Action Localization Krishna Kumar Singh and Yong Jae Lee University of California, Davis ---- Paper Presentation Yixian

More information

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington

3D Object Recognition and Scene Understanding from RGB-D Videos. Yu Xiang Postdoctoral Researcher University of Washington 3D Object Recognition and Scene Understanding from RGB-D Videos Yu Xiang Postdoctoral Researcher University of Washington 1 2 Act in the 3D World Sensing & Understanding Acting Intelligent System 3D World

More information

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213) Recognition of Animal Skin Texture Attributes in the Wild Amey Dharwadker (aap2174) Kai Zhang (kz2213) Motivation Patterns and textures are have an important role in object description and understanding

More information

Character Recognition

Character Recognition Character Recognition 5.1 INTRODUCTION Recognition is one of the important steps in image processing. There are different methods such as Histogram method, Hough transformation, Neural computing approaches

More information

Object Detection Design challenges

Object Detection Design challenges Object Detection Design challenges How to efficiently search for likely objects Even simple models require searching hundreds of thousands of positions and scales Feature design and scoring How should

More information

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ

Industrial Technology Research Institute, Hsinchu, Taiwan, R.O.C ǂ Stop Line Detection and Distance Measurement for Road Intersection based on Deep Learning Neural Network Guan-Ting Lin 1, Patrisia Sherryl Santoso *1, Che-Tsung Lin *ǂ, Chia-Chi Tsai and Jiun-In Guo National

More information

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg Human Detection A state-of-the-art survey Mohammad Dorgham University of Hamburg Presentation outline Motivation Applications Overview of approaches (categorized) Approaches details References Motivation

More information

Optimizing Object Detection:

Optimizing Object Detection: Lecture 10: Optimizing Object Detection: A Case Study of R-CNN, Fast R-CNN, and Faster R-CNN Visual Computing Systems Today s task: object detection Image classification: what is the object in this image?

More information

https://en.wikipedia.org/wiki/the_dress Recap: Viola-Jones sliding window detector Fast detection through two mechanisms Quickly eliminate unlikely windows Use features that are fast to compute Viola

More information

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Mobile Human Detection Systems based on Sliding Windows Approach-A Review Mobile Human Detection Systems based on Sliding Windows Approach-A Review Seminar: Mobile Human detection systems Njieutcheu Tassi cedrique Rovile Department of Computer Engineering University of Heidelberg

More information

Large-scale Video Classification with Convolutional Neural Networks

Large-scale Video Classification with Convolutional Neural Networks Large-scale Video Classification with Convolutional Neural Networks Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei Note: Slide content mostly from : Bay Area

More information

A Deep Learning Approach to Vehicle Speed Estimation

A Deep Learning Approach to Vehicle Speed Estimation A Deep Learning Approach to Vehicle Speed Estimation Benjamin Penchas bpenchas@stanford.edu Tobin Bell tbell@stanford.edu Marco Monteiro marcorm@stanford.edu ABSTRACT Given car dashboard video footage,

More information

Supplementary Materials for DVQA: Understanding Data Visualizations via Question Answering

Supplementary Materials for DVQA: Understanding Data Visualizations via Question Answering Supplementary Materials for DVQA: Understanding Data Visualizations via Question Answering Kushal Kafle 1, Brian Price 2 Scott Cohen 2 Christopher Kanan 1 1 Rochester Institute of Technology 2 Adobe Research

More information

2 OVERVIEW OF RELATED WORK

2 OVERVIEW OF RELATED WORK Utsushi SAKAI Jun OGATA This paper presents a pedestrian detection system based on the fusion of sensors for LIDAR and convolutional neural network based image classification. By using LIDAR our method

More information

arxiv: v1 [cs.lg] 31 Oct 2018

arxiv: v1 [cs.lg] 31 Oct 2018 UNDERSTANDING DEEP NEURAL NETWORKS USING TOPOLOGICAL DATA ANALYSIS DANIEL GOLDFARB arxiv:1811.00852v1 [cs.lg] 31 Oct 2018 Abstract. Deep neural networks (DNN) are black box algorithms. They are trained

More information

A New Protocol of CSI For The Royal Canadian Mounted Police

A New Protocol of CSI For The Royal Canadian Mounted Police A New Protocol of CSI For The Royal Canadian Mounted Police I. Introduction The Royal Canadian Mounted Police started using Unmanned Aerial Vehicles to help them with their work on collision and crime

More information