Jaroslav Moravec. Object recognition using 3D convolutional neural networks

Size: px
Start display at page:

Download "Jaroslav Moravec. Object recognition using 3D convolutional neural networks"

Transcription

1 BACHELOR THESIS Jaroslav Moravec Object recognition using 3D convolutional neural networks Department o Sotware Engineering Supervisor o the bachelor thesis: Study programme: Study branch: RNDr. Jakub Lokoč, Ph.D. Computer Science ISDI Prague 2017

2 I declare that I carried out this bachelor thesis independently, and only with the cited sources, literature and other proessional sources. I understand that my work relates to the rights and obligations under the Act No. 121/2000 Sb., the Copyright Act, as amended, in particular the act that the Charles University has the right to conclude a license agreement on the use o this work as a school work pursuant to Section 60 subsection 1 o the Copyright Act. In... date... signature o the author i

3 Title: Object recognition using 3D convolutional neural networks Author: Jaroslav Moravec Computer Science: Department o Sotware Engineering Supervisor: RNDr. Jakub Lokoč, Ph.D., Department o Sotware Engineering Abstract: With the ast development o laser and sensor technologies, it has become easy to scan a real-world object and save it in a digital ormat into a persistent database. With the rising number o scanned 3D objects, data management and retrieval methods become necessary. For various retrieval tasks, eective retrieval models are required. In our work, we ocus on eective classiication and similarity search. The investigated approach is based on convolutional neural networks representing a machine learning method that boomed in recent years. We have designed and trained several architectures o 3D convolutional neural networks and tested them on state-o-the-art benchmark 3D datasets or 3D object recognition and retrieval. We were also able to show that the trained eatures on one dataset can be then used to predict class labels on another 3D dataset. Keywords: Object recognition, 3D convolution, neural networks ii

4 Contents 1 Introduction 3 2 3D Datasets 4 3 Search in 3D Datasets 5 4 Classiication and Search Using DCNN DCNN Artiicial Neural Networks Convolutional Layer Pooling Layer Local Response Normalization Fully Connected Layer Dropout Transormation o Model or DCNN Used CNN Architectures Object retrieval Using Our Classiier Using Similarity Search Learning o DCNN Motivation Gradient Descent Optimization Gradient Descent Variants Gradient Descent Optimization Algorithms Experiments Object Recognition and Retrieval (SHREC16) Method Results and Discussion Object Retrieval (SHREC15) Method Results and Discussion Object Recognition and Retrieval (ModelNet10) Method Results and Discussion Conclusion and Future work 42 Bibliography 44 List o Figures 47 List o Tables 48 1

5 List o Abbreviations 49 Attachments 50 2

6 1. Introduction As the mankind always wanted to create a technology that would help with hard or even impossible work or one person, the ields o robotics and computer vision were established and developed or last decades. Nowadays, these ields are developing new autonomous robots, machines and cars or 3D world. Thus we also need to develop various methods to teach computers (i.e., machine brains ) to understand real-world objects, environment and situations. The goal o this thesis is to describe and implement one o these methods - 3D convolutional neural network - and compare its results or various datasets with state-o-the-art methods. During last decades, many types o classiiers were designed to address a classiication problem Obj C, where Obj is a set o 3D objects and C is a inite set o classes. Methods can be based on the appearance o the object eg. edge, gradient or grayscale matching. Such methods are mostly based on inormation gathered rom pre-computed projections o an object, thus do not take 3D geometric shape o the object into consideration. These are probably the oldest approaches and their results cannot compete with present eature-based methods. These methods extract some eatures rom pre-captured views o the object (eg. corners, surace patches). Thus enorce to consider also 3D shape o the object. These vectors o eatures are then matched to decide to which class the object belongs. In our thesis, we describe and implement one o the eature-based methods or objects recognition Convolutional neural networks (CNN). The thesis is organized as ollows. In chapter 2, we describe 3D datasets, which were used or training our convolutional neural network architectures in chapter 6. In chapter 3, we discuss the usage o the object retrieval approaches and important deinitions. These are helpul in the ollowing chapters especially in section 4.4 and chapter 6. Following chapter 4 contains all necessary deinitions and algorithms to understand convolutional neural networks. We tried to describe all concepts, which are very hard to understand, so that they are easily readable or anyone interested in this topic. The insight in this chapter is very important to understand the ollowing chapter 5 and thus the whole concept o learning o convolutional neural networks and why these black-box-algorithms really work. It also includes the theory behind chapter 6, which will reer to this one. The next chapter 5 contains important inormation about learning o convolutional neural networks. It describes the most common type o optimizer - gradient descent - and some o its approaches. Some o these optimizers are then used in chapter 6. Probably the most important chapter 6 contains parameters/hyperparameters o our architectures and their results that will be described and discussed in more detail. The conclusions and uture work are presented in chapter 7. 3

7 2. 3D Datasets With the development o new sensors, it is becoming more easy to scan a real world 3D model in everyday lie and store it in a digital ormat. With the rising number o stored 3D models, also novel methods or 3D data management and retrieval are required. This is the purpose o the SHREC competition which is being held by well-known universities every year. In this work, we will use the ModelNet10 dataset and two datasets rom competitions SHREC 16 and SHREC 15. The two datasets are used or shape recognition described more in chapter 3 and chapter 4. Since the structures o all the datasets are dierent, we need to describe them separately. According to the web page o the organizer o SHREC 15, the dataset (SHREC15 [2015]) contains 229 labeled objects rom nine classes and other unclassiied objects rom several publicly available shape collections. Ater obtaining the irst large-scale set o shapes, the organizers applied a careul post-processing step in order to repair non-maniold objects and merge objects with more than 1 connected component. Their inal dataset only contains maniold objects with one connected component. This pre-processing step will guarantee that most o the current approaches work with the dataset. The SHREC 16 competition uses the dataset ShapeNetCore subset o ShapeNet (SHREC16 [2016]) which contains about 51,300 3D models out o 55 common categories. For the competition, the dataset was divided into train, validation and test parts, in a ratio 70/10/20. The competition has two levels o diiculty: the normal and the perturbed data. The normal data are consistently aligned with respect to cartesian axes, while the perturbed data are randomly rotated. In this work, we trained all our classiiers and retrieval models using the normal dataset. As described on the web page ModelNet [2016], ModelNet (Wu et al. [2014]) is a project o Princeton University, whose goal is to provide comprehensive clean collection o 3D CAD models or objects. The organizers chose 10 common categories o objects and then collected models belonging to each category using online searching engines by querying or each object category term. Then, they hired human workers to manually decide whether each CAD model belongs to the speciied category. Furthermore, they manually aligned the orientation o the CAD models or this 10-class subset as well. We will use this dataset or our experiments, it contains models rom 10 classes or train and 908 models or the test. The comparisons o our results with state-o-the-art will be discussed in chapter 6. 4

8 3. Search in 3D Datasets In this section, we will ollow explanations presented in Bustos et al. [2005]. The problem o eicient and eective search in databases with 3D objects arises in many domains. These are or example: Medical domain: 3D shape retrieval can be used or the detection o organs deormation and thus or diagnostic purposes. Molecular biology: 3D retrieval approaches are used or a structural classiication, where molecules and proteins are modeled as 3D objects. Meteorology: Similarity search in 3D data was used to warn people allergic to dierent kinds o pollen. The conocal laser scan rom a microscope gives the 3D volumetric data or pollen rom which the structure can be extracted. Based on this structure we can build a classiier or dierent pollen types. Computer aided design: Retrieval in 3D databases can be used to support CAD tools which are requently used in manuacturing. When some new product is designed, it can be built rom smaller 3D objects, which are already in the database. Or i some part o the 3D object needs to be substituted, e. g. or reducing cost, it could be replaced by a similar part in the database. Army: 3D shape retrieval can be used or the classical riend/oe detection problem. The shape o some unidentiied object is compared to shapes in the database. Based on the result, we can say i the object is our riend or enemy. Movie and video games: Producers make heavy usage o 3D models to enhance realism. Similarity search can be used in existing databases or adaptation and re-usage o 3D objects. As we can see, there are diverse ields o usage or shape retrieval in 3D data and so there are also dierent approaches or 3D data representation, manipulation, and presentation. A complex 3D object can be represented as a set o smaller primitives, which are combined into one. 3D acquisition devices usually produce voxelized object approximations or 3D point clouds, but other representations like 3D grammars also exist. Probably the most widely used representation is an approximation o a 3D object with a mesh o polygons (usually triangles). Basically, all mentioned representations can be used or 3D shape retrieval or can be converted to another representation suitable or similarity retrieval. For decades, the similarity search o 3D shapes and their description was studied in the ields o computer vision, shape analysis, and computational geometry. In the computer vision, we usually try to segment a 3D object into 2D images and then match these segments to the set o a priori known reerence 2D objects. Problems can obviously arise with invariance in input (lighting conditions, view perspective, clutter, occlusion). But the decision problem itsel is also diicult: What is the similarity notion? 5

9 What is the similarity threshold? How much tolerance is sustainable in a given application context, and which answer set sizes are required? The key part o the object retrieval task is also its eiciency because we want to be able to search in large databases quickly. Feature vector paradigm Feature vector paradigm is a standard method or multimedia retrieval, when we don t know how to compare two objects directly. As complex and unstructured objects (like 3D models) rom a universe Obj cannot be directly compared to each other, a simpliied descriptor s universe U is deined consisting only o extracted (and potentially aggregated) important eatures o objects Lokoč [2010]. Let us assume, we have deined certain aspects o our 3D object, then all these aspects are used to orm a eature vector (descriptor) o this object, with usually a very high dimension. Note that eature vectors can be indexed or more eicient retrieval. The resulting eature vectors should describe important characteristics o the modeled 3D object, that are determined by the utilized extraction method. Extraction methods can consider: properties o the 3D object s bounding box distribution o normal vectors or curvature the ourier transorm o some spherical unctions that characterize objects It is naturally hard to ind the right extraction method or our similarity search task because no approach is suitable or all tasks at once. Every extraction method inds some other characteristic o the 3D object and so gives other results. Deinition 3.1. Extraction unction e :Obj U transorms a multimedia object rom database universe Obj into a descriptor in descriptor universe U. We are not usually working with the whole object universe Obj, but with a small subset X Obj. Similarly we deine a descriptor subset S with respect to original database X as S U. Ater we choose an appropriate extraction method, the eature vector or every object in the database has to be evaluated. I we want to decide or some 3D object how much is similar to another one, we only need to use a suitable distance unction or eature vectors o those two 3D objects. We can then produce the ranking o all database objects in ascending order with respect to their distance to a query object. Deinition 3.2. Distance measure o two 3D objects deined by their descriptors is determined by a non-negative, real number. Generally: δ : U U R + 0 Smaller values o δ or two objects denote higher similarity. Then, there are two types o similarity queries in the descriptor database S: 6

10 Collection A A R R Figure 3.1: The query relevant and retrieved objects visualization Range queries: A range query range(o, r) returns or some value r all objects (descriptors) that are within distance r rom o: range(o, r) = {u S : δ(u, o) r} k-nearest neighbors (k NN) queries: Returns the k most similar objects (descriptors) rom S to o, so it returns set o object knn(o) = C that: C S : C = k c C u S \ C : δ(o, c) δ(o, u) An important amily o similarity unctions in vector spaces is the Minkowski L s, deined as: L s ( v 1, ( ) 1 v 2 ) = vi 1 vi 2 s s where v 1 and v 2 are eature vectors rom R d, s 1. The most used unctions rom this amily is Manhattan distance L 1, Euclidean distance L 2 and the maximal distance L = max 1 i d v 1 i v 2 i. Let us know introduce the notation according to ig. 3.1: R... is a set o relevant objects A... is a set o retrieved objects R A... is a set o all retrieved objects that are relevant Wesley [2010] We can deine important deinitions with respect to ig. 3.1 or similarity search, which will be used in the ollowing chapters. Deinition 3.3. Precision is the raction o a number o retrieved objects that are relevant to a number o all retrieved objects: precision = i R A A Deinition 3.4. Recall is the raction o a number o retrieved objects that are relevant to a number o all relevant objects: recall = 7 R A R

11 Figure 3.2: The precision-recall curve rom Stanord [b] Deinition 3.5. Accuracy o an inormation retrieval system is the raction o classiications that are correct: accuracy = Manning et al. [2008] (R A) (R A ) (R A) (R A ) (R A) (R A ) In case we want to compute accuracy o classiier on some database, thus it is a raction o all objects in the database classiications that are correct. We can express precision as a unction (visualized in ig. 3.2) o recall p(r) and then deine ollowing Su et al. [2015]: Deinition 3.6. (Average precision) computes the average value o p(r) over the interval rom r = 0 to r = 1: AveP = 1 r=0 p(r)dr Deinition 3.7. Mean average precision is the mean o the average precision o all queries in Q: q Q AveP (q) MAP = Q Beitzel et al. [2009] 8

12 4. Classiication and Search Using DCNN 4.1 DCNN In this section, we present Deep convolutional neural networks, including their: main building block (neuron) standard layer-wise organization orward propagation and backpropagation algorithms Layers o convolutional neural networks are deined in the rest o the section Artiicial Neural Networks Motivation As we can ind in CS231n [2017a], artiicial neural networks were developed or modeling biological neural systems. Their basic computational unit was named ater its equivalent in brain - neuron. We will describe both systems with respect to ig Each neuron receives input signals rom its dendrites and creates an output signal, which is then transmitted through its axon. The axon branches out and connects to other neuron dendrites. I the sum o all input signals is greater than a threshold the neuron can ire again and send the signal to its axons. In the computational model which is used by artiicial neural nets the signal (x i ) travels rom one neuron through a connection with speciic strength (w i ) to a second neuron. The multiplication o the connection strength w i (which is called the weight o the connection) and the signal x i gives one o the inputs to the second neuron. The sum o all input signals i x i w i and a bias value b o the neuron is not compared with the threshold (like in the biological case), but we apply some activation unction on it. The result o the activation unction is also the output o the neuron. The network is taught eatures (trying to change weights o connections) o the input to make its prediction closer to the desired output. CS231n [2017a] Neuron We will use similar notation and some deinitions rom Schmid [2011]. Deinition 4.1. A Neuron is a triple (,w,b), where: : R R is an activation unction (e.g. sigmoid, tanh, ReLU, described below in this section 4.1.1) w R n is a vector o weights b R is a bias 9

13 (a) The biological neuron rom CS231n [2017c] (b) The computational model rom CS231n [2017d] Figure 4.1: The neuron For neuron input x R n its output y R is computed: y = (x T w + b) (4.1) Neurons can be connected with weighted links, thus creating an artiicial neural network. Deinition 4.2. The artiicial neural network is a pair (N, C), where: N is a set o neurons C N N is a set o oriented connections Layer-wise organization As deined in previous subsection, the artiicial neural network is a collection o neurons which are connected to one another. Primarily, neural networks are also organized into distinct layers: Deinition 4.3. The layer l is a subset o neurons in N: l N Deinition 4.4. The layer-wise organized artiicial neural network with c layers is an artiicial neural network in which: N is a set o neurons C N N is an acyclic set o oriented connections L = {l 0, l 1,..., l c 1 } is a set o neural network layers where l 0 l 1 l c 1 = N i j : l i l j = (n 1, n 2 ) C = i : (n 1 l i n 2 l i+1 ) the irst layer is called the input layer and the last layer is called the output layer 10

14 x 0 a 0 0 w0,0 1 w0,1 1 z0 1 a 1 0 w0,0 2 w0,1 2 z0 2 a 2 0 y 0 x 1 a 0 1 w1,0 1 w1,1 1 w2,0 1 w 1 2,1 z 1 1 a 1 1 w1,0 2 w1,1 2 w2,0 2 z 2 1 a 2 1 y w 2 2,1 Figure 4.2: The layer-wise organization Deinition 4.5. The artiicial neural network with c layers is called a (c 1)- layer neural network. Deinition 4.6. The cardinality o layer l ( l ) is equal to a number o neurons in this layer. The most commonly used layer type is a ully-connected layer, where all neurons rom one layer are connected with every neuron in the adjacent layer (there is no connection between neurons in the same layer). In ig. 4.2, there is a 2-layer neural network example with one hidden layer, one output layer, one input layer, two inputs and two outputs. Remark. In the ollowing derivations, we will treat bias in a special way. The reason is that or a speciic layer bias can be simulated as a new neuron with output 1 and connected with all neurons in the layer. The weights o these connections can be modiied so each connected neuron gets dierent input rom the bias neuron.j. Matas [2015] Forward propagation We will use the notation as suggested in the presentation o pro. J. Matas [2015] and visualized in the ig. 4.2: w l i,j is the weight o the connection between i-th neuron o (l 1)-th layer and j-th neuron o l-th layer zi l = j a l 1 j layer w l j,i is the weighted sum o inputs into the i-th neuron in l-th : R R is an activation unction a l i = (z l i) is the activation o the i-th neuron in the l-th layer Let us assume we need to compute the input zj k o the j-th neuron in the k-th layer: zj k = wi,j k a k 1 i (4.2) i {0,..., l k 1 1} To compute the output a k j o the j-th neuron in the k-th layer, we use the ollowing equation: a k j = ( ) zj k (4.3) 11

15 Algorithm 4.1. (Forward propagation algorithm) Let x R n be the input o c-layer artiicial neural network, where n is equal to L 0, we compute the output y R m, where m is equal to L c, o network as ollows 1. or h = 0 to n 1: 2. a 0 h = x h 3. or = 1 to c: 4. or h = 0 to L 1: 5. compute input z h using eq. (4.2) 6. compute output a h using eq. (4.3) 7. or h = 0 to m 1: 8. y h = a c h 9. return y J. Matas [2015] In ig. 4.3, we show an example o the orward propagation on the same network architecture as in ig. 4.2, the input x = (0.5, 0.3), the activation unction (x) = max(0, x) is ReLU (section 4.1.1) and weights: w 1 0,0 = 0.2, w 1 1,0 = 0.4, w 1 2,0 = 0.6 w 1 0,1 = 0.1, w 1 1,1 = 0.3, w 1 2,1 = 0.5 w 2 0,0 = 0.4, w 2 1,0 = 0.6, w 2 2,0 = 0.8 w 2 0,1 = 0.7, w 2 1,1 = 0.9, w 2 2,1 = 0.1 Now we proceed according to the algorithm Assign the input: a 0 0 = x 0 = 0.5 a 0 1 = x 1 = Compute input and output in the hidden layer: z 1 0 = w 1 0,0 a w 1 1,0 a w 1 2,0 1 = = 0.82 a 1 0 = (z 1 0) = 0.82 z 1 1 = w 1 0,1 a w 1 1,1 a w 1 2,1 1 = = 0.64 a 1 1 = (z 1 1) = Compute input and output in the output layer z 2 0 = w 2 0,0 a w 2 1,0 a w 2 2,0 1 = = a 2 0 = (z 2 0) = z 2 1 = w 2 0,1 a w 2 1,1 a w 2 2,1 1 = = 1.25 a 2 1 = (z 2 1) = Assign the output: y 0 = a 2 0 y 1 = a

16 Backpropagation algorithm Figure 4.3: The orward propagation We will derive the backpropagation algorithm ollowing Makin [2006] and interspace the derivation with the example in ig Every training algorithm or neural networks is trying to change weights o connections in the network so that the predicted outputs or a set o inputs are close to the real ones. This closeness is deined by an error unction E. Deinition 4.7. The input o c-layer artiicial neural network is x R n, where n is equal to L 0. Deinition 4.8. The output o c-layer artiicial neural network is y R m, where m is equal to L c. Deinition 4.9. The target output o c-layer artiicial neural network is t R m, where m is equal to L c. And t is the presumed output or the input x. Deinition The training set T is a set o ordered pairs (x, t), where x is the input and t is the target output. The algorithm iterates over every pair in the training set ((x, t) T ) and processes our consecutive steps: Use the orward propagation or the input x to compute the predicted output y Compute the error E with the predicted output y and the target output t Backpropagate the error signal and compute partial derivatives o parameters based on it Adapt weights Let T be a training set, x be an input and t be a target output, where (x, t) T, o a c-layer artiicial neural network. The predicted output o this neural network y is a result o orward propagation algorithm, which was deined in section 4.1.1, with the input x. We can deine the error unction: E = 1 (y i t i ) 2 (4.4) 2 i 13

17 In this method, the weights are moved in the opposite direction o their derivative: w l i,j = α w l i,j The α parameter is called the learning rate and enables to scale the step size. We can expand the partial derivative with chain rule as ollows (4.5) w l i,j = a l j a l j zj l z l j w l i,j (4.6) In the ollowing derivations, we will use the irst two ractions o the previous equation as a single quantity (error term): δj l = a l j a l j zj l (4.7) We will consider three situations: the computation o the error signal on the output layer the computation o the derivative o weights between the last hidden layer and the output layer the computation o the error signal on a hidden layer and the derivative o weights between other layers The error signal on the last layer In the case l is the output layer, this quantity can be computed as the derivative o 4.4: = (t j a l j) (4.8) a l j as a l j = y j rom the algorithm 4.1. Example: We can now compute the error on the last layer or our example in ig Let us consider, that target output t = (0.3, 0.5): a 2 0 = ( ) = 1.212, a 2 1 = ( ) = 0.75 The derivative o weights between last hidden and output layers We now need to use the eq. (4.6) to compute the derivative o weights. Let us split the computation to make it more clear. We already derived the error signal on the output layer in the eq. (4.8) above: a l j = (t j a l j) As a l j = (zj), l then the derivative o al j zj l unction: a l j zj l = [(z l j)] 14 is only a derivative o the activation

18 Figure 4.4: The error signal on the last layer We know that z l j is the weighted sum o inputs into the j-th neuron o the l-th layer and hence: Now we only combine it together: w l i,j z l j w l i,j = a l 1 i = (t j a l j) [(z l j)] a l 1 i (4.9) In convolutional neural networks is the most common activation unction ReLU unction, which will be explained in more detail in the section Its derivative is equal to 1 i the input is greater than 0 and is equal to 0 otherwise. Example: We can now compute the derivative o weights between the hidden layer and the last layer or our example in ig Consider that we want to compute w 2 0,0 and w 2 0,1 : w 2 0,0 = (1.512) 0.82 = w 2 0,1 = 0.75 (1.25) 0.82 = Let us know divide the last situation into two separate tasks Figure 4.5: The derivative o weights between the hidden layer and the last layer The error signal on a hidden layer 15

19 I we now suppose that layer l is a hidden layer, it is harder to compute. We a l j need to consider, how the error rom a l j was propagated to activation o the next layer l + 1: = a l+1 i zi l+1 a l j i a l+1 i zi l+1 (4.10) a l j The irst two derivatives are the negative error term o the next layer: a l+1 i a l+1 i zi l+1 = δ l+1 i And as z l+1 i = j a l j w l+1 j,i, the last derivative is equal to: z l+1 i a l j = w l+1 j,i We can rewrite eq. (4.10): a l j = i δ l+1 i w l+1 j,i (4.11) Example: Now we compute the error signal on the activation neuron in the hidden layer ig Let us consider we want to compute : a 1 0 a 1 0 = ( (1.512) (1.25) ) = Figure 4.6: Computation o the signal error o one neuron in the hidden layer The derivative o weight between other layers We already know the error signal o the hidden layer rom derivations above: a l j = i δi l+1 wj,i l+1 An we also derived ollowing derivatives above as ollows a l j zj l = [(z l j)] 16

20 Now, we combine everything: w l k,j ( = i z l j w l i,j = a l 1 i (δ l+1 i w l+1 j,i ) ) [(z l j)] a l 1 k (4.12) Example: We can inally count one o the derivative o error with respect to weight between the input layer and the hidden layer. For example in the ig. 4.7, we want to compute w 1 0,0 : w 1 0,0 = ( ( (1.512) (1.25) )) = Figure 4.7: The derivative o one weight between input layer and hidden layer As we can see, especially rom our ig. 4.7, irst we need to compute or each hidden layer all error terms o the adjacent layer. So the algorithm starts rom the output layer and iterates backward over all layers and weights. Ater this step, all weights are updated as in the eq. (4.5). Now we are going to prove that the backpropagation algorithm is linear in number o weights: Let now k be a number o all weights in NN. The orward propagation is done in linear time (O(k)), because this pass iterates over all layers, neurons and weights, while every weight is used only once. The error o the prediction is computed in linear time with respect to a number o neurons in the last layer (O(m)). When backpropagating the error signal and computing partial derivatives based on it, we use every weight once so it is also in linear time (O(k)) with respect to number o weights. Adaptation o weights is the iteration over all weights, thus also linear in number o weights (O(k)). Rectiied linear unit We will use explanations rom Krizhevsky et al. [2012] and Stanord course CS231n [2017a]. The rectiied linear unit computes the (activation) unction 17

21 (x) = max(0, x). The derivative o the ReLU that we need or the backpropagation, is: 1 x > 0 (x) = 0 else It was ound that it considerably accelerates the convergence o the stochastic gradient descent compared to the sigmoid or tanh unctions. And moreover this unction is unlike sigmoid or tanh unctions computationally less demanding Unortunately, ReLU units can be ragile during training. For example, a large gradient lowing through a ReLU neuron could cause the weights to update in such a way that the neuron will never activate on any input again Convolutional Layer This chapter is well described in one o the courses, which are available online by Stanord University, CS231n [2017b]. We will use this source to explain the concept o convolutional neural networks and its parameters and hyper-parameters. To derivate the orward pass and the backpropagation algorithm, we will use two online sources Gibianski [2014] and Kaunah [2016] but keep the notation rom the previous chapter. Motivation Convolutional layers are the main building block o convolutional neural networks. The parameter o this layer is a set o learnable ilters that are not large spatially but extend over all input channels. During the orward pass, we convolve (see proper deinition below) each ilter across the width and the height o the input volume and compute dot products at any position between the ilter entries and the input. When sliding the ilter over the width and the height o the input volume we will produce a 2-dimensional activation map that gives ilter responses at every spatial position. Intuitively, the network makes ilters activate whenever they see any type o visual eature. Each ilter creates one 2D activation map, the number o output channels is then equal to the number o used ilters. CS231n [2017b] In our ollowing derivations we will describe a concept o general d-dimensional convolutional layer, while in the rest o the thesis we will use three-dimensional convolutional layers. Input, parameters, hyperparameters and output Let us suppose we have a d-dimensional convolutional layer that accepts an input volume: E 1 E 2 E d C, where C is the number o channels. The layer uses our hyperparameters: K... the number o kernels F... the spatial size o kernels in all dimensions S... the stride (how many steps is increased region s position in all dimensions) 18

22 P... the amount o zero padding (pad the input volume with zeros around the border) The spatial size o kernels F satisies: F min(e 1,..., E d ) All kernels have the same number o channels C, thus the same as the input. The output has the ollowing volume: E 1 E 2 E d K, where: E i = E i F + 2P S It would be much better i each input region had its own kernel, so taught eature on that speciic position. But this would bring overwhelming need or memory, so we use a parameter sharing concept: The whole input uses K ilters, which are the same or each region. In addition the convolutional layers also uses K biases (or each ilter one). CS231n [2017b] Forward propagation algorithm Deinition Let d be a number o dimensions o ilters in a layer and D = {D 1, D 2,..., D d } be dimensions o ilters and E = {E 1, E 2,..., E d } be dimensions o the output o the previous layer so that satisies i : D i E i and C be a number o channels o ilters and the output. Then let A R E 1 E d C be the output o neurons in the previous layer, W R D 1 D d C be a ilter and b R is a bias o the kernel W. Input to a convolution layer on position p = (p 1,..., p d ) Z p : Z p = b + C 1 c=0 D 1 1 d 1 =0 D d 1 d d =0 + 1 A p1 +d 1,...,p d +d d,c W d1,...,d d,c (4.13) The orward propagation or the output o previous layer A l 1 then results in convolution rom deinition 4.11 on each position with every ilter. As we can see the output o convolution Z is a (d)-dimensional activation map with one channel or each ilter in the layer. Then we can concatenate all activation maps rom all ilters in the layer and call it the input Z l, which has d dimensions with K channels. The output o convolutional layer is then computed as: A l = (Z l ) where the activation unction is applied on each input element. Let us now consider an example o input into a convolutional layer with S = 1, P = 1, K = 1, C = 3 and F = 3. The orward pass can be seen in the ig We can compute e. g. Z 0,0,0 : Z 0,0,0 = =

23 OUTPUT A l 1 : FILTER W INPUT Z l (zero pad) BIAS B Figure 4.8: The convolution in a convolutional layer 20

24 Backpropagation In the previous section, we have introduced the orward pass in covolutional layers, thus now we need to understand how to compute an error signal or the previous layer and update parameters o the layer. We will ollow explanations rom Gibianski [2014] and Stanord [a]. Let us suppose, the adjacent layer was pooling layer or convolutional layer, we have the error unction E and the error signal on this layer (both types that A l we supposed as adjacent layer are routing the error signal to the previous layer, see bellow). Now assume, we want to compute the gradient o one kernel W on a position p = (p 1,..., p d ) (let us consider only one channel case). We will use a chain rule as we already did in the case o backpropagation in the ully connected layer: E 1 D 1 = W p1,...,p d p 1 =0 E d D d p d =0 A l p 1,...,p d A l p Z 1,...,p p l d 1,...,p d Z l p 1,...,p d W p1,...,p d (4.14) Let us now split the computation into three parts or each derivative: From the eq. (4.13) (orward propagation), the derivative o input with respect to kernels is equal to the last layer output: Z l p 1,...,p d W p1,...,p d = A l 1 p 1 +p 1,...,p d+p d The derivative o the output with respect to the input is only the derivative o the activation unction: A l p 1,...,p d Z l p 1,...,p d = [ (Z l p 1,...,p d )] The error signal or this layer was already computed by the adjacent layer thus: A l p 1,...,p d is already known. We can then combine these three parts again: E 1 D 1 E d D d = W p1,...,p d p 1 =0 p d =0 A l p 1,...,p d [ (Z l p 1,...,p d )] A l 1 p 1 +p 1,...,p d+p d (4.15) We will now derive gradient or bias b. This is a little bit easier than in the case o kernels, because bias is only added to the input on each position and is not weighted by anything, thus the derivative o error with respect to bias is: E1 D1 b = p 1 =0 E d D d p d =0 A l p 1,...,p d [ (Z l p 1,...,p d )] (4.16) The last task in order to complete backpropagation algorithm in convolutional layers is to compute the error signal o the previous layer: A l 1 p 1,...,p d = D 1 1 p 1 =0 D d 1 p d =0 D 1 1 p 1 =0 Z l p 1 p 1,...,p d p d D d 1 p d =0 21 Zp l 1 p 1,...,p d p d A l 1 = p 1,...,p d (4.17) W p1,...,p d Z l p 1 p 1,...,p d p d,

25 4.1.3 Pooling Layer We will explain a concept o pooling layers ollowing course rom Stanord University CS231n [2017b]. Motivation This type o a layer is used to progressively reduce the spatial size o the representation and also the number o parameters. Hence it helps to control overitting issue. The layer operates independently in every channel and resizes the spatial size with an operation. The operation is used on every region o the chosen spatial size and the output is used as a representative o this region, pooling can use (e.g.): Max pooling: choose the maximum value out o the region Min pooling: choose the minimum value out o the region Average pooling: compute the average on region L 2 -norm pooling: compute the L 2 -norm on region Input, parameters, hyperparameters and output The 2D pooling layer accepts volume o size: W 1 H 1 C 1, where W 1 is the width, H 1 is the height and C 1 is the number o channels. It needs two hyperparameters: F... the spatial size in both dimensions o the used regions S... the stride (how many steps is increased region position in all dimensions) The output has volume: W 2 H 2 C 2, where: W 2 = W 1 F S + 1 H 2 = H 1 F S + 1 C 2 = C 1 The 3D pooling layer accepts also the depth D 1 ( W 1 H 1 D 1 C 1 ), the output is then W 2 H 2 D 2 C 2, where D 2 = D 1 F + 1. S This layer does not use any parameter. Backpropagation As we mentioned in the motivation part o this subsection, this layer applies some kind o operation on regions o the input and or each region choose a representative with respect to the used operation. That is the orward propagation step, demonstrated in the ig As this type o layer does not use any weight there is nothing to update. But we 22

26 Figure 4.9: The max-pool (2 2) layer orward pass will still want to send the error to the previous layer. We will assume a backpropagation only or Max-pooling layer: The backward pass or a max(x, y) operation has a simple interpretation as only routing the gradient to the input with the highest value in the orward pass. So when we do the orward pass, we need to remember the index in the region with the maximum value and the error signal o the region during the backward pass is passed only on this index. Any other input element in the region has the error signal equal to zero. With this, we computed the error signal or the previous layer A l 1. CS231n [2017b] Local Response Normalization The concept o these types o layers is well described by Joshi [2016]. For the orward pass we will ollow the explanation o Krizhevsky et al. [2012]. Motivation In neurobiology there is a concept called lateral inhibition, which is the capacity o an excited neuron to decrease activity o its neighbors. This creates one signiicant peak - the local maximum. Local response normalization layer does the same in convolutional neural networks architecture. Joshi [2016] Today this type o layer is not so common anymore, because its contribution has been shown to be minimal, i any. Nowadays we have better training algorithms, regularization techniques, or eg. normalized datasets. This all helps the perormance much more than the LRN layers. So we will describe only one o the implementation approaches (krizhievsky) on 2D CNN. Forward pass ReLUs have the desirable property that they do not require any input normalization to prevent them rom saturating. I at least some training examples produce a positive input to a ReLU, some kind o learning will happen in that neuron. However, we still ind that the ollowing local normalization scheme aids generalization. Let a i x,y be the activity o a neuron computed by applying kernel i at position (x, y) and then applying the ReLU nonlinearity, the response-normalized activity b i x,y is given by: b i x,y = a i x,y ( k + α min(n 1,i+ n 2 ) j=max(0,i n 2 ) (aj x,y) 2 ) β (4.18) 23

27 where the sum runs over n adjacent kernel maps at the same spatial position, and N is the total number o kernels in the layer. The ordering o the kernel maps is, o course, arbitrary and determined beore the training begins. This sort o response normalization implements a orm o lateral inhibition inspired by the type ound in real neurons, creating a competition or big activities amongst neuron outputs computed with dierent kernels. Krizhevsky et al. [2012] Fully Connected Layer This layer corresponds to the normal ully connected layer, as described in the section above (Neural networks). 2D or 3D output o the previous layer is linearized and used as input to this layer. The backpropagation algorithm is the same as described above Dropout As was said in Krizhevsky et al. [2012], it would be much better i we had deeper convolutional neural networks, the results o which were combined into one. This approach is very successul in reducing the test errors but appears to be too expensive. There is, however, a very eicient technique that does the same thing and is rather eicient. It is called the dropout. This technique consists o setting to zero the output o each neuron in the layer with a speciied probability. Most common usage is with probability 0.5. The dropped out neurons do not contribute to the orward pass and do not participate in backpropagation at all. So every time an input is presented to out CNN, it chooses one o the architectures that share weights. I we do not use the dropout in our ully connected layer, our network would exhibits substantial overitting. 4.2 Transormation o Model or DCNN The input or a 3D convolutional neural network, as described in the previous subsection, is a three-dimensional matrix (width height depth) with one channel. I we had inormation about eg. RGB color o every point, the input would have three channels. Now let us assume, we have a classic 3D object deined with his vertexes and aces. You can se an example o this type o model in the ig. 4.10a as visualized in meshlab 1. We need to transorm this model so that it corresponds to the expected input o 3D CNN. Deinition A 3D occupancy grid is 3D map o cubes, where each cube is carrying inormation about its occupancy. Let X be a set o 3D points and F be a set o triangular aces rom three points in X. Let F and a, b, c be lengths o its sides. In our work we

28 (a) A model (b) Random points on aces (c) An occupancy grid o voxels Figure 4.10: An example o transormation assume that aces with longer sides are more important or the overall shape o the object, so we irstly choose only those: F = { F a t b t c t} (4.19) where t is a threshold. It would be surely easier to choose the vertices o each selected ace as its representative points, but this approach would be or aces with big area very sparse. Thus, we would like to choose random points on selected aces in F. We will ollow approach in Osada et al. [2002]. Let x, y, z R 3, t be a triangle with vertexes x, y and z and r 1, r 2 U[0, 1]. Then we will choose point p = (1 r 1 )x + ( r 1 (1 r 2 ))y + (r 2 r 1 )z (4.20) Intuitively, r 1 sets the percentage rom vertex A to the opposing edge, while r 2 represents the percentage along that edge. From each ace in F we create k points using eq. (4.20) and add them into the set P. Let min(p i ) be the least element in i-th dimension o all points in P, max(p ) be the greatest number in all dimension o all points in P and x[i], where x P and i {0, 1, 2}, be the element in i-the dimension o point x. We want to create a normalized set o points P. For each point x P, we create a new point y P, where: i {0, 1, 2} : y[i] = x[i] min(p i ) max(p ) We can see an example o P in the ig. 4.10b. With this normalized set o points in P we will create an occupancy grid with size n 3 using ollowing algorithm. 25

29 Algorithm 4.2. Let n be the size o all dimensions o an occupancy grid, P a set o 3D points that we want to register into the grid. The algorithm returns the occupancy grid with size n 3 and one channel: 0. Occ zeros(n, n, n, 1) 1. oreach p P : 2. x round(p[0] (n 1)), y round(p[1] (n 1)), z round(p[2] (n 1)) 3. Occ[x, y, z, 0] 1 4. return Occ The occupancy grid created by the algorithm 4.2 is then used as the input to convolutional neural network. With respect to our example it is visualized in the ig. 4.10c. 4.3 Used CNN Architectures We trained a lot o architectures with dierent parameters but only some o them were able to pass 85 % accuracy on validation dataset o SHREC16. In the ollowing chapter 6, we will consider only the architectures shown in the table 4.1. Our architectures will have the ollowing notation: INPUT(W H D C): an input layer with W H D C output volume CONV(K, F, S): a convolutional layer with K ilters o size F F F and with stride S. POOL(k, method): a pooling layer on region o spatial size k with a speciied method FC(k): a ully connected layer with k neurons Dropout(p): a Dropout, as deined above with probability p OUTPUT(k): an output layer with k neurons In the ig there is an example o a convolutional neural network. It has one convolutional layer with K = 32 ilters o size 5 5 3, stride S = 1, C = 1 and zero padding P = 2. There is also a max-pooling layer with spatial size 2, then another convolutional layer with kernels o spatial size 5 5 but with 32 channels and K = 48 ilter with stride S = 1 and P = 2.And another max-pooling with spatial size 2, its output is latten and goes into two ully connected layers (one with 768 neurons and the other with 256 neurons), where the later is ully connected to the output layer neurons. 4.4 Object retrieval We have described the object retrieval task and its approaches in chapter 3. Now we will introduce our methods, which will be used in chapter 6. 26

30 ID Architecture 0 INP UT ( ) = CONV (8, 3, 2) = CONV (16, 3, 1) = P OOL(2, MAX) = F C(256) = Dropout(0.5) = OUT P UT (55) 1 INP UT ( ) = CONV (8, 5, 2) = CONV (16, 3, 1) = P OOL(2, MAX) = F C(256) = Dropout(0.5) = OUT P UT (55) 2 INP UT ( ) = CONV (16, 5, 2) = CONV (16, 3, 1) = P OOL(2, MAX) = F C(256) = Dropout(0.5) = OUT P UT (55) 3 INP UT ( ) = CONV (8, 5, 1) = P OOL(2, MAX) = CONV (16, 3, 1) = P OOL(2, MAX) = F C(256) = Dropout(0.5) = OUT P UT (55) 4 INP UT ( ) = CONV (16, 5, 1) = P OOL(2, MAX) = CONV (16, 3, 1) = P OOL(2, MAX) = F C(256) = Dropout(0.5) = OUT P UT (55) 5 INP UT ( ) = CONV (16, 5, 1) = P OOL(2, MAX) = CONV (32, 3, 1) = P OOL(2, MAX) = F C(256) = Dropout(0.5) = OUT P UT (55) 6 INP UT ( ) = CONV (16, 7, 1) = P OOL(2, MAX) = CONV (32, 5, 1) = P OOL(2, MAX) = F C(256) = Dropout(0.5) = OUT P UT (55) 7 INP UT ( ) = CONV (16, 5, 1) = P OOL(2, MAX) = CONV (16, 5, 1) = P OOL(2, MAX) = F C(256) = Dropout(0.5) = OUT P UT (55) Table 4.1: Used CNN architectures 27

31 Figure 4.11: The CNN architecture This igure is generated by adapting the code rom gwding/draw_convnet Using Our Classiier Based on good results o our convolutional neural networks on object recognition (see chapter 6), we decided to use our classiier or the shape retrieval. Let us assume we have a database o objects D and we want to compute mean average precision on this database using our classiier. Forward pass through our trained convolutional neural network gives a vector o length equal to the number o classes. Each element o this vector gives the inormation o likelihood or the object o it belongs to this class. So or each class we can create a list o representatives o all objects with respect to their likelihood to be rom this class (ordered descending). We can think about this approach as some kind o hashing unction: Whenever we want to compute average precision o a queried object, we use our CNN to classiy the object and choose the prepared list o the class it was assigned to. For the computation o average precision we use the methods in deinition Using Similarity Search Our other method is a classic similarity search approach. Let us assume that out o is an output o orward pass in CNN or object o, which we can use as a eature vector. Consider, we want to compute average precision or speciic query object q in our database D. For each object o D we compute: DIS(q, o) = L 2 (out q out o ) We can retrieve all objects rom the D (ordered by the similarity unction ascending) and evaluate average precision or query object q on this set. 28

32 5. Learning o DCNN For our explanations o learning approaches or DCNN s weights we will ollow Ruder [2016]. 5.1 Motivation In the previous chapter in eq. (4.5), we have already introduced the way how to update weights to make the prediction o our convolutional neural network closer to the real label. In this chapter we are going to describe more sophisticating methods or updating parameters o our networks. We have chosen to use the algorithm based on a gradient descent or the optimization o our learning, which is by ar the most common in this ield. In most cases, learning rameworks already contains implementations o optimizations, so the approaches seemed to be used only as black-boxes. Because o this we are going to explain them in this chapter in more details to make it easier to understand our experiments in chapter 6. A gradient descent is a way to minimize the error unction E by updating parameters w o a network in the opposite direction o the gradient. The learning w rate α determines the size o the steps necessary to reach a (local) minimum. 5.2 Gradient Descent Optimization Gradient Descent Variants There are three variants o gradient descent algorithm which dier only in amount o data rom the dataset that is given to the network or learning. Batch gradient descent Computes the gradient o the error unction with respect to weights or the entire dataset. As we need to calculate the gradients or the whole dataset to perorm only one update the batch gradient descent can be very slow. This variant does not allow us to update our model online, i. e. with new examples on-the-ly. Stochastic gradient descent This approach perorms in contrast a parameter update or each training example (x, t). The batch gradient descent perorms redundant computations or a large dataset because it recomputes gradients or similar inputs without any update. SGD does not have this redundancy as it update weights each time. It is usually much aster and can be used or online learning. The problem is that SGD can complicate convergence to the exact error unction minimum as it only jumps between a new local minimum. However, it has been shown, that SGD will have the same convergence behavior or lower learning rates as the batch gradient descent method. 29

33 Mini-batch gradient descent The most common approach is somewhere between the previous two methods and perorms an update or every mini-batch o n examples. This way it: reduces the variance o the parameter updates which can lead to more stable convergence can make use o highly optimized matrix optimizations common to stateo-the-art deep learning libraries that make computing the gradient with respect to mini-batch very eicient The common size o a mini-batch is in a range (50, 256) but can vary with respect to our application Gradient Descent Optimization Algorithms In this subsection we are going to outline some o the optimization algorithms that we will use in our experiments in the chapter below. Momentum SGD has trouble with areas where the surace curves in one dimension much more steeply than in others. However, they are very common around local optima. In these scenarios, SGD oscillates across the slopes o the ravine while making only hesitant progress to local optimum. The momentum is a method that helps accelerate SGD in a relevant direction. It does this by adding raction o the update vector o the previous step to the current update vector: v s = γv s 1 + α w w = w v s where s is the current time step and γ is usually set to 0.9 or to a similar value. The momentum term increases or dimensions the gradients o which point in the same directions and reduces updates or dimensions the gradients o which change directions. As a result, we gain a aster convergence and a reduced oscillation. Adagrad The Adagrad is a gradient descent algorithm that only change the learning rate to parameters, thus perorming larger updates or inrequent parameters and smaller updates or requent parameters. Up to now, we have perormed updates generally or all weights, but Adagrad uses dierent learning rates or dierent weights. Let us consider, we have some weight w i and g s,i is the gradient o the error unction with respect to the w i at time step s: g s,i = w i 30

34 The SGD update or every parameter w i at each time step s then becomes: w s+1,i = w s,i αg t,i In its update rule, Adagrad modiies the general learning rate α at each time step s or every parameter w i, based on the previous gradients: w s+1,i = w s,i α Gs,ii + ε g s,i where G s R d d is a diagonal matrix, where each diagonal element i, i is the sum o the squares o the gradients with respect to the w i up to the time step s and ε is a smoothing term that avoids division by zero. Adagrad s main beneit is that it eliminates the need to tune the learning rate manually. On the other hand, there is a problem with squared gradients accumulation in the denominator. Since all added terms are positive, the accumulated sum keeps growing during training. The result is that the learning rate is becoming very small during the training and in the end the network can not learn any additional knowledge. Adadelta The Adadelta is an Adagrad extension, which is capable to reduce its biggest problem with aggressive learning rate decrease. The Adadelta accumulates only k previous gradients, where k is a ixed size. Instead o storing all k previous squared gradients ineiciently, the sum o gradients is computed recursively as the decaying average o all previous squared gradients. Let us consider we have the running average o squared gradients C[g 2 ] s,i on time step i or w i, the current gradient then depends only on the previous average and the current: C[g 2 ] s,i = γc[g 2 ] s 1,i + (1 γ)g 2 s,i where γ can be set to 0.9 or any close value as in case o the momentum above. For clarity, we will now rewrite our update step in vanilla SGD or the parameter: w s,i = αg s,i w s+1,i = w s,i + w s,i We know that the Adadelta learning rate decay is derived rom the Adagrad: α s,i = g s,i Gs, ii + ε So we only replace the diagonal matrix G with the average over previous squared gradients: α s,i = C[g2 ] s,i + ε g s,i 31

35 RMSprop The RMSprop is an unpublished, adaptive learning rate method proposed by Geo Hinton. The RMSprop and the Adadelta were developed independently rom the need to resolve Adagrad s main problem. In act the RMSprop is identical to the irst update o the Adadelta, that we derived above: C[g 2 ] s,i = 0.9 C[g 2 ] s 1,i g 2 s,i α s,i = C[g2 ] s,i + ε g s,i The RMSprop also divides the learning rate by an exponentially decaying average o squared gradients Adam The Adam (ADAptive Moment estimation) is another method that computes adaptive learning rates or each parameter. Besides storing exponentially decaying average o previous squared gradients, it also keeps an exponentially decaying average o past gradients, similar to the momentum. Now, let: m s,i = β 1 m s 1,i + (1 β 1 )g s,i v s,i = β 2 v s 1,i + (1 β 2 )g 2 s,i m s,i estimates the irst moment (mean) and v s,i estimates the second moment (variance) o the gradients. As all m s,i and v s,i are initialized to 0, according to authors o this method it is biased towards zero, especially during the initial step. They counteract these biases by computing the irst and the second bias-corrected moment: ˆm s,i = m s,i 1 β1 i ˆv s,i = 1 β2 i Then they use these to update parameters just as we have seen in the Adadelta and the RMSprop: α w s+1,i = w s,i ˆv s,i + ε ˆm s,i where the deault values are: β 1 = 0.9, β 2 = and ε = v s,i 32

36 6. Experiments All the mentioned experiments were done on the personal computer DELL Inspirion 15 on Windows 10, with the ollowing components: Intel Core i7 6700HQ 16 GB RAM NVIDIA GeForce GTX 960M Our program is using ramework tensorlow and is written in the Python programming language. Architectures and their numbering correspond to table 4.1 in section 4.3. All used datasets (already described in the chapter 2) where preprocessed according to the section 4.2. At irst we choose rom each model aces with suiciently long sides ollowing eq. (4.19), where we use t with respect to the dataset: SHREC16: t = 0 SHREC15: t = ModelNet10: # aces < t = 5 else where #aces is the number o all aces o the object. In these aces, we choose k points according to the eq. (4.20). The constant k also diers depending on the used dataset: SHREC16: SHREC15: ModelNet10: 100 S > k = 10 S > else 100 S > k = 10 S > else { 450 S k = > 450 S else where S is the area o ace. The parameters need to be chosen or each dataset respectively, because o dierent vertices normalization. All points created rom aces are then normalized and inally the occupancy grid is created with respect to them as described in the algorithm 4.2. Since it would be too computationally expensive to create the occupancy grid again and again or each model in every learning epoch, we decided to do the preprocessing part only once. The occupancy grid was then saved as a our-dimensional ( ) 33

37 numpy array saved into the binary ile. Reading the numpy binary ile is much aster than reading standard.obj or.o ormat (in our case even more than 10 times aster or one model reading). For ModelNet10 and SHREC16, we provide conusion matrix which is represented as a picture. Each row o pixels in the picture gives an inormation about distribution o guesses or speciic class. In other words i all guesses are right, then in the picture would be only a diagonal line rom (0, 0) to (#classes 1, #classes 1). 6.1 Object Recognition and Retrieval (SHREC16) Method As we presumed in the chapter 2, this dataset consists o around 51 thousands o models rom 55 distinct classes and is used in a shape retrieval competition. All models were irstly preprocessed as discussed in the section above. We then trained several CNN architectures according to the table 4.1 to recognize models in the validation and the test part o the dataset. During our training we used the validation dataset to remember the best model. Ater each epoch we perormed an accuracy computation on the validation dataset and i this model seemed to be by ar the best, we saved it. The computation o test data accuracy is then perormed with the best model on the validation dataset. Every network was trained or 50 epochs. Ater that we were able to perorm a similarity search as it was described in the section Results and Discussion Object recognition The results o all selected architectures with additional parameters are in the table 6.1. The accuracy evaluation on the test dataset was done with a model rom epoch having the best validation accuracy. On the other hand accuracy on the validation dataset in our table was computed on a model rom the last epoch in the training. In the beginnings o our approaches we were trying to train networks with a batch size only 10. The architecture had two convolutional layers but a pooling layer ollows only ater the second one. In this approach we can see the problem, which was described in chapter 5. The convolutional neural network weights were updated too oten and did big steps to the local minimum or small batch. The convergence to the real minimum was in this case very slow or probably impossible. In that case, the experiments to higher our number o hyperparameters (number o ilters and their spatial size) could not help, because it would only lead to a bigger overitting issue. Because o this problem we decided to change our approach and higher the number o objects in the batch. For the batch size 50 results were much better, as we 34

38 Arch. Batch Learning Optimizer Acc. on Acc. on Id size rate val. data test data Adam % % Adam % % Adam % % Adam % % Adam % % Adam % % Adam % % Adam % % Adam % % Adam 85.3 % % Adam 85.6 % % Table 6.1: Results o object recognition on SHREC16 can see in the table 6.1. We can also ind, that there were much better architectures with a bigger spatial size o ilters on the test dataset. As a higher size o the batch did not help much anymore we added another pooling layer next to the irst convolution layer (architectures: 3, 4, 5). As we can see in the table 6.1, these architectures were able to pass 85% accuracy on validation and with a higher batch size even more, than 74% on the test dataset. The last row o this table also describes that higher spatial size and number o ilters would not help. It would only led to bigger overitting issue. 0 airplane 1 trash can 2 bag 3 basket 4 bathtub 5 bed 6 bench 7 birdhouse 8 bookshel 9 bottle 10 bowl 11 bus 12 cabinet 13 camera 14 can 15 cap 16 car 17 cellphone 18 chair 19 clock 20 keyboard 21 dishwasher 22 display 23 earphone 24 aucet 25 ile 26 guitar 27 helmet 28 jar 29 knie 30 lamp 31 laptop 32 speaker 33 mailbox 34 microphone 35 microwave 36 motorcycle 37 mug 38 piano 39 pillow 40 pistol 41 pot 42 printer 43 remote control 44 rile 45 rocket 46 skateboard 47 soa 48 stove 49 table 50 telephone 51 tower 52 train 53 vessel 54 washer Table 6.2: Labels o object in SHREC16 In the ig. 6.1 there are two conusion matrices on SHREC16 dataset. We created matrices on those two architectures which had best results on the validation and the test dataset as described above with respect to table 6.1. You can ind the labels o dataset groups in the table 6.2. E. g. our approach was very good in recognizing airplanes, guitars, riles or motorcycles. It was mainly because their shape is very dierent rom other objects, but also because they have a lots o representatives in both splits o dataset. On the other hand, other objects can be oten mistakenly labeled as one o representatives o these big groups. It is easy to see that on validation dataset we did much better, but on both datasets we had problems to distinguish or example a microphone rom a lamp, this problem is understandable, because shapes o these to objects are very similar. But it also 35

39 (a) Validation dataset (b) Test dataset Figure 6.1: Conusion matrices on SHREC16 is not so good to recognize birdhouses, because it does not have enough examples in the training database (only 73 in all three parts o SHREC16 dataset), the same issue is probably with cameras, or cellphones (in the test dataset is only one example). Object retrieval Arch. Batch Learning Optimizer MAP on MAP on Id size rate val. data test data Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Adam Table 6.3: Results o object retrieval on SHREC16 The results o MAP on SHREC16 with architectures already seen above are in table 6.3. We tested the shape retrieval as it was described in section We can say that the results are correlating with those in object recognition. The three best architectures on test split are still the best on this part o dataset, but 36

40 the results are not as good as we would predict rom our method. The problem is probably visualized in the conusion matrix ig. 6.1b. The smaller classes are oten mistaken or the bigger ones and so our method, which is based on good object recognition, does not work with the test split as well as with the validation split, where was much better accuracy. The SHREC16 competition was using some dierent type o MAP evaluation on only 1000 retrieved objects. But our method still does not seem to be able to compete with the best methods yet, although as we can see in chapter 7 there is a big room or improvement o our approach. 6.2 Object Retrieval (SHREC15) Method We decide to try our taught CNNs also on dierent dataset to ind out i they will still be able to retrieve objects correctly. The learned eatures, which we are extracting rom objects in one dataset should be ound even in case o other objects, as it was shown with 2D CNNs. We cannot use the same approach as deined in the section or the classic shape retrieval. The task o the competition is to provide a distance matrix where or each query object are returned their distances rom all other objects in the database. The distance matrix is then a matrix F R q,p, where q is the number o labeled objects and p is the number o all other objects in the database (with over distractors and 229 labeled objects) and the element F i,j is the distance o the i-th query object rom the j-th object in the database. We will use similarity search deined in section For each object in database, we do the orward propagation in our selected convolutional neural network (trained on SHREC16, see above). The output o this CNN is a vector with 55 elements. This vector is then normalized and used as a eature vector o the object. Let us consider that the vector i is the normalized eature vector o i-th object in the database. As it was deined in the chapter 3, i we would like to compute the element F i,j we ollow: F i,j = L 2 ( i j ) Results and Discussion For our experiment, we used the best architectures learned on SHREC16, results can be seen in table 6.4. For evaluation o our distance matrix, we used code provided by organizers o SHREC15. This evaluator was also able to print our precision recall unction values, which is plotted in ig The result o competition are published in Godil et al. [2015] (in the igure 4). As we can see, our approach is even better than any other method rom the competition in inding the nearest neighbor (NN). Our average precision is also better than most o the approaches, thus we can say that the learned eatures on the completely dierent dataset were also ound in here and objects were retrieved accurately (when we consider there were so many distractors). 37

41 Figure 6.2: The precision-recall graph on SHREC15 Arch. Batch Learning Optimizer MAP NN Id size rate Adam Adam Adam Adam Table 6.4: Results o the object retrieval on the SHREC15 using architectures learned on the SHREC Object Recognition and Retrieval (ModelNet10) Method As we mentioned in the chapter 2, this dataset consists o around 5 thousands o models rom 10 classes. All models were irstly preprocessed as we already discussed above in the section. We then trained several cnn s architectures according to the table 4.1 to recognize models in the test dataset, since the validation is not available. This dataset has only the test split, so we did not used it as with the previous dataset and only trained it or a speciic number o epochs. On the model rom the last epoch we evaluated the accuracy on the test split. Ater that, we were able to perorm the similarity search as it was described in section using our trained classiier. 38

42 6.3.2 Results and Discussion Object recognition In the table 6.5 there are results o our convolutional neural networks on the ModelNet10 dataset. We can see that in this dataset was much better to use RMSProp optimizer. It was also good to use a lower learning rate and train it or a lower number o epochs. In the pictures ig. 6.3 there are visualized graphs o the accuracy (on the y axis) Arch. #epoch Batch Learning Optimizer Acc. on Id size rate test data Adam % RMSProp % RMSProp % RMSProp % Adam % RMSProp % RMSProp 90.3 % RMSProp 90.4 % RMSProp 90.5 % RMSProp 88.1 % Table 6.5: Results o object recognition on ModelNet10 ater each epoch (on the x axis) and there is also visible the problem with all architectures. Even with the smaller learning rate, the process o learning is not smooth much. The accuracy is oscillating around 89%. The problem is probably caused by the size o the dataset. It is so small and imbalanced that bigger groups have much bigger impact on the learning and move the gradients aster in their direction. We also created conusion matrix or ModelNet with the best trained (a) First our architectures (b) Last six architectures Figure 6.3: The accuracy in each epoch architecture (in the table 6.5). The labels are described in the table 6.6 and the conusion matrix is in the ig The model was in most cases right but it did oten mistake in recognizing between a desk and a table, or between a night stand and a dresser, but both cases are understandable. We can say that our approach is very good in recognizing every group, but the 39

43 results are not so good as in other approaches in the ModelNet competition. Even though we missed only 6.9% on the irst place we were sixth rom the last place (eleventh). 0 bathtub 1 bed 2 chair 3 desk 4 dresser 5 monitor 6 night stand 7 soa 8 table 9 toilet Table 6.6: The ModelNet10 labels Figure 6.4: The conusion matrix on ModelNet10 Object retrieval In the table 6.7 there are results o the object retrieval. The results are correlating with those in the object recognition, since the three architectures beore the last one are still the best. It also shows that our object retrieval using learned classiier is a very eicient approach on this dataset, as it has even better results than our recognition approach. Our results are much better than those in the competition, but best approaches did not provide their MAP. 40

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Fuzzy Set Theory in Computer Vision: Example 3, Part II

Fuzzy Set Theory in Computer Vision: Example 3, Part II Fuzzy Set Theory in Computer Vision: Example 3, Part II Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Resource; CS231n: Convolutional Neural Networks for Visual Recognition https://github.com/tuanavu/stanford-

More information

MAPI Computer Vision. Multiple View Geometry

MAPI Computer Vision. Multiple View Geometry MAPI Computer Vision Multiple View Geometry Geometry o Multiple Views 2- and 3- view geometry p p Kpˆ [ K R t]p Geometry o Multiple Views 2- and 3- view geometry Epipolar Geometry The epipolar geometry

More information

Neighbourhood Operations

Neighbourhood Operations Neighbourhood Operations Neighbourhood operations simply operate on a larger neighbourhood o piels than point operations Origin Neighbourhoods are mostly a rectangle around a central piel Any size rectangle

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

INTRODUCTION TO DEEP LEARNING

INTRODUCTION TO DEEP LEARNING INTRODUCTION TO DEEP LEARNING CONTENTS Introduction to deep learning Contents 1. Examples 2. Machine learning 3. Neural networks 4. Deep learning 5. Convolutional neural networks 6. Conclusion 7. Additional

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM

MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM UDC 681.3.06 MATRIX ALGORITHM OF SOLVING GRAPH CUTTING PROBLEM V.K. Pogrebnoy TPU Institute «Cybernetic centre» E-mail: vk@ad.cctpu.edu.ru Matrix algorithm o solving graph cutting problem has been suggested.

More information

Abalone Age Prediction using Artificial Neural Network

Abalone Age Prediction using Artificial Neural Network IOSR Journal o Computer Engineering (IOSR-JCE) e-issn: 2278-066,p-ISSN: 2278-8727, Volume 8, Issue 5, Ver. II (Sept - Oct. 206), PP 34-38 www.iosrjournals.org Abalone Age Prediction using Artiicial Neural

More information

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper

Deep Convolutional Neural Networks. Nov. 20th, 2015 Bruce Draper Deep Convolutional Neural Networks Nov. 20th, 2015 Bruce Draper Background: Fully-connected single layer neural networks Feed-forward classification Trained through back-propagation Example Computer Vision

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules Report Explainable Machine Learning Dynamic Routing Between Capsules Author: Michael Dorkenwald Supervisor: Dr. Ullrich Köthe 28. Juni 2018 Inhaltsverzeichnis 1 Introduction 2 2 Motivation 2 3 CapusleNet

More information

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Intro to Deep Learning Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Why this class? Deep Features Have been able to harness the big data in the most efficient and effective

More information

Deep neural networks II

Deep neural networks II Deep neural networks II May 31 st, 2018 Yong Jae Lee UC Davis Many slides from Rob Fergus, Svetlana Lazebnik, Jia-Bin Huang, Derek Hoiem, Adriana Kovashka, Why (convolutional) neural networks? State of

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

A Proposed Approach for Solving Rough Bi-Level. Programming Problems by Genetic Algorithm

A Proposed Approach for Solving Rough Bi-Level. Programming Problems by Genetic Algorithm Int J Contemp Math Sciences, Vol 6, 0, no 0, 45 465 A Proposed Approach or Solving Rough Bi-Level Programming Problems by Genetic Algorithm M S Osman Department o Basic Science, Higher Technological Institute

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

Visual object classification by sparse convolutional neural networks

Visual object classification by sparse convolutional neural networks Visual object classification by sparse convolutional neural networks Alexander Gepperth 1 1- Ruhr-Universität Bochum - Institute for Neural Dynamics Universitätsstraße 150, 44801 Bochum - Germany Abstract.

More information

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class

More information

Geometric Registration for Deformable Shapes 2.2 Deformable Registration

Geometric Registration for Deformable Shapes 2.2 Deformable Registration Geometric Registration or Deormable Shapes 2.2 Deormable Registration Variational Model Deormable ICP Variational Model What is deormable shape matching? Example? What are the Correspondences? Eurographics

More information

ROBUST FACE DETECTION UNDER CHALLENGES OF ROTATION, POSE AND OCCLUSION

ROBUST FACE DETECTION UNDER CHALLENGES OF ROTATION, POSE AND OCCLUSION ROBUST FACE DETECTION UNDER CHALLENGES OF ROTATION, POSE AND OCCLUSION Phuong-Trinh Pham-Ngoc, Quang-Linh Huynh Department o Biomedical Engineering, Faculty o Applied Science, Hochiminh University o Technology,

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Neural Network and Deep Learning Early history of deep learning Deep learning dates back to 1940s: known as cybernetics in the 1940s-60s, connectionism in the 1980s-90s, and under the current name starting

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

CSE 559A: Computer Vision

CSE 559A: Computer Vision CSE 559A: Computer Vision Fall 2018: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao Xia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Jakob Verbeek 2017-2018 Biological motivation Neuron is basic computational unit of the brain about 10^11 neurons in human brain Simplified neuron model as linear threshold

More information

Report: Privacy-Preserving Classification on Deep Neural Network

Report: Privacy-Preserving Classification on Deep Neural Network Report: Privacy-Preserving Classification on Deep Neural Network Janno Veeorg Supervised by Helger Lipmaa and Raul Vicente Zafra May 25, 2017 1 Introduction In this report we consider following task: how

More information

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com

More information

Chapter 3 Image Enhancement in the Spatial Domain

Chapter 3 Image Enhancement in the Spatial Domain Chapter 3 Image Enhancement in the Spatial Domain Yinghua He School o Computer Science and Technology Tianjin University Image enhancement approaches Spatial domain image plane itsel Spatial domain methods

More information

Piecewise polynomial interpolation

Piecewise polynomial interpolation Chapter 2 Piecewise polynomial interpolation In ection.6., and in Lab, we learned that it is not a good idea to interpolate unctions by a highorder polynomials at equally spaced points. However, it transpires

More information

Machine Learning. MGS Lecture 3: Deep Learning

Machine Learning. MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ Machine Learning MGS Lecture 3: Deep Learning Dr Michel F. Valstar http://cs.nott.ac.uk/~mfv/ WHAT IS DEEP LEARNING? Shallow network: Only one hidden layer

More information

Binary recursion. Unate functions. If a cover C(f) is unate in xj, x, then f is unate in xj. x

Binary recursion. Unate functions. If a cover C(f) is unate in xj, x, then f is unate in xj. x Binary recursion Unate unctions! Theorem I a cover C() is unate in,, then is unate in.! Theorem I is unate in,, then every prime implicant o is unate in. Why are unate unctions so special?! Special Boolean

More information

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

9.8 Graphing Rational Functions

9.8 Graphing Rational Functions 9. Graphing Rational Functions Lets begin with a deinition. Deinition: Rational Function A rational unction is a unction o the orm P where P and Q are polynomials. Q An eample o a simple rational unction

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa

Lecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural

More information

The Mathematics Behind Neural Networks

The Mathematics Behind Neural Networks The Mathematics Behind Neural Networks Pattern Recognition and Machine Learning by Christopher M. Bishop Student: Shivam Agrawal Mentor: Nathaniel Monson Courtesy of xkcd.com The Black Box Training the

More information

Classifier Evasion: Models and Open Problems

Classifier Evasion: Models and Open Problems Classiier Evasion: Models and Open Problems Blaine Nelson 1, Benjamin I. P. Rubinstein 2, Ling Huang 3, Anthony D. Joseph 1,3, and J. D. Tygar 1 1 UC Berkeley 2 Microsot Research 3 Intel Labs Berkeley

More information

Accelerating Convolutional Neural Nets. Yunming Zhang

Accelerating Convolutional Neural Nets. Yunming Zhang Accelerating Convolutional Neural Nets Yunming Zhang Focus Convolutional Neural Nets is the state of the art in classifying the images The models take days to train Difficult for the programmers to tune

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise noted, all material posted for this course

More information

Scanning Real World Objects without Worries 3D Reconstruction

Scanning Real World Objects without Worries 3D Reconstruction Scanning Real World Objects without Worries 3D Reconstruction 1. Overview Feng Li 308262 Kuan Tian 308263 This document is written for the 3D reconstruction part in the course Scanning real world objects

More information

A Cylindrical Surface Model to Rectify the Bound Document Image

A Cylindrical Surface Model to Rectify the Bound Document Image A Cylindrical Surace Model to Rectiy the Bound Document Image Huaigu Cao, Xiaoqing Ding, Changsong Liu Department o Electronic Engineering, Tsinghua University State Key Laboratory o Intelligent Technology

More information

Face Recognition A Deep Learning Approach

Face Recognition A Deep Learning Approach Face Recognition A Deep Learning Approach Lihi Shiloh Tal Perl Deep Learning Seminar 2 Outline What about Cat recognition? Classical face recognition Modern face recognition DeepFace FaceNet Comparison

More information

Using a Projected Subgradient Method to Solve a Constrained Optimization Problem for Separating an Arbitrary Set of Points into Uniform Segments

Using a Projected Subgradient Method to Solve a Constrained Optimization Problem for Separating an Arbitrary Set of Points into Uniform Segments Using a Projected Subgradient Method to Solve a Constrained Optimization Problem or Separating an Arbitrary Set o Points into Uniorm Segments Michael Johnson May 31, 2011 1 Background Inormation The Airborne

More information

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group 1D Input, 1D Output target input 2 2D Input, 1D Output: Data Distribution Complexity Imagine many dimensions (data occupies

More information

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center

Machine Learning With Python. Bin Chen Nov. 7, 2017 Research Computing Center Machine Learning With Python Bin Chen Nov. 7, 2017 Research Computing Center Outline Introduction to Machine Learning (ML) Introduction to Neural Network (NN) Introduction to Deep Learning NN Introduction

More information

POINT CLOUD DEEP LEARNING

POINT CLOUD DEEP LEARNING POINT CLOUD DEEP LEARNING Innfarn Yoo, 3/29/28 / 57 Introduction AGENDA Previous Work Method Result Conclusion 2 / 57 INTRODUCTION 3 / 57 2D OBJECT CLASSIFICATION Deep Learning for 2D Object Classification

More information

Data Mining. Neural Networks

Data Mining. Neural Networks Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

291 Programming Assignment #3

291 Programming Assignment #3 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES Deep Learning Practical introduction with Keras Chapter 3 27/05/2018 Neuron A neural network is formed by neurons connected to each other; in turn, each connection of one neural network is associated

More information

Discrete Optimization. Lecture Notes 2

Discrete Optimization. Lecture Notes 2 Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The

More information

CNN Basics. Chongruo Wu

CNN Basics. Chongruo Wu CNN Basics Chongruo Wu Overview 1. 2. 3. Forward: compute the output of each layer Back propagation: compute gradient Updating: update the parameters with computed gradient Agenda 1. Forward Conv, Fully

More information

Su et al. Shape Descriptors - III

Su et al. Shape Descriptors - III Su et al. Shape Descriptors - III Siddhartha Chaudhuri http://www.cse.iitb.ac.in/~cs749 Funkhouser; Feng, Liu, Gong Recap Global A shape descriptor is a set of numbers that describes a shape in a way that

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space Sikai Zhong February 14, 2018 COMPUTER SCIENCE Table of contents 1. PointNet 2. PointNet++ 3. Experiments 1 PointNet Property

More information

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units

COMP9444 Neural Networks and Deep Learning 5. Geometry of Hidden Units COMP9 8s Geometry of Hidden Units COMP9 Neural Networks and Deep Learning 5. Geometry of Hidden Units Outline Geometry of Hidden Unit Activations Limitations of -layer networks Alternative transfer functions

More information

CS485/685 Computer Vision Spring 2012 Dr. George Bebis Programming Assignment 2 Due Date: 3/27/2012

CS485/685 Computer Vision Spring 2012 Dr. George Bebis Programming Assignment 2 Due Date: 3/27/2012 CS8/68 Computer Vision Spring 0 Dr. George Bebis Programming Assignment Due Date: /7/0 In this assignment, you will implement an algorithm or normalizing ace image using SVD. Face normalization is a required

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

Larger K-maps. So far we have only discussed 2 and 3-variable K-maps. We can now create a 4-variable map in the

Larger K-maps. So far we have only discussed 2 and 3-variable K-maps. We can now create a 4-variable map in the EET 3 Chapter 3 7/3/2 PAGE - 23 Larger K-maps The -variable K-map So ar we have only discussed 2 and 3-variable K-maps. We can now create a -variable map in the same way that we created the 3-variable

More information

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why?

Deep Learning. Deep Learning provided breakthrough results in speech recognition and image classification. Why? Data Mining Deep Learning Deep Learning provided breakthrough results in speech recognition and image classification. Why? Because Speech recognition and image classification are two basic examples of

More information

Logical Rhythm - Class 3. August 27, 2018

Logical Rhythm - Class 3. August 27, 2018 Logical Rhythm - Class 3 August 27, 2018 In this Class Neural Networks (Intro To Deep Learning) Decision Trees Ensemble Methods(Random Forest) Hyperparameter Optimisation and Bias Variance Tradeoff Biological

More information

Analysis of high dimensional data via Topology. Louis Xiang. Oak Ridge National Laboratory. Oak Ridge, Tennessee

Analysis of high dimensional data via Topology. Louis Xiang. Oak Ridge National Laboratory. Oak Ridge, Tennessee Analysis of high dimensional data via Topology Louis Xiang Oak Ridge National Laboratory Oak Ridge, Tennessee Contents Abstract iii 1 Overview 1 2 Data Set 1 3 Simplicial Complex 5 4 Computation of homology

More information

Template Matching Rigid Motion

Template Matching Rigid Motion Template Matching Rigid Motion Find transformation to align two images. Focus on geometric features (not so much interesting with intensity images) Emphasis on tricks to make this efficient. Problem Definition

More information

arxiv: v1 [cs.cv] 28 Nov 2018

arxiv: v1 [cs.cv] 28 Nov 2018 MeshNet: Mesh Neural Network for 3D Shape Representation Yutong Feng, 1 Yifan Feng, 2 Haoxuan You, 1 Xibin Zhao 1, Yue Gao 1 1 BNRist, KLISS, School of Software, Tsinghua University, China. 2 School of

More information

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016 CS 4510/9010 Applied Machine Learning 1 Neural Nets Paula Matuszek Fall 2016 Neural Nets, the very short version 2 A neural net consists of layers of nodes, or neurons, each of which has an activation

More information

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology

What is Clustering? Clustering. Characterizing Cluster Methods. Clusters. Cluster Validity. Basic Clustering Methodology Clustering Unsupervised learning Generating classes Distance/similarity measures Agglomerative methods Divisive methods Data Clustering 1 What is Clustering? Form o unsupervised learning - no inormation

More information

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 Asymptotics, Recurrence and Basic Algorithms 1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 1. O(logn) 2. O(n) 3. O(nlogn) 4. O(n 2 ) 5. O(2 n ) 2. [1 pt] What is the solution

More information

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD. Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,

More information

Convolutional Neural Network for Image Classification

Convolutional Neural Network for Image Classification Convolutional Neural Network for Image Classification Chen Wang Johns Hopkins University Baltimore, MD 21218, USA cwang107@jhu.edu Yang Xi Johns Hopkins University Baltimore, MD 21218, USA yxi5@jhu.edu

More information

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture

More information

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Training Neural Networks II Connelly Barnes Overview Preprocessing Initialization Vanishing/exploding gradients problem Batch normalization Dropout Additional

More information

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 In this lecture, we describe a very general problem called linear programming

More information

Automated Modelization of Dynamic Systems

Automated Modelization of Dynamic Systems Automated Modelization o Dynamic Systems Ivan Perl, Aleksandr Penskoi ITMO University Saint-Petersburg, Russia ivan.perl, aleksandr.penskoi@corp.imo.ru Abstract Nowadays, dierent kinds o modelling settled

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

Convolutional Neural Networks and Supervised Learning

Convolutional Neural Networks and Supervised Learning Convolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 Outline Convolutional Architectures Convolutional neural networks Training Loss Optimization Regularization Hyperparameter

More information

A MULTI-LEVEL IMAGE DESCRIPTION MODEL SCHEME BASED ON DIGITAL TOPOLOGY

A MULTI-LEVEL IMAGE DESCRIPTION MODEL SCHEME BASED ON DIGITAL TOPOLOGY In: Stilla U et al (Eds) PIA7. International Archives o Photogrammetry, Remote Sensing and Spatial Inormation Sciences, 36 (3/W49B) A MULTI-LEVEL IMAGE DESCRIPTION MODEL SCHEME BASED ON DIGITAL TOPOLOG

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

Research on Image Splicing Based on Weighted POISSON Fusion

Research on Image Splicing Based on Weighted POISSON Fusion Research on Image Splicing Based on Weighted POISSO Fusion Dan Li, Ling Yuan*, Song Hu, Zeqi Wang School o Computer Science & Technology HuaZhong University o Science & Technology Wuhan, 430074, China

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Scalable Test Problems for Evolutionary Multi-Objective Optimization

Scalable Test Problems for Evolutionary Multi-Objective Optimization Scalable Test Problems or Evolutionary Multi-Objective Optimization Kalyanmoy Deb Kanpur Genetic Algorithms Laboratory Indian Institute o Technology Kanpur PIN 8 6, India deb@iitk.ac.in Lothar Thiele,

More information

Using Machine Learning for Classification of Cancer Cells

Using Machine Learning for Classification of Cancer Cells Using Machine Learning for Classification of Cancer Cells Camille Biscarrat University of California, Berkeley I Introduction Cell screening is a commonly used technique in the development of new drugs.

More information

Lecture 37: ConvNets (Cont d) and Training

Lecture 37: ConvNets (Cont d) and Training Lecture 37: ConvNets (Cont d) and Training CS 4670/5670 Sean Bell [http://bbabenko.tumblr.com/post/83319141207/convolutional-learnings-things-i-learned-by] (Unrelated) Dog vs Food [Karen Zack, @teenybiscuit]

More information

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details

More information

Convolutional Neural Networks

Convolutional Neural Networks NPFL114, Lecture 4 Convolutional Neural Networks Milan Straka March 25, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise

More information

Binary Morphological Model in Refining Local Fitting Active Contour in Segmenting Weak/Missing Edges

Binary Morphological Model in Refining Local Fitting Active Contour in Segmenting Weak/Missing Edges 0 International Conerence on Advanced Computer Science Applications and Technologies Binary Morphological Model in Reining Local Fitting Active Contour in Segmenting Weak/Missing Edges Norshaliza Kamaruddin,

More information

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning, A Acquisition function, 298, 301 Adam optimizer, 175 178 Anaconda navigator conda command, 3 Create button, 5 download and install, 1 installing packages, 8 Jupyter Notebook, 11 13 left navigation pane,

More information

Template Matching Rigid Motion. Find transformation to align two images. Focus on geometric features

Template Matching Rigid Motion. Find transformation to align two images. Focus on geometric features Template Matching Rigid Motion Find transformation to align two images. Focus on geometric features (not so much interesting with intensity images) Emphasis on tricks to make this efficient. Problem Definition

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

3-D TERRAIN RECONSTRUCTION WITH AERIAL PHOTOGRAPHY

3-D TERRAIN RECONSTRUCTION WITH AERIAL PHOTOGRAPHY 3-D TERRAIN RECONSTRUCTION WITH AERIAL PHOTOGRAPHY Bin-Yih Juang ( 莊斌鎰 ) 1, and Chiou-Shann Fuh ( 傅楸善 ) 3 1 Ph. D candidate o Dept. o Mechanical Engineering National Taiwan University, Taipei, Taiwan Instructor

More information

Logistic Regression and Gradient Ascent

Logistic Regression and Gradient Ascent Logistic Regression and Gradient Ascent CS 349-02 (Machine Learning) April 0, 207 The perceptron algorithm has a couple of issues: () the predictions have no probabilistic interpretation or confidence

More information

Classifier Evasion: Models and Open Problems

Classifier Evasion: Models and Open Problems . In Privacy and Security Issues in Data Mining and Machine Learning, eds. C. Dimitrakakis, et al. Springer, July 2011, pp. 92-98. Classiier Evasion: Models and Open Problems Blaine Nelson 1, Benjamin

More information

Road Sign Analysis Using Multisensory Data

Road Sign Analysis Using Multisensory Data Road Sign Analysis Using Multisensory Data R.J. López-Sastre, S. Lauente-Arroyo, P. Gil-Jiménez, P. Siegmann, and S. Maldonado-Bascón University o Alcalá, Department o Signal Theory and Communications

More information

Matrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation

Matrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation Chapter 7 Introduction to Matrices This chapter introduces the theory and application of matrices. It is divided into two main sections. Section 7.1 discusses some of the basic properties and operations

More information