D-Pruner: Filter-Based Pruning Method for Deep Convolutional Neural Network

Size: px
Start display at page:

Download "D-Pruner: Filter-Based Pruning Method for Deep Convolutional Neural Network"

Transcription

1 D-Pruner: Filter-Based Pruning Method for Deep Convolutional Neural Network Loc N. Huynh, Youngki Lee, Rajesh Krishna Balan Singapore Management University {nlhuynh.2014, youngkilee, ABSTRACT The emergence of augmented reality devices such as Google Glass and Microsoft Hololens has opened up a new class of vision sensing applications. Those applications often require the ability to continuously capture and analyze contextual information from video streams. They often adopt various deep learning algorithms such as convolutional neural networks (CNN) to achieve high recognition accuracy while facing severe challenges to run computationally intensive deep learning algorithms on resource-constrained mobile devices. In this paper, we propose and explore a new class of compression technique called D-Pruner to efficiently prune redundant parameters within a CNN model to run the model efficiently on mobile devices. D-Pruner removes redundancy by embedding a small additional network. This network evaluates the importance of filters and removes them during the fine-tuning phase to efficiently reduce the size of the model while maintaining the accuracy of the original model. We evaluated D-Pruner on various datasets such as CIFAR-10 and CIFAR-100 and showed that D-Pruner could reduce a significant amount of parameters up to 4.4 times on many existing models while maintaining accuracy drop less than 1%. Keywords Continuous Vision, Deep Learning, Compression; 1. INTRODUCTION The appearance of augmented reality devices such as Google Glass and Microsoft Hololens has been opening up various new vision sensing applications. The core function of these applications is to continuously capture contexts of users and surroundings from streaming video data and enable situational interactions with users. For example, a virtual assistant system for dementia patients identifies objects and people near to the patient and provide the patient with the intelligent guidance in real-time [2]. Recently, deep learning algorithms such as a convolutional neural networks (CNN) have been actively adopted for various computer vision tasks such as image recognition, object detection, and identification tasks to achieve higher recognition accuracy [7, 20, 22]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. EMDL 18, June 15, 2018, Munich, Germany c 2018 ACM. ISBN /18/06... $15.00 DOI: The key challenge to enable continuous vision applications is to run the state-of-the-art CNN models efficiently on resource-constrained mobile devices. Recent CNN models such as VGG-16 [20], ResNet [7], and Inception [22] often require a huge amount of computational resources regarding CPU/GPU cycles or memory usage, making their execution slow on mobile devices. For instance, VGG-16 and ResNet-152 require 15.3 GLOPS and 11.6 GLOPS to recognize a single image, which often takes at least hundreds of milliseconds on the commodity smartphones [9, 10, 14]. To address this problem, cloud offloading is often considered. However, the offloading approach has critical privacy concerns as it may expose a massive volume of private images and videos of users to the cloud. Previous works [1, 5, 6, 11, 24] have shown that CNNs usually have a lot of redundancy in terms of filters and parameters. The problem is further aggravated since developers often leverage transfer learning [19] to fine-tune the state-of-the-art models on new datasets to increase recognition accuracy. For example, the first 13 convolutional layers in VGG-16 can be used to provide robust features for a variety of new tasks such as classifying different types of fruits or animals which are not available in the ImageNet dataset [4]. Developers can attach a few additional layers on top of the existing 13 layers to fine-tune the network on new datasets. In many cases, if we don t process it carefully, transfer learning makes the model unnecessarily large and redundant to run on mobile devices. Compression of the neural networks has been actively studied for efficient execution of deep neural networks. Some works [1, 5] focus on approximating each layer separately via factorization techniques and fine-tune the whole network to restore accuracy. However, without global knowledge about relationships between lower and upper filters, independent pruning of filters might lead to significant loss in recognition accuracy. Other works [6] remove parameters based on their magnitudes during end-to-end fine-tuning. Unfortunately, the weight matrices become sparse and make it hard to leverage highly optimized libraries such as OpenBlas/ClBlast [18, 23] to perform inference efficiently on mobile devices. Recently, Yao et al. proposed DeepIoT system that used Recurrent Neural Network (RNN) to compress the models. However, RNN is prone to gradient vanishing problem and may not work well with very deep networks. In this paper, we propose a general technique called D-Pruner to reduce the memory footprint and computational cost of many existing and transferred CNN models. D-Pruner automatically figures out redundant filters in convolutional layers and removes them to make the model smaller in terms of memory and computational requirements. Its key idea is to embed a small network called masking layer into every convolution layer to score how effectively each filter contributes to the outcome. Masking layers removes only low

2 scored filters and fine-tune the network to keep the accuracy while pruning out the unnecessary filters. By learning the extended network end-to-end, D-Pruner can figure out the relationship between filters and make a better pruning decision. We conducted several experiments on two different datasets (CIFAR- 10 and CIFAR-100 [12]) to evaluate D-Pruner. Our results show that D-Pruner can compress existing models to be 4.4 and 2.76 smaller in terms on memory footprint, 4.57 and 2.9 better in term of computational cost on CIFAR-10 and CIFAR-100 respectively. In our latency tests, pruned models on CIFAR-10 and CIFAR- 100 achieve the speedup of 1.85 and 1.61 on Samsung Galaxy S7 device. Furthermore, D-Pruner achieves 8% smaller in size with accuracy of 90.48% comparing to pruned VGGNet with accuracy of 90.5% as proposed in DeepIoT [24] on CIFAR-10. We believe that mobile developers would be beneficial from D-Pruner to build small and efficient CNN models for many vision sensing tasks. The contribution of our paper can be summarized as follows: We propose D-Pruner, a simple but effective compression technique to remove redundancy within existing and transferred CNN models. D-Pruner introduces a novel concept of the masking block to figure out redundant filters which have low impacts on final accuracy. We leverage the knowledge from the training set to effectively remove only a subset of redundant filters to maintain accuracy at the highest level. We conducted intensive experiments using two different datasets on two network architectures to demonstrate the usefulness of D-Pruner. Our results on CIFAR-10 and CIFAR-100 [12] show that D-Pruner can compress existing models to be 4.4 and 2.76 smaller in terms on memory footprint, 4.57 and 2.9 better in term of computational cost on CIFAR-10 and CIFAR-100 respectively. In our latency evaluation, pruned models on CIFAR-10 and CIFAR-100 achieve the speedup of 1.85 and 1.61 on Samsung Galaxy S7 device. 2. RELATED WORK Mobile Deep Learning: There has been many prior works to migrate deep learning models onto mobile devices. DeepEar [15] was the first framework to support running deep neural network models on mobile CPUs and DSPs for audio sensing tasks. Afterwards, many other frameworks such as DeepX [14], DeepSense [9] and DeepMon [10] have been proposed with various optimizations for mobile vision applications. However, those focus primarily on optimizing the processing pipeline to make existing models run faster and more energy-efficient. For instance, DeepX tries to split computations between CPUs and low power DSPs to reduce energy consumption. DeepSense and DeepMon take another approach by doing low-level optimizations to utilize the powerful mobile GPUs to execute computations in parallel. Unlike those systems, D-Pruner focuses on optimizing the models to be smaller in terms of memory footprint and computational cost so that they can be run efficiently on any systems above. Compression Techniques: General optimizations to compress deep learning models are widely studied for both servers and mobile devices. Denton et al. used matrix factorization to approximate the weights within CNN [5] to reduce its parameters. Similarly, Lane et el. also used matrix factorization to reduce the number of operations within dense layers to improve inference time [1]. In those approaches, each layer is approximated separately and combined for final fine-tuning step. However, there is no global information sharing between approximations to make sure that in- Figure 1: Convolutional Neural Network Architecture formation loss in the first layer will have no effect on second layer. Hence, accuracy drop would be significant if we prune large amount of parameters. Han et al. proposed a method to prune network connection based on magnitude of parameters during fine-tuning phase [6]. However, weights matrix becomes sparse after pruning process and makes it hard to leverage optimized library to execute inference step efficiently. Yao et al. proposed DeepIot system that leverages recurrent neural network (RNN) to learn the relationship between parameters across many layers and prune the redundancy automatically [24]. However, RNN is prone to gradient vanishing problem and may not work well if the input sequence is too large. In this case, DeepIoT may not work well with state-of-the-art models such as ResNet or Inception network. D-Pruner shares the similar concept by learning global relationship end-to-end and automatically pruning unnecessary parameters. However, D-Pruner solves the potential of gradient vanishing problem by integrating a novel masking blocks directly to CNN model and fine-tune the whole network without any needs of RNN. 3. CONVOLUTIONAL NEURAL NETWORK Since AlexNet architecture was proposed in 2012 [13], there have been many significant changes in the first network architecture (Figure 1) to improve the capabilities of CNN on many computer vision tasks. One interesting change is the replacement of fully connected layers or dense layers by [1 1] convolutional layer and global average pooling in many state-of-the-art models such as ResNet [7], Inception network [22]. As dense layers consume the most parameters in CNN [9], this change significantly reduces the size (or memory footprint) of state-of-the-art models. However, as modern networks still rely heavily on convolutional layers to extract meaningful visual features, high computational cost is still an open problem [9]. There are two widely used methods to reduce computational cost in CNN. The first method is to use factorization techniques such as SVD (singular-value decomposition) to approximate the weights matrices during inference step to reduce the total processing operations. However, this approach tends to have high accuracy loss on very deep networks [1, 14]. The second method is to prune the redundant filters to achieve simpler but more efficient CNNs. As the computational cost is proportional to the number of filters, pruning unnecessary filters will result in improving both training and inference time. Many works have shown potential results using this approach [11, 24]. D-Pruner follows the latter approach by recognizing redundancy automatically during fine-tuning process. D-Pruner is designed as a general technique to compress any modern CNN models to be smaller and less resource-consuming to work efficiently on both servers and mobile devices. 4. D-PRUNER ALGORITHM In this section, we first introduce briefly how the technique works. Secondly, we provide details about our novel masking block to determine removable filters. Finally, we show how the training

3 process takes place to prune unnecessary parameters based on the knowledge from masking blocks. The algorithm works in multiple pruning iterations. In each iteration, we first expand all convolutional layers with extra layers called masking blocks to score how much each filter impacts on final accuracy. Each masking blocks will output a set of candidate filters to be removed for each particular image input. In order to prune only filters that have little impacts on the outcome, we leverage the all training images to collect the probability to be removed of each filter. We only remove those with high probability of being removed (e.g. over 95% on training set). We then fine-tune the new network to recover original accuracy and achieve a smaller model. Finally, we repeat the pruning process again until it converges (e.g. accuracy drop is above certain threshold.). 4.1 Masking Block The goal of masking block is to determine removable filters during the pruning process. For example, fine-tuning ImageNet models such as VGG-16 or ResNet to detect multiple types of fruits might contain a lot of redundant filters to recognize animals, which can be removed to make the model smaller and simpler. By attaching masking block to convolutional layer, it will inspect the output of every filter and score how effectively they affect the final outcome. Hence, unnecessary filters may be removed if they have no or little impact on the final accuracy. Furthermore, masking block incurs very small computational overhead and should be easily fine-tuned. Our masking block is inspired by SE block in Squeeze-and-Excitation network [8] which is used to measure the importance of each filter within a single convolutional layer. We leverage SE block and add masking function in order to filter out top-k unimportant filters. Masking block as shown in Figure 2 consists of an average pooling followed by 2 dense layers and a softmax layer to compute the score of each particular filter. Afterwards, the masking layer takes the scores, a maximum number of filters K to be removed and outputs the binary masks which zero out top-k lowest scores. At the end, we multiply the masks and previous output of convolutional layer to remove all unnecessary outputs corresponding to removed filters. Hence, only remaining output will contribute to the final outcome during fine-tuning process. Conv Avg. Pooling FC - RELU FC Softmax Masking MUL Masking block Output Secondly, we predict which filters should be removed within the final network architecture. As the masking block outputs a set of removable filters for every single image input, one filter can be removed for a particular input but can be preserved for another. We collect the removal distribution of each filter on the all training images and only remove the filters that have removable probability higher than predefined threshold (e.g. 95%) (line 6-14). For instance, the first convolutional layer of VGG-16 has 64 filters. If we use K=10 filters, during the training phase, masking block will automatically zero-out all the output of 10 filters with lowest scores on each training image. Only 54 remaining filters will contribute to the final output. However, masking block does not always produce the same 10 filters for every training image. In order to make correct pruning decision, we use all training images to collect the probability to be removed of all 64 filters and remove only those have higher than a threshold T. Finally, we build a new network by removing masking blocks and removed filers from previous step (line 15). We transfer the learned parameters to the new network and fine-tune it for few epochs to recover original accuracy (line 16-17). We update the new model if validation accuracy is within affordable range (line 18-21) and repeat the pruning process until we satisfy with the result or final accuracy drops below a certain threshold (line 3). Algorithm 1 Pruning Algorithm Data: Network O, Dataset D, const K, theshold T, epochs N Result: Network P 1 acc acc(o) 2 P Network O 3 while acc expected_accuracy do 4 P Network P + {masking blocks} 5 Finetune P on D for N epochs 6 R {} 7 for masking block l in P do 8 for filter f in l do 9 Pr f P(mask(f ) == 1 D) 10 if Pr f T then 11 R R {f} 12 end 13 end 14 end 15 P Network P - R 16 Transfer learned paraemters from P to P 17 Finetune P on D for N epochs 18 if acc(p ) acc then 19 acc acc(p ) 20 P P 21 end 22 end Figure 2: Masking Block 4.2 Pruning Method The pruning process consists four main stages as described in Algorithm 1. Firstly, we attach masking blocks to original network as shown in Figure 2 and fine-tune the network on training set (line 4). We fix the original network and train only the masking blocks for first few epochs and then fine-tune the whole network for few more epochs afterwards. (line 5) 5. EXPERIMENTS 5.1 Experiment Setup Datasets. We evaluated D-Pruner by compressing existing models on two datasets: CIFAR-10 and CIFAR-100 [12]. Each dataset consists of x32 color images ( images for training and images for validation). CIFAR-10 and CIFAR-100 contains images in 10 and 100 classes respectively.

4 CIFAR-10 CIFAR-100 ALL-CNN-C ALL-CNN-C(*) Impr. NIN NIN(*) Impr. ALL-CNN-C ALL-CNN-C(*) Impr. Accuracy(%) # Parameters 1.3M 310K K 348K M 501K 2.76 # Mul-Add Operations 281M 61M M 132M M 97M 2.9 Latency(ms) 211(±8) 113(±14) (±25) 131(±11) (±11) 129(±14) 1.61 (*): pruned model Impr.: Improvement Table 1: Overall Performance of D-Pruner CIFAR-10 ALL-CNN-C ALL-CNN-C(*) ALL-CNN-C(**) VGGNET VGGNET-DEEPIOT Accuracy(%) Number of Parameters 1.3M 664K 310K 29.7M 724K (*): Pruned model at 4th iteration (**): Final pruned model Table 2: Comparison with DeepIoT Filter Shape Type / Stride - Activation ALL-CNN-C NIN Conv1 / s1 - ReLU Conv2 / s1 - ReLU Conv3 / s2 - ReLU Conv4 / s1 - ReLU Conv5 / s1 - ReLU Conv6 / s2 - ReLU Conv7 / s1 - ReLU Conv8 / s1 - ReLU Conv9 / s1 - ReLU Global Average Pool / s1 Softmax Table 3: Network Architectures Models. We trained the ALL-CNN-C model from [21] which achieves accuracy of 90.19% on CIFAR-10 and 61.71% on CIFAR In order to show the robustness of D-Pruner on variety of architectures, we also trained NIN (network in network) model from [16] which achieves accuracy of 89.39% on CIFAR-10 for further evaluations. Unlike other models such as VGGNet [24] that use dense layers for classification, both networks in our evaluations use only convolutional layers which results in fewer number of parameters while achieving similar accuracy. ALL-CNN-C and NIN uses approximately about 281M and 222M Mul-Add operations respectively. Network architectures of ALL-CNN-C and NIN models on CIFAR-10 are shown in Table 3. Training process. We used Keras [3] in D-Pruner s implementation. For every pruning step, we tried to remove K = 20% of the filters and fine-tuned the network for N = 35 epochs (10% of number of epochs we used to train original network). We used Nesterov Gradient Descent [17] for fine-tuning with learning rate, momentum and decay set to 0.01, 0.9 and respectively. We also used threshold T of 0.95 to determine which filter will be removed. We repeated the pruning process for several iterations until there was no filter to be removed or the expected accuracy loss was larger than 1%. Metrics. We use accuracy, number of parameters, amount of mul-add operations and processing latency as our key performance metrics. For latency evaluation, we evaluated pruned models using DeepMon framework [10] and report the average latency on Samsung Galaxy S7 (with Exynos 8890 processor and Mali-T880 GPU). 5.2 Overall Results Overall, D-Pruner successfully compresses investigated models to be much smaller and less computational consuming. Table 1 shows the performance of pruned versions of ALL-CNN-C and NIN models on both CIFAR-10 and CIFAR-100. On CIFAR-10, D-Pruner easily compress both ALL-CNN-C and NIN models to be 4.4 and 2.77 smaller in memory footprint (in terms of number of parameters). It also reduces 4.57 and 1.68 computational cost (in terms of the amount of require Mul-Add operations) in ALL-CNN-C and NIN models respectively. We notice that performance of D-Pruner on ALL-CNN-C model is significantly higher than on NIN network due to several reasons. First, original NIN model has 1.34 less number of parameters comparing to ALL-CNN-C model which makes reduction in memory footprint seem to be lower. Second, NIN network leverages [1 1] convolutional filter which results in significantly reduction in computational cost, which explains why computation cost is reduced only 1.68 while memory footprint is reduced In latency evaluations, pruned models from ALL-CNN-C and NIN networks improves inference time up to 1.85 and 1.41 respectively. Similarly, pruned version of ALL-CNN-C achieves 2.76, 2.9 and 1.61 reduction in memory footprint, computational cost and inference time on CIFAR Performance Breakdown Next, we investigate how D-Pruner affects the models during each pruning iteration in terms of accuracy, amount of parameters, number of Mul-Add operations and the amount of filters in each convolutional layer using results from pruning ALL-CNN-C model. In general, giving the expected accuracy drop, D-Pruner gradually compresses the model by pruning unnecessary filters over various iterations and makes it smaller in terms of memory footprint and computational cost while trying its best to maintain the highest accuracy Impacts on Accuracy We now investigate the impact of D-Pruner on the final accuracy. Figure 3 shows the accuracy of pruned models during multiple pruning iterations on both CIFAR-10 and CIFAR-100. We achieve accuracy of 89.34% and 61.08% comparing to 90.19% and

5 Number Of Mul-Add Operations (in M ops.) Number of Parameters (in K params.) Number of [3x3] Filters Accuracy (%) Accuracy (%) Latency (ms) CIFAR-10 CIFAR (a) CIFAR-10 Figure 3: Accuracy 59 Orig (b) CIFAR Figure 5: Latency Num Ops. Params Figure 4: Parameters and Operations Reduction on CIFAR % from original models after 13 and 6 pruning iterations on CIFAR-10 and CIFAR-100 respectively. Firstly, we notice that it takes us 14 and 7 iterations to make the accuracy loss above 1% threshold on CIFAR-10 and CIFAR-100. This implies that the original models tend to have a lot of redundancy and D-Pruner can effectively prune them without significant loss in the final accuracy. Secondly, we figure out that accuracy increases for the first few iterations which indicates that the some redundancy negatively affects the accuracy. Hence, D-Pruner can be used to slightly improve accuracy by eliminating most negatively redundant parameters. Finally, we also want to note that the pruning process converges faster on CIFAR-100 than CIFAR-10. As we use same architecture on both tasks, it is understandable that classifying 100 classes requires more network capacity in terms of filters and parameters than classifying 10 classes Impacts on Parameters and Operations Next, we investigate on how many parameters and number of operations D-Pruner can prune during each iteration. Figure 4 shows that both the number of parameters and operations gradually decrease during pruning process. At 13th iteration, we achieve 4.4 and 4.57 reduction in number of parameters and Mul-Add operations on CIFAR-10. As D-Pruner s optimization is to reduce the number of filters during each pruning iteration, both parameters and number of operations would always decrease during the pruning process. Similarly, we also see the same trend on CIFAR-100 dataset which results in 2.76 and 2.9 improvement on model s parameters and number of Mul-Add operations Impacts on Latency Conv1 Conv2 Conv3 Conv4 Conv5 Conv6 Figure 6: Number of Filters per Iteration on CIFAR-10 We also explore the performance of pruned models on existing mobile deep learning frameworks. Figure 5 shows the latency per pruning iteration on both datasets using DeepMon framework. We achieve the speedup of 1.85 and 1.61 on CIFAR-10 and CIFAR- 100 respectively. However, we notice that the latency slowly decreases after 9th iteration. One reason is that number of operations per layer become too small to make DeepMon utilize GPU resources efficiently. However, batching multiple images as input could improve the average inference time Impacts on Number of Filters Finally, we investigate the reduction of filters during the pruning process on CIFAR-10. We plot the amount of filters within two first blocks in ANN-CNN-C model which consist the first 6 convolutional layers as shown in Figure 6. Each block consists of 3 convolutional layers which have 96 and 192 filters respectively. Both blocks end with spatial dimension reduction using convolutional layer with stride is set to 2 instead of using Max-Pooling. Interestingly, two blocks share the same trend in filters reduction. The first and last convolutional layers within the block stop reducing after a certain threshold while the middle layer keeps reducing during pruning process. This shows some insights for us the build better network architecture where last layer inside a block should have few filters comparing to previous layers. 5.4 Comparisons with DeepIoT We compare our pruned models with compressed VGGNet from DeepIoT [24] on CIFAR-10 dataset as shown in Table 2. At 4th iteration, D-Pruner provides a model with 8% less parameters than DeepIoT s model while achieving comparable accuracy (90.48% vs 90.5%), even though we start with less accurate model. If we are willing to sacrifice 1.16% (comparing to DeepIoT), we will achieve 2.33 smaller model.

6 We also notice that DeepIoT leverages recurrent neural network (RNN) to prune the parameters. However, RNN is prone to gradient vanishing problem and may not work well in very deep neural network such as ResNet or Inception network. Instead, D-Pruner s masking blocks can be easily integrated into CNN and can be trained at ease. 6. DISCUSSION Even though D-Pruner produces good preliminary results, there are still plenty of works that need to be done in the near future. Number of Removable Filters: D-Pruner has to do pruning in several iterations to search for desirable model. Unfortunately, pruning steps might take long time to process. One approach to solve that problem is to increase the amount of removable filters K during each pruning process. However, if K is not chosen carefully, target model will lack of parameters to learn meaningful features which will result in great accuracy drop. An approach to automatically determine the number K during fine-tuning process should be taken in consideration to make D-Pruner work more efficiently. Improving Accuracy: As CNN often has a lot of redundant parameters, some of them might have negative impact on the final outcome or make the model unstable. During our evaluations, we showed that accuracy can be improved by eliminating some parameters. This can be explored further to make pruned model not only smaller but also more accurate. 7. CONCLUSION In this paper, we propose D-Pruner, a simple and efficient compression technique to simplify existing CNN models in order to make them work more efficiently on mobile devices without loss in accuracy. On CIFAR-10, our evaluations on existing CNN models show that D-Pruner can reduce 4.4 in terms of memory footprint and 4.57 in terms of computational cost with less than 1% accuracy drop. Furthermore, compressed model runs 1.85 faster on Samsung Galaxy S7 comparing to original model during our inference latency evaluations. 8. REFERENCES [1] S. Bhattacharya and N. D. Lane. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM, pages ACM, [2] J. Boger, J. Hoey, P. Poupart, C. Boutilier, G. Fernie, and A. Mihailidis. A planning system based on markov decision processes to guide people with dementia through activities of daily living. IEEE Transactions on Information Technology in Biomedicine, 10(2): , [3] F. Chollet et al. Keras: Deep learning library for theano and tensorflow. URL: io/k, 7:8, [4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, CVPR IEEE Conference on, pages IEEE, [5] E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems, pages , [6] S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arxiv preprint arxiv: , [7] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages , [8] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. arxiv preprint arxiv: , [9] L. N. Huynh, R. K. Balan, and Y. Lee. Deepsense: A gpu-based deep convolutional neural network framework on commodity mobile devices. In Proceedings of the 2016 Workshop on Wearable Systems and Applications, pages ACM, [10] L. N. Huynh, Y. Lee, and R. K. Balan. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, pages ACM, [11] Y.-D. Kim, E. Park, S. Yoo, T. Choi, L. Yang, and D. Shin. Compression of deep convolutional neural networks for fast and low power mobile applications. arxiv preprint arxiv: , [12] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages , [14] N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao, L. Qendro, and F. Kawsar. Deepx: A software accelerator for low-power deep learning inference on mobile devices. In Information Processing in Sensor Networks (IPSN), th ACM/IEEE International Conference on, pages IEEE, [15] N. D. Lane, P. Georgiev, and L. Qendro. Deepear: robust smartphone audio sensing in unconstrained acoustic environments using deep learning. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages ACM, [16] M. Lin, Q. Chen, and S. Yan. Network in network. arxiv preprint arxiv: , [17] Y. Nesterov et al. Gradient methods for minimizing composite objective function, [18] C. Nugteren. Clblast: A tuned opencl blas library. arxiv preprint arxiv: , [19] S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10): , [20] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arxiv preprint arxiv: , [21] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. arxiv preprint arxiv: , [22] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, volume 4, page 12, [23] Z. Xianyi, W. Qian, and Z. Chothia. Openblas, version URL openblas. net/. Fe tched, page 09, [24] S. Yao, Y. Zhao, A. Zhang, L. Su, and T. Abdelzaher. Deepiot: Compressing deep neural network structures for sensing systems with a compressor-critic framework. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. ACM, 2017.

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Real-time convolutional networks for sonar image classification in low-power embedded systems

Real-time convolutional networks for sonar image classification in low-power embedded systems Real-time convolutional networks for sonar image classification in low-power embedded systems Matias Valdenegro-Toro Ocean Systems Laboratory - School of Engineering & Physical Sciences Heriot-Watt University,

More information

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru

More information

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague

CNNS FROM THE BASICS TO RECENT ADVANCES. Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague CNNS FROM THE BASICS TO RECENT ADVANCES Dmytro Mishkin Center for Machine Perception Czech Technical University in Prague ducha.aiki@gmail.com OUTLINE Short review of the CNN design Architecture progress

More information

Fei-Fei Li & Justin Johnson & Serena Yeung

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9-1 Administrative A2 due Wed May 2 Midterm: In-class Tue May 8. Covers material through Lecture 10 (Thu May 3). Sample midterm released on piazza. Midterm review session: Fri May 4 discussion

More information

Deep Learning with Tensorflow AlexNet

Deep Learning with Tensorflow   AlexNet Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Spatial Localization and Detection. Lecture 8-1

Spatial Localization and Detection. Lecture 8-1 Lecture 8: Spatial Localization and Detection Lecture 8-1 Administrative - Project Proposals were due on Saturday Homework 2 due Friday 2/5 Homework 1 grades out this week Midterm will be in-class on Wednesday

More information

CNN optimization. Rassadin A

CNN optimization. Rassadin A CNN optimization Rassadin A. 01.2017-02.2017 What to optimize? Training stage time consumption (CPU / GPU) Inference stage time consumption (CPU / GPU) Training stage memory consumption Inference stage

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks Surat Teerapittayanon Harvard University Email: steerapi@seas.harvard.edu Bradley McDanel Harvard University Email: mcdanel@fas.harvard.edu

More information

arxiv: v1 [cs.lg] 9 Jan 2019

arxiv: v1 [cs.lg] 9 Jan 2019 How Compact?: Assessing Compactness of Representations through Layer-Wise Pruning Hyun-Joo Jung 1 Jaedeok Kim 1 Yoonsuck Choe 1,2 1 Machine Learning Lab, Artificial Intelligence Center, Samsung Research,

More information

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset LSTM: An Image Classification Model Based on Fashion-MNIST Dataset Kexin Zhang, Research School of Computer Science, Australian National University Kexin Zhang, U6342657@anu.edu.au Abstract. The application

More information

arxiv: v1 [cs.cv] 20 Dec 2016

arxiv: v1 [cs.cv] 20 Dec 2016 End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr

More information

Multi-Glance Attention Models For Image Classification

Multi-Glance Attention Models For Image Classification Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Elastic Neural Networks for Classification

Elastic Neural Networks for Classification Elastic Neural Networks for Classification Yi Zhou 1, Yue Bai 1, Shuvra S. Bhattacharyya 1, 2 and Heikki Huttunen 1 1 Tampere University of Technology, Finland, 2 University of Maryland, USA arxiv:1810.00589v3

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

3D model classification using convolutional neural network

3D model classification using convolutional neural network 3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing

More information

arxiv: v1 [cs.cv] 26 Aug 2016

arxiv: v1 [cs.cv] 26 Aug 2016 Scalable Compression of Deep Neural Networks Xing Wang Simon Fraser University, BC, Canada AltumView Systems Inc., BC, Canada xingw@sfu.ca Jie Liang Simon Fraser University, BC, Canada AltumView Systems

More information

Know your data - many types of networks

Know your data - many types of networks Architectures Know your data - many types of networks Fixed length representation Variable length representation Online video sequences, or samples of different sizes Images Specific architectures for

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Mind the regularized GAP, for human action classification and semi-supervised localization based on visual saliency.

Mind the regularized GAP, for human action classification and semi-supervised localization based on visual saliency. Mind the regularized GAP, for human action classification and semi-supervised localization based on visual saliency. Marc Moreaux123, Natalia Lyubova1, Isabelle Ferrane 2 and Frederic Lerasle3 1 Softbank

More information

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia

Deep learning for dense per-pixel prediction. Chunhua Shen The University of Adelaide, Australia Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia Image understanding Classification error Convolution Neural Networks 0.3 0.2 0.1 Image Classification [Krizhevsky

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period starts

More information

Structured Prediction using Convolutional Neural Networks

Structured Prediction using Convolutional Neural Networks Overview Structured Prediction using Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Structured predictions for low level computer

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

Object Detection and Its Implementation on Android Devices

Object Detection and Its Implementation on Android Devices Object Detection and Its Implementation on Android Devices Zhongjie Li Stanford University 450 Serra Mall, Stanford, CA 94305 jay2015@stanford.edu Rao Zhang Stanford University 450 Serra Mall, Stanford,

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Announcements Computer Vision Lecture 16 Deep Learning Applications 11.01.2017 Seminar registration period starts on Friday We will offer a lab course in the summer semester Deep Robot Learning Topic:

More information

A Method to Estimate the Energy Consumption of Deep Neural Networks

A Method to Estimate the Energy Consumption of Deep Neural Networks A Method to Estimate the Consumption of Deep Neural Networks Tien-Ju Yang, Yu-Hsin Chen, Joel Emer, Vivienne Sze Massachusetts Institute of Technology, Cambridge, MA, USA {tjy, yhchen, jsemer, sze}@mit.edu

More information

Deep learning for object detection. Slides from Svetlana Lazebnik and many others

Deep learning for object detection. Slides from Svetlana Lazebnik and many others Deep learning for object detection Slides from Svetlana Lazebnik and many others Recent developments in object detection 80% PASCAL VOC mean0average0precision0(map) 70% 60% 50% 40% 30% 20% 10% Before deep

More information

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins

Smart Parking System using Deep Learning. Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins Smart Parking System using Deep Learning Sheece Gardezi Supervised By: Anoop Cherian Peter Strazdins Content Labeling tool Neural Networks Visual Road Map Labeling tool Data set Vgg16 Resnet50 Inception_v3

More information

Fully Convolutional Networks for Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation Fully Convolutional Networks for Semantic Segmentation Jonathan Long* Evan Shelhamer* Trevor Darrell UC Berkeley Chaim Ginzburg for Deep Learning seminar 1 Semantic Segmentation Define a pixel-wise labeling

More information

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Please contact the conference organizers at dcasechallenge@gmail.com if you require an accessible file, as the files provided by ConfTool Pro to reviewers are filtered to remove author information, and

More information

Perceptron: This is convolution!

Perceptron: This is convolution! Perceptron: This is convolution! v v v Shared weights v Filter = local perceptron. Also called kernel. By pooling responses at different locations, we gain robustness to the exact spatial location of image

More information

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Supplementary material for Analyzing Filters Toward Efficient ConvNet Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

Fuzzy Set Theory in Computer Vision: Example 3

Fuzzy Set Theory in Computer Vision: Example 3 Fuzzy Set Theory in Computer Vision: Example 3 Derek T. Anderson and James M. Keller FUZZ-IEEE, July 2017 Overview Purpose of these slides are to make you aware of a few of the different CNN architectures

More information

Progressive Neural Architecture Search

Progressive Neural Architecture Search Progressive Neural Architecture Search Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy 09/10/2018 @ECCV 1 Outline Introduction

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning,

Index. Springer Nature Switzerland AG 2019 B. Moons et al., Embedded Deep Learning, Index A Algorithmic noise tolerance (ANT), 93 94 Application specific instruction set processors (ASIPs), 115 116 Approximate computing application level, 95 circuits-levels, 93 94 DAS and DVAS, 107 110

More information

Convolutional Neural Networks

Convolutional Neural Networks NPFL114, Lecture 4 Convolutional Neural Networks Milan Straka March 25, 2019 Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise

More information

Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network

Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Lixue Xia, Tianqi Tang, Wenqin Huangfu, Ming Cheng, Xiling Yin, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E., Tsinghua

More information

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon

3D Densely Convolutional Networks for Volumetric Segmentation. Toan Duc Bui, Jitae Shin, and Taesup Moon 3D Densely Convolutional Networks for Volumetric Segmentation Toan Duc Bui, Jitae Shin, and Taesup Moon School of Electronic and Electrical Engineering, Sungkyunkwan University, Republic of Korea arxiv:1709.03199v2

More information

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization

More information

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU,

Machine Learning. Deep Learning. Eric Xing (and Pengtao Xie) , Fall Lecture 8, October 6, Eric CMU, Machine Learning 10-701, Fall 2015 Deep Learning Eric Xing (and Pengtao Xie) Lecture 8, October 6, 2015 Eric Xing @ CMU, 2015 1 A perennial challenge in computer vision: feature engineering SIFT Spin image

More information

AAR-CNNs: Auto Adaptive Regularized Convolutional Neural Networks

AAR-CNNs: Auto Adaptive Regularized Convolutional Neural Networks AAR-CNNs: Auto Adaptive Regularized Convolutional Neural Networks Yao Lu 1, Guangming Lu 1, Yuanrong Xu 1, Bob Zhang 2 1 Shenzhen Graduate School, Harbin Institute of Technology, China 2 Department of

More information

Inception Network Overview. David White CS793

Inception Network Overview. David White CS793 Inception Network Overview David White CS793 So, Leonardo DiCaprio dreams about dreaming... https://m.media-amazon.com/images/m/mv5bmjaxmzy3njcxnf5bml5banbnxkftztcwnti5otm0mw@@._v1_sy1000_cr0,0,675,1 000_AL_.jpg

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Yiqi Yan. May 10, 2017

Yiqi Yan. May 10, 2017 Yiqi Yan May 10, 2017 P a r t I F u n d a m e n t a l B a c k g r o u n d s Convolution Single Filter Multiple Filters 3 Convolution: case study, 2 filters 4 Convolution: receptive field receptive field

More information

Rotation Invariance Neural Network

Rotation Invariance Neural Network Rotation Invariance Neural Network Shiyuan Li Abstract Rotation invariance and translate invariance have great values in image recognition. In this paper, we bring a new architecture in convolutional neural

More information

POINT CLOUD DEEP LEARNING

POINT CLOUD DEEP LEARNING POINT CLOUD DEEP LEARNING Innfarn Yoo, 3/29/28 / 57 Introduction AGENDA Previous Work Method Result Conclusion 2 / 57 INTRODUCTION 3 / 57 2D OBJECT CLASSIFICATION Deep Learning for 2D Object Classification

More information

Deep Learning for Computer Vision II

Deep Learning for Computer Vision II IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L

More information

Object Detection Based on Deep Learning

Object Detection Based on Deep Learning Object Detection Based on Deep Learning Yurii Pashchenko AI Ukraine 2016, Kharkiv, 2016 Image classification (mostly what you ve seen) http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf

More information

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Intro to Deep Learning Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Why this class? Deep Features Have been able to harness the big data in the most efficient and effective

More information

Deconvolutions in Convolutional Neural Networks

Deconvolutions in Convolutional Neural Networks Overview Deconvolutions in Convolutional Neural Networks Bohyung Han bhhan@postech.ac.kr Computer Vision Lab. Convolutional Neural Networks (CNNs) Deconvolutions in CNNs Applications Network visualization

More information

Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet

Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet Amarjot Singh, Nick Kingsbury Signal Processing Group, Department of Engineering, University of Cambridge,

More information

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material

Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Volumetric and Multi-View CNNs for Object Classification on 3D Data Supplementary Material Charles R. Qi Hao Su Matthias Nießner Angela Dai Mengyuan Yan Leonidas J. Guibas Stanford University 1. Details

More information

A Deep Learning Approach to Vehicle Speed Estimation

A Deep Learning Approach to Vehicle Speed Estimation A Deep Learning Approach to Vehicle Speed Estimation Benjamin Penchas bpenchas@stanford.edu Tobin Bell tbell@stanford.edu Marco Monteiro marcorm@stanford.edu ABSTRACT Given car dashboard video footage,

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 11. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 11 Sinan Kalkan TRAINING A CNN Fig: http://www.robots.ox.ac.uk/~vgg/practicals/cnn/ Feed-forward pass Note that this is written in terms of the

More information

arxiv: v1 [cs.cv] 6 Jul 2016

arxiv: v1 [cs.cv] 6 Jul 2016 arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks

More information

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta

Encoder-Decoder Networks for Semantic Segmentation. Sachin Mehta Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta Outline > Overview of Semantic Segmentation > Encoder-Decoder Networks > Results What is Semantic Segmentation? Input: RGB Image Output:

More information

Real-time object detection towards high power efficiency

Real-time object detection towards high power efficiency Real-time object detection towards high power efficiency Jincheng Yu, Kaiyuan Guo, Yiming Hu, Xuefei Ning, Jiantao Qiu, Huizi Mao, Song Yao, Tianqi Tang, Boxun Li, Yu Wang, and Huazhong Yang Tsinghua University,

More information

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna,

Convolutional Neural Networks: Applications and a short timeline. 7th Deep Learning Meetup Kornel Kis Vienna, Convolutional Neural Networks: Applications and a short timeline 7th Deep Learning Meetup Kornel Kis Vienna, 1.12.2016. Introduction Currently a master student Master thesis at BME SmartLab Started deep

More information

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Junzhong Shen, You Huang, Zelong Wang, Yuran Qiao, Mei Wen, Chunyuan Zhang National University of Defense Technology,

More information

Deep Learning With Noise

Deep Learning With Noise Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu

More information

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu

More information

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms Deep Neural Network Hyperparameter Optimization with Genetic Algorithms EvoDevo A Genetic Algorithm Framework Aaron Vose, Jacob Balma, Geert Wenes, and Rangan Sukumar Cray Inc. October 2017 Presenter Vose,

More information

CrescendoNet: A New Deep Convolutional Neural Network with Ensemble Behavior

CrescendoNet: A New Deep Convolutional Neural Network with Ensemble Behavior CrescendoNet: A New Deep Convolutional Neural Network with Ensemble Behavior Xiang Zhang, Nishant Vishwamitra, Hongxin Hu, Feng Luo School of Computing Clemson University xzhang7@clemson.edu Abstract We

More information

Deep Neural Network Acceleration Framework Under Hardware Uncertainty

Deep Neural Network Acceleration Framework Under Hardware Uncertainty Deep Neural Network Acceleration Framework Under Hardware Uncertainty Mohsen Imani, Pushen Wang, and Tajana Rosing Computer Science and Engineering, UC San Diego, La Jolla, CA 92093, USA {moimani, puw001,

More information

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work

Contextual Dropout. Sam Fok. Abstract. 1. Introduction. 2. Background and Related Work Contextual Dropout Finding subnets for subtasks Sam Fok samfok@stanford.edu Abstract The feedforward networks widely used in classification are static and have no means for leveraging information about

More information

Evaluation of Convnets for Large-Scale Scene Classification From High-Resolution Remote Sensing Images

Evaluation of Convnets for Large-Scale Scene Classification From High-Resolution Remote Sensing Images Evaluation of Convnets for Large-Scale Scene Classification From High-Resolution Remote Sensing Images Ratko Pilipović and Vladimir Risojević Faculty of Electrical Engineering University of Banja Luka

More information

Weighted Convolutional Neural Network. Ensemble.

Weighted Convolutional Neural Network. Ensemble. Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com

More information

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY

Tutorial on Keras CAP ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Tutorial on Keras CAP 6412 - ADVANCED COMPUTER VISION SPRING 2018 KISHAN S ATHREY Deep learning packages TensorFlow Google PyTorch Facebook AI research Keras Francois Chollet (now at Google) Chainer Company

More information

Deep Learning and Its Applications

Deep Learning and Its Applications Convolutional Neural Network and Its Application in Image Recognition Oct 28, 2016 Outline 1 A Motivating Example 2 The Convolutional Neural Network (CNN) Model 3 Training the CNN Model 4 Issues and Recent

More information

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs

In-Place Activated BatchNorm for Memory- Optimized Training of DNNs In-Place Activated BatchNorm for Memory- Optimized Training of DNNs Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder Mapillary Research Paper: https://arxiv.org/abs/1712.02616 Code: https://github.com/mapillary/inplace_abn

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection

MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection MCMOT: Multi-Class Multi-Object Tracking using Changing Point Detection ILSVRC 2016 Object Detection from Video Byungjae Lee¹, Songguo Jin¹, Enkhbayar Erdenee¹, Mi Young Nam², Young Gui Jung², Phill Kyu

More information

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL

LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL LARGE-SCALE PERSON RE-IDENTIFICATION AS RETRIEVAL Hantao Yao 1,2, Shiliang Zhang 3, Dongming Zhang 1, Yongdong Zhang 1,2, Jintao Li 1, Yu Wang 4, Qi Tian 5 1 Key Lab of Intelligent Information Processing

More information

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications

Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Anand Joshi CS229-Machine Learning, Computer Science, Stanford University,

More information

Convolution Neural Network for Traditional Chinese Calligraphy Recognition

Convolution Neural Network for Traditional Chinese Calligraphy Recognition Convolution Neural Network for Traditional Chinese Calligraphy Recognition Boqi Li Mechanical Engineering Stanford University boqili@stanford.edu Abstract script. Fig. 1 shows examples of the same TCC

More information

How to Build Optimized ML Applications with Arm Software

How to Build Optimized ML Applications with Arm Software How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 ML Group Overview Today we will talk about applied machine learning (ML) on Arm. My aim for today is to show you just

More information

Lecture 7: Semantic Segmentation

Lecture 7: Semantic Segmentation Semantic Segmentation CSED703R: Deep Learning for Visual Recognition (207F) Segmenting images based on its semantic notion Lecture 7: Semantic Segmentation Bohyung Han Computer Vision Lab. bhhanpostech.ac.kr

More information

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017

Convolutional Neural Networks + Neural Style Transfer. Justin Johnson 2/1/2017 Convolutional Neural Networks + Neural Style Transfer Justin Johnson 2/1/2017 Outline Convolutional Neural Networks Convolution Pooling Feature Visualization Neural Style Transfer Feature Inversion Texture

More information

Deep Learning Based Real-time Object Recognition System with Image Web Crawler

Deep Learning Based Real-time Object Recognition System with Image Web Crawler , pp.103-110 http://dx.doi.org/10.14257/astl.2016.142.19 Deep Learning Based Real-time Object Recognition System with Image Web Crawler Myung-jae Lee 1, Hyeok-june Jeong 1, Young-guk Ha 2 1 Department

More information

Computation-Performance Optimization of Convolutional Neural Networks with Redundant Kernel Removal

Computation-Performance Optimization of Convolutional Neural Networks with Redundant Kernel Removal Computation-Performance Optimization of Convolutional Neural Networks with Redundant Kernel Removal arxiv:1705.10748v3 [cs.cv] 10 Apr 2018 Chih-Ting Liu, Yi-Heng Wu, Yu-Sheng Lin, and Shao-Yi Chien Media

More information

arxiv: v1 [cs.cv] 14 Jul 2017

arxiv: v1 [cs.cv] 14 Jul 2017 Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie Zhou, Shilei Wen Baidu IDL & Tsinghua University

More information

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation

Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation Fast CNN-Based Object Tracking Using Localization Layers and Deep Features Interpolation Al-Hussein A. El-Shafie Faculty of Engineering Cairo University Giza, Egypt elshafie_a@yahoo.com Mohamed Zaki Faculty

More information

Fast Sliding Window Classification with Convolutional Neural Networks

Fast Sliding Window Classification with Convolutional Neural Networks Fast Sliding Window Classification with Convolutional Neural Networks Henry G. R. Gouk Department of Computer Science University of Waikato Private Bag 3105, Hamilton 3240, New Zealand hgouk@waikato.ac.nz

More information

Unified, real-time object detection

Unified, real-time object detection Unified, real-time object detection Final Project Report, Group 02, 8 Nov 2016 Akshat Agarwal (13068), Siddharth Tanwar (13699) CS698N: Recent Advances in Computer Vision, Jul Nov 2016 Instructor: Gaurav

More information

An Exploration of Computer Vision Techniques for Bird Species Classification

An Exploration of Computer Vision Techniques for Bird Species Classification An Exploration of Computer Vision Techniques for Bird Species Classification Anne L. Alter, Karen M. Wang December 15, 2017 Abstract Bird classification, a fine-grained categorization task, is a complex

More information

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh National Institute of Advanced Industrial Science and Technology (AIST) Tsukuba,

More information

How to Build Optimized ML Applications with Arm Software

How to Build Optimized ML Applications with Arm Software How to Build Optimized ML Applications with Arm Software Arm Technical Symposia 2018 Arm K.K. Senior FAE Ryuji Tanaka Overview Today we will talk about applied machine learning (ML) on Arm. My aim for

More information

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina

Residual Networks And Attention Models. cs273b Recitation 11/11/2016. Anna Shcherbina Residual Networks And Attention Models cs273b Recitation 11/11/2016 Anna Shcherbina Introduction to ResNets Introduced in 2015 by Microsoft Research Deep Residual Learning for Image Recognition (He, Zhang,

More information

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016

CS 2750: Machine Learning. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 CS 2750: Machine Learning Neural Networks Prof. Adriana Kovashka University of Pittsburgh April 13, 2016 Plan for today Neural network definition and examples Training neural networks (backprop) Convolutional

More information

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity

Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Deep Learning Benchmarks Mumtaz Vauhkonen, Quaizar Vohra, Saurabh Madaan Collaboration with Adam Coates, Stanford Unviersity Abstract: This project aims at creating a benchmark for Deep Learning (DL) algorithms

More information