A Faster Algorithm for Reducing the Computational Complexity of Convolutional Neural Networks

Size: px
Start display at page:

Download "A Faster Algorithm for Reducing the Computational Complexity of Convolutional Neural Networks"

Transcription

1 algorithms Article A Faster Algorithm for Reducing Computational Complexity Convolutional Neural Networks Yulin Zhao,,3, *, Donghui Wang,, Leiou Wang, and Peng Liu,,3 Institute Acoustics, Chinese Academy Sciences, Beijing 0090, China; wangdh@mail.ioa.ac.cn (D.W.); wangleiou@mail.ioa.ac.cn (L.W.); liupengucas@mail.ioa.ac.cn (P.L.) Key Laboratory Information Technology for Autonomous Underwater Vehicles, Institute Acoustics, Chinese Academy Sciences, Beijing 0090, China 3 University Chinese Academy Sciences, Beijing 00049, China * Correspondence: zhaoyulin4@mails.ucas.ac.cn Received: 0 September 08; Accepted: 6 October 08; Published: 8 October 08 Abstract: Convolutional neural networks have achieved remarkable improvements in image and video recognition but incur a heavy computational burden. To reduce computational complexity a convolutional neural network, this paper proposes an algorithm based on Winograd minimal filtering algorithm and Strassen algorithm. Theoretical assessments proposed algorithm show that it can dramatically reduce computational complexity. Furrmore, Visual Geometry Group (VGG) network is employed to evaluate algorithm in practice. The results show that proposed algorithm can provide optimal performance by combining savings se two algorithms. It saves 75% runtime compared with conventional algorithm. Keywords: convolutional neural network; Winograd; minimal filtering; Strassen; fast; complexity. Introduction Deep convolutional neural networks have achieved remarkable improvements in image and video processing 3. However, computational complexity se networks has also increased significantly. Since prediction process networks used in real-time applications requires very low latency, heavy computational burden is a major problem with se systems. Detecting faces from video imagery is still a challenging task 4,5. The success convolutional neural networks in se applications is limited by ir heavy computational burden. There have been a number studies on accelerating efficiency convolutional neural networks. Denil et al. 6 indicate that re are significant redundancies in parameterizations neural networks. Han et al. 7 and Guo et al. 8 use certain training strategies to compress se neural network models without significantly weakening ir performance. Some researchers 9 have found that low-precision computation is sufficient for networks. Binary/Ternary Net,3 restricts parameters to two or three values. Zhang et al. 4 used low-rank approximation to reconstruct convolution matrix, which can reduce complexity convolution. These algorithms are effective in accelerating computation in network, but y also cause a degradation in accuracy. Fast Fourier Transform (FFT) is also useful in reducing computational complexity convolutional neural networks without losing accuracy 5,6, but it is only effective for networks with large kernels. However, convolutional neural networks tend to use small kernels because y achieve better accuracy than networks with larger kernels. For se reasons, re is a demand for an algorithm that can accelerate efficiency networks with small kernels. In this paper, we present an algorithm based on Winograd s minimal filtering algorithm which was proposed by Toom 7 and Cook 8 and generalized by Winograd 9. The minimal filtering Algorithms 08,, 59; doi:0.3390/a0059

2 Algorithms Algorithms 08, 08,,, 59 x FOR PEER REVIEW algorithm can reduce computational complexity each convolution in network without algorithm can reduce computational complexity each convolution in network without losing losing accuracy. However, computational complexity is still large for real-time requirements. To accuracy. However, computational complexity is still large for real-time requirements. To reduce reduce furr computational complexity se networks, we utilize Strassen algorithm to furr computational complexity se networks, we utilize Strassen algorithm to reduce reduce number convolutions in network simultaneously. Moreover, we evaluate our number convolutions in network simultaneously. Moreover, we evaluate our algorithm with algorithm with Visual Geometry Group (VGG) network. Experimental results show that it can Visual Geometry Group (VGG) network. Experimental results show that it can save 75% save 75% time spent on computation when batch is 3. time spent on computation when batch is 3. The rest this paper is organized as follows. Section reviews related work on convolutional The rest this paper is organized as follows. Section reviews related work on convolutional neural networks, Winograd algorithm and Strassen algorithm. The proposed algorithm is neural networks, Winograd algorithm and Strassen algorithm. The proposed algorithm is presented in Section 3. Several simulations are included in Section 4, and work is concluded in presented in Section 3. Several simulations are included in Section 4, and work is concluded in Section 5. Section 5... Related Related Work Work.. Convolutional Neural Networks Machine-learning has has produced impressive results results in many in many signal signal processing processing applications applications 0,. Convolutional 0,. Convolutional neural neural networks networks extend extend machine-learning machine-learning capabilities capabilities neural neural networks networks by introducing by introducing convolutional layers layers to to network. network. Convolutional neural neural networks networks are are mainly mainly used used in image in image processing. Figure Figure shows structure aa classical convolutional neural network, LeNet. It consists two convolutional layers, two subsampling layers and three fully connected layers. Usually, computation convolutional layers occupies most network. input output convolutions subsampling convolutions subsampling Full connection Figure. The architecture LeNet. Convolutional Convolutional layers layers extract extract s s from from input input via via different different kernels. kernels. Suppose Suppose re re are are Q input input Mx Mx Nx Nx and and R output output My My Ny. Ny. The The convolutional convolutional kernel kernel is is Mw Mw Nw. Nw. The The computation computation output output in in a single single layer layer is is given given by by equation equation Q Mw Nw y r,x,y = Q Mw Nw y = r,x,y w r,q,u,v x q,x+u,y+v, () q= u= v= w x, r,q,u,v q,x+u,y+v () q= u= v= where X is input map, Y is output map, and W is kernel. The subscripts x where is input map, is output map, and is kernel. The subscripts and y indicate position pixel in map. The subscripts u and v indicate position and indicate position pixel in map. The subscripts and indicate position parameter in kernel. Equation () can be rewritten as Equation (). parameter in kernel. Equation () can be rewritten as Equation (). Q Q y r = y = w r,q * x q () r r, q q q= q= Suppose re are images that are sent toger to neural network, which means batch Suppose re are P images that are sent toger to neural network, which means batch is P. Then output in Equation () can be expressed by Equation (3). is P. Then output Y in Equation () can be expressed by Equation (3). y, = Q rp Q wrq, * x qp, (3) q= y r,p = w r,q x q,p (3) q= If we regard yr,p, wr,q and xq,p as elements matrices Y, W and X, respectively, output can be expressed as convolution matrix in Equation (4).

3 Algorithms 08,, 59 3 If we regard y r,p, w r,q and x q,p as elements matrices Y, W and X, respectively, output can be expressed as convolution matrix in Equation (4). Y = y, y,p. y R, y R,P W = Y = W X (4) w, w,q. w R, w R,Q X = x, x,p. x Q, x Q,P (5) Matrix Y and matrix X are special matrices. Matrix W is a special matrix kernels. This convolutional matrix provides a new view computation output Y... Winograd Algorithm We denote an r-tap FIR filter with m outputs as F(m, r). The conventional algorithm for F(, 3) is shown in Equation (6), where d 0, d, d and d 3 are inputs filter, and h 0, h and h are parameters filter. As Equation (6) shows, it uses 6 multiplications and 4 additions to compute F(, 3). F(, 3) = d 0 d d d d d 3 h 0 h h = d 0 h 0 + d h + d h d h 0 + d h + d 3 h If we use minimal filtering algorithm 9 to compute F(m, r), it requires (m + r ) multiplications. The process algorithm for computing F(, 3) is shown in Equations (7) (). m = (d 0 d )h 0 (7) (6) m = (d + d ) (h 0 + h + h ) (8) F(, 3) = m 3 = (d d ) (h 0 h + h ) (9) m 4 = (d d 3 )h (0) d 0 h 0 + d h + d h m = + m + m 3 () d h 0 + d h + d 3 h m m 3 m 4 The computation can be written in matrix form as Equation (). F(, 3) = h 0 h h d 0 d d d 3 () We substitute A, G and B for matrices in Equation (). Equation () can n be rewritten as Equation (3). Y = A T ((Gh) (B T d)), (3) A = , G =, B = In Equation (3), indicates element-wise multiplication, and superscript T indicates transpose operator. A, G and B are defined in Equation (4). (4)

4 Algorithms 08,, 59 4 We can see from Equation (7) to Equation () that whole process needs 4 multiplications. However, it also needs 4 additions to transform data, 3 additions and multiplications by a constant to transform filter, and 4 additions to transform final result. (To compare complexity easily, we regard multiplication by a constant as an addition.) The -dimensional filters F(m m, r r) can be generalized by filter F(m, r) as Equation (5). Y = A T ((GhG T ) (B T db))a (5) F(, 3 3) needs 4 4 = 6 multiplications, 3 additions to transform data, 8 additions to transform filter, and 4 additions to transform final result. The conventional algorithm needs 36 multiplications to calculate result. This algorithm can reduce number multiplications from 36 to 6. F(, 3 3) can be used to compute convolutional layer with 3 3 kernels. Each input map can be divided into smaller in order to use Equation (5). If we substitute U = GwG T and V = B T B, n Equation (3) can be rewritten as Equation (6). y r,p = Q w r,q x q,p q= = Q A T ((GwG T )(B T B))A q= (6) = Q A T (U r,q V q,p )A q=.3. Strassen Algorithm Suppose re are two matrices A and B, and matrix C is product A and B. The numbers elements in both rows and columns A, B and C are even. We can partition A, B and C into block matrices equal s as follows: A = A, A, A, A,, B = B, B, B, B,, C = C, C, C, C, According to conventional matrix multiplication algorithm, we n have Equation (8). C = A B = A, B, + A, B, A, B, + A, B, A, B, + A, B, A, B, + A, B, As Equation (8) shows, we need 8 multiplications and 4 additions to complete matrix C. The Strassen algorithm can be used to reduce number multiplications 3. The process Strassen algorithm is shown as follows: (7) (8) I = (A, + A, ) (B, + B, ), (9) II = (A, + A, ) B,, (0) III = A, (B, B, ), () IV = A, (B, B, ), () V = (A, + A, ) B,, (3) VI = (A, A, ) (B, +B, ), (4) VII = (A, A, ) (B, + B, ), (5)

5 Algorithms 08,, 59 5 C, = I + IV V + VII, (6) C, = III + V, (7) C, = II + IV, (8) C, = I II + III + VI, (9) where I, II, III, IV, V, VI, VII are temporary matrices. The whole process requires 7 multiplications and 8 additions. It reduces number multiplications from 8 to 7 without changing computational results. More multiplications can be saved by using Strassen algorithm recursively, as long as numbers rows and columns submatrices are even. If we use N recursions Strassen algorithm, n it can save (7/8) N multiplications. The Strassen algorithm is suitable for special convolutional matrix in Equation (4) 4. Therefore, we can use Strassen algorithm to handle a convolutional matrix. 3. Proposed Algorithm As we can see from Section., Winograd algorithm incurs more additions. To avoid repeating transform W and X in Equation (6), we calculate matrices U and V separately. This can reduce number additions incurred by this algorithm. The practical implementation this algorithm is listed in Algorithm. The calculation output M in Algorithm is main complexity multiplication in whole computation process. To reduce computational complexity output M, we can use Strassen algorithm. Before using Strassen algorithm, we need to reform expression M as follows. The output M in Algorithm can be written as equation M r,p = Q q= ( ) A T (U r,q V q,p )A, (30) where U r,q and V q,p are temporary matrices, and A is constant parameter matrix. To show equation easily, we ignore matrix A. (Matrix A is not ignored in actual implementation algorithm.) The output M can n be written as shown in Equation (3). M r,p = Q q= ( Ur,q V q,p ) (3) We denote three special matrices M, U and V. M r,p, U r,q, and V q,p are elements matrices M, U and V, respectively, as shown in Equation (33). The output M can n be written as a multiplication matrix U and matrix V. M = U V (3) M = M, M,P. M R, M R,P U = U, U,Q. U R, U R,Q V = V, V,P. V Q, V Q,P (33) In this case, we can partition matrices M, U and V into equal-d block matrices, and n use Strassen algorithm to reduce number multiplications between U r,q and V q,p. The multiplication in Strassen algorithm is redefined as element-wise multiplication matrices U r,q and V q,p. We name this new combination as Strassen-Winograd algorithm.

6 Algorithms 08,, 59 6 Algorithm. Implementation Winograd algorithm. for r = to number output for q = to number input 3 U = GwG T 4 end 5 end 6 for p = to batch 7 for q = to number input 8 for k = to number image tiles 9 V = B T xb 0 end end end 3 for p = to batch 4 for r = to number output 5 for j = to number image tiles 6 M = zero; 7 for q = to number input 8 M = M + A T (U V)A 9 end 0 end end end To compare oretically computational complexity conventional algorithm, Strassen algorithm, Winograd algorithm, and Strassen-Winograd algorithm, we list complexity multiplication and addition in Table. The output map is set to 64 64, and kernel is set to 3 3. Table. Computational complexity different algorithms. Matrix Size Conventional Strassen Winograd Strassen-Winograd N Mul Add Mul Add Mul Add Mul Add 94,9 78,58 58, ,04 3,07 344,76 4,688 40,50 4,359,96,93,760,806,336,46,640,048,576,94,08 80,86,908, We can see from Table that, although algorithms cause more additions when matrix is small, number extra additions is less than number decreased multiplications. Moreover, multiplication usually costs more time than addition. Hence three algorithms are all oretically effective in reducing computational complexity. Figure shows a comparison computational complexity ratios. The Strassen algorithm shows less reduction multiplication when matrix is small, but it incurs less additions. The Winograd algorithm shows a stable performance. Moreover, number additions slightly decreases as matrix increases. For small-d matrices, Strassen-Winograd algorithm shows a much better reduction in multiplication complexity than Strassen algorithm. Although it incurs more additions, number extra additions is much less than number decreased multiplications. The Strassen-Winograd algorithm shows a similar performance to Winograd algorithm. When matrix is small, Winograd algorithm shows a slightly better performance,

7 Winograd algorithm shows a stable performance. Moreover, number additions slightly decreases as matrix increases. For small-d matrices, Strassen-Winograd algorithm shows a much better reduction in multiplication complexity than Strassen algorithm. Although it incurs more additions, number extra additions is much less than number decreased multiplications. The Strassen-Winograd algorithm shows a similar performance to Winograd Algorithms 08,, 59 7 algorithm. When matrix is small, Winograd algorithm shows a slightly better performance, whereas Strassen-Winograd algorithm and Strassen algorithm perform much better whereas matrix Strassen-Winograd increases. algorithm and Strassen algorithm perform much better as matrix increases. Figure.. Comparisons complexity ratio with different matrix s. 4. Simulation Results 4. Simulation Results Several simulations were conducted to evaluate our algorithm. We compare our algorithm with Several simulations were conducted to evaluate our algorithm. We compare our algorithm with Strassen algorithm and Winograd algorithm, measuring performance by runtime in MATLAB Strassen algorithm and Winograd algorithm, measuring performance by runtime in MATLAB R03b (CPU: Inter(R) Core(TM) i7-3370k). For objectivity, we apply Equation (8) to conventional R03b (CPU: Inter(R) Core(TM) i7-3370k). For objectivity, we apply Equation (8) to algorithm and use it as a benchmark. Moreover, all input data x and kernel w are randomly conventional algorithm and use it as a benchmark. Moreover, all input data x and kernel w are generated. We measure accuracy our algorithm by absolute element error in output randomly generated. We measure accuracy our algorithm by absolute element error in. As a benchmark, we use conventional algorithm with double precision data, kernels, output. As a benchmark, we use conventional algorithm with double precision data, middle variables and outputs. The or algorithms in this comparison use double precision data and kernels, middle variables and outputs. The or algorithms in this comparison use double precision kernels but single precision middle variables and outputs. data and kernels but single precision middle variables and outputs. The VGG network was applied to our simulation. There are nine different convolutional layers The VGG network was applied to our simulation. There are nine different convolutional in VGG network. The parameters convolutional layer are shown in Table. The depth layers in VGG network. The parameters convolutional layer are shown in Table. The indicates number times a layer occurs in network. Q indicates number input depth indicates number times a layer occurs in network. Q indicates number input. R indicates number output. Mw and Nw represent kernel.. R indicates number output. Mw and Nw represent My and Ny represent output map. The kernel in VGG network is kernel. My and Ny represent output map. The kernel in VGG 3 3. We apply F(, 3 3) to operation convolution. For computation output network is 3 3. We apply F(, 3 3) to operation convolution. For computation map with My Ny, map is partitioned into (My/) (Ny/) sets, each using one output map with My Ny, map is partitioned into (My/) (Ny/) sets, each using computation F(, 3 3). one computation F(, 3 3). Table. Parameters convolutional layers in Visual Geometry Group (VGG) network. Convolutional Layer Parameters Depth Q R Mw(Nw) My(Ny) As Table shows, numbers rows and columns are not always even, and matrices are not always square. To solve this problem, we pad a dummy row or column in matrices once we encounter an odd number rows or columns. The matrix can n continue using Strassen algorithm. We apply se nine convolutional layers in turn to our simulations. For each convolutional layer, we run four algorithms with different batch s from to 3. The runtime consumption algorithms is listed in Table 3, and numerical accuracy different algorithms in different layers is shown in Table 4.

8 Algorithms 08,, 59 8 Table 3. Runtime consumption different algorithms. Layer Batch Size Layer Layer Layer3 Layer4 Layer5 Layer6 Layer7 Layer8 Layer9 Conventional 4s 48s 94s 87s 375s 75s Strassen 4s 56s 95s 9s 383s 768s Winograd 4s 9s 57s 5s 30s 46s Strassen-Winograd 4s 33s 58s 7s 34s 470s Conventional 493s 986s 97s 3939s 7888s 58s Strassen 49s 86s 508s 636s 465s 8438s Winograd 99s 598s 96s 396s 4787s 9935s Strassen-Winograd 99s 543s 99s 88s 3348s 6468s Conventional 45s 490s 980s 96s 396s 7858s Strassen 47s 433s 759s 38s 35s 4076s Winograd 8s 56s 53s 05s 049s 40s Strassen-Winograd 8s 9s 4s 737s 335s 47s Conventional 488s 978s 954s 3908s 789s 5639s Strassen 494s 864s 53s 648s 466s 840s Winograd 54s 509s 07s 033s 4075s 868s Strassen-Winograd 54s 455s 84s 466s 645s 48s Conventional 50s 50s 007s 004s 40s 8076s Strassen 48s 436s 76s 38s 37s 4078s Winograd 8s 36s 47s 94s 88s 3776s Strassen-Winograd 8s 09s 370s 656s 67s 085s Conventional 498s 00s 998s 3995s 7948s 589s Strassen 494s 868s 507s 646s 4643s 80s Winograd 3s 46s 93s 844s 3693s 738s Strassen-Winograd 3s 40s 75s 86s 96s 4089s Conventional 44s 487s 980s 940s 390s 780s Strassen 4s 4s 739s 83s 50s 396s Winograd 6s 3s 46s 90s 839s 3680s Strassen-Winograd 6s 04s 358s 630s s 96s Conventional 479s 955s 97s 3833s 7675s 539s Strassen 474s 89s 453s 546s 4447s 78s Winograd s 443s 884s 766s 354s 7068s Strassen-Winograd 3s 39s 686s 0s 9s 377s Conventional 8s 37s 474s 95s 900s 383s Strassen 7s 06s 36s 63s 07s 937s Winograd 65s 8s 54s 507s 009s 00s Strassen-Winograd 65s 3s 97s 345s 606s 063s Table 4. Maximum element error different algorithms in different layers. Conventional Strassen Winograd Strassen-Winograd Layer Layer Layer Layer Layer Layer Layer Layer Layer Table 4 shows that Winograd algorithm is slightly more accurate than Strassen algorithm and Strassen-Winograd algorithm. The maximum element error se algorithms is Compared with minimum value in output map, accuracy loss incurred

9 Algorithms 08,, 59 9 by se algorithms is negligible. As we can see from Section, oretically, processes in all se algorithms do not result in a loss in accuracy. In practice, a loss in accuracy is mainly caused by single precision data. Because conventional algorithm with low precision data is sufficiently accurate for deep learning 0,, we conclude that accuracy our algorithm is equally sufficient. To compare runtime easily, we use conventional algorithm as a benchmark, and calculate saving on runtime displayed by or algorithms. The result is shown in Figure 3. The Strassen-Winograd algorithm shows a better performance than benchmark in all layers except layer. This is because number input Q in layer is three, which limits performance algorithm as a small matrix incurs more additions. Moreover, odd numbers rows Algorithms columns 08,, need x FOR dummy PEER REVIEW rows or columns for matrix partitioning, which causes more runtime. 0 Figure 3. Comparisons with different batch s. Figure 3. Comparisons with different batch s. The performance Winograd algorithm is stable from layer to layer9. It saves 53% runtime 5. Conclusions average, and which Future iswork close to 56% reduction in multiplications. The performances Strassen algorithm and Strassen-Winograd algorithm improve as batch increases. For example, The computational complexity convolutional neural networks is an urgent problem for realtime applications. Both Strassen algorithm and Winograd algorithm are effective in reducing in layer7, when batch is, we cannot partition matrix to use Strassen algorithm, and re is almost no saving on runtime. The Strassen-Winograd algorithm saves 5% runtime, computational complexity without losing accuracy. This paper proposed to combine se algorithms a similar saving as Winograd algorithm. When batch is, Strassen algorithm saves to reduce heavy computational burden. The proposed strategy was evaluated with VGG network. Both oretical performance assessment and experimental results show that Strassen-Winograd algorithm can dramatically reduce computational complexity. There remain limitations that need to be addressed in future research. Although algorithm reduces computational complexity convolutional neural networks, cost is an increased

10 Algorithms 08,, % runtime, which equates to 3% reduction in multiplications. The Strassen-Winograd algorithm saves 58% runtime, which is close to 6% reduction in multiplications. As batch increases, Strassen algorithm and Strassen-Winograd algorithm can use more recursions, which can furr reduce number multiplications and save more runtime. When batch is 3, Strassen-Winograd algorithm saves 75% runtime, while Strassen algorithm and Winograd algorithm save 49% and 53%, respectively. Though experiments with larger batch s were not carried out due to limitations on time and memory, we can see trend in performance as batch increases. This is consistent with oretical analysis in Section 3. We conclude refore that proposed algorithm can provide optimal performance by combining savings se two algorithms. 5. Conclusions and Future Work The computational complexity convolutional neural networks is an urgent problem for real-time applications. Both Strassen algorithm and Winograd algorithm are effective in reducing computational complexity without losing accuracy. This paper proposed to combine se algorithms to reduce heavy computational burden. The proposed strategy was evaluated with VGG network. Both oretical performance assessment and experimental results show that Strassen-Winograd algorithm can dramatically reduce computational complexity. There remain limitations that need to be addressed in future research. Although algorithm reduces computational complexity convolutional neural networks, cost is an increased difficulty in implementation, especially in real-time systems and embedded devices. It also increases difficulty parallelizing an artificial network for hardware acceleration. In future work, we aim to apply this method to hardware accelerator using practical applications. Author Contributions: Y.Z. performed experiments and wrote paper. D.W. provided suggestions about algorithm. L.W. analyzed complexity algorithms. P.L. checked paper. Funding: This research received no external funding. Acknowledgments: This work was supported in part by National Natural Science Foundation China under Granted , and in part by Young Talent Program Institute Acoustics, Chinese Academy Science, under Granted QNYC06. Conflicts Interest: The authors declare no conflict interest. References. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arxiv 04, arxiv: Liu, N.; Wan, L.; Zhang, Y.; Zhou, T.; Huo, H.; Fang, T. Exploiting Convolutional Neural Networks with Deeply Local Description for Remote Sensing Image Classification. IEEE Access 08, 6, 5 8. CrossRef 3. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings International Conference on Neural Information Processing Systems, 3 6 December 0; Volume 60, pp Le, N.M.; Granger, E.; Kiran, M. A comparison CNN-based face and head detectors for real-time video surveillance applications. In Proceedings Seventh International Conference on Image Processing Theory, Tools and Applications, Montreal, QC, Canada, 8 November December Ren, S.; He, K.; Girshick, R. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8 3 December 04; pp Denil, M.; Shakibi, B.; Dinh, L. Predicting Parameters in Deep Learning. In Proceedings International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5 0 December 03; pp

11 Algorithms 08,, Han, S.; Pool, J.; Tran, J. Learning both Weights and Connections for Efficient Neural Networks. In Proceedings International Conference on Neural Information Processing Systems, Istanbul, Turkey, 9 November 05; pp Guo, Y.; Yao, A.; Chen, Y. Dynamic Network Surgery for Efficient DNNs. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 06; pp Qiu, J.; Wang, J.; Yao, S. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In Proceedings ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 3 February 06; pp Courbariaux, M.; Bengio, Y.; David, J.P. Low Precision Arithmetic for Deep Learning. arxiv 04, arxiv: Gupta, S.; Agrawal, A.; Gopalakrishnan, K.; Narayanan, P. Deep Learning with Limited Numerical Precision. In Proceedings International Conference on Machine Learning, Lille, France, 7 9 July 05.. Rastegari, M.; Ordonez, V.; Redmon, J. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. In Proceedings European Conference on Computer Vision, Amsterdam, The Nerlands, 4 October 06; Springer: Cham, Switzerland, 06; pp Zhu, C.; Han, S.; Mao, H. Trained Ternary Quantization. In Proceedings International Conference on Learning Representations, Toulon, France, 4 6 April Zhang, X.; Zou, J.; Ming, X.; Sun, J. Efficient and accurate approximations nonlinear convolutional networks. In Proceedings Computer Vision and Pattern Recognition, Boston, MA, USA, 7 June 04; pp Mathieu, M.; Henaff, M.; Lecun, Y.; Chintala, S.; Piantino, S.; Lecun, Y. Fast Training Convolutional Networks through FFTs. arxiv 03, arxiv: Vasilache, N.; Johnson, J.; Mathieu, M. Fast Convolutional Nets with fbfft: A GPU Performance Evaluation. In Proceedings International Conference on Learning Representations, San Diego, CA, USA, 7 9 May Toom, A.L. The complexity a scheme functional elements simulating multiplication integers. Dokl. Akad. Nauk SSSR 963, 50, Cook, S.A. On Minimum Computation Time for Multiplication. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, Winograd, S. Arithmetic Complexity Computations; SIAM: Philadelphia, PA, USA, Jiao, Y.; Zhang, Y.; Wang, Y.; Wang, B.; Jin, J.; Wang, X. A novel multilayer correlation maximization model for improving CCA-based frequency recognition in SSVEP brain-computer interface. Int. J. Neural Syst. 08, 8, CrossRef PubMed. Wang, R.; Zhang, Y.; Zhang, L. An adaptive neural network approach for operator functional state prediction using psychophysiological data. Integr. Comput. Aided Eng. 05, 3, CrossRef. Lavin, A.; Gray, S. Fast Algorithms for Convolutional Neural Networks. In Proceedings Computer Vision and Pattern Recognition, Caesars Palace, NV, USA, 6 June July 06; pp Strassen, V. Gaussian elimination is not optimal. Numer. Math. 969, 3, CrossRef 4. Cong, J.; Xiao, B. Minimizing Computation in Convolutional Neural Networks. In Proceedings Artificial Neural Networks and Machine Learning ICANN 04, Hamburg, Germany, 5 9 September by authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under terms and conditions Creative Commons Attribution (CC BY) license (

Minimizing Computation in Convolutional Neural Networks

Minimizing Computation in Convolutional Neural Networks Minimizing Computation in Convolutional Neural Networks Jason Cong and Bingjun Xiao Computer Science Department, University of California, Los Angeles, CA 90095, USA {cong,xiao}@cs.ucla.edu Abstract. Convolutional

More information

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Junzhong Shen, You Huang, Zelong Wang, Yuran Qiao, Mei Wen, Chunyuan Zhang National University of Defense Technology,

More information

Towards Design Space Exploration and Optimization of Fast Algorithms for Convolutional Neural Networks (CNNs) on FPGAs

Towards Design Space Exploration and Optimization of Fast Algorithms for Convolutional Neural Networks (CNNs) on FPGAs Preprint: Accepted at 22nd IEEE Design, Automation & Test in Europe Conference & Exhibition (DATE 19) Towards Design Space Exploration and Optimization of Fast Algorithms for Convolutional Neural Networks

More information

Channel Locality Block: A Variant of Squeeze-and-Excitation

Channel Locality Block: A Variant of Squeeze-and-Excitation Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan

More information

Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network

Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Switched by Input: Power Efficient Structure for RRAMbased Convolutional Neural Network Lixue Xia, Tianqi Tang, Wenqin Huangfu, Ming Cheng, Xiling Yin, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E., Tsinghua

More information

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong

Proceedings of the International MultiConference of Engineers and Computer Scientists 2018 Vol I IMECS 2018, March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong , March 14-16, 2018, Hong Kong TABLE I CLASSIFICATION ACCURACY OF DIFFERENT PRE-TRAINED MODELS ON THE TEST DATA

More information

Real-time object detection towards high power efficiency

Real-time object detection towards high power efficiency Real-time object detection towards high power efficiency Jincheng Yu, Kaiyuan Guo, Yiming Hu, Xuefei Ning, Jiantao Qiu, Huizi Mao, Song Yao, Tianqi Tang, Boxun Li, Yu Wang, and Huazhong Yang Tsinghua University,

More information

Model Compression. Girish Varma IIIT Hyderabad

Model Compression. Girish Varma IIIT Hyderabad Model Compression Girish Varma IIIT Hyderabad http://bit.ly/2tpy1wu Big Huge Neural Network! AlexNet - 60 Million Parameters = 240 MB & the Humble Mobile Phone 1 GB RAM 1/2 Billion FLOPs NOT SO BAD! But

More information

Binary Convolutional Neural Network on RRAM

Binary Convolutional Neural Network on RRAM Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua

More information

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage

HENet: A Highly Efficient Convolutional Neural. Networks Optimized for Accuracy, Speed and Storage HENet: A Highly Efficient Convolutional Neural Networks Optimized for Accuracy, Speed and Storage Qiuyu Zhu Shanghai University zhuqiuyu@staff.shu.edu.cn Ruixin Zhang Shanghai University chriszhang96@shu.edu.cn

More information

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Real-time convolutional networks for sonar image classification in low-power embedded systems

Real-time convolutional networks for sonar image classification in low-power embedded systems Real-time convolutional networks for sonar image classification in low-power embedded systems Matias Valdenegro-Toro Ocean Systems Laboratory - School of Engineering & Physical Sciences Heriot-Watt University,

More information

Convolutional Neural Network Layer Reordering for Acceleration

Convolutional Neural Network Layer Reordering for Acceleration R1-15 SASIMI 2016 Proceedings Convolutional Neural Network Layer Reordering for Acceleration Vijay Daultani Subhajit Chaudhury Kazuhisa Ishizaka System Platform Labs Value Co-creation Center System Platform

More information

Content-Based Image Recovery

Content-Based Image Recovery Content-Based Image Recovery Hong-Yu Zhou and Jianxin Wu National Key Laboratory for Novel Software Technology Nanjing University, China zhouhy@lamda.nju.edu.cn wujx2001@nju.edu.cn Abstract. We propose

More information

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network Liwen Zheng, Canmiao Fu, Yong Zhao * School of Electronic and Computer Engineering, Shenzhen Graduate School of

More information

A Hybrid Feature Extractor using Fast Hessian Detector and SIFT

A Hybrid Feature Extractor using Fast Hessian Detector and SIFT Technologies 2015, 3, 103-110; doi:10.3390/technologies3020103 OPEN ACCESS technologies ISSN 2227-7080 www.mdpi.com/journal/technologies Article A Hybrid Feature Extractor using Fast Hessian Detector and

More information

Study of Residual Networks for Image Recognition

Study of Residual Networks for Image Recognition Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks

More information

A Method to Estimate the Energy Consumption of Deep Neural Networks

A Method to Estimate the Energy Consumption of Deep Neural Networks A Method to Estimate the Consumption of Deep Neural Networks Tien-Ju Yang, Yu-Hsin Chen, Joel Emer, Vivienne Sze Massachusetts Institute of Technology, Cambridge, MA, USA {tjy, yhchen, jsemer, sze}@mit.edu

More information

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.

Deep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD. Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,

More information

Robust Face Recognition Based on Convolutional Neural Network

Robust Face Recognition Based on Convolutional Neural Network 2017 2nd International Conference on Manufacturing Science and Information Engineering (ICMSIE 2017) ISBN: 978-1-60595-516-2 Robust Face Recognition Based on Convolutional Neural Network Ying Xu, Hui Ma,

More information

Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators

Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators Neural Network-Hardware Co-design for Scalable RRAM-based BNN Accelerators Yulhwa Kim, Hyungjun Kim, and Jae-Joon Kim Dept. of Creative IT Engineering, Pohang University of Science and Technology (POSTECH),

More information

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen

A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY

More information

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides

Deep Learning in Visual Recognition. Thanks Da Zhang for the slides Deep Learning in Visual Recognition Thanks Da Zhang for the slides Deep Learning is Everywhere 2 Roadmap Introduction Convolutional Neural Network Application Image Classification Object Detection Object

More information

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks Surat Teerapittayanon Harvard University Email: steerapi@seas.harvard.edu Bradley McDanel Harvard University Email: mcdanel@fas.harvard.edu

More information

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech

Convolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:

More information

Video Inter-frame Forgery Identification Based on Optical Flow Consistency

Video Inter-frame Forgery Identification Based on Optical Flow Consistency Sensors & Transducers 24 by IFSA Publishing, S. L. http://www.sensorsportal.com Video Inter-frame Forgery Identification Based on Optical Flow Consistency Qi Wang, Zhaohong Li, Zhenzhen Zhang, Qinglong

More information

Supplementary material for Analyzing Filters Toward Efficient ConvNet

Supplementary material for Analyzing Filters Toward Efficient ConvNet Supplementary material for Analyzing Filters Toward Efficient Net Takumi Kobayashi National Institute of Advanced Industrial Science and Technology, Japan takumi.kobayashi@aist.go.jp A. Orthonormal Steerable

More information

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu

More information

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun Presented by Tushar Bansal Objective 1. Get bounding box for all objects

More information

Instruction Driven Cross-Layer CNN Accelerator with Winograd Transformation on FPGA

Instruction Driven Cross-Layer CNN Accelerator with Winograd Transformation on FPGA Instruction Driven Cross-Layer CNN Accelerator with Winograd Transformation on FPGA Abstract In recent years, Convolutional Neural Network (CNN) has been widely applied in computer vision tasks. FPGAs

More information

ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N.

ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N. ADVANCED IMAGE PROCESSING METHODS FOR ULTRASONIC NDE RESEARCH C. H. Chen, University of Massachusetts Dartmouth, N. Dartmouth, MA USA Abstract: The significant progress in ultrasonic NDE systems has now

More information

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS 1 RONNIE O. SERFA JUAN, 2 CHAN SU PARK, 3 HI SEOK KIM, 4 HYEONG WOO CHA 1,2,3,4 CheongJu University E-maul: 1 engr_serfs@yahoo.com,

More information

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University Lab 4: Binarized Convolutional Neural Networks Due Wednesday, October 31, 2018, 11:59pm

More information

A High-Precision Fusion Algorithm for Fabric Quality Inspection

A High-Precision Fusion Algorithm for Fabric Quality Inspection 2017 2 nd International Conference on Computer Engineering, Information Science and Internet Technology (CII 2017) ISBN: 978-1-60595-504-9 A High-Precision Fusion Algorithm for Fabric Quality Inspection

More information

C-Brain: A Deep Learning Accelerator

C-Brain: A Deep Learning Accelerator C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level Parallelization Lili Song, Ying Wang, Yinhe Han, Xin Zhao, Bosheng Liu, Xiaowei Li State Key Laboratory

More information

Kaggle Data Science Bowl 2017 Technical Report

Kaggle Data Science Bowl 2017 Technical Report Kaggle Data Science Bowl 2017 Technical Report qfpxfd Team May 11, 2017 1 Team Members Table 1: Team members Name E-Mail University Jia Ding dingjia@pku.edu.cn Peking University, Beijing, China Aoxue Li

More information

Application of Improved Lzc Algorithm in the Discrimination of Photo and Text ChengJing Ye 1, a, Donghai Zeng 2,b

Application of Improved Lzc Algorithm in the Discrimination of Photo and Text ChengJing Ye 1, a, Donghai Zeng 2,b 2016 International Conference on Information Engineering and Communications Technology (IECT 2016) ISBN: 978-1-60595-375-5 Application of Improved Lzc Algorithm in the Discrimination of Photo and Text

More information

Image Classification using Fast Learning Convolutional Neural Networks

Image Classification using Fast Learning Convolutional Neural Networks , pp.50-55 http://dx.doi.org/10.14257/astl.2015.113.11 Image Classification using Fast Learning Convolutional Neural Networks Keonhee Lee 1 and Dong-Chul Park 2 1 Software Device Research Center Korea

More information

Automatic detection of books based on Faster R-CNN

Automatic detection of books based on Faster R-CNN Automatic detection of books based on Faster R-CNN Beibei Zhu, Xiaoyu Wu, Lei Yang, Yinghua Shen School of Information Engineering, Communication University of China Beijing, China e-mail: zhubeibei@cuc.edu.cn,

More information

Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks

Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks Abstract Deep Convolutional Neural Networks (DCNN) have proven to be very effective in many pattern recognition applications, such

More information

An Adaptive Threshold LBP Algorithm for Face Recognition

An Adaptive Threshold LBP Algorithm for Face Recognition An Adaptive Threshold LBP Algorithm for Face Recognition Xiaoping Jiang 1, Chuyu Guo 1,*, Hua Zhang 1, and Chenghua Li 1 1 College of Electronics and Information Engineering, Hubei Key Laboratory of Intelligent

More information

Handwritten Digit Classication using 8-bit Floating Point based Convolutional Neural Networks

Handwritten Digit Classication using 8-bit Floating Point based Convolutional Neural Networks Downloaded from orbit.dtu.dk on: Nov 22, 2018 Handwritten Digit Classication using 8-bit Floating Point based Convolutional Neural Networks Gallus, Michal; Nannarelli, Alberto Publication date: 2018 Document

More information

arxiv: v1 [cs.lg] 9 Jan 2019

arxiv: v1 [cs.lg] 9 Jan 2019 How Compact?: Assessing Compactness of Representations through Layer-Wise Pruning Hyun-Joo Jung 1 Jaedeok Kim 1 Yoonsuck Choe 1,2 1 Machine Learning Lab, Artificial Intelligence Center, Samsung Research,

More information

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python.

Inception and Residual Networks. Hantao Zhang. Deep Learning with Python. Inception and Residual Networks Hantao Zhang Deep Learning with Python https://en.wikipedia.org/wiki/residual_neural_network Deep Neural Network Progress from Large Scale Visual Recognition Challenge (ILSVRC)

More information

FPGA Implementation of an Efficient Two-dimensional Wavelet Decomposing Algorithm

FPGA Implementation of an Efficient Two-dimensional Wavelet Decomposing Algorithm FPGA Implementation of an Efficient Two-dimensional Wavelet Decomposing Algorithm # Chuanyu Zhang, * Chunling Yang, # Zhenpeng Zuo # School of Electrical Engineering, Harbin Institute of Technology Harbin,

More information

Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks

Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks Design Space Exploration of FPGA-Based Deep Convolutional Neural Networks Mohammad Motamedi, Philipp Gysel, Venkatesh Akella and Soheil Ghiasi Electrical and Computer Engineering Department, University

More information

RSRN: Rich Side-output Residual Network for Medial Axis Detection

RSRN: Rich Side-output Residual Network for Medial Axis Detection RSRN: Rich Side-output Residual Network for Medial Axis Detection Chang Liu, Wei Ke, Jianbin Jiao, and Qixiang Ye University of Chinese Academy of Sciences, Beijing, China {liuchang615, kewei11}@mails.ucas.ac.cn,

More information

End-To-End Spam Classification With Neural Networks

End-To-End Spam Classification With Neural Networks End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam

More information

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling

Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling [DOI: 10.2197/ipsjtcva.7.99] Express Paper Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling Takayoshi Yamashita 1,a) Takaya Nakamura 1 Hiroshi Fukui 1,b) Yuji

More information

Exploring Capsules. Binghui Peng Runzhou Tao Shunyu Yao IIIS, Tsinghua University {pbh15, trz15,

Exploring Capsules. Binghui Peng Runzhou Tao Shunyu Yao IIIS, Tsinghua University {pbh15, trz15, Exploring Capsules Binghui Peng Runzhou Tao Shunyu Yao IIIS, Tsinghua University {pbh15, trz15, yao-sy15}@mails.tsinghua.edu.cn 1 Introduction Nowadays, convolutional neural networks (CNNs) have received

More information

In Live Computer Vision

In Live Computer Vision EVA 2 : Exploiting Temporal Redundancy In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson International Symposium on Computer Architecture (ISCA) Tuesday June 5, 2018

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Flexible Calibration of a Portable Structured Light System through Surface Plane

Flexible Calibration of a Portable Structured Light System through Surface Plane Vol. 34, No. 11 ACTA AUTOMATICA SINICA November, 2008 Flexible Calibration of a Portable Structured Light System through Surface Plane GAO Wei 1 WANG Liang 1 HU Zhan-Yi 1 Abstract For a portable structured

More information

Convolution Neural Networks for Chinese Handwriting Recognition

Convolution Neural Networks for Chinese Handwriting Recognition Convolution Neural Networks for Chinese Handwriting Recognition Xu Chen Stanford University 450 Serra Mall, Stanford, CA 94305 xchen91@stanford.edu Abstract Convolutional neural networks have been proven

More information

Classifying a specific image region using convolutional nets with an ROI mask as input

Classifying a specific image region using convolutional nets with an ROI mask as input Classifying a specific image region using convolutional nets with an ROI mask as input 1 Sagi Eppel Abstract Convolutional neural nets (CNN) are the leading computer vision method for classifying images.

More information

Learning visual odometry with a convolutional network

Learning visual odometry with a convolutional network Learning visual odometry with a convolutional network Kishore Konda 1, Roland Memisevic 2 1 Goethe University Frankfurt 2 University of Montreal konda.kishorereddy@gmail.com, roland.memisevic@gmail.com

More information

Weighted Convolutional Neural Network. Ensemble.

Weighted Convolutional Neural Network. Ensemble. Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com

More information

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation

A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation , pp.162-167 http://dx.doi.org/10.14257/astl.2016.138.33 A Novel Image Super-resolution Reconstruction Algorithm based on Modified Sparse Representation Liqiang Hu, Chaofeng He Shijiazhuang Tiedao University,

More information

Rotation Invariance Neural Network

Rotation Invariance Neural Network Rotation Invariance Neural Network Shiyuan Li Abstract Rotation invariance and translate invariance have great values in image recognition. In this paper, we bring a new architecture in convolutional neural

More information

Performance analysis of Integer DCT of different block sizes.

Performance analysis of Integer DCT of different block sizes. Performance analysis of Integer DCT of different block sizes. Aim: To investigate performance analysis of integer DCT of different block sizes. Abstract: Discrete cosine transform (DCT) has been serving

More information

ORT EP R RCH A ESE R P A IDI! " #$$% &' (# $!"

ORT EP R RCH A ESE R P A IDI!  #$$% &' (# $! R E S E A R C H R E P O R T IDIAP A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert a b Yoshua Bengio b IDIAP RR 01-12 April 26, 2002 Samy Bengio a published in Neural Computation,

More information

Iterative Removing Salt and Pepper Noise based on Neighbourhood Information

Iterative Removing Salt and Pepper Noise based on Neighbourhood Information Iterative Removing Salt and Pepper Noise based on Neighbourhood Information Liu Chun College of Computer Science and Information Technology Daqing Normal University Daqing, China Sun Bishen Twenty-seventh

More information

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION

FUSION MODEL BASED ON CONVOLUTIONAL NEURAL NETWORKS WITH TWO FEATURES FOR ACOUSTIC SCENE CLASSIFICATION Please contact the conference organizers at dcasechallenge@gmail.com if you require an accessible file, as the files provided by ConfTool Pro to reviewers are filtered to remove author information, and

More information

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms

Deep Neural Network Hyperparameter Optimization with Genetic Algorithms Deep Neural Network Hyperparameter Optimization with Genetic Algorithms EvoDevo A Genetic Algorithm Framework Aaron Vose, Jacob Balma, Geert Wenes, and Rangan Sukumar Cray Inc. October 2017 Presenter Vose,

More information

Fast Image Matching on Web Pages

Fast Image Matching on Web Pages Fast Image Matching on Web Pages HAZEM M. EL-BAKRY Faculty of Computer Science & Information Systems, Mansoura University, EGYPT E-mail: helbakry20@yahoo.com NIKOS MASTORAKIS Technical University of Sofia,

More information

Temperature Calculation of Pellet Rotary Kiln Based on Texture

Temperature Calculation of Pellet Rotary Kiln Based on Texture Intelligent Control and Automation, 2017, 8, 67-74 http://www.scirp.org/journal/ica ISSN Online: 2153-0661 ISSN Print: 2153-0653 Temperature Calculation of Pellet Rotary Kiln Based on Texture Chunli Lin,

More information

Image Segmentation Based on Watershed and Edge Detection Techniques

Image Segmentation Based on Watershed and Edge Detection Techniques 0 The International Arab Journal of Information Technology, Vol., No., April 00 Image Segmentation Based on Watershed and Edge Detection Techniques Nassir Salman Computer Science Department, Zarqa Private

More information

arxiv: v1 [cs.cv] 20 Dec 2017

arxiv: v1 [cs.cv] 20 Dec 2017 Learning a Wavelet-like Auto-Encoder to Accelerate Deep Neural Networks Tianshui Chen 1, Liang Lin 1,5, Wangmeng Zuo 2, Xiaonan Luo 3, Lei Zhang 4 1 School of Data and Computer Science, Sun Yat-sen University,

More information

A reversible data hiding based on adaptive prediction technique and histogram shifting

A reversible data hiding based on adaptive prediction technique and histogram shifting A reversible data hiding based on adaptive prediction technique and histogram shifting Rui Liu, Rongrong Ni, Yao Zhao Institute of Information Science Beijing Jiaotong University E-mail: rrni@bjtu.edu.cn

More information

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction by Noh, Hyeonwoo, Paul Hongsuck Seo, and Bohyung Han.[1] Presented : Badri Patro 1 1 Computer Vision Reading

More information

Extensions of One-Dimensional Gray-level Nonlinear Image Processing Filters to Three-Dimensional Color Space

Extensions of One-Dimensional Gray-level Nonlinear Image Processing Filters to Three-Dimensional Color Space Extensions of One-Dimensional Gray-level Nonlinear Image Processing Filters to Three-Dimensional Color Space Orlando HERNANDEZ and Richard KNOWLES Department Electrical and Computer Engineering, The College

More information

Feature-Fused SSD: Fast Detection for Small Objects

Feature-Fused SSD: Fast Detection for Small Objects Feature-Fused SSD: Fast Detection for Small Objects Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu School of Electronic Engineering, Xidian University, China xmxie@mail.xidian.edu.cn

More information

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks Layer-wise Performance Bottleneck Analysis of Deep Neural Networks Hengyu Zhao, Colin Weinshenker*, Mohamed Ibrahim*, Adwait Jog*, Jishen Zhao University of California, Santa Cruz, *The College of William

More information

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks

Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks Soheil Hashemi, Nicholas Anthony, Hokchhay Tann, R. Iris Bahar, Sherief Reda School of Engineering Brown

More information

High Performance Computing

High Performance Computing High Performance Computing 9th Lecture 2016/10/28 YUKI ITO 1 Selected Paper: vdnn: Virtualized Deep Neural Networks for Scalable, MemoryEfficient Neural Network Design Minsoo Rhu, Natalia Gimelshein, Jason

More information

Computer Vision Lecture 16

Computer Vision Lecture 16 Computer Vision Lecture 16 Deep Learning for Object Categorization 14.01.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Announcements Seminar registration period

More information

Redundancy Resolution by Minimization of Joint Disturbance Torque for Independent Joint Controlled Kinematically Redundant Manipulators

Redundancy Resolution by Minimization of Joint Disturbance Torque for Independent Joint Controlled Kinematically Redundant Manipulators 56 ICASE :The Institute ofcontrol,automation and Systems Engineering,KOREA Vol.,No.1,March,000 Redundancy Resolution by Minimization of Joint Disturbance Torque for Independent Joint Controlled Kinematically

More information

Real-time Object Detection CS 229 Course Project

Real-time Object Detection CS 229 Course Project Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection

More information

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION

REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION REGION AVERAGE POOLING FOR CONTEXT-AWARE OBJECT DETECTION Kingsley Kuan 1, Gaurav Manek 1, Jie Lin 1, Yuan Fang 1, Vijay Chandrasekhar 1,2 Institute for Infocomm Research, A*STAR, Singapore 1 Nanyang Technological

More information

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,

More information

Pedestrian Detection based on Deep Fusion Network using Feature Correlation

Pedestrian Detection based on Deep Fusion Network using Feature Correlation Pedestrian Detection based on Deep Fusion Network using Feature Correlation Yongwoo Lee, Toan Duc Bui and Jitae Shin School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South

More information

CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT

CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT CHAPTER 2 TEXTURE CLASSIFICATION METHODS GRAY LEVEL CO-OCCURRENCE MATRIX AND TEXTURE UNIT 2.1 BRIEF OUTLINE The classification of digital imagery is to extract useful thematic information which is one

More information

Convolutional Neural Networks

Convolutional Neural Networks Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

More information

Minimal Test Cost Feature Selection with Positive Region Constraint

Minimal Test Cost Feature Selection with Positive Region Constraint Minimal Test Cost Feature Selection with Positive Region Constraint Jiabin Liu 1,2,FanMin 2,, Shujiao Liao 2, and William Zhu 2 1 Department of Computer Science, Sichuan University for Nationalities, Kangding

More information

BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques

BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Network with Blocked Hashing Techniques Jingyang Zhu 1, Zhiliang Qian 2*, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and

More information

OPTIMIZED GPU KERNELS FOR DEEP LEARNING. Amir Khosrowshahi

OPTIMIZED GPU KERNELS FOR DEEP LEARNING. Amir Khosrowshahi OPTIMIZED GPU KERNELS FOR DEEP LEARNING Amir Khosrowshahi GTC 17 Mar 2015 Outline About nervana Optimizing deep learning at assembler level Limited precision for deep learning neon benchmarks 2 About nervana

More information

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 96 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (2016 ) 1409 1417 20th International Conference on Knowledge Based and Intelligent Information and Engineering Systems,

More information

Implementation Of Harris Corner Matching Based On FPGA

Implementation Of Harris Corner Matching Based On FPGA 6th International Conference on Energy and Environmental Protection (ICEEP 2017) Implementation Of Harris Corner Matching Based On FPGA Xu Chengdaa, Bai Yunshanb Transportion Service Department, Bengbu

More information

A Novel Extreme Point Selection Algorithm in SIFT

A Novel Extreme Point Selection Algorithm in SIFT A Novel Extreme Point Selection Algorithm in SIFT Ding Zuchun School of Electronic and Communication, South China University of Technolog Guangzhou, China zucding@gmail.com Abstract. This paper proposes

More information

HAI ZHOU. Evanston, IL Glenview, IL (847) (o) (847) (h)

HAI ZHOU. Evanston, IL Glenview, IL (847) (o) (847) (h) HAI ZHOU Electrical and Computer Engineering Northwestern University 2535 Happy Hollow Rd. Evanston, IL 60208-3118 Glenview, IL 60025 haizhou@ece.nwu.edu www.ece.nwu.edu/~haizhou (847) 491-4155 (o) (847)

More information

Implementation of 14 bits floating point numbers of calculating units for neural network hardware development

Implementation of 14 bits floating point numbers of calculating units for neural network hardware development International Conference on Recent Trends in Physics 206 (ICRTP206) Journal of Physics: Conference Series 755 (206) 000 doi:0.088/742-6596/755//000 Implementation of 4 bits floating numbers of calculating

More information

Hybrid filters for medical image reconstruction

Hybrid filters for medical image reconstruction Vol. 6(9), pp. 177-182, October, 2013 DOI: 10.5897/AJMCSR11.124 ISSN 2006-9731 2013 Academic Journals http://www.academicjournals.org/ajmcsr African Journal of Mathematics and Computer Science Research

More information

Keywords - DWT, Lifting Scheme, DWT Processor.

Keywords - DWT, Lifting Scheme, DWT Processor. Lifting Based 2D DWT Processor for Image Compression A. F. Mulla, Dr.R. S. Patil aieshamulla@yahoo.com Abstract - Digital images play an important role both in daily life applications as well as in areas

More information

arxiv: v1 [cs.cv] 11 Feb 2018

arxiv: v1 [cs.cv] 11 Feb 2018 arxiv:8.8v [cs.cv] Feb 8 - Partitioning of Deep Neural Networks with Feature Space Encoding for Resource-Constrained Internet-of-Things Platforms ABSTRACT Jong Hwan Ko, Taesik Na, Mohammad Faisal Amir,

More information

Face recognition based on improved BP neural network

Face recognition based on improved BP neural network Face recognition based on improved BP neural network Gaili Yue, Lei Lu a, College of Electrical and Control Engineering, Xi an University of Science and Technology, Xi an 710043, China Abstract. In order

More information

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn

Intro to Deep Learning. Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Intro to Deep Learning Slides Credit: Andrej Karapathy, Derek Hoiem, Marc Aurelio, Yann LeCunn Why this class? Deep Features Have been able to harness the big data in the most efficient and effective

More information

Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms

Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms Optimizing CNN-based Object Detection Algorithms on Embedded FPGA Platforms Ruizhe Zhao 1, Xinyu Niu 1, Yajie Wu 2, Wayne Luk 1, and Qiang Liu 3 1 Imperial College London {ruizhe.zhao15,niu.xinyu10,w.luk}@imperial.ac.uk

More information

An Improved Image Resizing Approach with Protection of Main Objects

An Improved Image Resizing Approach with Protection of Main Objects An Improved Image Resizing Approach with Protection of Main Objects Chin-Chen Chang National United University, Miaoli 360, Taiwan. *Corresponding Author: Chun-Ju Chen National United University, Miaoli

More information