A Faster Algorithm for Reducing the Computational Complexity of Convolutional Neural Networks

Size: px

Start display at page:

Download "A Faster Algorithm for Reducing the Computational Complexity of Convolutional Neural Networks"

Brittany Patterson
5 years ago
Views:

algorithms Article A Faster Algorithm for Reducing Computational Complexity Convolutional Neural Networks Yulin Zhao,,3, *, Donghui Wang,, Leiou Wang, and Peng Liu,,3 Institute Acoustics, Chinese

China * Correspondence: zhaoyulin4@mails.ucas.ac.

1 algorithms Article A Faster Algorithm for Reducing Computational Complexity Convolutional Neural Networks Yulin Zhao,,3, *, Donghui Wang,, Leiou Wang, and Peng Liu,,3 Institute Acoustics, Chinese Academy Sciences, Beijing 0090, China; wangdh@mail.ioa.ac.cn (D.W.); wangleiou@mail.ioa.ac.cn (L.W.); liupengucas@mail.ioa.ac.cn (P.L.) Key Laboratory Information Technology for Autonomous Underwater Vehicles, Institute Acoustics, Chinese Academy Sciences, Beijing 0090, China 3 University Chinese Academy Sciences, Beijing 00049, China * Correspondence: zhaoyulin4@mails.ucas.ac.cn Received: 0 September 08; Accepted: 6 October 08; Published: 8 October 08 Abstract: Convolutional neural networks have achieved remarkable improvements in image and video recognition but incur a heavy computational burden. To reduce computational complexity a convolutional neural network, this paper proposes an algorithm based on Winograd minimal filtering algorithm and Strassen algorithm. Theoretical assessments proposed algorithm show that it can dramatically reduce computational complexity. Furrmore, Visual Geometry Group (VGG) network is employed to evaluate algorithm in practice. The results show that proposed algorithm can provide optimal performance by combining savings se two algorithms. It saves 75% runtime compared with conventional algorithm. Keywords: convolutional neural network; Winograd; minimal filtering; Strassen; fast; complexity. Introduction Deep convolutional neural networks have achieved remarkable improvements in image and video processing 3. However, computational complexity se networks has also increased significantly. Since prediction process networks used in real-time applications requires very low latency, heavy computational burden is a major problem with se systems. Detecting faces from video imagery is still a challenging task 4,5. The success convolutional neural networks in se applications is limited by ir heavy computational burden. There have been a number studies on accelerating efficiency convolutional neural networks. Denil et al. 6 indicate that re are significant redundancies in parameterizations neural networks. Han et al. 7 and Guo et al. 8 use certain training strategies to compress se neural network models without significantly weakening ir performance. Some researchers 9 have found that low-precision computation is sufficient for networks. Binary/Ternary Net,3 restricts parameters to two or three values. Zhang et al. 4 used low-rank approximation to reconstruct convolution matrix, which can reduce complexity convolution. These algorithms are effective in accelerating computation in network, but y also cause a degradation in accuracy. Fast Fourier Transform (FFT) is also useful in reducing computational complexity convolutional neural networks without losing accuracy 5,6, but it is only effective for networks with large kernels. However, convolutional neural networks tend to use small kernels because y achieve better accuracy than networks with larger kernels. For se reasons, re is a demand for an algorithm that can accelerate efficiency networks with small kernels. In this paper, we present an algorithm based on Winograd s minimal filtering algorithm which was proposed by Toom 7 and Cook 8 and generalized by Winograd 9. The minimal filtering Algorithms 08,, 59; doi:0.3390/a0059

2 Algorithms Algorithms 08, 08,,, 59 x FOR PEER REVIEW algorithm can reduce computational complexity each convolution in network without algorithm can reduce computational complexity each convolution in network without losing losing accuracy. However, computational complexity is still large for real-time requirements. To accuracy. However, computational complexity is still large for real-time requirements. To reduce reduce furr computational complexity se networks, we utilize Strassen algorithm to furr computational complexity se networks, we utilize Strassen algorithm to reduce reduce number convolutions in network simultaneously. Moreover, we evaluate our number convolutions in network simultaneously. Moreover, we evaluate our algorithm with algorithm with Visual Geometry Group (VGG) network. Experimental results show that it can Visual Geometry Group (VGG) network. Experimental results show that it can save 75% save 75% time spent on computation when batch is 3. time spent on computation when batch is 3. The rest this paper is organized as follows. Section reviews related work on convolutional The rest this paper is organized as follows. Section reviews related work on convolutional neural networks, Winograd algorithm and Strassen algorithm. The proposed algorithm is neural networks, Winograd algorithm and Strassen algorithm. The proposed algorithm is presented in Section 3. Several simulations are included in Section 4, and work is concluded in presented in Section 3. Several simulations are included in Section 4, and work is concluded in Section 5. Section 5... Related Related Work Work.. Convolutional Neural Networks Machine-learning has has produced impressive results results in many in many signal signal processing processing applications applications 0,. Convolutional 0,. Convolutional neural neural networks networks extend extend machine-learning machine-learning capabilities capabilities neural neural networks networks by introducing by introducing convolutional layers layers to to network. network. Convolutional neural neural networks networks are are mainly mainly used used in image in image processing. Figure Figure shows structure aa classical convolutional neural network, LeNet. It consists two convolutional layers, two subsampling layers and three fully connected layers. Usually, computation convolutional layers occupies most network. input output convolutions subsampling convolutions subsampling Full connection Figure. The architecture LeNet. Convolutional Convolutional layers layers extract extract s s from from input input via via different different kernels. kernels. Suppose Suppose re re are are Q input input Mx Mx Nx Nx and and R output output My My Ny. Ny. The The convolutional convolutional kernel kernel is is Mw Mw Nw. Nw. The The computation computation output output in in a single single layer layer is is given given by by equation equation Q Mw Nw y r,x,y = Q Mw Nw y = r,x,y w r,q,u,v x q,x+u,y+v, () q= u= v= w x, r,q,u,v q,x+u,y+v () q= u= v= where X is input map, Y is output map, and W is kernel. The subscripts x where is input map, is output map, and is kernel. The subscripts and y indicate position pixel in map. The subscripts u and v indicate position and indicate position pixel in map. The subscripts and indicate position parameter in kernel. Equation () can be rewritten as Equation (). parameter in kernel. Equation () can be rewritten as Equation (). Q Q y r = y = w r,q * x q () r r, q q q= q= Suppose re are images that are sent toger to neural network, which means batch Suppose re are P images that are sent toger to neural network, which means batch is P. Then output in Equation () can be expressed by Equation (3). is P. Then output Y in Equation () can be expressed by Equation (3). y, = Q rp Q wrq, * x qp, (3) q= y r,p = w r,q x q,p (3) q= If we regard yr,p, wr,q and xq,p as elements matrices Y, W and X, respectively, output can be expressed as convolution matrix in Equation (4).

3 Algorithms 08,, 59 3 If we regard y r,p, w r,q and x q,p as elements matrices Y, W and X, respectively, output can be expressed as convolution matrix in Equation (4). Y = y, y,p. y R, y R,P W = Y = W X (4) w, w,q. w R, w R,Q X = x, x,p. x Q, x Q,P (5) Matrix Y and matrix X are special matrices. Matrix W is a special matrix kernels. This convolutional matrix provides a new view computation output Y... Winograd Algorithm We denote an r-tap FIR filter with m outputs as F(m, r). The conventional algorithm for F(, 3) is shown in Equation (6), where d 0, d, d and d 3 are inputs filter, and h 0, h and h are parameters filter. As Equation (6) shows, it uses 6 multiplications and 4 additions to compute F(, 3). F(, 3) = d 0 d d d d d 3 h 0 h h = d 0 h 0 + d h + d h d h 0 + d h + d 3 h If we use minimal filtering algorithm 9 to compute F(m, r), it requires (m + r ) multiplications. The process algorithm for computing F(, 3) is shown in Equations (7) (). m = (d 0 d )h 0 (7) (6) m = (d + d ) (h 0 + h + h ) (8) F(, 3) = m 3 = (d d ) (h 0 h + h ) (9) m 4 = (d d 3 )h (0) d 0 h 0 + d h + d h m = + m + m 3 () d h 0 + d h + d 3 h m m 3 m 4 The computation can be written in matrix form as Equation (). F(, 3) = h 0 h h d 0 d d d 3 () We substitute A, G and B for matrices in Equation (). Equation () can n be rewritten as Equation (3). Y = A T ((Gh) (B T d)), (3) A = , G =, B = In Equation (3), indicates element-wise multiplication, and superscript T indicates transpose operator. A, G and B are defined in Equation (4). (4)

4 Algorithms 08,, 59 4 We can see from Equation (7) to Equation () that whole process needs 4 multiplications. However, it also needs 4 additions to transform data, 3 additions and multiplications by a constant to transform filter, and 4 additions to transform final result. (To compare complexity easily, we regard multiplication by a constant as an addition.) The -dimensional filters F(m m, r r) can be generalized by filter F(m, r) as Equation (5). Y = A T ((GhG T ) (B T db))a (5) F(, 3 3) needs 4 4 = 6 multiplications, 3 additions to transform data, 8 additions to transform filter, and 4 additions to transform final result. The conventional algorithm needs 36 multiplications to calculate result. This algorithm can reduce number multiplications from 36 to 6. F(, 3 3) can be used to compute convolutional layer with 3 3 kernels. Each input map can be divided into smaller in order to use Equation (5). If we substitute U = GwG T and V = B T B, n Equation (3) can be rewritten as Equation (6). y r,p = Q w r,q x q,p q= = Q A T ((GwG T )(B T B))A q= (6) = Q A T (U r,q V q,p )A q=.3. Strassen Algorithm Suppose re are two matrices A and B, and matrix C is product A and B. The numbers elements in both rows and columns A, B and C are even. We can partition A, B and C into block matrices equal s as follows: A = A, A, A, A,, B = B, B, B, B,, C = C, C, C, C, According to conventional matrix multiplication algorithm, we n have Equation (8). C = A B = A, B, + A, B, A, B, + A, B, A, B, + A, B, A, B, + A, B, As Equation (8) shows, we need 8 multiplications and 4 additions to complete matrix C. The Strassen algorithm can be used to reduce number multiplications 3. The process Strassen algorithm is shown as follows: (7) (8) I = (A, + A, ) (B, + B, ), (9) II = (A, + A, ) B,, (0) III = A, (B, B, ), () IV = A, (B, B, ), () V = (A, + A, ) B,, (3) VI = (A, A, ) (B, +B, ), (4) VII = (A, A, ) (B, + B, ), (5)

5 Algorithms 08,, 59 5 C, = I + IV V + VII, (6) C, = III + V, (7) C, = II + IV, (8) C, = I II + III + VI, (9) where I, II, III, IV, V, VI, VII are temporary matrices. The whole process requires 7 multiplications and 8 additions. It reduces number multiplications from 8 to 7 without changing computational results. More multiplications can be saved by using Strassen algorithm recursively, as long as numbers rows and columns submatrices are even. If we use N recursions Strassen algorithm, n it can save (7/8) N multiplications. The Strassen algorithm is suitable for special convolutional matrix in Equation (4) 4. Therefore, we can use Strassen algorithm to handle a convolutional matrix. 3. Proposed Algorithm As we can see from Section., Winograd algorithm incurs more additions. To avoid repeating transform W and X in Equation (6), we calculate matrices U and V separately. This can reduce number additions incurred by this algorithm. The practical implementation this algorithm is listed in Algorithm. The calculation output M in Algorithm is main complexity multiplication in whole computation process. To reduce computational complexity output M, we can use Strassen algorithm. Before using Strassen algorithm, we need to reform expression M as follows. The output M in Algorithm can be written as equation M r,p = Q q= ( ) A T (U r,q V q,p )A, (30) where U r,q and V q,p are temporary matrices, and A is constant parameter matrix. To show equation easily, we ignore matrix A. (Matrix A is not ignored in actual implementation algorithm.) The output M can n be written as shown in Equation (3). M r,p = Q q= ( Ur,q V q,p ) (3) We denote three special matrices M, U and V. M r,p, U r,q, and V q,p are elements matrices M, U and V, respectively, as shown in Equation (33). The output M can n be written as a multiplication matrix U and matrix V. M = U V (3) M = M, M,P. M R, M R,P U = U, U,Q. U R, U R,Q V = V, V,P. V Q, V Q,P (33) In this case, we can partition matrices M, U and V into equal-d block matrices, and n use Strassen algorithm to reduce number multiplications between U r,q and V q,p. The multiplication in Strassen algorithm is redefined as element-wise multiplication matrices U r,q and V q,p. We name this new combination as Strassen-Winograd algorithm.

6 Algorithms 08,, 59 6 Algorithm. Implementation Winograd algorithm. for r = to number output for q = to number input 3 U = GwG T 4 end 5 end 6 for p = to batch 7 for q = to number input 8 for k = to number image tiles 9 V = B T xb 0 end end end 3 for p = to batch 4 for r = to number output 5 for j = to number image tiles 6 M = zero; 7 for q = to number input 8 M = M + A T (U V)A 9 end 0 end end end To compare oretically computational complexity conventional algorithm, Strassen algorithm, Winograd algorithm, and Strassen-Winograd algorithm, we list complexity multiplication and addition in Table. The output map is set to 64 64, and kernel is set to 3 3. Table. Computational complexity different algorithms. Matrix Size Conventional Strassen Winograd Strassen-Winograd N Mul Add Mul Add Mul Add Mul Add 94,9 78,58 58, ,04 3,07 344,76 4,688 40,50 4,359,96,93,760,806,336,46,640,048,576,94,08 80,86,908, We can see from Table that, although algorithms cause more additions when matrix is small, number extra additions is less than number decreased multiplications. Moreover, multiplication usually costs more time than addition. Hence three algorithms are all oretically effective in reducing computational complexity. Figure shows a comparison computational complexity ratios. The Strassen algorithm shows less reduction multiplication when matrix is small, but it incurs less additions. The Winograd algorithm shows a stable performance. Moreover, number additions slightly decreases as matrix increases. For small-d matrices, Strassen-Winograd algorithm shows a much better reduction in multiplication complexity than Strassen algorithm. Although it incurs more additions, number extra additions is much less than number decreased multiplications. The Strassen-Winograd algorithm shows a similar performance to Winograd algorithm. When matrix is small, Winograd algorithm shows a slightly better performance,

7 Winograd algorithm shows a stable performance. Moreover, number additions slightly decreases as matrix increases. For small-d matrices, Strassen-Winograd algorithm shows a much better reduction in multiplication complexity than Strassen algorithm. Although it incurs more additions, number extra additions is much less than number decreased multiplications. The Strassen-Winograd algorithm shows a similar performance to Winograd Algorithms 08,, 59 7 algorithm. When matrix is small, Winograd algorithm shows a slightly better performance, whereas Strassen-Winograd algorithm and Strassen algorithm perform much better whereas matrix Strassen-Winograd increases. algorithm and Strassen algorithm perform much better as matrix increases. Figure.. Comparisons complexity ratio with different matrix s. 4. Simulation Results 4. Simulation Results Several simulations were conducted to evaluate our algorithm. We compare our algorithm with Several simulations were conducted to evaluate our algorithm. We compare our algorithm with Strassen algorithm and Winograd algorithm, measuring performance by runtime in MATLAB Strassen algorithm and Winograd algorithm, measuring performance by runtime in MATLAB R03b (CPU: Inter(R) Core(TM) i7-3370k). For objectivity, we apply Equation (8) to conventional R03b (CPU: Inter(R) Core(TM) i7-3370k). For objectivity, we apply Equation (8) to algorithm and use it as a benchmark. Moreover, all input data x and kernel w are randomly conventional algorithm and use it as a benchmark. Moreover, all input data x and kernel w are generated. We measure accuracy our algorithm by absolute element error in output randomly generated. We measure accuracy our algorithm by absolute element error in. As a benchmark, we use conventional algorithm with double precision data, kernels, output. As a benchmark, we use conventional algorithm with double precision data, middle variables and outputs. The or algorithms in this comparison use double precision data and kernels, middle variables and outputs. The or algorithms in this comparison use double precision kernels but single precision middle variables and outputs. data and kernels but single precision middle variables and outputs. The VGG network was applied to our simulation. There are nine different convolutional layers The VGG network was applied to our simulation. There are nine different convolutional in VGG network. The parameters convolutional layer are shown in Table. The depth layers in VGG network. The parameters convolutional layer are shown in Table. The indicates number times a layer occurs in network. Q indicates number input depth indicates number times a layer occurs in network. Q indicates number input. R indicates number output. Mw and Nw represent kernel.. R indicates number output. Mw and Nw represent My and Ny represent output map. The kernel in VGG network is kernel. My and Ny represent output map. The kernel in VGG 3 3. We apply F(, 3 3) to operation convolution. For computation output network is 3 3. We apply F(, 3 3) to operation convolution. For computation map with My Ny, map is partitioned into (My/) (Ny/) sets, each using one output map with My Ny, map is partitioned into (My/) (Ny/) sets, each using computation F(, 3 3). one computation F(, 3 3). Table. Parameters convolutional layers in Visual Geometry Group (VGG) network. Convolutional Layer Parameters Depth Q R Mw(Nw) My(Ny) As Table shows, numbers rows and columns are not always even, and matrices are not always square. To solve this problem, we pad a dummy row or column in matrices once we encounter an odd number rows or columns. The matrix can n continue using Strassen algorithm. We apply se nine convolutional layers in turn to our simulations. For each convolutional layer, we run four algorithms with different batch s from to 3. The runtime consumption algorithms is listed in Table 3, and numerical accuracy different algorithms in different layers is shown in Table 4.

8 Algorithms 08,, 59 8 Table 3. Runtime consumption different algorithms. Layer Batch Size Layer Layer Layer3 Layer4 Layer5 Layer6 Layer7 Layer8 Layer9 Conventional 4s 48s 94s 87s 375s 75s Strassen 4s 56s 95s 9s 383s 768s Winograd 4s 9s 57s 5s 30s 46s Strassen-Winograd 4s 33s 58s 7s 34s 470s Conventional 493s 986s 97s 3939s 7888s 58s Strassen 49s 86s 508s 636s 465s 8438s Winograd 99s 598s 96s 396s 4787s 9935s Strassen-Winograd 99s 543s 99s 88s 3348s 6468s Conventional 45s 490s 980s 96s 396s 7858s Strassen 47s 433s 759s 38s 35s 4076s Winograd 8s 56s 53s 05s 049s 40s Strassen-Winograd 8s 9s 4s 737s 335s 47s Conventional 488s 978s 954s 3908s 789s 5639s Strassen 494s 864s 53s 648s 466s 840s Winograd 54s 509s 07s 033s 4075s 868s Strassen-Winograd 54s 455s 84s 466s 645s 48s Conventional 50s 50s 007s 004s 40s 8076s Strassen 48s 436s 76s 38s 37s 4078s Winograd 8s 36s 47s 94s 88s 3776s Strassen-Winograd 8s 09s 370s 656s 67s 085s Conventional 498s 00s 998s 3995s 7948s 589s Strassen 494s 868s 507s 646s 4643s 80s Winograd 3s 46s 93s 844s 3693s 738s Strassen-Winograd 3s 40s 75s 86s 96s 4089s Conventional 44s 487s 980s 940s 390s 780s Strassen 4s 4s 739s 83s 50s 396s Winograd 6s 3s 46s 90s 839s 3680s Strassen-Winograd 6s 04s 358s 630s s 96s Conventional 479s 955s 97s 3833s 7675s 539s Strassen 474s 89s 453s 546s 4447s 78s Winograd s 443s 884s 766s 354s 7068s Strassen-Winograd 3s 39s 686s 0s 9s 377s Conventional 8s 37s 474s 95s 900s 383s Strassen 7s 06s 36s 63s 07s 937s Winograd 65s 8s 54s 507s 009s 00s Strassen-Winograd 65s 3s 97s 345s 606s 063s Table 4. Maximum element error different algorithms in different layers. Conventional Strassen Winograd Strassen-Winograd Layer Layer Layer Layer Layer Layer Layer Layer Layer Table 4 shows that Winograd algorithm is slightly more accurate than Strassen algorithm and Strassen-Winograd algorithm. The maximum element error se algorithms is Compared with minimum value in output map, accuracy loss incurred

Algorithms 08,, 59 9 by se algorithms is negligible. As we can see from Section, oretically, processes in all se algorithms do not result in a loss in accuracy.

9 Algorithms 08,, 59 9 by se algorithms is negligible. As we can see from Section, oretically, processes in all se algorithms do not result in a loss in accuracy. In practice, a loss in accuracy is mainly caused by single precision data. Because conventional algorithm with low precision data is sufficiently accurate for deep learning 0,, we conclude that accuracy our algorithm is equally sufficient. To compare runtime easily, we use conventional algorithm as a benchmark, and calculate saving on runtime displayed by or algorithms. The result is shown in Figure 3. The Strassen-Winograd algorithm shows a better performance than benchmark in all layers except layer. This is because number input Q in layer is three, which limits performance algorithm as a small matrix incurs more additions. Moreover, odd numbers rows Algorithms columns 08,, need x FOR dummy PEER REVIEW rows or columns for matrix partitioning, which causes more runtime. 0 Figure 3. Comparisons with different batch s. Figure 3. Comparisons with different batch s. The performance Winograd algorithm is stable from layer to layer9. It saves 53% runtime 5. Conclusions average, and which Future iswork close to 56% reduction in multiplications. The performances Strassen algorithm and Strassen-Winograd algorithm improve as batch increases. For example, The computational complexity convolutional neural networks is an urgent problem for realtime applications. Both Strassen algorithm and Winograd algorithm are effective in reducing in layer7, when batch is, we cannot partition matrix to use Strassen algorithm, and re is almost no saving on runtime. The Strassen-Winograd algorithm saves 5% runtime, computational complexity without losing accuracy. This paper proposed to combine se algorithms a similar saving as Winograd algorithm. When batch is, Strassen algorithm saves to reduce heavy computational burden. The proposed strategy was evaluated with VGG network. Both oretical performance assessment and experimental results show that Strassen-Winograd algorithm can dramatically reduce computational complexity. There remain limitations that need to be addressed in future research. Although algorithm reduces computational complexity convolutional neural networks, cost is an increased

10 Algorithms 08,, % runtime, which equates to 3% reduction in multiplications. The Strassen-Winograd algorithm saves 58% runtime, which is close to 6% reduction in multiplications. As batch increases, Strassen algorithm and Strassen-Winograd algorithm can use more recursions, which can furr reduce number multiplications and save more runtime. When batch is 3, Strassen-Winograd algorithm saves 75% runtime, while Strassen algorithm and Winograd algorithm save 49% and 53%, respectively. Though experiments with larger batch s were not carried out due to limitations on time and memory, we can see trend in performance as batch increases. This is consistent with oretical analysis in Section 3. We conclude refore that proposed algorithm can provide optimal performance by combining savings se two algorithms. 5. Conclusions and Future Work The computational complexity convolutional neural networks is an urgent problem for real-time applications. Both Strassen algorithm and Winograd algorithm are effective in reducing computational complexity without losing accuracy. This paper proposed to combine se algorithms to reduce heavy computational burden. The proposed strategy was evaluated with VGG network. Both oretical performance assessment and experimental results show that Strassen-Winograd algorithm can dramatically reduce computational complexity. There remain limitations that need to be addressed in future research. Although algorithm reduces computational complexity convolutional neural networks, cost is an increased difficulty in implementation, especially in real-time systems and embedded devices. It also increases difficulty parallelizing an artificial network for hardware acceleration. In future work, we aim to apply this method to hardware accelerator using practical applications. Author Contributions: Y.Z. performed experiments and wrote paper. D.W. provided suggestions about algorithm. L.W. analyzed complexity algorithms. P.L. checked paper. Funding: This research received no external funding. Acknowledgments: This work was supported in part by National Natural Science Foundation China under Granted , and in part by Young Talent Program Institute Acoustics, Chinese Academy Science, under Granted QNYC06. Conflicts Interest: The authors declare no conflict interest. References. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arxiv 04, arxiv: Liu, N.; Wan, L.; Zhang, Y.; Zhou, T.; Huo, H.; Fang, T. Exploiting Convolutional Neural Networks with Deeply Local Description for Remote Sensing Image Classification. IEEE Access 08, 6, 5 8. CrossRef 3. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings International Conference on Neural Information Processing Systems, 3 6 December 0; Volume 60, pp Le, N.M.; Granger, E.; Kiran, M. A comparison CNN-based face and head detectors for real-time video surveillance applications. In Proceedings Seventh International Conference on Image Processing Theory, Tools and Applications, Montreal, QC, Canada, 8 November December Ren, S.; He, K.; Girshick, R. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8 3 December 04; pp Denil, M.; Shakibi, B.; Dinh, L. Predicting Parameters in Deep Learning. In Proceedings International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5 0 December 03; pp

Algorithms 08,, 59 7. Han, S.; Pool, J.; Tran, J. Learning both Weights and Connections for Efficient Neural Networks.

11 Algorithms 08,, Han, S.; Pool, J.; Tran, J. Learning both Weights and Connections for Efficient Neural Networks. In Proceedings International Conference on Neural Information Processing Systems, Istanbul, Turkey, 9 November 05; pp Guo, Y.; Yao, A.; Chen, Y. Dynamic Network Surgery for Efficient DNNs. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 06; pp Qiu, J.; Wang, J.; Yao, S. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In Proceedings ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, 3 February 06; pp Courbariaux, M.; Bengio, Y.; David, J.P. Low Precision Arithmetic for Deep Learning. arxiv 04, arxiv: Gupta, S.; Agrawal, A.; Gopalakrishnan, K.; Narayanan, P. Deep Learning with Limited Numerical Precision. In Proceedings International Conference on Machine Learning, Lille, France, 7 9 July 05.. Rastegari, M.; Ordonez, V.; Redmon, J. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. In Proceedings European Conference on Computer Vision, Amsterdam, The Nerlands, 4 October 06; Springer: Cham, Switzerland, 06; pp Zhu, C.; Han, S.; Mao, H. Trained Ternary Quantization. In Proceedings International Conference on Learning Representations, Toulon, France, 4 6 April Zhang, X.; Zou, J.; Ming, X.; Sun, J. Efficient and accurate approximations nonlinear convolutional networks. In Proceedings Computer Vision and Pattern Recognition, Boston, MA, USA, 7 June 04; pp Mathieu, M.; Henaff, M.; Lecun, Y.; Chintala, S.; Piantino, S.; Lecun, Y. Fast Training Convolutional Networks through FFTs. arxiv 03, arxiv: Vasilache, N.; Johnson, J.; Mathieu, M. Fast Convolutional Nets with fbfft: A GPU Performance Evaluation. In Proceedings International Conference on Learning Representations, San Diego, CA, USA, 7 9 May Toom, A.L. The complexity a scheme functional elements simulating multiplication integers. Dokl. Akad. Nauk SSSR 963, 50, Cook, S.A. On Minimum Computation Time for Multiplication. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, Winograd, S. Arithmetic Complexity Computations; SIAM: Philadelphia, PA, USA, Jiao, Y.; Zhang, Y.; Wang, Y.; Wang, B.; Jin, J.; Wang, X. A novel multilayer correlation maximization model for improving CCA-based frequency recognition in SSVEP brain-computer interface. Int. J. Neural Syst. 08, 8, CrossRef PubMed. Wang, R.; Zhang, Y.; Zhang, L. An adaptive neural network approach for operator functional state prediction using psychophysiological data. Integr. Comput. Aided Eng. 05, 3, CrossRef. Lavin, A.; Gray, S. Fast Algorithms for Convolutional Neural Networks. In Proceedings Computer Vision and Pattern Recognition, Caesars Palace, NV, USA, 6 June July 06; pp Strassen, V. Gaussian elimination is not optimal. Numer. Math. 969, 3, CrossRef 4. Cong, J.; Xiao, B. Minimizing Computation in Convolutional Neural Networks. In Proceedings Artificial Neural Networks and Machine Learning ICANN 04, Hamburg, Germany, 5 9 September by authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under terms and conditions Creative Commons Attribution (CC BY) license (

Minimizing Computation in Convolutional Neural Networks

Minimizing Computation in Convolutional Neural Networks Jason Cong and Bingjun Xiao Computer Science Department, University of California, Los Angeles, CA 90095, USA {cong,xiao}@cs.ucla.edu Abstract. Convolutional