Neneta: Heterogeneous Computing Complex-Valued Neural Network Framework
|
|
- Annice Copeland
- 6 years ago
- Views:
Transcription
1 Neneta: Heterogeneous Computing Complex-Valued Neural Network Framework Vladimir Lekić* and Zdenka Babić* * Faculty of Electrical Engineering, University of Banja Luka, Banja Luka, Bosnia and Herzegovina Abstract Due to increased demand for computational efficiency for the training, validation and testing of artificial neural networks, many open source software frameworks have emerged. Almost exclusively GPU programming model of choice in such software frameworks is CUDA. Symptomatic is also lack of the support for complex-valued neural networks. With our research going exactly in that direction, we developed and made publicly available yet another software framework, completely based on C++ and OpenCL standards with which we try to solve problems we identified with already existing solutions. I. INTRODUCTION Attention that complex machine learning algorithms are receiving in the recent years is tremendous. Research laboratories are competing in making their sets of training data publicly available [1],[2], along with software frameworks [3],[4],[5], tutorials and courses to use this data. On the other side, scientists, industry professionals and enthusiasts are competing in tuning the available models, and tweaking the algorithm performance. Of course, this is perfectly valid approach, but what somehow stays hidden in this machine learning hype, is that everyone is building machine learning models with the same sets of training data on more or less the same hardware. The neural network framework "neneta" that will be introduced in this paper is a product of a research we are conducting on complex-valued neural networks [6],[7]. Our goal was never to compete with the state-of-the-art frameworks already available, but to build the tool that will support us efficiently through our research. On the other hand, we believe that presented tool has a potential to attract attention of the broader community. Not only by offering ability to efficiently design and train neural networks on a broader range of GPUs, but also by offering to do that in a more general way by using complex-valued neural networks. II. DESIGN DECISIONS Choice of a programming languages and APIs when starting work on a project like this is everything but an easy task. Idea was also to allow software to run on most popular operating systems. Decisions to enable all these goals are as follows. A. C++11 For all programming tasks related to the network and GPU configuration, input data preprocessing and result presentation we use C++ [8]. There are two main reasons behind this decision. First, we already had enough knowledge of the language to make significant progress fast. Second, although preprocessing tasks in deep neural networks are relatively simple compared to the network model itself, they are not insignificant, and they can definitely impact the overall performance of the framework. Therefore, we needed programming language with minimum overhead possible, but still with object oriented programming support. C++ was an ideal candidate. B. OpenCL We wanted to be able to run our software on wide range of available devices, and of course on the ones yet to come. By this we do not only consider the GPUs having available OpenCL [9],[10] support, but also FPGAs and DSPs utilizing this standard [11]. C. Testing Due to significant complexity of the software, unit testing and component testing [12] was necessary and done for most of the components. D. Operating systems support We use gcc compiler for development with CMake build system. Basically, all operating systems having this compiler and build system support and appropriate OpenCL drivers can run neneta. Until now, software has been successfully tested on Microsoft Windows and Linux operating systems. III. SOFTWARE ARCHITECTURE Component diagram of neneta is shown on Fig. 1. Components are compiled to static libraries and at the end of the linking process linked together to a single executable. As it can be seen, only requirement for the operating system is support for OpenCL. Number of GPUs is not limited and it is completely abstracted away from the neuralnetwork component through interfaces provided by the gpgpu, which is the only component interfacing directly to the GPU. In the following subsections is given short description of the framework components. A. confighandler This component is responsible for parsing the XML configuration files. There are three types of configuration files: configuration.xml - Holds general configuration information for the logging (log level, log rotation, log MIPRO 2017/DC VIS 209
2 Fig. 1. neneta Component Diagram format etc.), plotting, input data sources, persistence, OpenCL kernel sources and GPU. kernels.xml - Holds profiling configuration for all the OpenCL kernels used by the framework. On startup, all configured OpenCL kernel sources are compiled by the OpenCL driver. This also means that kernels can be added, removed or modified without recompiling any of the framework libraries. network_params_<id>.xml - Holds configuration for the neuralnetwork component. As an example configuration, one of the complex-valued neural network layers is shown of Fig. 2. Configuration of the layers is parsed automatically. Based on the layer type, appropriate objects are instantiated and enqueued for the execution on the GPU. At the moment following types of the layers are supported: Input layer Convolution layer Fully connected layer FFT layer IFFT layer Projection Layer Softmax Layer Spectral-pooling layer Error calculation layer <layer type="conv" id="conv1"> <input>input1</input> <channels>1</channels> <kernels>10</kernels> <kernelsize>5</kernelsize> <stride>1</stride> <actfunc>complextanh</actfunc> <weightsdev>1</weightsdev> <weightsmean>0</weightsmean> <weightstype>complex</weightstype> <biasre>0.0001</biasre> <biasim>0</biasim> Fig. 2. Example of complex-valued neural network layer configuration. B. plotting Task of the plotting component is to abstract the data presentation tools for the neuralnetwork component. Normally, for data presentation tools some kind of plotting library is used (for example gnuplot [13], but not limited to it). C. imageprocessing Input date comes in various formats. Task of the imageprocessing components is to convert, adapt, merge or filter input data based on the neuralnetwork component needs. This component runs only on host CPU and it is not desirable that these operations have high complexity. In case input data preprocessing step consumes significant processing time (eg. FFT [14]), additional layer type should be introduced to the neuralnetwork component. D. neuralnetwork This is the core component of the neneta framework. Basically, entire neural network processing is done within this component. Main features of this component are: It is consisted of various types of layers, available within the framework. It is straightforward to define and implement a new layer. Framework it self will enqueue it for execution on GPU. Performance critical functionality of the layers is transferred to the OpenCL kernels. Changing the functionality within kernel source files doesn t require recompilation, but only application restart. Fig. 3 shows simplified class diagram of ConvLayer layer. In this example, ConvLayer implements IPersistedLayer interface. Functions store() and restore() are called for this layer after each training epoch. Other two classes that ConvLayer inherits are clearly indicating the relation of the layer to OpenCL execution plan. Being also IOpenCLChainableExecutionPlan, allows layer to be linked with other layers. Functions setinputbuffer(bufferio)/setbkpinputbuffer(bufferio) are called from the left/right layer during forward/back propagation configuration. Forward and back-propagation are configured once during startup, but executed many times during training. Input parameter BufferIO is the block in GPU s global memory. Role of this memory block is to pass needed information between layers - what directly means that size of neural network model is directly proportional to the size of the available GPU global memory. E. imagehandler Any set of training data can be used to train the modeled neural network. Task of the imagehandler component is to abstract away the training set from the neuralnetwork component. At the moment support for MNIST [2] and IMAGENET [1] is available. F. logging During the long training periods some sort of logging system (for example boost logging library [15]) is desirable. This component provides logging capabilities to 210 MIPRO 2017/DC VIS
3 Fig. 3. Simplified class diagram of ConvLayer the entire framework. Logging file path, rotation size, logging level and logging format can be configured in configuration.xml configuration file. G. persistence Role of the persistence component is to ensure that network training execution can be interrupted and continued at will. Persistence interface store() can be called at the end of each batch execution or at the end of each training epoch. On the other hand restore() is called only once during initialization phase. Persistence data are stored as binary blob on hard disk. H. gpgpu Although OpenCL offers C++ interface wrappers [16], we introduced even higher level of abstraction in order to incorporate the GPU execution plan into the model. Component gpgpu offers interfaces to plan kernel execution in predefined order, at the same time giving ability to profile kernel execution if desired. IV. CONFIGURATION EXAMPLE As an example we performed training of the network consisted of a single Soft-Max layer on MNIST data-set [2]. Data-set is consisted of 60,000 training images of hand written Arabic numerals and of 10,000 test images. In configuration, we have split training data-set in 50,000 training and 10,000 validation images, as shown on Fig. 4. An example of such network configuration is show on Fig. 5. Input layer allocates a continuous block of global memory on GPU. Although not relevant for this example, this memory is always split equally to hold real and imaginary data, using parameters rpipesize and ipipesize. Other parameters in input layer are determined by the input data size (for MNIST these are 28x28 pixel gray-level images). Layer of type softmax is real-valued layer. Parameters of the layer are descriptive, as show on Fig. 5, and require no further explanation. For loss calculation we used cross-entropy function, simply defined through errorcalc layer. <images source="mnist"> <trainset> <offset>0</offset> <size>50000</size> <minibatchsize>1</minibatchsize> <path>train-images.idx3-ubyte</path> <labels>train-labels.idx1-ubyte</labels> </trainset> <testset> <offset>0</offset> <size>10000</size> <path>t10k-images.idx3-ubyte</path> <labels>t10k-labels.idx1-ubyte</labels> </testset> <validationset> <offset>50000</offset> <size>10000</size> <path>train-images.idx3-ubyte</path> <labels>train-labels.idx1-ubyte</labels> </validationset> </images> Fig. 4. Configuration of MNIST data-set. <?xml version="1.0"?> <neneta> <layer type="input" id="input1"> <rpipesize> </rpipesize> <ipipesize> </ipipesize> <outputsize>10</outputsize> <inputchannels>1</inputchannels> <layer type="softmax" id="sm1"> <input>input1</input> <channels>1</channels> <outputsize>10</outputsize> <actfunc>softmax</actfunc> <weightsdev>1</weightsdev> <weightsmean>0</weightsmean> <bias>0.1</bias> <layer type="errorcalc" id="err1"> <input>sm1</input> <channels>10</channels> <errorfunc>crossentropy</errorfunc> </neneta> Fig. 5. Simple Soft-Max layer configuration. MIPRO 2017/DC VIS 211
4 Fig. 6. Three training epochs on MNIST data-set. Training progress was monitored using the plotting interface for gnuplot [13], as shown on Fig. 6. More detailed training results are obtained from the log file and here are presented in Table I. TABLE I DETAILS OF THREE TRAINING EPOCHS ON MNIST DATA-SET. Ep. Train. Loss Train. Acc. [%] Val. Loss Val. Acc. [%] V. PERFORMANCE COMPARISION We compared the performance of the CPU and GPU running the same example configuration described in previous section. Properties of the OpenCL devices used to run simulations are given in Table II. TABLE II CPU AND GPU DEVICE PROPERTIES Property CPU GPU Name Phenom II X4 965 Radeon HD 5770 Vendor AMD AMD Max. proc. elements Max. clock freq. [MHz] Max. gl. mem. size [B] Max. lo. mem. size [B] Max. work group size Max. work items sizes 1024,1024, ,256,256 Measured execution time for forward and backpropagation for both devices is shown on Fig. 7. Measurement is taken on one entire training epoch (50,000 images). It is obvious that for most of the training epoch, algorithm execution is approximately 10 times faster on the GPU than it is on the CPU. In this simulation, both CPU and GPU were not dedicated computing devices (OS and graphics are running on them). This could explain some of the peaks in the graph. It is interesting to analyze shortly the parameters of the devices and how they relate to the algorithm performance. Fig. 7. Performance comparison CPU-GPU. CPU clock speed is four times of the GPU clock speed, but number of processing elements on the GPU is 200 times higher (AMD Radeon HD 5770 Juniper graphics card has 10 computing elements, each computing element has 16 stream cores and each stream core has 5 processing elements). Based on this, one could expect even higher performance gain when executing the algorithm on GPU. To explain the obtained result, it must be taken into account that GPU has a SIMD (Single Instruction Multiple Data) processor architecture. That means that all processing elements within the given work group are always executing the same instruction. If particular care is not taken during kernel development to avoid problems that can arise due to limitations of such architecture (for example branching divergence) full performance gain of GPU cannot be achieved. Another point to consider (and maybe more relevant for this example) is the well known memory transfer bottleneck that occurs during data transfer between host (CPU) memory and GPU memory. To cope with this problem it is desirable to transfer as much as possible data at a time to the GPU global memory (entire mini batches) and let GPU work on multiple passes through the network on this data. VI. CONCLUSIONS Although initially intended to serve as a platform for research of complex-valued neural network, due to its simplicity and extensibility neneta can be used for training of real-valued neural networks as well. Platform already offers a number of different types of neural network layers, and moreover, with opening the code to the public we hope to attract more researchers contributing to it. We are aware that some of already implemented OpenCL kernels are far from optimal from execution time point of view. Our goal for the future is to further improve the code base in that sense, and to improve the design and quality aspects of it as well. REFERENCES [1] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, MIPRO 2017/DC VIS
5 [2] Yann LeCun and Corinna Cortes. The mnist database of handwritten digits, [3] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages , [4] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, Software available from tensorflow.org. [5] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arxiv preprint arxiv: , [6] Akira Hirose. Complex-valued neural networks. Springer Science & Business Media, [7] Danilo P Mandic and Vanessa Su Lee Goh. Complex valued nonlinear adaptive filters: noncircularity, widely linear and neural models, volume 59. John Wiley & Sons, [8] Bjarne Stroustrup. The C++ Programming Language. Addison- Wesley Professional, 4th edition, [9] Jonathan Tompson and Kristofer Schlachter. An introduction to the opencl programming model. Person Education, 49, [10] John E. Stone, David Gohara, and Guochun Shi. Opencl: A parallel programming standard for heterogeneous computing systems. IEEE Des. Test, 12(3):66 73, May [11] Deshanand Singh. Implementing fpga design with the opencl standard. Altera whitepaper, [12] Robert C Martin. Clean code: a handbook of agile software craftsmanship. Pearson Education, [13] T Williams, C Kelley, HB Bröker, J Campbell, R Cunningham, D Denholm, E Elber, R Fearick, C Grammes, and L Hart. Gnuplot 5.0.5: An interactive plotting program, URL gnuplot. info. [14] Keun-Yung Byun, Chun-Su Park, Jee-Young Sun, and Sung-Jea Ko. Vector radix 2 2 sliding fast fourier transform. Mathematical Problems in Engineering, 2016, [15] Boris Schling. The Boost C++ Libraries. XML Press, [16] Benedict R Gaster. The opencl c++ wrapper api, MIPRO 2017/DC VIS 213
3D Deep Convolution Neural Network Application in Lung Nodule Detection on CT Images
3D Deep Convolution Neural Network Application in Lung Nodule Detection on CT Images Fonova zl953@nyu.edu Abstract Pulmonary cancer is the leading cause of cancer-related death worldwide, and early stage
More informationREVISITING DISTRIBUTED SYNCHRONOUS SGD
REVISITING DISTRIBUTED SYNCHRONOUS SGD Jianmin Chen, Rajat Monga, Samy Bengio & Rafal Jozefowicz Google Brain Mountain View, CA, USA {jmchen,rajatmonga,bengio,rafalj}@google.com 1 THE NEED FOR A LARGE
More informationTensorFlow Debugger: Debugging Dataflow Graphs for Machine Learning
TensorFlow Debugger: Debugging Dataflow Graphs for Machine Learning Shanqing Cai, Eric Breck, Eric Nielsen, Michael Salib, D. Sculley Google, Inc. {cais, ebreck, nielsene, msalib, dsculley}@google.com
More informationEvaluating Mask R-CNN Performance for Indoor Scene Understanding
Evaluating Mask R-CNN Performance for Indoor Scene Understanding Badruswamy, Shiva shivalgo@stanford.edu June 12, 2018 1 Motivation and Problem Statement Indoor robotics and Augmented Reality are fast
More informationTuning the Scheduling of Distributed Stochastic Gradient Descent with Bayesian Optimization
Tuning the Scheduling of Distributed Stochastic Gradient Descent with Bayesian Optimization Valentin Dalibard Michael Schaarschmidt Eiko Yoneki Abstract We present an optimizer which uses Bayesian optimization
More informationVariational autoencoders for tissue heterogeneity exploration from (almost) no preprocessed mass spectrometry imaging data.
arxiv:1708.07012v2 [q-bio.qm] 24 Aug 2017 Variational autoencoders for tissue heterogeneity exploration from (almost) no preprocessed mass spectrometry imaging data. Paolo Inglese, James L. Alexander,
More informationCross-domain Deep Encoding for 3D Voxels and 2D Images
Cross-domain Deep Encoding for 3D Voxels and 2D Images Jingwei Ji Stanford University jingweij@stanford.edu Danyang Wang Stanford University danyangw@stanford.edu 1. Introduction 3D reconstruction is one
More informationTECHNIQUES OF BRAIN CANCER DETECTION FROM MRI USING MACHINE LEARNING
TECHNIQUES OF BRAIN CANCER DETECTION FROM MRI USING MACHINE LEARNING Aaswad Sawant #1, Mayur Bhandari #2, Ravikumar Yadav #3, Rohan Yele #4, Sneha Kolhe #5 1,2,3,4,5 Department of Computer Engineering,
More informationHorovod: fast and easy distributed deep learning in TensorFlow
Horovod: fast and easy distributed deep learning in TensorFlow Alexander Sergeev Uber Technologies, Inc. asergeev@uber.com Mike Del Balso Uber Technologies, Inc. mdb@uber.com arxiv:1802.05799v3 [cs.lg]
More informationREGULARIZED GRADIENT DESCENT TRAINING OF STEERED MIXTURE OF EXPERTS FOR SPARSE IMAGE REPRESENTATION
REGULARIZED GRADIENT DESCENT TRAINING OF STEERED MIXTURE OF EXPERTS FOR SPARSE IMAGE REPRESENTATION Erik Bochinski, Rolf Jongebloed, Michael Tok, and Thomas Sikora Technische Universität Berlin Communication
More informationSUMMARY. in the task of supervised automatic seismic interpretation. We evaluate these tasks qualitatively and quantitatively.
Deep learning seismic facies on state-of-the-art CNN architectures Jesper S. Dramsch, Technical University of Denmark, and Mikael Lüthje, Technical University of Denmark SUMMARY We explore propagation
More informationSeismic Full-Waveform Inversion Using Deep Learning Tools and Techniques
arxiv:1801.07232v2 [physics.geo-ph] 31 Jan 2018 Seismic Full-Waveform Inversion Using Deep Learning Tools and Techniques Alan Richardson (Ausar Geophysical) February 1, 2018 Abstract I demonstrate that
More informationScene classification with Convolutional Neural Networks
Scene classification with Convolutional Neural Networks Josh King jking9@stanford.edu Vayu Kishore vayu@stanford.edu Filippo Ranalli franalli@stanford.edu Abstract This paper approaches the problem of
More informationPipeline-Based Processing of the Deep Learning Framework Caffe
Pipeline-Based Processing of the Deep Learning Framework Caffe ABSTRACT Ayae Ichinose Ochanomizu University 2-1-1 Otsuka, Bunkyo-ku, Tokyo, 112-8610, Japan ayae@ogl.is.ocha.ac.jp Hidemoto Nakada National
More informationTraining a multi-label FastXML classifier on the OpenImages dataset
Training a multi-label FastXML classifier on the OpenImages dataset David Marlon Gengenbach Eneldo Loza Mencía (Supervisor) david_marlon.gengenbach@stud.tu-darmstadt.de eneldo@ke.tu-darmstadt.de Praktikum
More informationTOWARDS SCALABLE DEEP LEARNING VIA I/O ANALYSIS AND OPTIMIZATION
TOWARDS SCALABLE DEEP LEARNING VIA I/O ANALYSIS AND OPTIMIZATION Sarunya Pumma, Min Si, Wu-chun Feng, and Pavan Balaji Virginia Tech, USA; {sarunya, wfeng}@vt.edu Argonne National Laboratory, USA; {msi,
More informationarxiv: v1 [cs.dc] 3 Dec 2015
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems arxiv:1512.01274v1 [cs.dc] 3 Dec 2015 Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, U.
More informationConvolutional Neural Network Layer Reordering for Acceleration
R1-15 SASIMI 2016 Proceedings Convolutional Neural Network Layer Reordering for Acceleration Vijay Daultani Subhajit Chaudhury Kazuhisa Ishizaka System Platform Labs Value Co-creation Center System Platform
More informationTracking by recognition using neural network
Zhiliang Zeng, Ying Kin Yu, and Kin Hong Wong,"Tracking by recognition using neural network",19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed
More informationarxiv: v1 [cs.lg] 18 Sep 2017
Bingzhen Wei 1 2 Xu Sun 1 2 Xuancheng Ren 1 2 Jingjing Xu 1 2 arxiv:1709.05804v1 [cs.lg] 18 Sep 2017 Abstract As traditional neural network consumes a significant amount of computing resources during back
More informationImage recognition for x-ray luggage scanners using free and open source software
Image recognition for x-ray luggage scanners using free and open source software Pablo Lázaro, Ariel Maiorano Dirección de Gestión Tecnológica (DGT), Policía de Seguridad Aeroportuaria (PSA) {plazaro,amaiorano}@psa.gob.ar
More informationarxiv: v1 [cs.dc] 7 Aug 2017
PowerAI DDL Minsik Cho, Ulrich Finkler, Sameer Kumar, David Kung, Vaibhav Saxena, Dheeraj Sreedhar IBM Research arxiv:1708.02188v1 [cs.dc] 7 Aug 2017 August 8, 2017 Abstract As deep neural networks become
More informationarxiv: v1 [cs.cv] 25 Aug 2018
Painting Outside the Box: Image Outpainting with GANs Mark Sabini and Gili Rusak Stanford University {msabini, gilir}@cs.stanford.edu arxiv:1808.08483v1 [cs.cv] 25 Aug 2018 Abstract The challenging task
More informationAn Accurate and Real-time Self-blast Glass Insulator Location Method Based On Faster R-CNN and U-net with Aerial Images
1 An Accurate and Real-time Self-blast Glass Insulator Location Method Based On Faster R-CNN and U-net with Aerial Images Zenan Ling 2, Robert C. Qiu 1,2, Fellow, IEEE, Zhijian Jin 2, Member, IEEE Yuhang
More informationA LAYER-BLOCK-WISE PIPELINE FOR MEMORY AND BANDWIDTH REDUCTION IN DISTRIBUTED DEEP LEARNING
017 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 5 8, 017, TOKYO, JAPAN A LAYER-BLOCK-WISE PIPELINE FOR MEMORY AND BANDWIDTH REDUCTION IN DISTRIBUTED DEEP LEARNING Haruki
More informationGroupout: A Way to Regularize Deep Convolutional Neural Network
Groupout: A Way to Regularize Deep Convolutional Neural Network Eunbyung Park Department of Computer Science University of North Carolina at Chapel Hill eunbyung@cs.unc.edu Abstract Groupout is a new technique
More informationarxiv: v1 [cs.lg] 18 Nov 2015 ABSTRACT
ADVERSARIAL AUTOENCODERS Alireza Makhzani University of Toronto makhzani@psi.utoronto.ca Jonathon Shlens & Navdeep Jaitly & Ian Goodfellow Google Brain {shlens,ndjaitly,goodfellow}@google.com arxiv:1511.05644v1
More informationAUTOMATIC TRANSPORT NETWORK MATCHING USING DEEP LEARNING
AUTOMATIC TRANSPORT NETWORK MATCHING USING DEEP LEARNING Manuel Martin Salvador We Are Base / Bournemouth University Marcin Budka Bournemouth University Tom Quay We Are Base 1. INTRODUCTION Public transport
More informationTraining Deeper Models by GPU Memory Optimization on TensorFlow
Training Deeper Models by GPU Memory Optimization on TensorFlow Chen Meng 1, Minmin Sun 2, Jun Yang 1, Minghui Qiu 2, Yang Gu 1 1 Alibaba Group, Beijing, China 2 Alibaba Group, Hangzhou, China {mc119496,
More informationRyerson University CP8208. Soft Computing and Machine Intelligence. Naive Road-Detection using CNNS. Authors: Sarah Asiri - Domenic Curro
Ryerson University CP8208 Soft Computing and Machine Intelligence Naive Road-Detection using CNNS Authors: Sarah Asiri - Domenic Curro April 24 2016 Contents 1 Abstract 2 2 Introduction 2 3 Motivation
More informationEnd-to-end Training of Differentiable Pipelines Across Machine Learning Frameworks
End-to-end Training of Differentiable Pipelines Across Machine Learning Frameworks Mitar Milutinovic Computer Science Division University of California, Berkeley mitar@cs.berkeley.edu Robert Zinkov zinkov@robots.ox.ac.uk
More informationA MULTI THREADED FEATURE EXTRACTION TOOL FOR SAR IMAGES USING OPEN SOURCE SOFTWARE LIBRARIES
A MULTI THREADED FEATURE EXTRACTION TOOL FOR SAR IMAGES USING OPEN SOURCE SOFTWARE LIBRARIES Bipin C 1,*, C.V.Rao 1, P.V.Sridevi 2, Jayabharathi S 1, B.Gopala Krishna 1 1 National Remote Sensing Center,
More informationPitch and Roll Camera Orientation From a Single 2D Image Using Convolutional Neural Networks
Pitch and Roll Camera Orientation From a Single 2D Image Using Convolutional Neural Networks Greg Olmschenk, Hao Tang, Zhigang Zhu The Graduate Center of the City University of New York Borough of Manhattan
More informationMusic Genre Classification
Music Genre Classification Derek A. Huang huangda@stanford.edu Arianna A. Serafini aserafini@stanford.edu Eli J. Pugh epugh@stanford.edu https://github.com/derekahuang/music-classification 1 Introduction
More informationarxiv: v1 [cs.ne] 19 Jan 2018 Abstract
CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs Liangzhen Lai Arm Inc. liangzhen.lai@arm.com Naveen Suda Arm Inc. naveen.suda@arm.com Vikas Chandra Arm Inc. vikas.chandra@arm.com arxiv:1801.06601v1
More informationChainerMN: Scalable Distributed Deep Learning Framework
ChainerMN: Scalable Distributed Deep Learning Framework Takuya Akiba Preferred Networks, Inc. akiba@preferred.jp Keisuke Fukuda Preferred Networks, Inc. kfukuda@preferred.jp Shuji Suzuki Preferred Networks,
More informationTrajectory-control using deep system identification and model predictive control for drone control under uncertain load.*
Trajectory-control using deep system identification and model predictive control for drone control under uncertain load.* 1 st Antoine Mahé CentraleSupélec, Université de Lorraine, CNRS, LORIA F-57000
More informationMocha.jl. Deep Learning in Julia. Chiyuan Zhang CSAIL, MIT
Mocha.jl Deep Learning in Julia Chiyuan Zhang (@pluskid) CSAIL, MIT Deep Learning Learning with multi-layer (3~30) neural networks, on a huge training set. State-of-the-art on many AI tasks Computer Vision:
More informationarxiv: v1 [cs.cv] 27 Jun 2018
LPRNet: License Plate Recognition via Deep Neural Networks Sergey Zherzdev ex-intel IOTG Computer Vision Group sergeyzherzdev@gmail.com Alexey Gruzdev Intel IOTG Computer Vision Group alexey.gruzdev@intel.com
More informationIDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS
IDENTIFYING PHOTOREALISTIC COMPUTER GRAPHICS USING CONVOLUTIONAL NEURAL NETWORKS In-Jae Yu, Do-Guk Kim, Jin-Seok Park, Jong-Uk Hou, Sunghee Choi, and Heung-Kyu Lee Korea Advanced Institute of Science and
More informationConvolutional Neural Networks (CNNs) for Power System Big Data Analysis
Convolutional Neural Networks (CNNs) for Power System Big Analysis Siby Jose Plathottam, Hossein Salehfar, Prakash Ranganathan Electrical Engineering, University of North Dakota Grand Forks, USA siby.plathottam@und.edu,
More informationarxiv: v1 [cs.dc] 5 Nov 2018
WORKLOAD-AWARE AUTOMATIC PARALLELIZATION FOR MULTI-GPU DNN TRAINING Sungho Shin, Youngmin Jo, Jungwook Choi, Swagath Venkataramani, Vijayalakshmi Srinivasan, and Wonyong Sung arxiv:1811.01532v1 [cs.dc]
More informationDetermining Aircraft Sizing Parameters through Machine Learning
h(y) Determining Aircraft Sizing Parameters through Machine Learning J. Michael Vegh, Tim MacDonald, Brian Munguía I. Introduction Aircraft conceptual design is an inherently iterative process as many
More informationEasyChair Preprint. Real Time Object Detection And Tracking
EasyChair Preprint 223 Real Time Object Detection And Tracking Dária Baikova, Rui Maia, Pedro Santos, João Ferreira and Joao Oliveira EasyChair preprints are intended for rapid dissemination of research
More informationarxiv: v2 [cs.cv] 10 Mar 2018
3D Reconstruction of Incomplete Archaeological Objects Using a Generative Adversarial Network Renato Hermoza Ivan Sipiran Pontificia Universidad Católica del Perú, Lima, Peru renato.hermoza@pucp.edu.pe
More informationU-shaped Networks for Shape from Light Field
HEBER, YU, POCK: U-SHAPED NETWORKS FOR SHAPE FROM LIGHT FIELD 1 U-shaped Networks for Shape from Light Field Stefan Heber 1 stefan.heber@icg.tugraz.at Wei Yu 1 wei.yu@icg.tugraz.at Thomas Pock 1,2 pock@icg.tugraz.at
More informationDeep Learning Workshop. Nov. 20, 2015 Andrew Fishberg, Rowan Zellers
Deep Learning Workshop Nov. 20, 2015 Andrew Fishberg, Rowan Zellers Why deep learning? The ImageNet Challenge Goal: image classification with 1000 categories Top 5 error rate of 15%. Krizhevsky, Alex,
More informationUnsupervised Deep Structure Learning by Recursive Independence Testing
Unsupervised Deep Structure Learning by Recursive Independence Testing Raanan Y. Yehezkel Rohekar, Guy Koren, Shami Nisimov, Gal Novik Intel Corporation bstract We introduce a principled approach for unsupervised
More informationVisual Odometry using Convolutional Neural Networks
The Kennesaw Journal of Undergraduate Research Volume 5 Issue 3 Article 5 December 2017 Visual Odometry using Convolutional Neural Networks Alec Graves Kennesaw State University, agrave15@students.kennesaw.edu
More informationA FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS. Kuan-Chuan Peng and Tsuhan Chen
A FRAMEWORK OF EXTRACTING MULTI-SCALE FEATURES USING MULTIPLE CONVOLUTIONAL NEURAL NETWORKS Kuan-Chuan Peng and Tsuhan Chen School of Electrical and Computer Engineering, Cornell University, Ithaca, NY
More informationSqueezeMap: Fast Pedestrian Detection on a Low-power Automotive Processor Using Efficient Convolutional Neural Networks
SqueezeMap: Fast Pedestrian Detection on a Low-power Automotive Processor Using Efficient Convolutional Neural Networks Rytis Verbickas 1, Robert Laganiere 2 University of Ottawa Ottawa, ON, Canada 1 rverb054@uottawa.ca
More informationSIMPLE AND EFFICIENT ARCHITECTURE SEARCH FOR CONVOLUTIONAL NEURAL NETWORKS
SIMPLE AND EFFICIENT ARCHITECTURE SEARCH FOR CONVOLUTIONAL NEURAL NETWORKS Anonymous authors Paper under double-blind review ABSTRACT Neural networks have recently had a lot of success for many tasks.
More informationCapsNet comparative performance evaluation for image classification
CapsNet comparative performance evaluation for image classification Rinat Mukhometzianov 1 and Juan Carrillo 1 1 University of Waterloo, ON, Canada Abstract. Image classification has become one of the
More informationdhsegment: A generic deep-learning approach for document segmentation
: A generic deep-learning approach for document segmentation Sofia Ares Oliveira, Benoit Seguin, Frederic Kaplan Digital Humanities Laboratory, EPFL, Lausanne, VD, Switzerland {sofia.oliveiraares, benoit.seguin,
More informationSanny: Scalable Approximate Nearest Neighbors Search System Using Partial Nearest Neighbors Sets
Sanny: EC 1,a) 1,b) EC EC EC EC Sanny Sanny ( ) Sanny: Scalable Approximate Nearest Neighbors Search System Using Partial Nearest Neighbors Sets Yusuke Miyake 1,a) Ryosuke Matsumoto 1,b) Abstract: Building
More informationCOUNTING PLANTS USING DEEP LEARNING
COUNTING PLANTS USING DEEP LEARNING Javier Ribera 1, Yuhao Chen 1, Christopher Boomsma 2 and Edward J. Delp 1 1 Video and Image Processing Laboratory (VIPER), Purdue University, West Lafayette, Indiana,
More informationDecentralized and Distributed Machine Learning Model Training with Actors
Decentralized and Distributed Machine Learning Model Training with Actors Travis Addair Stanford University taddair@stanford.edu Abstract Training a machine learning model with terabytes to petabytes of
More informationSage: The New BBN Speech Processing Platform
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Sage: The New BBN Speech Processing Platform Roger Hsiao, Ralf Meermeier, Tim Ng, Zhongqiang Huang, Maxwell Jordan, Enoch Kan, Tanel Alumäe, Jan
More informationarxiv: v1 [cs.cv] 6 Jul 2016
arxiv:607.079v [cs.cv] 6 Jul 206 Deep CORAL: Correlation Alignment for Deep Domain Adaptation Baochen Sun and Kate Saenko University of Massachusetts Lowell, Boston University Abstract. Deep neural networks
More informationChannel Locality Block: A Variant of Squeeze-and-Excitation
Channel Locality Block: A Variant of Squeeze-and-Excitation 1 st Huayu Li Northern Arizona University Flagstaff, United State Northern Arizona University hl459@nau.edu arxiv:1901.01493v1 [cs.lg] 6 Jan
More informationTowards Effective Deep Learning for Constraint Satisfaction Problems
Towards Effective Deep Learning for Constraint Satisfaction Problems Hong Xu ( ), Sven Koenig, and T. K. Satish Kumar University of Southern California, Los Angeles, CA 90089, United States of America
More informationarxiv: v2 [cs.ne] 26 Apr 2014
One weird trick for parallelizing convolutional neural networks Alex Krizhevsky Google Inc. akrizhevsky@google.com April 29, 2014 arxiv:1404.5997v2 [cs.ne] 26 Apr 2014 Abstract I present a new way to parallelize
More informationLEARNING REPRESENTATIONS FOR FASTER SIMILARITY SEARCH
LEARNING REPRESENTATIONS FOR FASTER SIMILARITY SEARCH Anonymous authors Paper under double-blind review ABSTRACT In high dimensions, the performance of nearest neighbor algorithms depends crucially on
More informationMulti-Glance Attention Models For Image Classification
Multi-Glance Attention Models For Image Classification Chinmay Duvedi Stanford University Stanford, CA cduvedi@stanford.edu Pararth Shah Stanford University Stanford, CA pararth@stanford.edu Abstract We
More informationREVISITING DISTRIBUTED SYNCHRONOUS SGD
REVISITING DISTRIBUTED SYNCHRONOUS SGD Jianmin Chen, Xinghao Pan, Rajat Monga, Samy Bengio Google Brain Mountain View, CA, USA {jmchen,xinghao,rajatmonga,bengio}@google.com Rafal Jozefowicz OpenAI San
More informationAdditive Manufacturing Defect Detection using Neural Networks
Additive Manufacturing Defect Detection using Neural Networks James Ferguson Department of Electrical Engineering and Computer Science University of Tennessee Knoxville Knoxville, Tennessee 37996 Jfergu35@vols.utk.edu
More informationarxiv: v1 [stat.ml] 13 Nov 2017
SIMPLE AND EFFICIENT ARCHITECTURE SEARCH FOR CONVOLUTIONAL NEURAL NETWORKS Thomas Elsken Bosch Center for Artificial Intelligence, Robert Bosch GmbH & University of Freiburg Thomas.Elsken@de.bosch.com
More informationA Cellular Similarity Metric Induced by Siamese Convolutional Neural Networks
A Cellular Similarity Metric Induced by Siamese Convolutional Neural Networks Morgan Paull Stanford Bioengineering mpaull@stanford.edu Abstract High-throughput microscopy imaging holds great promise for
More informationHuman Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016
Human Action Recognition Using CNN and BoW Methods Stanford University CS229 Machine Learning Spring 2016 Max Wang mwang07@stanford.edu Ting-Chun Yeh chun618@stanford.edu I. Introduction Recognizing human
More informationDeep Learning for Computer Vision II
IIIT Hyderabad Deep Learning for Computer Vision II C. V. Jawahar Paradigm Shift Feature Extraction (SIFT, HoG, ) Part Models / Encoding Classifier Sparrow Feature Learning Classifier Sparrow L 1 L 2 L
More informationCaffe tutorial. Seong Joon Oh
Caffe tutorial Seong Joon Oh What is Caffe? Convolution Architecture For Feature Extraction (CAFFE) Open framework, models, and examples for deep learning 600+ citations, 100+ contributors, 7,000+ stars,
More informationarxiv: v1 [cs.ir] 15 Sep 2017
Algorithms and Architecture for Real-time Recommendations at News UK Dion Bailey 1, Tom Pajak 1, Daoud Clarke 2, and Carlos Rodriguez 3 arxiv:1709.05278v1 [cs.ir] 15 Sep 2017 1 News UK, London, UK, {dion.bailey,
More informationTiny ImageNet Visual Recognition Challenge
Tiny ImageNet Visual Recognition Challenge Ya Le Department of Statistics Stanford University yle@stanford.edu Xuan Yang Department of Electrical Engineering Stanford University xuany@stanford.edu Abstract
More informationRethinking the Inception Architecture for Computer Vision
Rethinking the Inception Architecture for Computer Vision Christian Szegedy Google Inc. szegedy@google.com Vincent Vanhoucke vanhoucke@google.com Zbigniew Wojna University College London zbigniewwojna@gmail.com
More informationA Dense Take on Inception for Tiny ImageNet
A Dense Take on Inception for Tiny ImageNet William Kovacs Stanford University kovacswc@stanford.edu Abstract Image classificiation is one of the fundamental aspects of computer vision that has seen great
More informationReal Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications
Real Time Monitoring of CCTV Camera Images Using Object Detectors and Scene Classification for Retail and Surveillance Applications Anand Joshi CS229-Machine Learning, Computer Science, Stanford University,
More informationReal-time Object Detection CS 229 Course Project
Real-time Object Detection CS 229 Course Project Zibo Gong 1, Tianchang He 1, and Ziyi Yang 1 1 Department of Electrical Engineering, Stanford University December 17, 2016 Abstract Objection detection
More informationarxiv: v1 [cs.cv] 21 Dec 2013
Intriguing properties of neural networks Christian Szegedy Google Inc. Wojciech Zaremba New York University Ilya Sutskever Google Inc. Joan Bruna New York University arxiv:1312.6199v1 [cs.cv] 21 Dec 2013
More informationInternational Journal of Computer Engineering and Applications, Volume XII, Special Issue, September 18,
REAL-TIME OBJECT DETECTION WITH CONVOLUTION NEURAL NETWORK USING KERAS Asmita Goswami [1], Lokesh Soni [2 ] Department of Information Technology [1] Jaipur Engineering College and Research Center Jaipur[2]
More informationTowards Representation Learning for Biomedical Concept Detection in Medical Images: UA.PT Bioinformatics in ImageCLEF 2017
Towards Representation Learning for Biomedical Concept Detection in Medical Images: UA.PT Bioinformatics in ImageCLEF 2017 Eduardo Pinho, João Figueira Silva, Jorge Miguel Silva, and Carlos Costa DETI
More informationDistributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability.
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability. arxiv:1609.06870v4 [cs.cv] 5 Dec 2016 Janis Keuper Fraunhofer ITWM Competence Center High Performance
More informationarxiv: v1 [cs.cv] 31 Jul 2017
Fashioning with Networks: Neural Style Transfer to Design Clothes Prutha Date University Of Maryland Baltimore County (UMBC), Baltimore, MD, USA dprutha1@umbc.edu Ashwinkumar Ganesan University Of Maryland
More informationNeural Networks with Input Specified Thresholds
Neural Networks with Input Specified Thresholds Fei Liu Stanford University liufei@stanford.edu Junyang Qian Stanford University junyangq@stanford.edu Abstract In this project report, we propose a method
More informationarxiv: v1 [cs.lg] 7 Dec 2018
NONLINEAR CONJUGATE GRADIENTS FOR SCALING SYNCHRONOUS DISTRIBUTED DNN TRAINING Saurabh Adya 1 Vinay Palakkode 1 Oncel Tuzel 1 arxiv:1812.02886v1 [cs.lg] 7 Dec 2018 ABSTRACT Nonlinear conjugate gradient
More informationarxiv: v2 [cs.cv] 3 Jan 2014
Intriguing properties of neural networks Christian Szegedy Google Inc. Wojciech Zaremba New York University Ilya Sutskever Google Inc. Joan Bruna New York University arxiv:1312.6199v2 [cs.cv] 3 Jan 2014
More informationReal Time Road Lane Segmentation and Tracking System
Real Time Road Lane Segmentation and Tracking System Michael Person, Mathew Jensen, Anthony O. Smith, Nezamoddin Nezamoddini-Kachouie, Marius Silaghi IGVC Spec 2 Team College of Engineering and Computing
More informationGradient-learned Models for Stereo Matching CS231A Project Final Report
Gradient-learned Models for Stereo Matching CS231A Project Final Report Leonid Keselman Stanford University leonidk@cs.stanford.edu Abstract In this project, we are exploring the application of machine
More informationCS 231N Final Project Report: Cervical Cancer Screening
CS 231N Final Project Report: Cervical Cancer Screening Huyen Nguyen Stanford University huyenn@stanford.edu Tucker Leavitt Stanford University tuckerl@stanford.edu Yianni Laloudakis Stanford University
More informationImplementation of Deep Convolutional Neural Net on a Digital Signal Processor
Implementation of Deep Convolutional Neural Net on a Digital Signal Processor Elaina Chai December 12, 2014 1. Abstract In this paper I will discuss the feasibility of an implementation of an algorithm
More informationFully Convolutional Neural Networks For Remote Sensing Image Classification
Fully Convolutional Neural Networks For Remote Sensing Image Classification Emmanuel Maggiori, Yuliya Tarabalka, Guillaume Charpiat, Pierre Alliez To cite this version: Emmanuel Maggiori, Yuliya Tarabalka,
More informationProgressive Neural Architecture Search
Progressive Neural Architecture Search Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy 09/10/2018 @ECCV 1 Outline Introduction
More informationDeep Neural Networks:
Deep Neural Networks: Part II Convolutional Neural Network (CNN) Yuan-Kai Wang, 2016 Web site of this course: http://pattern-recognition.weebly.com source: CNN for ImageClassification, by S. Lazebnik,
More informationQuantifying Translation-Invariance in Convolutional Neural Networks
Quantifying Translation-Invariance in Convolutional Neural Networks Eric Kauderer-Abrams Stanford University 450 Serra Mall, Stanford, CA 94305 ekabrams@stanford.edu Abstract A fundamental problem in object
More informationATTACKING BINARIZED NEURAL NETWORKS
ATTACKING BINARIZED NEURAL NETWORKS Angus Galloway 1, Graham W. Taylor 1,2,3 Medhat Moussa 1 1 School of Engineering, University of Guelph, Canada 2 Canadian Institute for Advanced Research 3 Vector Institute
More informationDeep Learning Anthropomorphic 3D Point Clouds from a Single Depth Map Camera Viewpoint
Deep Learning Anthropomorphic 3D Point Clouds from a Single Depth Map Camera Viewpoint Nolan Lunscher University of Waterloo 200 University Ave W. nlunscher@uwaterloo.ca John Zelek University of Waterloo
More informationFast Sliding Window Classification with Convolutional Neural Networks
Fast Sliding Window Classification with Convolutional Neural Networks Henry G. R. Gouk Department of Computer Science University of Waikato Private Bag 3105, Hamilton 3240, New Zealand hgouk@waikato.ac.nz
More informationDeep Embedded Clustering with Data Augmentation
Proceedings of Machine Learning Research 95:550-565, 2018 ACML 2018 Deep Embedded Clustering with Data Augmentation Xifeng Guo GUOXIFENG1990@163.COM En Zhu ENZHU@NUDT.EDU.CN Xinwang Liu XINWANGLIU@NUDT.EDU.CN
More informationDXTK : Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit
DXTK : Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit Nicholas D. Lane, Sourav Bhattacharya, Akhil Mathur Claudio Forlivesi, Fahim Kawsar Nokia Bell Labs,
More informationAdditive Manufacturing Defect Detection using Neural Networks. James Ferguson May 16, 2016
Additive Manufacturing Defect Detection using Neural Networks James Ferguson May 16, 2016 Outline Introduction Background Edge Detection Methods Results Porosity Detection Methods Results Conclusion /
More informationRETURNN: THE RWTH EXTENSIBLE TRAINING FRAMEWORK FOR UNIVERSAL RECURRENT NEURAL NETWORKS
RETURNN: THE RWTH EXTENSIBLE TRAINING FRAMEWORK FOR UNIVERSAL RECURRENT NEURAL NETWORKS Patrick Doetsch, Albert Zeyer, Paul Voigtlaender, Ilia Kulikov, Ralf Schlüter, Hermann Ney Human Language Technology
More information