Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity
|
|
- Gary Isaac Fields
- 5 years ago
- Views:
Transcription
1 Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity Anonymous Author(s) Affiliation Address Abstract Sum-product networks are a multi-layered architecture for computing the joint probabilities of a set of input features. These networks were recently proposed as a structure capable of efficiently performing inference and they have been demonstrated to have performance superior to that of deep belief networks in the domain of visual classification. This paper explains the principles that govern the usage of sum-product networks and some of their theoretical background and proposes a method for testing the performance of the sum-product network in the novel domain of molecular activity prediction. The report concludes by reporting on the development of a port of the implementation of the sum-product network into the Python scripting language. 1 Introduction In October of 2012, a group of students working under the supervision of Professor Hinton at the University of Toronto won the Merck Molecular Activity Challenge hosted on [1]. The goal of the challenge was to predict the activity of molecules in different contexts given numerical descriptors generated from their structure. In practice, this required participants to analyze 15 data sets that each contained thousands of training examples with tens of thousands of features. Features are sometimes shared between data sets, but every data set contains many unique features as well. George Dahl, one of the students involved in the winning entry, wrote an article after the competition explaining that a deep neural network trained with dropout was used for prediction [1]. The success of this algorithm over other solutions that employed more pre-processing of features is a demonstration of the power of deep learning techniques. However, using a neural network is not the only deep learning option available. In 2011, Poon and Domingos introduced a new architecture called a sum-product network (SPN) that was designed for efficient inference with a complex partition function [2]. One of the results reported in the Poon and Domingos paper is that the SPN architecture outperformed the deep belief network (DBN) architecture used by Hinton et al. by a wide margin on visual classification and face completion tasks. The question that this paper will address is whether the SPN architecture can be adapted from the visual recognition domain to other areas in which neural networks currently dominate. The paper is structured into four main components. First, an introductory view of neural network and SPN architectures will be presented with an emphasis on the motivation for their usage. Second, claims about the SPN architecture and its strengths relative to other architectures will be discussed from a theoretical perspective. Third, a method for adapting the sum-product network to the Merck Molecular Activity Challenge prediction task is proposed. Finally, the implementation of a basic sum-product network is discussed. 1
2 Neural Networks Artificial neural networks use layers of units to represent a functional mapping. In a typical feedforward network, the neurons in each layer can only connect to neurons in the next layer. Thus, if a ij is the jth neuron in layer i, it can connect to neurons of the form a k(j+1). Connections between neurons are weighted and all the incoming connections to a neuron are summed. The output of each neuron is determined by an activation function which is a function of the summed inputs. By introducing hidden layers of neurons between the input and output layers, neural networks can compactly represent arbitrary functions. Neural networks with at least one hidden layer and continuous, bounded, and non-constant activation functions have been proven to be universal approximators [3]. The significance of this result is that neural networks can represent functions arbitrarly well given a sufficient number of neurons. The neural networks employed by Hinton et al. in their Merck Challenge used rectified linear activation functions, multiple hidden layers, and dropout regularization [1]. Dropout is a recently developed technique for increasing the robustness of neural networks that functions by randomly omitting feature detectors during training rounds [4]. Omitting features during training prevents neurons from co-adapting to features and overfitting the data. This process is conceptually equivalent to averaging multiple models together, but it is more efficient computationally. 1.2 Sum-Product Networks Sum-product networks are motivated by the difficulty of exact inference in graphical models. Consider a graphical model written in the form P (X = x) = 1 Z k φ ( ) k x{k} where x is a vector, x{k} is a subset of x which forms the scope of the potential function φ k, and Z is the partition function. Performing inference requires summing the product of exponentially many potential functions to obtain Z = x k φ k(x {k} ). The sum-product network is based on network polynomials which are an alternate representation of the potential function. The network polynomial is constructed by multiplying the probability at a state x, p(x), with all of the indicator variables that have a value of one in that state. This operation is repeated for all states to obtain a set of products that are then summed together to yield the network polynomial. The operations required to compute the network polynomial can be represented as a tree with each product forming a node between the indicator variables and the summation. This representation suffers from the same problem as inference in graphical models in that the number of product nodes grows exponentially with the number of indicators. The root of the problem is that one product node is required for each possible state. The insight that allows sum-product networks to avoid this problem is that they add additional layers of sums and products which enables states to be reused. A sum-product network is a directed acyclic graph formed from summation and product nodes. The leaf nodes in the graph are binary indicator variables and the negation of all these indicators. The edges from summation nodes to their children are weighted with non-negative values but edges from product nodes to their children are not associated with weights. Figure 1 shows an example of a sum-product network. The network in figure 1 is a tree, but connections between nodes are not restricted to adjacent levels as long as all connected nodes alternate between product nodes and sum nodes. Figure 1: A sum-product network with four independent binary values. Bars over variable names indicate negation. 2
3 The sum-product network can be used to find the joint probability of a set of variables by summing the weighted values at summation nodes and multiplying these values at product nodes. The joint probability is the value of the root node. Marginal probabilities can be computed by choosing a variable to sum out and setting both the indicator for that variable and the indicator for the negation of the variable to one. 2 Theoretical Properties of Sum-Product Networks Sum-product networks are defined to be valid if and only if the value of the root node is always equal to the probability of observing the state indicated by its leaf nodes. This is an informal explanation and assumes that the state indicated by the leaf nodes is a valid state; in the event that both an indicator and its negation are set to one the probability of the indicated state is zero, but as mentioned above what will occur is that the network will marginalize out the two indicators. The technical definition of this property is that the network is valid iff S(e) = φ S (e) where e is an event, S(e) is the value of the root node given the input e to the indicators and phi S (e) is the probability of the event. Guaranteeing that the networks constructed using an SPN are valid is an important part of their power. As proven by Poon and Domingo [1], there are two properties that need to be met for a SPN to be valid. The children of a sum node must all be functions of the same variables and all product nodes must not be functions of both a variable and its negation. These two conditions can be easily met in the construction of the network are not affected by changing the weights of edges during training, thus sum product-networks can always guarantee that inference is possible by evaluating the values in the network. Dellaleau and Bengio showed that the depth of the network, measured as the maximum number of alternating sum and product layers, affect the representational ability of the network [5]. Their theoretical results focused on two particular classes of functions and proved that the number of hidden units in deep representations grew slower than the same quanity for shallow representations when the same functions were represented. They concluded that deep networks offer much more compact representations than shallow networks, however their results do not cover all the functions that the SPN architecture is capable of representing. 3 Adapting a SPN for the Merck Challenge There are a number of modifications that need to be made to the form of the sum-product network presented in section 1.2 for the network to be usable on the molecular activity prediction task described in section 1. This section outlines these considerations and ends by detailing how the output of the system can be compared to the results of the Merck Molecular Activity Challenge. 3.1 Continuous Input The features in the Merck data set have integers values which is problematic since the sum-product networks shown to this point have used binary features exclusively. This limitation can be overcome by using integral nodes instead of sum nodes. The idea is to treat real-valued features as samples drawn from a multinomial distribution with an infinite number of values. If each input feature is drawn from a multinomial distribution of infinitely many variables, then the weighted sum of indicators becomes an integral over the probability distribution. In the original paper on sum-product networks, Poon and Domingos assumed that pixel values were drawn from a mixture of Gaussians model [1]. The procedure for converting a real-valued input into a continuous sample begins by normalizing the input features to have zero mean and unit variance. The input values for each feature are then divided into k equal sized sets and the mean value of each set is used as the mean of a Gaussian in the mixture of Gaussians model. 3
4 Prediction The value of the root node in the sum product network naturally yields probabilities for the network. For prediction, the value of interest is not the probability of the inputs but the value of an output variable. Unlike a neural network, an SPN is not an input-output mapping and to obtain an output prediction it must first be added to the network using indicator variables. During training, the values of these indicator variables are set according to the sample result. During testing, the values of the indicator variables are obtained as the values that maximize the network value. 3.3 Overlapping Feature Sets The presence of overlapping features in the Merck data set is one the properties that allowed a deep belief network to outperform other solutions by exploiting the shared structure in the data. Capturing these shared relationships requires building one SPN that spans all of the data sets. The training of this larger network differs from standard training because not all the indicator variables will have set values since the same features aren t present in every set. A simple method for addressing this new scenario is to train the network with the non-observed features marginalized out by setting their indicator values and their negations to one. 3.4 Evaluation The evaluation of the Merck Challenge was based on the R-squared metric: R 2 = (X X)2 (Y Y ) 2 The dataset is divided with a temporal split as data for testing comes from later assays of molecules than the training data. At the time the Merck Challenge ended, the R-squared score to beat was Details about the confidence of the prediction and its robustness is unfortunately not available from this single metric, so evaluation can be supplemented by using a bootstrap method to obtain confidence intervals on the R-squared score. The bootstrap evaluation procedes by drawing N samples with replacement from the testing samples and computing the R-squared metric over these samples to obtain ˆR 2 1. This process is repeated for k iterations to build the set { ˆR 2 1, ˆR 2 2,..., ˆR 2 k }. Ordering this set and finding the 2.5 and 97.5 percentile provides an estimate of a 95 percent confidence interval. 4 SPN Implementation Poon and Domingos released a Java implementation of code for constructing and learning an SPN. Their code is intended to run on large distributed systems and as such depends on a message passing protocol between many computing nodes. The limitations of running this system on personal computing hardware motivated the development of a Python-based implementation of the system. Python was chosen as a platform as it is platform-independent, gaining popularity amongst the machine learning community, and accompanied by support tools such as Theano that provide access to hardware accleration [6]. The algorithm initializes by constructing a densely-connected sum-product network with zero edge weights. From this point, the basic learning algorithm presented in [1] is followed by incrementing edge weights following inference with each data sample. The data samples are presented iteratively until the edge weights converge. Edges with zero weight are removed from the final graph. The performance of the Python port is not yet capable of computing reasonable predictions for the Merck challenge due to the scale of the data set. Even with the increased efficiency of the sum-product network, learning the model of the Merck data set can only realistically be feasible with GPU-accelerated code or distributed computing, neither of which have yet been implemented. However, the current Python code could serve as a pedagogical tool in the explanation of sumproduct networks and future work will address these performance issues. 4
5 Conclusions and Contributions This work has relied heavily on the original SPN paper published by Poon and Domingo as it remains one of the only published resources about this new architecture. It is hoped that the development of an alternative Python implementation of sum-product network construction and training will spur additional interest in this line of research and its applications outside of vision processing. References [1] Dahl, G. (2012) Deep Learning How I Did It: Merck 1st place interview. Online article available from [2] Poon, H. & Domingos, P. (2011, November). Sum-product networks: A new deep architecture. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on IEEE. [3] Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks 4(2): [4] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arxiv preprint arxiv: [5] Delalleau, O. & Bengio Y. (2011). Shallow vs. Deep Sum-Product Networks. In Proceedings of the 25th Conference on Neural Information Processing Systems. [6] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley & Y. Bengio. Theano: A CPU and GPU Math Expression Compiler. Proceedings of the Python for Scientific Computing Conference (SciPy) June 30 - July 3, Austin, TX 5
Emotion Detection using Deep Belief Networks
Emotion Detection using Deep Belief Networks Kevin Terusaki and Vince Stigliani May 9, 2014 Abstract In this paper, we explore the exciting new field of deep learning. Recent discoveries have made it possible
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationLearning visual odometry with a convolutional network
Learning visual odometry with a convolutional network Kishore Konda 1, Roland Memisevic 2 1 Goethe University Frankfurt 2 University of Montreal konda.kishorereddy@gmail.com, roland.memisevic@gmail.com
More informationStochastic Gradient Descent Algorithm in the Computational Network Toolkit
Stochastic Gradient Descent Algorithm in the Computational Network Toolkit Brian Guenter, Dong Yu, Adam Eversole, Oleksii Kuchaiev, Michael L. Seltzer Microsoft Corporation One Microsoft Way Redmond, WA
More informationCambridge Interview Technical Talk
Cambridge Interview Technical Talk February 2, 2010 Table of contents Causal Learning 1 Causal Learning Conclusion 2 3 Motivation Recursive Segmentation Learning Causal Learning Conclusion Causal learning
More informationDeep Learning With Noise
Deep Learning With Noise Yixin Luo Computer Science Department Carnegie Mellon University yixinluo@cs.cmu.edu Fan Yang Department of Mathematical Sciences Carnegie Mellon University fanyang1@andrew.cmu.edu
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationConvolutional Neural Networks
Lecturer: Barnabas Poczos Introduction to Machine Learning (Lecture Notes) Convolutional Neural Networks Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
More informationAdvanced Introduction to Machine Learning, CMU-10715
Advanced Introduction to Machine Learning, CMU-10715 Deep Learning Barnabás Póczos, Sept 17 Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio
More informationDeep Learning with Tensorflow AlexNet
Machine Learning and Computer Vision Group Deep Learning with Tensorflow http://cvml.ist.ac.at/courses/dlwt_w17/ AlexNet Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton, "Imagenet classification
More informationProbabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation
Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation Daniel Lowd January 14, 2004 1 Introduction Probabilistic models have shown increasing popularity
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA January 31, 2018 1 / 30 OUTLINE 1 Keras: Introduction 2 Installing Keras 3 Keras: Building, Testing, Improving A Simple Network 2 / 30
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 7: Universal Approximation Theorem, More Hidden Units, Multi-Class Classifiers, Softmax, and Regularization Peter Belhumeur Computer Science Columbia University
More informationKeras: Handwritten Digit Recognition using MNIST Dataset
Keras: Handwritten Digit Recognition using MNIST Dataset IIT PATNA February 9, 2017 1 / 24 OUTLINE 1 Introduction Keras: Deep Learning library for Theano and TensorFlow 2 Installing Keras Installation
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationRobust PDF Table Locator
Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records
More informationAllstate Insurance Claims Severity: A Machine Learning Approach
Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationMotivation Dropout Fast Dropout Maxout References. Dropout. Auston Sterling. January 26, 2016
Dropout Auston Sterling January 26, 2016 Outline Motivation Dropout Fast Dropout Maxout Co-adaptation Each unit in a neural network should ideally compute one complete feature. Since units are trained
More informationDeep Learning. Volker Tresp Summer 2015
Deep Learning Volker Tresp Summer 2015 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationGlobal Optimality in Neural Network Training
Global Optimality in Neural Network Training Benjamin D. Haeffele and René Vidal Johns Hopkins University, Center for Imaging Science. Baltimore, USA Questions in Deep Learning Architecture Design Optimization
More informationPart II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between
More informationImproving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah
Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah Reference Most of the slides are taken from the third chapter of the online book by Michael Nielson: neuralnetworksanddeeplearning.com
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationWilliam Yang Group 14 Mentor: Dr. Rogerio Richa Visual Tracking of Surgical Tools in Retinal Surgery using Particle Filtering
Mutual Information Computation and Maximization Using GPU Yuping Lin and Gérard Medioni Computer Vision and Pattern Recognition Workshops (CVPR) Anchorage, AK, pp. 1-6, June 2008 Project Summary and Paper
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More informationD-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either
More informationSupervised Learning of Classifiers
Supervised Learning of Classifiers Carlo Tomasi Supervised learning is the problem of computing a function from a feature (or input) space X to an output space Y from a training set T of feature-output
More informationEfficient Feature Learning Using Perturb-and-MAP
Efficient Feature Learning Using Perturb-and-MAP Ke Li, Kevin Swersky, Richard Zemel Dept. of Computer Science, University of Toronto {keli,kswersky,zemel}@cs.toronto.edu Abstract Perturb-and-MAP [1] is
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationRestricted Boltzmann Machines. Shallow vs. deep networks. Stacked RBMs. Boltzmann Machine learning: Unsupervised version
Shallow vs. deep networks Restricted Boltzmann Machines Shallow: one hidden layer Features can be learned more-or-less independently Arbitrary function approximator (with enough hidden units) Deep: two
More informationDNN-BASED AUDIO SCENE CLASSIFICATION FOR DCASE 2017: DUAL INPUT FEATURES, BALANCING COST, AND STOCHASTIC DATA DUPLICATION
DNN-BASED AUDIO SCENE CLASSIFICATION FOR DCASE 2017: DUAL INPUT FEATURES, BALANCING COST, AND STOCHASTIC DATA DUPLICATION Jee-Weon Jung, Hee-Soo Heo, IL-Ho Yang, Sung-Hyun Yoon, Hye-Jin Shim, and Ha-Jin
More informationA Co-Clustering approach for Sum-Product Network Structure Learning
Università degli Studi di Bari Dipartimento di Informatica LACAM Machine Learning Group A Co-Clustering approach for Sum-Product Network Antonio Vergari Nicola Di Mauro Floriana Esposito December 8, 2014
More informationLearning Social Graph Topologies using Generative Adversarial Neural Networks
Learning Social Graph Topologies using Generative Adversarial Neural Networks Sahar Tavakoli 1, Alireza Hajibagheri 1, and Gita Sukthankar 1 1 University of Central Florida, Orlando, Florida sahar@knights.ucf.edu,alireza@eecs.ucf.edu,gitars@eecs.ucf.edu
More informationStudy of Residual Networks for Image Recognition
Study of Residual Networks for Image Recognition Mohammad Sadegh Ebrahimi Stanford University sadegh@stanford.edu Hossein Karkeh Abadi Stanford University hosseink@stanford.edu Abstract Deep neural networks
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian
More informationStochastic Function Norm Regularization of DNNs
Stochastic Function Norm Regularization of DNNs Amal Rannen Triki Dept. of Computational Science and Engineering Yonsei University Seoul, South Korea amal.rannen@yonsei.ac.kr Matthew B. Blaschko Center
More informationDeep Learning Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD.
Deep Learning 861.061 Basic Lecture - Complex Systems & Artificial Intelligence 2017/18 (VO) Asan Agibetov, PhD asan.agibetov@meduniwien.ac.at Medical University of Vienna Center for Medical Statistics,
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More informationA Performance Comparison of Random Forests and Dropout Nets on Sign Language Gesture Classification Using the Microsoft Kinect
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 A Performance
More informationReport: Privacy-Preserving Classification on Deep Neural Network
Report: Privacy-Preserving Classification on Deep Neural Network Janno Veeorg Supervised by Helger Lipmaa and Raul Vicente Zafra May 25, 2017 1 Introduction In this report we consider following task: how
More informationAn Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation
An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007
More informationComputer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models
Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2
More informationInference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias
Inference Complexity s Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Don t use model complexity as your learning bias Use inference complexity. Joint work with
More informationIntroduction to Deep Learning
ENEE698A : Machine Learning Seminar Introduction to Deep Learning Raviteja Vemulapalli Image credit: [LeCun 1998] Resources Unsupervised feature learning and deep learning (UFLDL) tutorial (http://ufldl.stanford.edu/wiki/index.php/ufldl_tutorial)
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationEnd-To-End Spam Classification With Neural Networks
End-To-End Spam Classification With Neural Networks Christopher Lennan, Bastian Naber, Jan Reher, Leon Weber 1 Introduction A few years ago, the majority of the internet s network traffic was due to spam
More informationMondrian Forests: Efficient Online Random Forests
Mondrian Forests: Efficient Online Random Forests Balaji Lakshminarayanan (Gatsby Unit, UCL) Daniel M. Roy (Cambridge Toronto) Yee Whye Teh (Oxford) September 4, 2014 1 Outline Background and Motivation
More informationLearning the Structure of Sum-Product Networks. Robert Gens Pedro Domingos
Learning the Structure of Sum-Product Networks Robert Gens Pedro Domingos w 20 10x O(n) X Y LL PLL CLL CMLL Motivation SPN Structure Experiments Review Learning Graphical Models Representation Inference
More informationDeep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks
Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin
More informationGraph Neural Network. learning algorithm and applications. Shujia Zhang
Graph Neural Network learning algorithm and applications Shujia Zhang What is Deep Learning? Enable computers to mimic human behaviour Artificial Intelligence Machine Learning Subset of ML algorithms using
More informationPouya Kousha Fall 2018 CSE 5194 Prof. DK Panda
Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda 1 Observe novel applicability of DL techniques in Big Data Analytics. Applications of DL techniques for common Big Data Analytics problems. Semantic indexing
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationConvolutional Neural Networks for No-Reference Image Quality Assessment
Convolutional Neural Networks for No-Reference Image Quality Assessment Le Kang 1, Peng Ye 1, Yi Li 2, and David Doermann 1 1 University of Maryland, College Park, MD, USA 2 NICTA and ANU, Canberra, Australia
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationFor Monday. Read chapter 18, sections Homework:
For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron
More informationData Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank Implementation: Real machine learning schemes Decision trees Classification
More informationEnergy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt
Energy Based Models, Restricted Boltzmann Machines and Deep Networks Jesse Eickholt ???? Who s heard of Energy Based Models (EBMs) Restricted Boltzmann Machines (RBMs) Deep Belief Networks Auto-encoders
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationNeural Network Optimization and Tuning / Spring 2018 / Recitation 3
Neural Network Optimization and Tuning 11-785 / Spring 2018 / Recitation 3 1 Logistics You will work through a Jupyter notebook that contains sample and starter code with explanations and comments throughout.
More informationNeural Network Application Design. Supervised Function Approximation. Supervised Function Approximation. Supervised Function Approximation
Supervised Function Approximation There is a tradeoff between a network s ability to precisely learn the given exemplars and its ability to generalize (i.e., inter- and extrapolate). This problem is similar
More informationFrom Maxout to Channel-Out: Encoding Information on Sparse Pathways
From Maxout to Channel-Out: Encoding Information on Sparse Pathways Qi Wang and Joseph JaJa Department of Electrical and Computer Engineering and, University of Maryland Institute of Advanced Computer
More informationConvolutional Neural Networks. Computer Vision Jia-Bin Huang, Virginia Tech
Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Today s class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization:
More informationParallel one-versus-rest SVM training on the GPU
Parallel one-versus-rest SVM training on the GPU Sander Dieleman*, Aäron van den Oord*, Benjamin Schrauwen Electronics and Information Systems (ELIS) Ghent University, Ghent, Belgium {sander.dieleman,
More informationTraining Convolutional Neural Networks for Translational Invariance on SAR ATR
Downloaded from orbit.dtu.dk on: Mar 28, 2019 Training Convolutional Neural Networks for Translational Invariance on SAR ATR Malmgren-Hansen, David; Engholm, Rasmus ; Østergaard Pedersen, Morten Published
More informationDeep Learning. Volker Tresp Summer 2014
Deep Learning Volker Tresp Summer 2014 1 Neural Network Winter and Revival While Machine Learning was flourishing, there was a Neural Network winter (late 1990 s until late 2000 s) Around 2010 there
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationLinear Separability. Linear Separability. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons. Capabilities of Threshold Neurons
Linear Separability Input space in the two-dimensional case (n = ): - - - - - - w =, w =, = - - - - - - w = -, w =, = - - - - - - w = -, w =, = Linear Separability So by varying the weights and the threshold,
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationNeural Networks: promises of current research
April 2008 www.apstat.com Current research on deep architectures A few labs are currently researching deep neural network training: Geoffrey Hinton s lab at U.Toronto Yann LeCun s lab at NYU Our LISA lab
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationDropout. Sargur N. Srihari This is part of lecture slides on Deep Learning:
Dropout Sargur N. srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationAnalysis: TextonBoost and Semantic Texton Forests. Daniel Munoz Februrary 9, 2009
Analysis: TextonBoost and Semantic Texton Forests Daniel Munoz 16-721 Februrary 9, 2009 Papers [shotton-eccv-06] J. Shotton, J. Winn, C. Rother, A. Criminisi, TextonBoost: Joint Appearance, Shape and Context
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationSupervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationMATTI TUHOLA WIRELESS ACCESS POINT QUALITY ASSESSMENT USING CONVOLUTIONAL NEURAL NETWORKS. Bachelor of Science Thesis
MATTI TUHOLA WIRELESS ACCESS POINT QUALITY ASSESSMENT USING CONVOLUTIONAL NEURAL NETWORKS Bachelor of Science Thesis Examiner: Heikki Huttunen Submitted: April 29, 2016 I ABSTRACT TAMPERE UNIVERSITY OF
More informationCOMP9444 Neural Networks and Deep Learning 7. Image Processing. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 7. Image Processing COMP9444 17s2 Image Processing 1 Outline Image Datasets and Tasks Convolution in Detail AlexNet Weight Initialization Batch Normalization
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationArtificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5
Artificial Neural Networks Lecture Notes Part 5 About this file: If you have trouble reading the contents of this file, or in case of transcription errors, email gi0062@bcmail.brooklyn.cuny.edu Acknowledgments:
More informationWeighted Convolutional Neural Network. Ensemble.
Weighted Convolutional Neural Network Ensemble Xavier Frazão and Luís A. Alexandre Dept. of Informatics, Univ. Beira Interior and Instituto de Telecomunicações Covilhã, Portugal xavierfrazao@gmail.com
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationNatural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu
Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward
More informationLoopy Belief Propagation
Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure
More informationFunction approximation using RBF network. 10 basis functions and 25 data points.
1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationTraining Restricted Boltzmann Machines using Approximations to the Likelihood Gradient
Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient Tijmen Tieleman tijmen@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, Ontario M5S
More informationNeuron Selectivity as a Biologically Plausible Alternative to Backpropagation
Neuron Selectivity as a Biologically Plausible Alternative to Backpropagation C.J. Norsigian Department of Bioengineering cnorsigi@eng.ucsd.edu Vishwajith Ramesh Department of Bioengineering vramesh@eng.ucsd.edu
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More information3D model classification using convolutional neural network
3D model classification using convolutional neural network JunYoung Gwak Stanford jgwak@cs.stanford.edu Abstract Our goal is to classify 3D models directly using convolutional neural network. Most of existing
More informationOn the Effectiveness of Neural Networks Classifying the MNIST Dataset
On the Effectiveness of Neural Networks Classifying the MNIST Dataset Carter W. Blum March 2017 1 Abstract Convolutional Neural Networks (CNNs) are the primary driver of the explosion of computer vision.
More informationHow Learning Differs from Optimization. Sargur N. Srihari
How Learning Differs from Optimization Sargur N. srihari@cedar.buffalo.edu 1 Topics in Optimization Optimization for Training Deep Models: Overview How learning differs from optimization Risk, empirical
More information