Intro to Learning in SD -1
|
|
- Harriet Curtis
- 5 years ago
- Views:
Transcription
1 Alessio Micheli Intro to Learning in SD -1 Alessio Micheli 1- Introduction to ReCNN Apr 2018 DRAFT, please do not circulate! Dipartimento di Informatica Università di Pisa - Italy Computational Intelligence & Machine Learning Group 1
2 Alessio Micheli Learning in Structured Domain Plan in 2 lectures 1. Recurrent and Recursive Neural Networks Extensions of models for supervised and unsupervised learning in structured domains Motivation and examples The structured data RNN and RecNN Recursive Cascade Correlation and other recursive approaches Unsupervised recursive models 2. Moving to DPAG and graphs: the role of causality 2
3 3 Why structured data? Alessio Micheli Because data have relationships
4 4 Introduction: Motivation of ML for SD Alessio Micheli Most of known ML methods are limited to the use of flat and fixed form of the data (vectors or sequences) fixed-length attribute-value vectors Central: data representation Graph: very useful abstraction for real data Labeled graphs = vector patterns + relationships natural: for structured domain richness efficiency: repetitive nature inherent the data SD + ML = adaptive processing of structured information
5 Alessio Micheli 5 Introduction: Research Area General Aim: investigation of ML models for the adaptive processing of structured information (e.g. sequences, trees, graphs): Structured domain learning, Learning in Structured Domain, Relational Learning. Fractal tree: a recursive structure
6 Alessio Micheli Advancements from ML to SDL Learning in Structured Domains (SD) in Pisa/CIML: Pioneering since the 90 s the development of Theoretical analysis New approaches Applications Especially on the basis of Recursive approaches. And for you? To build and advanced background for Analysis/development of innovative models Applications in the are of interdisciplinary projects (@ ciml) Practical: thesis are possible for the design of new models/applications for the extension of the input domains toward: Extension of the applicative domain Adaptivity and accuracy Efficiency 6
7 Feedforward versus Recurrent (memento) Alessio Micheli Feedforward: direction: input output Recurrent neural networks: A different category of architecture, based on the addition of feedback loops connections in the network topology, The presence of self-loop connections provides the network with dynamical properties, letting a memory of the past computations in the model. This allows us to extend the representation capability of the model to the processing of sequences (and structured data). Recurrent neural networks: They will be the subject (further developed) ISPR/CNS courses (see later). 7
8 Alessio Micheli From flat to structured data Flat: vectors (as in the rest of AA1) Structured: Sequences, trees, graphs, multi-relational data stringa_in_italiano Strings Proteins l 1 l 2 l 3 l 4 l 5 Series/ temporal stream Small molecules Network data 8
9 9 Alessio Micheli Introduction: Motivation of ML for SD - Examples Examples: Computer Science: AI: problem solving techniques, theorem proving: controlling search on tree by heuristics natural language processing: evaluation of parser trees image analysis: structured representation of components document processing: fuzzy evaluation of web document Natural Sciences: Computational biology and chemistry: quite complex domains where the managing of relationships and structure is relevant to achieving suitable modeling of the solutions.
10 10 Example: logo recognition Alessio Micheli
11 Alessio Micheli 11 Example: Terms in 1 st order logic
12 Alessio Micheli 12 Example: language parsing S VP Anchor NP NP PP NP NP NP Foot PRP VBZ DT NN IN PRPS NN NN It has no bearing on our work force
13 16 Example (graphs): Molecules Alessio Micheli A fundamental problem in Chemistry: correlate chemical structure of molecules with their properties (e.g. physico-chemical properties, or biological activity of molecules) in order to be able to predict these properties for new molecules Quantitative Structure-Property Relationship (QSPR) Quantitative Structure-Activity Relationship (QSAR) Property/Activity = T(Structure) T Property Value (regression) Toxic (yes/no) (classification) Molecules are not vectors! Molecules can be more naturally represented by varying size structures Can we predict directly from structures? QSPR: Correlate chemical structure of molecules with their properties
14 17 Introduction: Adaptive processing of SD Alessio Micheli The problem: there has been no systematic way to extract features or metrics relations between examples for SD A representation learning instances (extended to SD)! Goal: to learn a mapping between a structured information domain (SD) and a discrete or continuous space (transduction). Property/Activity = T(Structure) T Property Value (regression) Toxic (yes/no) (classification) Note: we have a set of trees/graphs in input Task: For instance, classify each graph [starting from a tr set of know couples (graph i, target i )]
15 Alessio Micheli Introduction: Learning Model for SD What we mean for adaptive processing of SD: extraction of the topological information directly from data H has to be able to represent hierarchic relationships adaptive measure of similarity on structures + apt learning rule efficient handling of structure variability Classical: efficient learning good generalization performance knowledge extraction capabilities 18
16 19 SD Learning scenario Alessio Micheli Model Symbolic Connectionist Probabilistic Data Type STATIC attribute/value, real vectors Rule induction, Decision trees NN, SVM Mixture models, Naïve Bayes SEQUENTIAL serially ordered entities Learning finite state automata Recurrent NN Hidden Markov Models STRUCTURAL relations among domain variables Inductive logic programming Recursive NN (Kernels forsd) Recursive Markov models
17 20 Preview: The RecNN idea Alessio Micheli Recursive NN: Recursive and parametric realization of the transduction function Adaptive by Neural Networks Fractal tree: a recursive structure
18 Alessio Micheli Structures: just use sequence? Can we process structures like they are sequences? E.g. any tree can be converted into a sequence (no information loss) but: Sequences may be long: number of vertex exp w.r.t. height of tree (aka the paths are log of #nodes for the tree, so the dependencies are much shorter) Dependencies are blurred out (arbitrary depending on the visit) Tree i g h i(g(d(a,b),(e(c))),h(f)) d e f???????? Children are far, distant relatives are close a b c 21
19 Alessio Micheli Learning in Structured Domain Plan in 2 lectures 1. Recurrent and Recursive Neural Networks Extensions of model for supervised and unsupervised learning in structured domains Motivation and examples The structured data RNN and RecNN Recursive Cascade Correlation and other recursive approaches Unsupervised recursive models 2. Moving to DPAG and graphs: the role of causality 22
20 Structured Domains l(v) L: set of attribute vectors Structure G: vertexes labels L + topology (skeleton of G) Sequences, Trees, DOAGs- DPAGs, graphs: Alessio Micheli G : labeled direct ordered/positional acyclic graphs with super-source A total order (or the position) on the edges leaving from each vertex Super-source: a vertex s such that every vertex can be reached by a direct path starting from s. Bounded out-degree and in-degree (the number of edges leaving and entering from a vertex v) DPAG: supeclass of the DOAGs: besides ordering, a distinctive positive integer can be associated to each edge, allowing some position to be absent. Trees: labeled rooted order trees, or positional (K-ary trees). Super-source: the root of the tree. Binary tree (K=2) Mostly used in this lecture 23
21 24 K-ary Trees Alessio Micheli k-ary trees (trees in the following) are rooted positional trees with finite out-degree k. Given a node v in the tree TG: The children of v are the node successors of v, each with a position j=1,...k; k is the maximum out-degree over G, i.e. the maximum number of children for each node; L(v) in is the input label associated with v, and L i (v) is the i-th element of the label; The subtree T (j) is a tree rooted at the j-th children of v. T root L T (1) ( k) T
22 Alessio Micheli Data Domains G We consider sets of DPAGs: labeled direct positional acyclic graphs with super-source, bounded in-degree and out-degree (k). Include sub-classes: DPAGs DOAGs k-ary trees sequences vectors. Notations: ch[v] set of successors of v ch j [v] is the j-th child of the node v 25
23 Alessio Micheli Structured Data: Examples Labeled Sequences, Trees, DOAGs- DPAGs, graphs l 1 l 1 l 2 l 3 l 4 l 5 Single labeled vertex Sequence d Supersource d d b c b c b c a a Rooted Tree a a DPAG a a DPAG: labeled direct positional acyclic graphs with super-source, bounded indegree and out-degree (k). Graph (undirected) 26
24 27 Trees and DPGAs Alessio Micheli f? f c c b a b a b b: shared node DPAG Tree Exercise after the lecture SD-2
25 Alessio Micheli Positional versus Ordered DPAG: for each vertex v in vert(g), an injective function S v : edg(v) [1,2,...,K] is defined on the edges leaving from v in_set(v ) = {u v V and u v } Predecessors out_set(v ) = {u v V and v u } Successors 28
26 Alessio Micheli Learning in Structured Domain Plan in 2 lectures 1. Recurrent and Recursive Neural Networks Extensions of model for supervised and unsupervised learning in structured domainsi Motivation and examples The structured data RNN and RecNN Recursive Cascade Correlation and other recursive approaches Unsupervised recursive models 2. Moving to DPAG and graphs: the role of causality 29
27 Alessio Micheli Neural Computing Approach NN are universal approximators (Theorem of Cybenko) NN can learn from example (automatic inference) NN can deal with noise and incomplete data NN can handle continuos real and discrete data Simple gradient descent technique for training Successful model in ML due to the flexibility in applications Domain Static fixed-dim patterns (vectors, records,...) Dynamical patterns (temporal sequences, sequences...) Structured patterns (DPAGs, trees...) Neural Network Feedforward Recurrent Recursive 30
28 31 Recurrent Neural Networks (resume) Alessio Micheli Up to now: internal state x( t) y( t) ( x( t 1), l( t)) g( x( t), l( t)) y x 1 q l l 1 l 2 l 3 l 4 l 5 l
29 Alessio Micheli RNN training and properties resume BPTT/RTRL see CNS course Unfolding (see ML lecture RNN): Back-Prop Through Time: Backprop on this enrolled version Causality: A system is causal if the output at time t 0 (or vertex v) only depends on inputs at time t<t 0 (depends only on v and its descendants) necessary and sufficient for internal state Stationarity: time invariance, state transition function is independent on node v (the same in any time) 32
30 Recursive Neural Networks (overview) Alessio Micheli Now: x( v) y( v) ( x(ch[ v]), l( v)) g( x(v), l( v)) x( ch[ v]) x(ch [ v]),, (ch [ v]) 1 x k State transition sytem f y d e x a b c l 1 q 1 l q k # feedbacks = # children 33
31 Alessio Micheli Generalized Shift Operators Standard shift operator (time): q -1 S t = S t-1 Generalized shift operators (structure): q j -1 G v = G chj [v] where ch j [v] is the j-th child of v RecNN with q: x( v) ( l( v), q y( v) g( x(v)) q q 1 j 1 j xi ( v) x0 x ( v) x (ch i i 1 if ch j x( v)) [ v])) [ v] nil j otherwise j v ch j [v] Used to extend the states for children of vertexes of the tree 34
32 35 Overview of Domains Alessio Micheli Synthesis formula (instantiation of domains for RNN): transduction Structure R G G : e.g. a labeled direct positional acyclic graphs with supersource or a labeled k-ary trees g output function IR m descriptor (or code) space IR n label space f I E IR T m g IR
33 36 As a transductions by recursion Alessio Micheli Key idea: family of recursive functions for processing of G T G : G O is a functional graph transduction. T G = g E (encoding and output functions) Link between state transition system and recursive transduction : x(v)= E (G v ) state x(v) in X associated to each v is the encoding of the subgraph rooted in v
34 37 Recursive Processing Alessio Micheli Recursive definition of E (encoding function) s G... G G ( 1) ( k ) E ( 0 if G is empty G) ( L, ( G ( 1) ),..., ( G ( k ) )) NN root E E E : systematic visit of G it guides the application of NN to each node of tree (bottom-up). Causality and stationary assumption.
35 38 Properties (I) Alessio Micheli Transduction: not-algebraic: T G depends not only on information associated with the current node, but also context information encoded for subgraphs Extension of causality and stationarity defined for sequence processing: A structural transduction is Causal if its output for a vertex v only depends on v and its descendants (induced subgraphs) Stationary: state transition function ( NN ) is independent on vertex v Recurrent/recursive NN transductions admit a recursive state representation with such properties Adaptivity (NN. learn. Alg.) + Universal approximation over the tree domain [Hammer ]
36 Properties (II) Alessio Micheli T G is IO-isomorph if X and T G (G) have the same skeleton (graph after removing labels) Input DOAG a b 1 T G b 0 0 a b 1 1 b b 0 0 Output DOAG Supersource transductions b Scalar Value a b Input DOAG a b b b Also knows as Structure-to-Element Supersource transductions 39
37 40 Properties (III) Alessio Micheli IO-isomorph causal transduction T G
38 Alessio Micheli Unfolding and Enc. Network We will see through RecNN data flow process with two points of view: 1. Unfolding by Walking on structures model (stationarity), according to causal assumption (inverse topological order*). The model visits the structures 2. Building an Encoding network isomorphic to the input structure (same skeleton, inverse arrows, again with stationarity) and causal assumption): We build a different encoding network for each input structure * see next slide 41
39 Alessio Micheli Topological Order A linear ordering of its nodes s.t. each node comes before all nodes to which it has edges. Every DAG has at least one topological sort, and may have many. A numbering of the vertices of a directed acyclic graph such that every edge from a vertex numbered i to a vertex numbered j satisfies i<j. According to a Partial order For RNN: Inverse topological order 42
40 44 Unfolding & Encoding Process: Unfolding view (1) Alessio Micheli f Unfolding the encoding process trough structures d e Bottom-up process for visiting a b c We will see later how to made it with NN (and hence by NN) for each step: to build an encoding network
41 45 RecNN over different structures: unfolding 2 Alessio Micheli OH Output for the tree O C CH 2 CH 3 CH 3 CH 3 CH 3 Start Start Start Examples on different trees for chemical compounds: Unfolding through structures: The same process apply to all the vertices of a tree and for all the trees in the data set Start CH 2 CH 3 Start
42 Alessio Micheli Unfolding (3) & Enc. Net. For Recurrent and Recursive NN (or Recursive unfold. for two different structures) NN NN for each step (with weight sharing): encoding network Adaptive encoding via the free parameters of NN NN (and weight) sharing : units are the same for the all the vertices of a tree and for all the trees in the data set!!! 46
43 47 RecNN: going to details for the domains Alessio Micheli X = IR m continuos state (code) space (encoded subgraph space) L= IR n vertex label space O = IR z or {0,1} z = NN : IR n m IR... IR m IR m g output function x 0 = 0 G k times E IR m Subgraph code g IR and g realized by NN with free parameters W TG RNN realize g E
44 Alessio Micheli 48 Realization of NN x NN : IR NN n m IR... ( l, x (1),..., IR k times x ( k) m IR m ) ( Wl Subgraph code k j1 Wˆ j x ( j) m x m θ) Free parameters Recursive neuron ( NN with m=1) Process a vertex # feedbacks = # children (max k)
45 49 Equation for a single unit Alessio Micheli Single unit X: generic encoded subgraph space x is for a single RNN-unit (1) ( k ) X NN ( L( v), X,..., X ) n k k w ˆ ˆ ili v w X L w X i1 j1 j1 j ( j) j ( j) () θ ( W θ) Where is a non-linear sigmoidal function, L(v) in the label of v, theta is the bias term, W in is the weight vector associated with the label space, X (j) in is the code for the j-th subgraphs (subtree) of v, and w j -hat in is the weight associated with the j-th subgraph space.
46 50 Fully-connected RNN Alessio Micheli Standard Neurons Recursive Neurons Inputs g function x Copy made according to the graph topology E Labels q 1-1 x q 2-1 x
47 52 In details: Encoding Network (I) Alessio Micheli Start here! Tau (and weight) sharing : units are the same for the all the vertices of a tree and for all the trees in the data set!!!
48 53 In details: Encoding Network (II) Alessio Micheli
49 54 Recap: State transition system Alessio Micheli x( v) ( l( v), x(ch[ v])) y( v) g( l( v), x(v)) x( ch[ v]) x(ch [ v]),, (ch [ v]) 1 x k x( v) ( l( v), q y( v) g( x(v)) q q 1 j 1 j x ( v) i x ( v) i x x 0 i (ch j 1 if x( v)) ch [ v])) [ v] nil j Generalized shift operator otherwise x ( v ) E ( G v) X state (code) space g y x 1 q 1 l q k Graphical model of /g l
50 Alessio Micheli Recap: Unfolding 4. A different view by graphical models of the Encoding Networks for Seq. and Structures 55 Note that the use of graphical models make uniform the cases of NN and generative approaches
51 56 Alessio Micheli Recent Applications: Repetita (da ML): RecNN for NLP recent exploitment Currently wide successful application in NLP (e.g. by the Stanford NLP group) Shown the effectives of RecursiveNN applied to tree representation of language (and images) data and tasks. Started in E.g. Sentiment Treebank Sentiment labels (movies reviews) for 215,154 phrases in the parse trees of 11,855 sentences Recursive NN pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%.
52 Repetita (da ML): Examples Alessio Micheli Polarity grade Human-gram annotation Other instances in the dataset 57
53 Alessio Micheli Learning System Supersource transductions: y = g(x(s)) IO-isomorphic transductions: output share the same skeleton of input graph output emitted for each vertex vg Parametric : T G depends on tunable parameters W. Adaptive : learning algorithm used to fit W to data. H = {h h(g) T G (G), G G } Tasks Supervised: target values are known (G,f(G)) (or f(v)) Unsupervised: TR ={G G G} 58
54 Alessio Micheli RNN Learning Algorithms Backpropagation through structure: Extension of BPTT Goller & Küchler (1996) Simple to understand using graphical formalism (backprop+weight sharing on the unfolded net) : The notation is adapted to the case of delta form the fathers RTRL Sperduti & Starita (1997), Equations: See Cap 19 in Kolen, Kremer, A Field Guide to Dynamical Recurrent Networks. IEEE press 2001 RCC family based: next slides 59
55 Alessio Micheli Learning in Structured Domain Plan in 2 lectures 1. Recurrent and Recursive Neural Networks Extensions of model for supervised and unsupervised learning in structured domains Motivation and examples The structured data RNN and RecNN Recursive Cascade Correlation and other recursive approaches Unsupervised recursive models 2. Moving to DPAG and graphs: the role of causality 60
56 Alessio Micheli RCC (I) Architecture of Cascade Correlation for Structure (Recursive Cascade Correlation - RCC) We realize RNN by RCC: constructive approach m is automatically computed by the training algorithm. 61
57 62 RCC (II) Alessio Micheli Architecture of a RCC with 3 hidden units ( m=3 ) and k=2. Recursive hidden units (shadowed) generate the code of the input graph (function E ). The hidden units are added to the network during the training. The box elements are used to store the output of the hidden units, i.e. the code x i (j) that represent the context according to the graph topology. The output unit realize the function g and produce the final prediction value. E.G. A.M. Bianucci, A. Micheli, A. Sperduti, A. Starita. Application of Cascade Correlation Networks for Structures to Chemistry, Applied Intelligence Journal. 12 (1/2): , 2000
58 Alessio Micheli RCC (III) Learning Gradient descent: interleaving LMS (output) and maximization of correlation between new hidden units and residual error. Main difference with CC: calculation of the following derivatives by recurrent equations: Note: simplification (of the sum on other units) due to the architecture compared to full RTRL! But compared to the recurrent case it appears the summation on children appears! 63 k t j hi t h t hh j i j hi k h j hi h k t j hi t h t hh i j hi k h j hi h w w f w w x w w l f w w x 1 ) ( ) ( ) ( ) 1 ( 1 ) ( ) ( ) 1 ( ˆ ˆ ˆ ),,, ( ˆ ˆ ),,, ( x x x x l x x x l
59 Alessio Micheli TreeESN ( ) Extend the applicability of the RC/ESN approach to tree structured data Extremely efficient way of modeling RecNNs Architectural and experimental performance baseline for trained RecNN models w 1 w 1 w 1 w 2 w 1 w 1 w 2 The same recursive process of RecNN made by efficient RC approaches C. Gallicchio, A. Micheli. Neurocomputing, More in a NEXT LECTURE 64
60 HTMM (2012) Alessio Micheli E.g Bottom-up Hidden Tree Markov Models extend HMM to trees exploiting the recursive approach as well Generative process from the leaves to the root Markov assumption (conditional dependence) Qch1(u),, Qch K (u) Qu Children to parent hidden state transition P(Qu Qch1(u),, Qch K (u)) Issue: how decompose this joint state transition? (see [9]). Bayesian network unfolding graphical model over the input trees y: observed elements Q: hidden states variables with discrete values More in an other LECTURE Bacciu, Micheli, Sperduti. IEEE TNNLS,
61 Alessio Micheli Learning in Structured Domain Plan in 2 lectures 1. Recurrent and Recursive Neural Networks Extensions of model for supervised and unsupervised learning in structured domains Motivation and examples The structured data RNN and RecNN Recursive Cascade Correlation and other recursive approaches Unsupervised recursive models Toward next lecture 2. Moving to DPAG and graphs: the role of causality 66
62 Unsupervised learning in SD A general framework extending SOM approach B. Hammer, A. Micheli, M. Strickert, A. Sperduti. A General Framework for Unsupervised Processing of Structured Data, Neurocomputing (Elsevier Science) Volume 57, Pages 3-35, March B. Hammer, A. Micheli, A. Sperduti, M. Strickert. Recursive Self-organizing Network Models. Neural Networks, Elsevier Science. Volume 17, Issues 8-9, Pages , October-November 2004.
63 Self-Organizing Map Dip. Informatica University of Pisa Unsupervised learning NN Clustering (find natural groupings in a set of data) data compression projection and exploration SOM: learn a map from input space to a lattice of neural-units topology-preserving map (neighboring units respond to similar input pattern) visualization properties of the SOM output feature map (2D) Neural map Alessio Micheli 68
64 SOM Dip. Informatica University of Pisa SOM Units in the map layer compete with each other to represent the current input pattern. The unit whose weight vector is closest to the input is the winner. Competitive stage: Winner units = units with minimal d(w,x) Cooperative stage: adjust w of winner and neighborhood units Alessio Micheli 69
65 Similarity Based Approaches Dip. Informatica University of Pisa Define ad hoc similarity measure on structured data (sequence, trees etc.) Apply known unsupervised technique Examples: SOM for non-vectorial data (e.g. Kohonen et al.): Prototypes: mean vs generalized median Edit distance, graph matching etc. The problem: there has been no systematic way to extract features or metrics relations between examples for SD (general for every task) Alessio Micheli 70
66 Basic Idea Dip. Informatica University of Pisa Transfer recursive idea to unsupervised learning No prior metric/pre-processing (but still bias!!!) Evolution of the similarity measure through recursive comparison of sub-structures Iteratively compared via bottom-up encoding process c a b Alessio Micheli 72
67 SOM for Structured Domains Dip. Informatica University of Pisa Motivations (SOM-SD): Unsupervised learning for SD New supervised model for SD basing on the SOM Visualization properties of the SOM output feature map (2D) (e.g for SAR analysis) SOM: learn a map from input space to a lattice of neural-units NN realized by a SOM discrete output display space (2D coordinates) = discrete code space! The SOM is applied recursively to each node of the graph Vertex is represented by Label + subgraph code value (coordinates of the winning units for the subgraph) Alessio Micheli 73
68 SOM-SD Bottom-Up Processing Dip. Informatica University of Pisa , , , ,1.3,-1,-1,-1, Alessio Micheli 74
69 SOM-SD Bottom-Up Processing Dip. Informatica University of Pisa , ,0.27,2,1,-1, ,1.3,-1,-1,-1, , ,1.3 Alessio Micheli 75
70 SOM-SD Bottom-Up Processing Dip. Informatica University of Pisa , ,1.3,-1,-1,-1, ,0.27,2,1,-1, ,0.51,0,2,2,1 3.14, ,1.3 Alessio Micheli 76
71 A General Framework for Unsupervised Processing of SD Dip. Informatica University of Pisa Motivations of the General Fw Exploiting recursive approach for unsupervised learning in SD specific aspects (internal representation) make the concept of similarity measures in its formulation explicit Combining different SOM-based existing approach set of unrelated models sharing common characteristics Basis for new implementation and new models and learning methods (e.g. extension to Neural Gas and VQ). Uniform evaluation of general properties (prior to learning) E.g. w.r.t. noise tolerance and in-principle capacity of representation. Alessio Micheli 77
72 A General Framework for Unsupervised Processing of SD Dip. Informatica University of Pisa Aims: Uniform formulation within the recursive dynamics of Models Learning methods (Hebbian / based on energy function) Theoretical properties: investigate fundamental mathematical properties Key point: formalize how structures are internally represented Alessio Micheli 78
73 GF for Various Models Dip. Informatica University of Pisa Models: RSOM: Recursive SOM - originally for Sequences (extended) TKM: Temporal Kohonen map - originally for Sequences SOM-SD: Sequences and Trees all models share the basic recursive dynamics models differ in the internal representation (R) of structures Explicitly define a mapping rep: map activity profiles IR N R Neural map Formal repr. of structure Alessio Micheli 79
74 Internal Representation Dip. Informatica University of Pisa SOM-SD RSOM TKM i,j (coordinate vector of winner) All unit activations (exp) Single unit activation (selecting for each unit the previous activation) Alessio Micheli 80
75 GF for Unsupervised Processing of SD (III) Dip. Informatica University of Pisa Objective: a(t 1,t 2 ) a Method: t 1 t 2 Compare unit W L W 1 W 2 with : rep() rep() a t 1 t 2 Alessio Micheli 81
76 Def. General Framework (GSOMSD) Dip. Informatica University of Pisa Each approaches obtained specifying similarity measure on label (d L ) similarity measure on formal representation of tress (d R ), the set of units, W :N (W L, W 1, W 2 ), N the function rep: IR R Activation of unit n for a(t 1,t 2 ) (or distance from the input): ~ d ( a( t1, t2), n) dl( a, W0 ( n)) dr( R1, W1 ( n)) dr( R2, W2 ( n)) R j ~ rep( d ( t j null ~, n ),..., d ( t, n 1 j N )) nil otherwise Activity on the map for subtree t j Alessio Micheli 82
77 GF for Unsupervised Processing of SD (IV) Dip. Informatica University of Pisa SOM-SD : rep( x 1,..., x N ) outputs the index of the winner RSOM: rep x,..., x ) (exp( x ),...,exp( x )) ( 1 N 1 N TKM : rep identity, W 1 per n i = i-th unit vector Graph Encoding: E ( t) ~ rep( d ( t, n ),. ~ rep( d ( t, N )). ~., d ( t, n 1 N )) Alessio Micheli 83
78 Hebbian Learning Dip. Informatica University of Pisa A synaptic weight is increased with simultaneous occurrence of presynaptic and postsynaptic activities Iteratively making weights of the neuron that fires in response to input x more similar to x. + Simple - Investigation of learning behavior is often difficult Objectives of learning can be expressed in terms of a cost function For flat domain (input vector) most unsupervised Hebb learning can be derived as gradient descent of a an appropriate cost function Kohonen used a simple geometric computation for the Hebb-like rule W(t+1) = W(t) + n(t) h(t) [x - w] Alessio Micheli 84
79 General Framework for Unsupervised Learning Dip. Informatica University of Pisa Learning (in all the models): Hebbian Learning: the weights W(winner) are iteratively moved toward the current inputs (a, R 1, R 2 ). Soft-max adaptive rule by Neighborhood function on SOM lattice Learning based on an energy function E (Objectives of learning can be expressed in terms of a cost function) Derive rule as stochastic gradient descent General form where changing f we obtain different learning algorithms e.g. VQ for SD by E Extension to Neural Gas. ~ E f ( d ( t, N)) t Alessio Micheli t ~ d ( t, n j ( t) ) 2
80 Learning: main general result Dip. Informatica University of Pisa For flat domain (input vector) most unsupervised Hebbian training rule can be derived as gradient descent of an appropriate cost function Computing gradient for VQ, SOM, NG in the SD case we found that Hebbian learning is only an (efficient) approximation to the precise gradient descent of the cost function ~ E f ( d ( t, N)) t Alessio Micheli 86
81 Theoretical Investigations (examples) Dip. Informatica University of Pisa Properties of rep for SD: efficiency compactness noise tolerance (rep(x) on d R ) (nice for SOM-SD) representational capability (cardinality of R) preservation of similarity (topological map) locality (TKM) / globality (SOM-SD, RSOM) of the computation: locality limits the recognition of structures Key issue: choice of the internal representation rep Uniform evaluation of methods prior to learning Alessio Micheli 88
82 Alessio Micheli Learning in Structured Domain Plan in 2 lectures 1. Recurrent and Recursive Neural Networks Extensions of model for supervised and unsupervised learning in structured domainsi Motivation and examples The structured data RNN and RecNN Recursive Cascade Correlation and other recursive approaches Unsupervised recursive models Toward next lecture 2. Moving to DPAG and graphs: the role of causality 89
83 Alessio Micheli Analysis (I) Computational properties: approximations capability Suitable solutions for A structured form of computation: compositionality tackles variability Parsimony: weight sharing reduces number of parameters Adaptive transduction: BPTS, RTRL,... allowing adaptive representation of SD handling of variability distributed code of structure (X) state (code) space of constant size (ind. from structure size/deep) 90
84 Analysis (II): Assumptions and Open Problems Alessio Micheli Inherited from time sequence processing: Stationarity: efficacy solution to parsimony without reducing expressive power relevant for SD Causality: affects the computational power! Learning task: supervised and unsupervised Is it possible a uniform approach allowing investigation of fundamental mathematical properties? Other approaches Assessments (applications) Proof-of-the-principle New methods and results 91
85 Alessio Micheli Graphs by NN? For Graphs by NN: see next lecture! Following a journey through the causality assumption! d b c a a How to deal with cycles and causality? 92
86 MODELS panorama for SD (examples) Alessio Micheli l 1 l 1 l 2 l 3 l 4 l 5 Standard ML models for flat data d d Recurrent NN/ESN HMM Kernel for strings d b c b c b c A. Micheli a a Tree: Recursive NN Tree ESN HTMM Tree Kernels a a DPAG: CRCC See references for models in the bibliography slides (later) a a GNN/GraphESN NN4G Graph Kernels SRL 93
87 Alessio Micheli Bibliography: aims Different parts in the following: Basic/Fundamentals * Possible topic for seminars May be useful also for future studies Many topics can be subject of study and development Many many works in literature (arrive continuously)! Many possible topics for demand and possible thesis More bibliography on demand: micheli@di.unipi.it 94
88 95 Bibliografia (Basic, origins of RecNN) Alessio Micheli RNN A. Sperduti, A. Starita. Supervised Neural Networks for the Classification of Structures,IEEE Transactions on Neural Networks. Vol. 8, n. 3, pp , P. Frasconi, M. Gori, and A. Sperduti, A General Framework for Adaptive Processing of Data Structures, IEEE Transactions on Neural Networks. Vol. 9, No. 5, pp , A.M. Bianucci, A. Micheli, A. Sperduti, A. Starita. Application of Cascade Correlation Networks for Structures to Chemistry, Applied Intelligence Journal (Kluwer Academic Publishers), Special Issue on "Neural Networks and Structured Knowledge" Vol. 12 (1/2): , A. Micheli, A. Sperduti, A. Starita, A.M. Bianucci. A Novel Approach to QSPR/QSAR Based on Neural Networks for Structures, Chapter in Book : "Soft Computing Approaches in Chemistry", pp , H. Cartwright, L. M. Sztandera, Eds., Springer-Verlag, Heidelberg, March 2003.
89 96 Bibliography: NN approaches-2 Alessio Micheli * UNSUPERVISED RecursiveNN B. Hammer, A. Micheli, M. Strickert, A. Sperduti. A General Framework for Unsupervised Processing of Structured Data, Neurocomputing (Elsevier Science) Volume 57, Pages 3-35, March B. Hammer, A. Micheli, A. Sperduti, M. Strickert. Recursive Self-organizing Network Models. Neural Networks, Elsevier Science. Volume 17, Issues 8-9, Pages , October-November * TreeESN: efficient RecNN C. Gallicchio, A. Micheli. Tree Echo State Networks, Neurocomputing, volume 101, pag , * HTMM: further developments (generative) D. Bacciu, A. Micheli and A. Sperduti. Compositional Generative Mapping for Tree-Structured Data - Part I: Bottom-Up Probabilistic Modeling of Trees, IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 12, pp , 2012
90 97 Bibliography: RecNN applications (example) Alessio Micheli * NLP applications (that you can extend with recent instances, and relate them to the general RecNN framework present in this lecture and the basic RecNN bibliography references ) R. Socher, C.C. Lin, C. Manning, A.Y. Ng, Parsing natural scenes and natural language with recursive neural networks, Proceedings of the 28th international conference on machine learning (ICML-11) R. Socher, A. Perelygin, J.Y. Wu, J. Chuang, C.D. Manning, A.Y. Ng, C.P. Potts, Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages , Seattle, Washington, USA, October 2013
91 98 Bibliography: Next lecture Alessio Micheli * RecNN for DPAGs : how to extend the domain I A. Micheli, D. Sona, A. Sperduti. Contextual Processing of Structured Data by Recursive Cascade Correlation. IEEE Transactions on Neural Networks. Vol. 15, n. 6, Pages , November Hammer, A. Micheli, and A. Sperduti. Universal Approximation Capability of Cascade Correlation for Structures. Neural Computation. Vol. 17, No. 5, Pages , (C) 2005 MIT press. * NN for GRAPH DATA: how to extend the domain II * A. Micheli. Neural network for graphs: a contextual constructive approach, IEEE Transactions on Neural Networks, volume 20 (3), pag , doi: /TNN , C. Gallicchio, A. Micheli. Graph Echo State Networks, Proceedings of the International Joint Conference on Neural Networks (IJCNN), pages 1 8, F. Scarselli, M. Gori, A.C.Tsoi, M. Hagenbuchner, G. Monfardini. The graph neural network model, IEEE Transactions on Neural Networks, 20(1), pag , 2009.
92 Alessio Micheli DRAFT, please do not circulate! For information Alessio Micheli Dipartimento di Informatica Università di Pisa - Italy Computational Intelligence & Machine Learning Group
Intro to Learning in SD -2
Intro to Learning in SD -2 Alessio Micheli E-mail: micheli@di.unipi.it 2- Neural Networks for Graphs Apr 208 DRAFT, please do not circulate! Dipartimento di Informatica Università di Pisa www.di.unipi.it/groups/ciml
More informationNC ms Universal Approximation Capability of. Cascade Correlation for Structures
NC ms 2933 Universal Approximation Capability of Cascade Correlation for Structures Barbara Hammer, Alessio Micheli, and Alessandro Sperduti July 21, 200 Abstract Cascade correlation (CC) constitutes a
More information768 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 9, NO. 5, SEPTEMBER A General Framework for Adaptive Processing of Data Structures
768 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 9, NO. 5, SEPTEMBER 1998 A General Framework for Adaptive Processing of Data Structures Paolo Frasconi, Member, IEEE, Marco Gori, Senior Member, IEEE, and
More informationSelf-Organizing Maps for cyclic and unbounded graphs
Self-Organizing Maps for cyclic and unbounded graphs M. Hagenbuchner 1, A. Sperduti 2, A.C. Tsoi 3 1- University of Wollongong, Wollongong, Australia. 2- University of Padova, Padova, Italy. 3- Hong Kong
More informationRecursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text
Philosophische Fakultät Seminar für Sprachwissenschaft Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text 06 July 2017, Patricia Fischer & Neele Witte Overview Sentiment
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More information1464 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 6, NOVEMBER Processing Directed Acyclic Graphs With Recursive Neural Networks
1464 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 6, NOVEMBER 2001 Brief Papers Processing Directed Acyclic Graphs With Recursive Neural Networks Monica Bianchini, Marco Gori, and Franco Scarselli
More informationUniversita degli Studi di Firenze
Universita degli studi di Firenze Dipartimento di Sistemi e Informatica Via di Santa Marta 3, 5039 Firenze (Italy) http://www.dsi.unifi.it D S I A general framework for adaptive processing of data structures
More informationUnsupervised Learning
Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationFMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationNeural networks for relational learning: an experimental comparison
Mach Learn (2011) 82: 315 349 DOI 10.1007/s10994-010-5196-5 Neural networks for relational learning: an experimental comparison Werner Uwents Gabriele Monfardini Hendrik Blockeel Marco Gori Franco Scarselli
More informationBioinformatics - Lecture 07
Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles
More informationProjection of undirected and non-positional graphs using self organizing maps
University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2009 Projection of undirected and non-positional
More information27: Hybrid Graphical Models and Neural Networks
10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training
More informationGrounded Compositional Semantics for Finding and Describing Images with Sentences
Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational
More information6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION
6 NEURAL NETWORK BASED PATH PLANNING ALGORITHM 61 INTRODUCTION In previous chapters path planning algorithms such as trigonometry based path planning algorithm and direction based path planning algorithm
More informationCOMP 551 Applied Machine Learning Lecture 16: Deep Learning
COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all
More informationUnderstanding the Principles of Recursive Neural Networks: A Generative Approach to Tackle Model Complexity
Understanding the Principles of Recursive Neural Networks: A Generative Approach to Tackle Model Complexity Alejandro Chinea Departamento de Física Fundamental, Facultad de Ciencias UNED, Paseo Senda del
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationNeural Network Weight Selection Using Genetic Algorithms
Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks
More informationExact Solutions for Recursive Principal Components Analysis of Sequences and Trees
Exact Solutions for Recursive Principal Components Analysis of Sequences and Trees Alessandro Sperduti Department of Pure and Applied Mathematics, University of Padova, Italy sperduti@math.unipd.it Abstract.
More informationUnbiased SVM Density Estimation with Application to Graphical Pattern Recognition
Unbiased SVM Density Estimation with Application to Graphical Pattern Recognition Edmondo Trentin and Ernesto Di Iorio DII - Università di Siena, V. Roma 56, Siena (Italy) {trentin,diiorio}@dii.unisi.it
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More informationWorkshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient
Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient quality) 3. I suggest writing it on one presentation. 4. Include
More informationContents. Preface to the Second Edition
Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................
More informationRecurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra
Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification
More informationECG782: Multidimensional Digital Signal Processing
ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017
3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural
More informationRecurrent Neural Network (RNN) Industrial AI Lab.
Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes
More information1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2015 1 : Introduction to GM and Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Wenbo Liu, Venkata Krishna Pillutla 1 Overview This lecture
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More informationRNNs as Directed Graphical Models
RNNs as Directed Graphical Models Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence Modeling Overview
More informationLearning numbers from Graphs
Learning numbers from Graphs Aurélie Goulon 1, Arthur Duprat 1,2, Gérard Dreyfus 1 Ecole Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI-ParisTech) 1 Laboratoire d Électronique,
More informationUnsupervised Recursive Sequence Processing
Unsupervised Recursive Sequence Processing Marc Strickert, Barbara Hammer Dept. of Math./Comp. Science, University of Osnabrück, Germany e-mail: {marc,hammer}@informatik.uni-osnabrueck.de Abstract. We
More informationChapter 7: Competitive learning, clustering, and self-organizing maps
Chapter 7: Competitive learning, clustering, and self-organizing maps António R. C. Paiva EEL 6814 Spring 2008 Outline Competitive learning Clustering Self-Organizing Maps What is competition in neural
More informationTransition-based Parsing with Neural Nets
CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between
More informationIT has recently been demonstrated that neural networks
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 10, NO 6, NOVEMBER 1999 1305 On the Implementation of Frontier-to-Root Tree Automata in Recursive Neural Networks Marco Gori, Senior Member, IEEE, Andreas Küchler,
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationArtificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)
Artificial Neural Networks Analogy to biological neural systems, the most robust learning systems we know. Attempt to: Understand natural biological systems through computational modeling. Model intelligent
More informationComputer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models
Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationDeep Learning. Architecture Design for. Sargur N. Srihari
Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationInductive Modelling of Temporal Sequences by Means of Self-organization
Inductive Modelling of Temporal Sequences by Means of Self-organization Jan Koutnik 1 1 Computational Intelligence Group, Dept. of Computer Science and Engineering, Faculty of Electrical Engineering, Czech
More informationSequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015
Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using
More informationReview: Final Exam CPSC Artificial Intelligence Michael M. Richter
Review: Final Exam Model for a Learning Step Learner initially Environm ent Teacher Compare s pe c ia l Information Control Correct Learning criteria Feedback changed Learner after Learning Learning by
More informationDependency detection with Bayesian Networks
Dependency detection with Bayesian Networks M V Vikhreva Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Supervisor: A G Dyakonov
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationDifferentiable Data Structures (and POMDPs)
Differentiable Data Structures (and POMDPs) Yarin Gal & Rowan McAllister February 11, 2016 Many thanks to Edward Grefenstette for graphics material; other sources include Wikimedia licensed under CC BY-SA
More informationEmpirical Evaluation of RNN Architectures on Sentence Classification Task
Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks
More informationDEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla
DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationImage Classification Using Wavelet Coefficients in Low-pass Bands
Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August -7, 007 Image Classification Using Wavelet Coefficients in Low-pass Bands Weibao Zou, Member, IEEE, and Yan
More informationFunction approximation using RBF network. 10 basis functions and 25 data points.
1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data
More informationLoopy Belief Propagation
Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure
More informationName of the lecturer Doç. Dr. Selma Ayşe ÖZEL
Y.L. CENG-541 Information Retrieval Systems MASTER Doç. Dr. Selma Ayşe ÖZEL Information retrieval strategies: vector space model, probabilistic retrieval, language models, inference networks, extended
More informationLearning Web Page Scores by Error Back-Propagation
Learning Web Page Scores by Error Back-Propagation Michelangelo Diligenti, Marco Gori, Marco Maggini Dipartimento di Ingegneria dell Informazione Università di Siena Via Roma 56, I-53100 Siena (Italy)
More informationSlide07 Haykin Chapter 9: Self-Organizing Maps
Slide07 Haykin Chapter 9: Self-Organizing Maps CPSC 636-600 Instructor: Yoonsuck Choe Spring 2012 Introduction Self-organizing maps (SOM) is based on competitive learning, where output neurons compete
More informationCMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationPattern Recognition Using Graph Theory
ISSN: 2278 0211 (Online) Pattern Recognition Using Graph Theory Aditya Doshi Department of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India Manmohan Jangid Department of
More informationMachine Learning : Clustering, Self-Organizing Maps
Machine Learning Clustering, Self-Organizing Maps 12/12/2013 Machine Learning : Clustering, Self-Organizing Maps Clustering The task: partition a set of objects into meaningful subsets (clusters). The
More informationObject Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision
Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition
More informationEfficient SQL-Querying Method for Data Mining in Large Data Bases
Efficient SQL-Querying Method for Data Mining in Large Data Bases Nguyen Hung Son Institute of Mathematics Warsaw University Banacha 2, 02095, Warsaw, Poland Abstract Data mining can be understood as a
More informationD-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either
More informationSubgraph Matching Using Graph Neural Network
Journal of Intelligent Learning Systems and Applications, 0,, -8 http://dxdoiorg/0/jilsa008 Published Online November 0 (http://wwwscirporg/journal/jilsa) GnanaJothi Raja Baskararaja, MeenaRani Sundaramoorthy
More informationSparsity issues in self-organizing-maps for structures
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2011 Sparsity issues in self-organizing-maps for structures Markus Hagenbuchner
More informationPattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition
Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant
More informationThis Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 1 This Talk 1) Node embeddings Map nodes to low-dimensional embeddings. 2) Graph neural networks Deep learning architectures
More informationContext Encoding LSTM CS224N Course Project
Context Encoding LSTM CS224N Course Project Abhinav Rastogi arastogi@stanford.edu Supervised by - Samuel R. Bowman December 7, 2015 Abstract This project uses ideas from greedy transition based parsing
More information18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos
Machine Learning for Computer Vision 1 18 October, 2013 MVA ENS Cachan Lecture 6: Introduction to graphical models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris
More informationMethods for Intelligent Systems
Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering
More informationLECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS
LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class
More informationCS229 Final Project: Predicting Expected Response Times
CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time
More informationThe Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem
Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran
More informationESANN'1999 proceedings - European Symposium on Artificial Neural Networks
Computation of Tree-Recursive Information for Structures Gradient of Neural Information Processing, University of Ulm, 89069 Ulm, Dept. Email: kuechler@neuro.informatik.uni-ulm.de Germany, Recently, so-called
More informationClustering with Reinforcement Learning
Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of
More informationClassification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska
Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute
More information1. Why Study Trees? Trees and Graphs. 2. Binary Trees. CITS2200 Data Structures and Algorithms. Wood... Topic 10. Trees are ubiquitous. Examples...
. Why Study Trees? CITS00 Data Structures and Algorithms Topic 0 Trees and Graphs Trees and Graphs Binary trees definitions: size, height, levels, skinny, complete Trees, forests and orchards Wood... Examples...
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More informationLecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)
Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL) http://chrischoy.org Stanford CS231A 1 Understanding a Scene Objects Chairs, Cups, Tables,
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationConditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國
Conditional Random Fields - A probabilistic graphical model Yen-Chin Lee 指導老師 : 鮑興國 Outline Labeling sequence data problem Introduction conditional random field (CRF) Different views on building a conditional
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width
More informationWeek 3: Perceptron and Multi-layer Perceptron
Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationMachine Learning. Sourangshu Bhattacharya
Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares
More informationFigure 4.1: The evolution of a rooted tree.
106 CHAPTER 4. INDUCTION, RECURSION AND RECURRENCES 4.6 Rooted Trees 4.6.1 The idea of a rooted tree We talked about how a tree diagram helps us visualize merge sort or other divide and conquer algorithms.
More informationDOMINATION IN SOME CLASSES OF DITREES
BULLETIN OF THE INTERNATIONAL MATHEMATICAL VIRTUAL INSTITUTE ISSN (p) 20-4874, ISSN (o) 20-4955 www.imvibl.org /JOURNALS / BULLETIN Vol. 6(2016), 157-167 Former BULLETIN OF THE SOCIETY OF MATHEMATICIANS
More informationRepresentations of Weighted Graphs (as Matrices) Algorithms and Data Structures: Minimum Spanning Trees. Weighted Graphs
Representations of Weighted Graphs (as Matrices) A B Algorithms and Data Structures: Minimum Spanning Trees 9.0 F 1.0 6.0 5.0 6.0 G 5.0 I H 3.0 1.0 C 5.0 E 1.0 D 28th Oct, 1st & 4th Nov, 2011 ADS: lects
More informationLecture 11: Clustering Introduction and Projects Machine Learning
Lecture 11: Clustering Introduction and Projects Machine Learning Andrew Rosenberg March 12, 2010 1/1 Last Time Junction Tree Algorithm Efficient Marginals in Graphical Models 2/1 Today Clustering Project
More informationLecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying
given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an
More information