Intro to Learning in SD -1

Size: px
Start display at page:

Download "Intro to Learning in SD -1"

Transcription

1 Alessio Micheli Intro to Learning in SD -1 Alessio Micheli 1- Introduction to ReCNN Apr 2018 DRAFT, please do not circulate! Dipartimento di Informatica Università di Pisa - Italy Computational Intelligence & Machine Learning Group 1

2 Alessio Micheli Learning in Structured Domain Plan in 2 lectures 1. Recurrent and Recursive Neural Networks Extensions of models for supervised and unsupervised learning in structured domains Motivation and examples The structured data RNN and RecNN Recursive Cascade Correlation and other recursive approaches Unsupervised recursive models 2. Moving to DPAG and graphs: the role of causality 2

3 3 Why structured data? Alessio Micheli Because data have relationships

4 4 Introduction: Motivation of ML for SD Alessio Micheli Most of known ML methods are limited to the use of flat and fixed form of the data (vectors or sequences) fixed-length attribute-value vectors Central: data representation Graph: very useful abstraction for real data Labeled graphs = vector patterns + relationships natural: for structured domain richness efficiency: repetitive nature inherent the data SD + ML = adaptive processing of structured information

5 Alessio Micheli 5 Introduction: Research Area General Aim: investigation of ML models for the adaptive processing of structured information (e.g. sequences, trees, graphs): Structured domain learning, Learning in Structured Domain, Relational Learning. Fractal tree: a recursive structure

6 Alessio Micheli Advancements from ML to SDL Learning in Structured Domains (SD) in Pisa/CIML: Pioneering since the 90 s the development of Theoretical analysis New approaches Applications Especially on the basis of Recursive approaches. And for you? To build and advanced background for Analysis/development of innovative models Applications in the are of interdisciplinary projects (@ ciml) Practical: thesis are possible for the design of new models/applications for the extension of the input domains toward: Extension of the applicative domain Adaptivity and accuracy Efficiency 6

7 Feedforward versus Recurrent (memento) Alessio Micheli Feedforward: direction: input output Recurrent neural networks: A different category of architecture, based on the addition of feedback loops connections in the network topology, The presence of self-loop connections provides the network with dynamical properties, letting a memory of the past computations in the model. This allows us to extend the representation capability of the model to the processing of sequences (and structured data). Recurrent neural networks: They will be the subject (further developed) ISPR/CNS courses (see later). 7

8 Alessio Micheli From flat to structured data Flat: vectors (as in the rest of AA1) Structured: Sequences, trees, graphs, multi-relational data stringa_in_italiano Strings Proteins l 1 l 2 l 3 l 4 l 5 Series/ temporal stream Small molecules Network data 8

9 9 Alessio Micheli Introduction: Motivation of ML for SD - Examples Examples: Computer Science: AI: problem solving techniques, theorem proving: controlling search on tree by heuristics natural language processing: evaluation of parser trees image analysis: structured representation of components document processing: fuzzy evaluation of web document Natural Sciences: Computational biology and chemistry: quite complex domains where the managing of relationships and structure is relevant to achieving suitable modeling of the solutions.

10 10 Example: logo recognition Alessio Micheli

11 Alessio Micheli 11 Example: Terms in 1 st order logic

12 Alessio Micheli 12 Example: language parsing S VP Anchor NP NP PP NP NP NP Foot PRP VBZ DT NN IN PRPS NN NN It has no bearing on our work force

13 16 Example (graphs): Molecules Alessio Micheli A fundamental problem in Chemistry: correlate chemical structure of molecules with their properties (e.g. physico-chemical properties, or biological activity of molecules) in order to be able to predict these properties for new molecules Quantitative Structure-Property Relationship (QSPR) Quantitative Structure-Activity Relationship (QSAR) Property/Activity = T(Structure) T Property Value (regression) Toxic (yes/no) (classification) Molecules are not vectors! Molecules can be more naturally represented by varying size structures Can we predict directly from structures? QSPR: Correlate chemical structure of molecules with their properties

14 17 Introduction: Adaptive processing of SD Alessio Micheli The problem: there has been no systematic way to extract features or metrics relations between examples for SD A representation learning instances (extended to SD)! Goal: to learn a mapping between a structured information domain (SD) and a discrete or continuous space (transduction). Property/Activity = T(Structure) T Property Value (regression) Toxic (yes/no) (classification) Note: we have a set of trees/graphs in input Task: For instance, classify each graph [starting from a tr set of know couples (graph i, target i )]

15 Alessio Micheli Introduction: Learning Model for SD What we mean for adaptive processing of SD: extraction of the topological information directly from data H has to be able to represent hierarchic relationships adaptive measure of similarity on structures + apt learning rule efficient handling of structure variability Classical: efficient learning good generalization performance knowledge extraction capabilities 18

16 19 SD Learning scenario Alessio Micheli Model Symbolic Connectionist Probabilistic Data Type STATIC attribute/value, real vectors Rule induction, Decision trees NN, SVM Mixture models, Naïve Bayes SEQUENTIAL serially ordered entities Learning finite state automata Recurrent NN Hidden Markov Models STRUCTURAL relations among domain variables Inductive logic programming Recursive NN (Kernels forsd) Recursive Markov models

17 20 Preview: The RecNN idea Alessio Micheli Recursive NN: Recursive and parametric realization of the transduction function Adaptive by Neural Networks Fractal tree: a recursive structure

18 Alessio Micheli Structures: just use sequence? Can we process structures like they are sequences? E.g. any tree can be converted into a sequence (no information loss) but: Sequences may be long: number of vertex exp w.r.t. height of tree (aka the paths are log of #nodes for the tree, so the dependencies are much shorter) Dependencies are blurred out (arbitrary depending on the visit) Tree i g h i(g(d(a,b),(e(c))),h(f)) d e f???????? Children are far, distant relatives are close a b c 21

19 Alessio Micheli Learning in Structured Domain Plan in 2 lectures 1. Recurrent and Recursive Neural Networks Extensions of model for supervised and unsupervised learning in structured domains Motivation and examples The structured data RNN and RecNN Recursive Cascade Correlation and other recursive approaches Unsupervised recursive models 2. Moving to DPAG and graphs: the role of causality 22

20 Structured Domains l(v) L: set of attribute vectors Structure G: vertexes labels L + topology (skeleton of G) Sequences, Trees, DOAGs- DPAGs, graphs: Alessio Micheli G : labeled direct ordered/positional acyclic graphs with super-source A total order (or the position) on the edges leaving from each vertex Super-source: a vertex s such that every vertex can be reached by a direct path starting from s. Bounded out-degree and in-degree (the number of edges leaving and entering from a vertex v) DPAG: supeclass of the DOAGs: besides ordering, a distinctive positive integer can be associated to each edge, allowing some position to be absent. Trees: labeled rooted order trees, or positional (K-ary trees). Super-source: the root of the tree. Binary tree (K=2) Mostly used in this lecture 23

21 24 K-ary Trees Alessio Micheli k-ary trees (trees in the following) are rooted positional trees with finite out-degree k. Given a node v in the tree TG: The children of v are the node successors of v, each with a position j=1,...k; k is the maximum out-degree over G, i.e. the maximum number of children for each node; L(v) in is the input label associated with v, and L i (v) is the i-th element of the label; The subtree T (j) is a tree rooted at the j-th children of v. T root L T (1) ( k) T

22 Alessio Micheli Data Domains G We consider sets of DPAGs: labeled direct positional acyclic graphs with super-source, bounded in-degree and out-degree (k). Include sub-classes: DPAGs DOAGs k-ary trees sequences vectors. Notations: ch[v] set of successors of v ch j [v] is the j-th child of the node v 25

23 Alessio Micheli Structured Data: Examples Labeled Sequences, Trees, DOAGs- DPAGs, graphs l 1 l 1 l 2 l 3 l 4 l 5 Single labeled vertex Sequence d Supersource d d b c b c b c a a Rooted Tree a a DPAG a a DPAG: labeled direct positional acyclic graphs with super-source, bounded indegree and out-degree (k). Graph (undirected) 26

24 27 Trees and DPGAs Alessio Micheli f? f c c b a b a b b: shared node DPAG Tree Exercise after the lecture SD-2

25 Alessio Micheli Positional versus Ordered DPAG: for each vertex v in vert(g), an injective function S v : edg(v) [1,2,...,K] is defined on the edges leaving from v in_set(v ) = {u v V and u v } Predecessors out_set(v ) = {u v V and v u } Successors 28

26 Alessio Micheli Learning in Structured Domain Plan in 2 lectures 1. Recurrent and Recursive Neural Networks Extensions of model for supervised and unsupervised learning in structured domainsi Motivation and examples The structured data RNN and RecNN Recursive Cascade Correlation and other recursive approaches Unsupervised recursive models 2. Moving to DPAG and graphs: the role of causality 29

27 Alessio Micheli Neural Computing Approach NN are universal approximators (Theorem of Cybenko) NN can learn from example (automatic inference) NN can deal with noise and incomplete data NN can handle continuos real and discrete data Simple gradient descent technique for training Successful model in ML due to the flexibility in applications Domain Static fixed-dim patterns (vectors, records,...) Dynamical patterns (temporal sequences, sequences...) Structured patterns (DPAGs, trees...) Neural Network Feedforward Recurrent Recursive 30

28 31 Recurrent Neural Networks (resume) Alessio Micheli Up to now: internal state x( t) y( t) ( x( t 1), l( t)) g( x( t), l( t)) y x 1 q l l 1 l 2 l 3 l 4 l 5 l

29 Alessio Micheli RNN training and properties resume BPTT/RTRL see CNS course Unfolding (see ML lecture RNN): Back-Prop Through Time: Backprop on this enrolled version Causality: A system is causal if the output at time t 0 (or vertex v) only depends on inputs at time t<t 0 (depends only on v and its descendants) necessary and sufficient for internal state Stationarity: time invariance, state transition function is independent on node v (the same in any time) 32

30 Recursive Neural Networks (overview) Alessio Micheli Now: x( v) y( v) ( x(ch[ v]), l( v)) g( x(v), l( v)) x( ch[ v]) x(ch [ v]),, (ch [ v]) 1 x k State transition sytem f y d e x a b c l 1 q 1 l q k # feedbacks = # children 33

31 Alessio Micheli Generalized Shift Operators Standard shift operator (time): q -1 S t = S t-1 Generalized shift operators (structure): q j -1 G v = G chj [v] where ch j [v] is the j-th child of v RecNN with q: x( v) ( l( v), q y( v) g( x(v)) q q 1 j 1 j xi ( v) x0 x ( v) x (ch i i 1 if ch j x( v)) [ v])) [ v] nil j otherwise j v ch j [v] Used to extend the states for children of vertexes of the tree 34

32 35 Overview of Domains Alessio Micheli Synthesis formula (instantiation of domains for RNN): transduction Structure R G G : e.g. a labeled direct positional acyclic graphs with supersource or a labeled k-ary trees g output function IR m descriptor (or code) space IR n label space f I E IR T m g IR

33 36 As a transductions by recursion Alessio Micheli Key idea: family of recursive functions for processing of G T G : G O is a functional graph transduction. T G = g E (encoding and output functions) Link between state transition system and recursive transduction : x(v)= E (G v ) state x(v) in X associated to each v is the encoding of the subgraph rooted in v

34 37 Recursive Processing Alessio Micheli Recursive definition of E (encoding function) s G... G G ( 1) ( k ) E ( 0 if G is empty G) ( L, ( G ( 1) ),..., ( G ( k ) )) NN root E E E : systematic visit of G it guides the application of NN to each node of tree (bottom-up). Causality and stationary assumption.

35 38 Properties (I) Alessio Micheli Transduction: not-algebraic: T G depends not only on information associated with the current node, but also context information encoded for subgraphs Extension of causality and stationarity defined for sequence processing: A structural transduction is Causal if its output for a vertex v only depends on v and its descendants (induced subgraphs) Stationary: state transition function ( NN ) is independent on vertex v Recurrent/recursive NN transductions admit a recursive state representation with such properties Adaptivity (NN. learn. Alg.) + Universal approximation over the tree domain [Hammer ]

36 Properties (II) Alessio Micheli T G is IO-isomorph if X and T G (G) have the same skeleton (graph after removing labels) Input DOAG a b 1 T G b 0 0 a b 1 1 b b 0 0 Output DOAG Supersource transductions b Scalar Value a b Input DOAG a b b b Also knows as Structure-to-Element Supersource transductions 39

37 40 Properties (III) Alessio Micheli IO-isomorph causal transduction T G

38 Alessio Micheli Unfolding and Enc. Network We will see through RecNN data flow process with two points of view: 1. Unfolding by Walking on structures model (stationarity), according to causal assumption (inverse topological order*). The model visits the structures 2. Building an Encoding network isomorphic to the input structure (same skeleton, inverse arrows, again with stationarity) and causal assumption): We build a different encoding network for each input structure * see next slide 41

39 Alessio Micheli Topological Order A linear ordering of its nodes s.t. each node comes before all nodes to which it has edges. Every DAG has at least one topological sort, and may have many. A numbering of the vertices of a directed acyclic graph such that every edge from a vertex numbered i to a vertex numbered j satisfies i<j. According to a Partial order For RNN: Inverse topological order 42

40 44 Unfolding & Encoding Process: Unfolding view (1) Alessio Micheli f Unfolding the encoding process trough structures d e Bottom-up process for visiting a b c We will see later how to made it with NN (and hence by NN) for each step: to build an encoding network

41 45 RecNN over different structures: unfolding 2 Alessio Micheli OH Output for the tree O C CH 2 CH 3 CH 3 CH 3 CH 3 Start Start Start Examples on different trees for chemical compounds: Unfolding through structures: The same process apply to all the vertices of a tree and for all the trees in the data set Start CH 2 CH 3 Start

42 Alessio Micheli Unfolding (3) & Enc. Net. For Recurrent and Recursive NN (or Recursive unfold. for two different structures) NN NN for each step (with weight sharing): encoding network Adaptive encoding via the free parameters of NN NN (and weight) sharing : units are the same for the all the vertices of a tree and for all the trees in the data set!!! 46

43 47 RecNN: going to details for the domains Alessio Micheli X = IR m continuos state (code) space (encoded subgraph space) L= IR n vertex label space O = IR z or {0,1} z = NN : IR n m IR... IR m IR m g output function x 0 = 0 G k times E IR m Subgraph code g IR and g realized by NN with free parameters W TG RNN realize g E

44 Alessio Micheli 48 Realization of NN x NN : IR NN n m IR... ( l, x (1),..., IR k times x ( k) m IR m ) ( Wl Subgraph code k j1 Wˆ j x ( j) m x m θ) Free parameters Recursive neuron ( NN with m=1) Process a vertex # feedbacks = # children (max k)

45 49 Equation for a single unit Alessio Micheli Single unit X: generic encoded subgraph space x is for a single RNN-unit (1) ( k ) X NN ( L( v), X,..., X ) n k k w ˆ ˆ ili v w X L w X i1 j1 j1 j ( j) j ( j) () θ ( W θ) Where is a non-linear sigmoidal function, L(v) in the label of v, theta is the bias term, W in is the weight vector associated with the label space, X (j) in is the code for the j-th subgraphs (subtree) of v, and w j -hat in is the weight associated with the j-th subgraph space.

46 50 Fully-connected RNN Alessio Micheli Standard Neurons Recursive Neurons Inputs g function x Copy made according to the graph topology E Labels q 1-1 x q 2-1 x

47 52 In details: Encoding Network (I) Alessio Micheli Start here! Tau (and weight) sharing : units are the same for the all the vertices of a tree and for all the trees in the data set!!!

48 53 In details: Encoding Network (II) Alessio Micheli

49 54 Recap: State transition system Alessio Micheli x( v) ( l( v), x(ch[ v])) y( v) g( l( v), x(v)) x( ch[ v]) x(ch [ v]),, (ch [ v]) 1 x k x( v) ( l( v), q y( v) g( x(v)) q q 1 j 1 j x ( v) i x ( v) i x x 0 i (ch j 1 if x( v)) ch [ v])) [ v] nil j Generalized shift operator otherwise x ( v ) E ( G v) X state (code) space g y x 1 q 1 l q k Graphical model of /g l

50 Alessio Micheli Recap: Unfolding 4. A different view by graphical models of the Encoding Networks for Seq. and Structures 55 Note that the use of graphical models make uniform the cases of NN and generative approaches

51 56 Alessio Micheli Recent Applications: Repetita (da ML): RecNN for NLP recent exploitment Currently wide successful application in NLP (e.g. by the Stanford NLP group) Shown the effectives of RecursiveNN applied to tree representation of language (and images) data and tasks. Started in E.g. Sentiment Treebank Sentiment labels (movies reviews) for 215,154 phrases in the parse trees of 11,855 sentences Recursive NN pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%.

52 Repetita (da ML): Examples Alessio Micheli Polarity grade Human-gram annotation Other instances in the dataset 57

53 Alessio Micheli Learning System Supersource transductions: y = g(x(s)) IO-isomorphic transductions: output share the same skeleton of input graph output emitted for each vertex vg Parametric : T G depends on tunable parameters W. Adaptive : learning algorithm used to fit W to data. H = {h h(g) T G (G), G G } Tasks Supervised: target values are known (G,f(G)) (or f(v)) Unsupervised: TR ={G G G} 58

54 Alessio Micheli RNN Learning Algorithms Backpropagation through structure: Extension of BPTT Goller & Küchler (1996) Simple to understand using graphical formalism (backprop+weight sharing on the unfolded net) : The notation is adapted to the case of delta form the fathers RTRL Sperduti & Starita (1997), Equations: See Cap 19 in Kolen, Kremer, A Field Guide to Dynamical Recurrent Networks. IEEE press 2001 RCC family based: next slides 59

55 Alessio Micheli Learning in Structured Domain Plan in 2 lectures 1. Recurrent and Recursive Neural Networks Extensions of model for supervised and unsupervised learning in structured domains Motivation and examples The structured data RNN and RecNN Recursive Cascade Correlation and other recursive approaches Unsupervised recursive models 2. Moving to DPAG and graphs: the role of causality 60

56 Alessio Micheli RCC (I) Architecture of Cascade Correlation for Structure (Recursive Cascade Correlation - RCC) We realize RNN by RCC: constructive approach m is automatically computed by the training algorithm. 61

57 62 RCC (II) Alessio Micheli Architecture of a RCC with 3 hidden units ( m=3 ) and k=2. Recursive hidden units (shadowed) generate the code of the input graph (function E ). The hidden units are added to the network during the training. The box elements are used to store the output of the hidden units, i.e. the code x i (j) that represent the context according to the graph topology. The output unit realize the function g and produce the final prediction value. E.G. A.M. Bianucci, A. Micheli, A. Sperduti, A. Starita. Application of Cascade Correlation Networks for Structures to Chemistry, Applied Intelligence Journal. 12 (1/2): , 2000

58 Alessio Micheli RCC (III) Learning Gradient descent: interleaving LMS (output) and maximization of correlation between new hidden units and residual error. Main difference with CC: calculation of the following derivatives by recurrent equations: Note: simplification (of the sum on other units) due to the architecture compared to full RTRL! But compared to the recurrent case it appears the summation on children appears! 63 k t j hi t h t hh j i j hi k h j hi h k t j hi t h t hh i j hi k h j hi h w w f w w x w w l f w w x 1 ) ( ) ( ) ( ) 1 ( 1 ) ( ) ( ) 1 ( ˆ ˆ ˆ ),,, ( ˆ ˆ ),,, ( x x x x l x x x l

59 Alessio Micheli TreeESN ( ) Extend the applicability of the RC/ESN approach to tree structured data Extremely efficient way of modeling RecNNs Architectural and experimental performance baseline for trained RecNN models w 1 w 1 w 1 w 2 w 1 w 1 w 2 The same recursive process of RecNN made by efficient RC approaches C. Gallicchio, A. Micheli. Neurocomputing, More in a NEXT LECTURE 64

60 HTMM (2012) Alessio Micheli E.g Bottom-up Hidden Tree Markov Models extend HMM to trees exploiting the recursive approach as well Generative process from the leaves to the root Markov assumption (conditional dependence) Qch1(u),, Qch K (u) Qu Children to parent hidden state transition P(Qu Qch1(u),, Qch K (u)) Issue: how decompose this joint state transition? (see [9]). Bayesian network unfolding graphical model over the input trees y: observed elements Q: hidden states variables with discrete values More in an other LECTURE Bacciu, Micheli, Sperduti. IEEE TNNLS,

61 Alessio Micheli Learning in Structured Domain Plan in 2 lectures 1. Recurrent and Recursive Neural Networks Extensions of model for supervised and unsupervised learning in structured domains Motivation and examples The structured data RNN and RecNN Recursive Cascade Correlation and other recursive approaches Unsupervised recursive models Toward next lecture 2. Moving to DPAG and graphs: the role of causality 66

62 Unsupervised learning in SD A general framework extending SOM approach B. Hammer, A. Micheli, M. Strickert, A. Sperduti. A General Framework for Unsupervised Processing of Structured Data, Neurocomputing (Elsevier Science) Volume 57, Pages 3-35, March B. Hammer, A. Micheli, A. Sperduti, M. Strickert. Recursive Self-organizing Network Models. Neural Networks, Elsevier Science. Volume 17, Issues 8-9, Pages , October-November 2004.

63 Self-Organizing Map Dip. Informatica University of Pisa Unsupervised learning NN Clustering (find natural groupings in a set of data) data compression projection and exploration SOM: learn a map from input space to a lattice of neural-units topology-preserving map (neighboring units respond to similar input pattern) visualization properties of the SOM output feature map (2D) Neural map Alessio Micheli 68

64 SOM Dip. Informatica University of Pisa SOM Units in the map layer compete with each other to represent the current input pattern. The unit whose weight vector is closest to the input is the winner. Competitive stage: Winner units = units with minimal d(w,x) Cooperative stage: adjust w of winner and neighborhood units Alessio Micheli 69

65 Similarity Based Approaches Dip. Informatica University of Pisa Define ad hoc similarity measure on structured data (sequence, trees etc.) Apply known unsupervised technique Examples: SOM for non-vectorial data (e.g. Kohonen et al.): Prototypes: mean vs generalized median Edit distance, graph matching etc. The problem: there has been no systematic way to extract features or metrics relations between examples for SD (general for every task) Alessio Micheli 70

66 Basic Idea Dip. Informatica University of Pisa Transfer recursive idea to unsupervised learning No prior metric/pre-processing (but still bias!!!) Evolution of the similarity measure through recursive comparison of sub-structures Iteratively compared via bottom-up encoding process c a b Alessio Micheli 72

67 SOM for Structured Domains Dip. Informatica University of Pisa Motivations (SOM-SD): Unsupervised learning for SD New supervised model for SD basing on the SOM Visualization properties of the SOM output feature map (2D) (e.g for SAR analysis) SOM: learn a map from input space to a lattice of neural-units NN realized by a SOM discrete output display space (2D coordinates) = discrete code space! The SOM is applied recursively to each node of the graph Vertex is represented by Label + subgraph code value (coordinates of the winning units for the subgraph) Alessio Micheli 73

68 SOM-SD Bottom-Up Processing Dip. Informatica University of Pisa , , , ,1.3,-1,-1,-1, Alessio Micheli 74

69 SOM-SD Bottom-Up Processing Dip. Informatica University of Pisa , ,0.27,2,1,-1, ,1.3,-1,-1,-1, , ,1.3 Alessio Micheli 75

70 SOM-SD Bottom-Up Processing Dip. Informatica University of Pisa , ,1.3,-1,-1,-1, ,0.27,2,1,-1, ,0.51,0,2,2,1 3.14, ,1.3 Alessio Micheli 76

71 A General Framework for Unsupervised Processing of SD Dip. Informatica University of Pisa Motivations of the General Fw Exploiting recursive approach for unsupervised learning in SD specific aspects (internal representation) make the concept of similarity measures in its formulation explicit Combining different SOM-based existing approach set of unrelated models sharing common characteristics Basis for new implementation and new models and learning methods (e.g. extension to Neural Gas and VQ). Uniform evaluation of general properties (prior to learning) E.g. w.r.t. noise tolerance and in-principle capacity of representation. Alessio Micheli 77

72 A General Framework for Unsupervised Processing of SD Dip. Informatica University of Pisa Aims: Uniform formulation within the recursive dynamics of Models Learning methods (Hebbian / based on energy function) Theoretical properties: investigate fundamental mathematical properties Key point: formalize how structures are internally represented Alessio Micheli 78

73 GF for Various Models Dip. Informatica University of Pisa Models: RSOM: Recursive SOM - originally for Sequences (extended) TKM: Temporal Kohonen map - originally for Sequences SOM-SD: Sequences and Trees all models share the basic recursive dynamics models differ in the internal representation (R) of structures Explicitly define a mapping rep: map activity profiles IR N R Neural map Formal repr. of structure Alessio Micheli 79

74 Internal Representation Dip. Informatica University of Pisa SOM-SD RSOM TKM i,j (coordinate vector of winner) All unit activations (exp) Single unit activation (selecting for each unit the previous activation) Alessio Micheli 80

75 GF for Unsupervised Processing of SD (III) Dip. Informatica University of Pisa Objective: a(t 1,t 2 ) a Method: t 1 t 2 Compare unit W L W 1 W 2 with : rep() rep() a t 1 t 2 Alessio Micheli 81

76 Def. General Framework (GSOMSD) Dip. Informatica University of Pisa Each approaches obtained specifying similarity measure on label (d L ) similarity measure on formal representation of tress (d R ), the set of units, W :N (W L, W 1, W 2 ), N the function rep: IR R Activation of unit n for a(t 1,t 2 ) (or distance from the input): ~ d ( a( t1, t2), n) dl( a, W0 ( n)) dr( R1, W1 ( n)) dr( R2, W2 ( n)) R j ~ rep( d ( t j null ~, n ),..., d ( t, n 1 j N )) nil otherwise Activity on the map for subtree t j Alessio Micheli 82

77 GF for Unsupervised Processing of SD (IV) Dip. Informatica University of Pisa SOM-SD : rep( x 1,..., x N ) outputs the index of the winner RSOM: rep x,..., x ) (exp( x ),...,exp( x )) ( 1 N 1 N TKM : rep identity, W 1 per n i = i-th unit vector Graph Encoding: E ( t) ~ rep( d ( t, n ),. ~ rep( d ( t, N )). ~., d ( t, n 1 N )) Alessio Micheli 83

78 Hebbian Learning Dip. Informatica University of Pisa A synaptic weight is increased with simultaneous occurrence of presynaptic and postsynaptic activities Iteratively making weights of the neuron that fires in response to input x more similar to x. + Simple - Investigation of learning behavior is often difficult Objectives of learning can be expressed in terms of a cost function For flat domain (input vector) most unsupervised Hebb learning can be derived as gradient descent of a an appropriate cost function Kohonen used a simple geometric computation for the Hebb-like rule W(t+1) = W(t) + n(t) h(t) [x - w] Alessio Micheli 84

79 General Framework for Unsupervised Learning Dip. Informatica University of Pisa Learning (in all the models): Hebbian Learning: the weights W(winner) are iteratively moved toward the current inputs (a, R 1, R 2 ). Soft-max adaptive rule by Neighborhood function on SOM lattice Learning based on an energy function E (Objectives of learning can be expressed in terms of a cost function) Derive rule as stochastic gradient descent General form where changing f we obtain different learning algorithms e.g. VQ for SD by E Extension to Neural Gas. ~ E f ( d ( t, N)) t Alessio Micheli t ~ d ( t, n j ( t) ) 2

80 Learning: main general result Dip. Informatica University of Pisa For flat domain (input vector) most unsupervised Hebbian training rule can be derived as gradient descent of an appropriate cost function Computing gradient for VQ, SOM, NG in the SD case we found that Hebbian learning is only an (efficient) approximation to the precise gradient descent of the cost function ~ E f ( d ( t, N)) t Alessio Micheli 86

81 Theoretical Investigations (examples) Dip. Informatica University of Pisa Properties of rep for SD: efficiency compactness noise tolerance (rep(x) on d R ) (nice for SOM-SD) representational capability (cardinality of R) preservation of similarity (topological map) locality (TKM) / globality (SOM-SD, RSOM) of the computation: locality limits the recognition of structures Key issue: choice of the internal representation rep Uniform evaluation of methods prior to learning Alessio Micheli 88

82 Alessio Micheli Learning in Structured Domain Plan in 2 lectures 1. Recurrent and Recursive Neural Networks Extensions of model for supervised and unsupervised learning in structured domainsi Motivation and examples The structured data RNN and RecNN Recursive Cascade Correlation and other recursive approaches Unsupervised recursive models Toward next lecture 2. Moving to DPAG and graphs: the role of causality 89

83 Alessio Micheli Analysis (I) Computational properties: approximations capability Suitable solutions for A structured form of computation: compositionality tackles variability Parsimony: weight sharing reduces number of parameters Adaptive transduction: BPTS, RTRL,... allowing adaptive representation of SD handling of variability distributed code of structure (X) state (code) space of constant size (ind. from structure size/deep) 90

84 Analysis (II): Assumptions and Open Problems Alessio Micheli Inherited from time sequence processing: Stationarity: efficacy solution to parsimony without reducing expressive power relevant for SD Causality: affects the computational power! Learning task: supervised and unsupervised Is it possible a uniform approach allowing investigation of fundamental mathematical properties? Other approaches Assessments (applications) Proof-of-the-principle New methods and results 91

85 Alessio Micheli Graphs by NN? For Graphs by NN: see next lecture! Following a journey through the causality assumption! d b c a a How to deal with cycles and causality? 92

86 MODELS panorama for SD (examples) Alessio Micheli l 1 l 1 l 2 l 3 l 4 l 5 Standard ML models for flat data d d Recurrent NN/ESN HMM Kernel for strings d b c b c b c A. Micheli a a Tree: Recursive NN Tree ESN HTMM Tree Kernels a a DPAG: CRCC See references for models in the bibliography slides (later) a a GNN/GraphESN NN4G Graph Kernels SRL 93

87 Alessio Micheli Bibliography: aims Different parts in the following: Basic/Fundamentals * Possible topic for seminars May be useful also for future studies Many topics can be subject of study and development Many many works in literature (arrive continuously)! Many possible topics for demand and possible thesis More bibliography on demand: micheli@di.unipi.it 94

88 95 Bibliografia (Basic, origins of RecNN) Alessio Micheli RNN A. Sperduti, A. Starita. Supervised Neural Networks for the Classification of Structures,IEEE Transactions on Neural Networks. Vol. 8, n. 3, pp , P. Frasconi, M. Gori, and A. Sperduti, A General Framework for Adaptive Processing of Data Structures, IEEE Transactions on Neural Networks. Vol. 9, No. 5, pp , A.M. Bianucci, A. Micheli, A. Sperduti, A. Starita. Application of Cascade Correlation Networks for Structures to Chemistry, Applied Intelligence Journal (Kluwer Academic Publishers), Special Issue on "Neural Networks and Structured Knowledge" Vol. 12 (1/2): , A. Micheli, A. Sperduti, A. Starita, A.M. Bianucci. A Novel Approach to QSPR/QSAR Based on Neural Networks for Structures, Chapter in Book : "Soft Computing Approaches in Chemistry", pp , H. Cartwright, L. M. Sztandera, Eds., Springer-Verlag, Heidelberg, March 2003.

89 96 Bibliography: NN approaches-2 Alessio Micheli * UNSUPERVISED RecursiveNN B. Hammer, A. Micheli, M. Strickert, A. Sperduti. A General Framework for Unsupervised Processing of Structured Data, Neurocomputing (Elsevier Science) Volume 57, Pages 3-35, March B. Hammer, A. Micheli, A. Sperduti, M. Strickert. Recursive Self-organizing Network Models. Neural Networks, Elsevier Science. Volume 17, Issues 8-9, Pages , October-November * TreeESN: efficient RecNN C. Gallicchio, A. Micheli. Tree Echo State Networks, Neurocomputing, volume 101, pag , * HTMM: further developments (generative) D. Bacciu, A. Micheli and A. Sperduti. Compositional Generative Mapping for Tree-Structured Data - Part I: Bottom-Up Probabilistic Modeling of Trees, IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 12, pp , 2012

90 97 Bibliography: RecNN applications (example) Alessio Micheli * NLP applications (that you can extend with recent instances, and relate them to the general RecNN framework present in this lecture and the basic RecNN bibliography references ) R. Socher, C.C. Lin, C. Manning, A.Y. Ng, Parsing natural scenes and natural language with recursive neural networks, Proceedings of the 28th international conference on machine learning (ICML-11) R. Socher, A. Perelygin, J.Y. Wu, J. Chuang, C.D. Manning, A.Y. Ng, C.P. Potts, Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages , Seattle, Washington, USA, October 2013

91 98 Bibliography: Next lecture Alessio Micheli * RecNN for DPAGs : how to extend the domain I A. Micheli, D. Sona, A. Sperduti. Contextual Processing of Structured Data by Recursive Cascade Correlation. IEEE Transactions on Neural Networks. Vol. 15, n. 6, Pages , November Hammer, A. Micheli, and A. Sperduti. Universal Approximation Capability of Cascade Correlation for Structures. Neural Computation. Vol. 17, No. 5, Pages , (C) 2005 MIT press. * NN for GRAPH DATA: how to extend the domain II * A. Micheli. Neural network for graphs: a contextual constructive approach, IEEE Transactions on Neural Networks, volume 20 (3), pag , doi: /TNN , C. Gallicchio, A. Micheli. Graph Echo State Networks, Proceedings of the International Joint Conference on Neural Networks (IJCNN), pages 1 8, F. Scarselli, M. Gori, A.C.Tsoi, M. Hagenbuchner, G. Monfardini. The graph neural network model, IEEE Transactions on Neural Networks, 20(1), pag , 2009.

92 Alessio Micheli DRAFT, please do not circulate! For information Alessio Micheli Dipartimento di Informatica Università di Pisa - Italy Computational Intelligence & Machine Learning Group

Intro to Learning in SD -2

Intro to Learning in SD -2 Intro to Learning in SD -2 Alessio Micheli E-mail: micheli@di.unipi.it 2- Neural Networks for Graphs Apr 208 DRAFT, please do not circulate! Dipartimento di Informatica Università di Pisa www.di.unipi.it/groups/ciml

More information

NC ms Universal Approximation Capability of. Cascade Correlation for Structures

NC ms Universal Approximation Capability of. Cascade Correlation for Structures NC ms 2933 Universal Approximation Capability of Cascade Correlation for Structures Barbara Hammer, Alessio Micheli, and Alessandro Sperduti July 21, 200 Abstract Cascade correlation (CC) constitutes a

More information

768 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 9, NO. 5, SEPTEMBER A General Framework for Adaptive Processing of Data Structures

768 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 9, NO. 5, SEPTEMBER A General Framework for Adaptive Processing of Data Structures 768 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 9, NO. 5, SEPTEMBER 1998 A General Framework for Adaptive Processing of Data Structures Paolo Frasconi, Member, IEEE, Marco Gori, Senior Member, IEEE, and

More information

Self-Organizing Maps for cyclic and unbounded graphs

Self-Organizing Maps for cyclic and unbounded graphs Self-Organizing Maps for cyclic and unbounded graphs M. Hagenbuchner 1, A. Sperduti 2, A.C. Tsoi 3 1- University of Wollongong, Wollongong, Australia. 2- University of Padova, Padova, Italy. 3- Hong Kong

More information

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text Philosophische Fakultät Seminar für Sprachwissenschaft Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank text 06 July 2017, Patricia Fischer & Neele Witte Overview Sentiment

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining

More information

1464 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 6, NOVEMBER Processing Directed Acyclic Graphs With Recursive Neural Networks

1464 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 6, NOVEMBER Processing Directed Acyclic Graphs With Recursive Neural Networks 1464 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 6, NOVEMBER 2001 Brief Papers Processing Directed Acyclic Graphs With Recursive Neural Networks Monica Bianchini, Marco Gori, and Franco Scarselli

More information

Universita degli Studi di Firenze

Universita degli Studi di Firenze Universita degli studi di Firenze Dipartimento di Sistemi e Informatica Via di Santa Marta 3, 5039 Firenze (Italy) http://www.dsi.unifi.it D S I A general framework for adaptive processing of data structures

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Structured Learning. Jun Zhu

Structured Learning. Jun Zhu Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum

More information

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu FMA901F: Machine Learning Lecture 6: Graphical Models Cristian Sminchisescu Graphical Models Provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate

More information

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer

More information

Neural networks for relational learning: an experimental comparison

Neural networks for relational learning: an experimental comparison Mach Learn (2011) 82: 315 349 DOI 10.1007/s10994-010-5196-5 Neural networks for relational learning: an experimental comparison Werner Uwents Gabriele Monfardini Hendrik Blockeel Marco Gori Franco Scarselli

More information

Bioinformatics - Lecture 07

Bioinformatics - Lecture 07 Bioinformatics - Lecture 07 Bioinformatics Clusters and networks Martin Saturka http://www.bioplexity.org/lectures/ EBI version 0.4 Creative Commons Attribution-Share Alike 2.5 License Learning on profiles

More information

Projection of undirected and non-positional graphs using self organizing maps

Projection of undirected and non-positional graphs using self organizing maps University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2009 Projection of undirected and non-positional

More information

27: Hybrid Graphical Models and Neural Networks

27: Hybrid Graphical Models and Neural Networks 10-708: Probabilistic Graphical Models 10-708 Spring 2016 27: Hybrid Graphical Models and Neural Networks Lecturer: Matt Gormley Scribes: Jakob Bauer Otilia Stretcu Rohan Varma 1 Motivation We first look

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 7. Recurrent Neural Networks (Some figures adapted from NNDL book) 1 Recurrent Neural Networks 1. Recurrent Neural Networks (RNNs) 2. RNN Training

More information

Grounded Compositional Semantics for Finding and Describing Images with Sentences

Grounded Compositional Semantics for Finding and Describing Images with Sentences Grounded Compositional Semantics for Finding and Describing Images with Sentences R. Socher, A. Karpathy, V. Le,D. Manning, A Y. Ng - 2013 Ali Gharaee 1 Alireza Keshavarzi 2 1 Department of Computational

More information

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION

6. NEURAL NETWORK BASED PATH PLANNING ALGORITHM 6.1 INTRODUCTION 6 NEURAL NETWORK BASED PATH PLANNING ALGORITHM 61 INTRODUCTION In previous chapters path planning algorithms such as trigonometry based path planning algorithm and direction based path planning algorithm

More information

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

COMP 551 Applied Machine Learning Lecture 16: Deep Learning COMP 551 Applied Machine Learning Lecture 16: Deep Learning Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted, all

More information

Understanding the Principles of Recursive Neural Networks: A Generative Approach to Tackle Model Complexity

Understanding the Principles of Recursive Neural Networks: A Generative Approach to Tackle Model Complexity Understanding the Principles of Recursive Neural Networks: A Generative Approach to Tackle Model Complexity Alejandro Chinea Departamento de Física Fundamental, Facultad de Ciencias UNED, Paseo Senda del

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Neural Network Weight Selection Using Genetic Algorithms

Neural Network Weight Selection Using Genetic Algorithms Neural Network Weight Selection Using Genetic Algorithms David Montana presented by: Carl Fink, Hongyi Chen, Jack Cheng, Xinglong Li, Bruce Lin, Chongjie Zhang April 12, 2005 1 Neural Networks Neural networks

More information

Exact Solutions for Recursive Principal Components Analysis of Sequences and Trees

Exact Solutions for Recursive Principal Components Analysis of Sequences and Trees Exact Solutions for Recursive Principal Components Analysis of Sequences and Trees Alessandro Sperduti Department of Pure and Applied Mathematics, University of Padova, Italy sperduti@math.unipd.it Abstract.

More information

Unbiased SVM Density Estimation with Application to Graphical Pattern Recognition

Unbiased SVM Density Estimation with Application to Graphical Pattern Recognition Unbiased SVM Density Estimation with Application to Graphical Pattern Recognition Edmondo Trentin and Ernesto Di Iorio DII - Università di Siena, V. Roma 56, Siena (Italy) {trentin,diiorio}@dii.unisi.it

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient

Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient Workshop report 1. Daniels report is on website 2. Don t expect to write it based on listening to one project (we had 6 only 2 was sufficient quality) 3. I suggest writing it on one presentation. 4. Include

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra

Recurrent Neural Networks. Nand Kishore, Audrey Huang, Rohan Batra Recurrent Neural Networks Nand Kishore, Audrey Huang, Rohan Batra Roadmap Issues Motivation 1 Application 1: Sequence Level Training 2 Basic Structure 3 4 Variations 5 Application 3: Image Classification

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017 3/0/207 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/0/207 Perceptron as a neural

More information

Recurrent Neural Network (RNN) Industrial AI Lab.

Recurrent Neural Network (RNN) Industrial AI Lab. Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series

More information

A Dendrogram. Bioinformatics (Lec 17)

A Dendrogram. Bioinformatics (Lec 17) A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes

More information

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2015 1 : Introduction to GM and Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Wenbo Liu, Venkata Krishna Pillutla 1 Overview This lecture

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

RNNs as Directed Graphical Models

RNNs as Directed Graphical Models RNNs as Directed Graphical Models Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 10. Topics in Sequence Modeling Overview

More information

Learning numbers from Graphs

Learning numbers from Graphs Learning numbers from Graphs Aurélie Goulon 1, Arthur Duprat 1,2, Gérard Dreyfus 1 Ecole Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI-ParisTech) 1 Laboratoire d Électronique,

More information

Unsupervised Recursive Sequence Processing

Unsupervised Recursive Sequence Processing Unsupervised Recursive Sequence Processing Marc Strickert, Barbara Hammer Dept. of Math./Comp. Science, University of Osnabrück, Germany e-mail: {marc,hammer}@informatik.uni-osnabrueck.de Abstract. We

More information

Chapter 7: Competitive learning, clustering, and self-organizing maps

Chapter 7: Competitive learning, clustering, and self-organizing maps Chapter 7: Competitive learning, clustering, and self-organizing maps António R. C. Paiva EEL 6814 Spring 2008 Outline Competitive learning Clustering Self-Organizing Maps What is competition in neural

More information

Transition-based Parsing with Neural Nets

Transition-based Parsing with Neural Nets CS11-747 Neural Networks for NLP Transition-based Parsing with Neural Nets Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Two Types of Linguistic Structure Dependency: focus on relations between

More information

IT has recently been demonstrated that neural networks

IT has recently been demonstrated that neural networks IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 10, NO 6, NOVEMBER 1999 1305 On the Implementation of Frontier-to-Root Tree Automata in Recursive Neural Networks Marco Gori, Senior Member, IEEE, Andreas Küchler,

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)

Artificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism) Artificial Neural Networks Analogy to biological neural systems, the most robust learning systems we know. Attempt to: Understand natural biological systems through computational modeling. Model intelligent

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Machine Learning 13. week

Machine Learning 13. week Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of

More information

Deep Learning. Architecture Design for. Sargur N. Srihari

Deep Learning. Architecture Design for. Sargur N. Srihari Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information

Inductive Modelling of Temporal Sequences by Means of Self-organization

Inductive Modelling of Temporal Sequences by Means of Self-organization Inductive Modelling of Temporal Sequences by Means of Self-organization Jan Koutnik 1 1 Computational Intelligence Group, Dept. of Computer Science and Engineering, Faculty of Electrical Engineering, Czech

More information

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015

Sequence Modeling: Recurrent and Recursive Nets. By Pyry Takala 14 Oct 2015 Sequence Modeling: Recurrent and Recursive Nets By Pyry Takala 14 Oct 2015 Agenda Why Recurrent neural networks? Anatomy and basic training of an RNN (10.2, 10.2.1) Properties of RNNs (10.2.2, 8.2.6) Using

More information

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter

Review: Final Exam CPSC Artificial Intelligence Michael M. Richter Review: Final Exam Model for a Learning Step Learner initially Environm ent Teacher Compare s pe c ia l Information Control Correct Learning criteria Feedback changed Learner after Learning Learning by

More information

Dependency detection with Bayesian Networks

Dependency detection with Bayesian Networks Dependency detection with Bayesian Networks M V Vikhreva Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Leninskie Gory, Moscow, 119991 Supervisor: A G Dyakonov

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Differentiable Data Structures (and POMDPs)

Differentiable Data Structures (and POMDPs) Differentiable Data Structures (and POMDPs) Yarin Gal & Rowan McAllister February 11, 2016 Many thanks to Edward Grefenstette for graphics material; other sources include Wikimedia licensed under CC BY-SA

More information

Empirical Evaluation of RNN Architectures on Sentence Classification Task

Empirical Evaluation of RNN Architectures on Sentence Classification Task Empirical Evaluation of RNN Architectures on Sentence Classification Task Lei Shen, Junlin Zhang Chanjet Information Technology lorashen@126.com, zhangjlh@chanjet.com Abstract. Recurrent Neural Networks

More information

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla

DEEP LEARNING REVIEW. Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature Presented by Divya Chitimalla DEEP LEARNING REVIEW Yann LeCun, Yoshua Bengio & Geoffrey Hinton Nature 2015 -Presented by Divya Chitimalla What is deep learning Deep learning allows computational models that are composed of multiple

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Image Classification Using Wavelet Coefficients in Low-pass Bands

Image Classification Using Wavelet Coefficients in Low-pass Bands Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August -7, 007 Image Classification Using Wavelet Coefficients in Low-pass Bands Weibao Zou, Member, IEEE, and Yan

More information

Function approximation using RBF network. 10 basis functions and 25 data points.

Function approximation using RBF network. 10 basis functions and 25 data points. 1 Function approximation using RBF network F (x j ) = m 1 w i ϕ( x j t i ) i=1 j = 1... N, m 1 = 10, N = 25 10 basis functions and 25 data points. Basis function centers are plotted with circles and data

More information

Loopy Belief Propagation

Loopy Belief Propagation Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure

More information

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL

Name of the lecturer Doç. Dr. Selma Ayşe ÖZEL Y.L. CENG-541 Information Retrieval Systems MASTER Doç. Dr. Selma Ayşe ÖZEL Information retrieval strategies: vector space model, probabilistic retrieval, language models, inference networks, extended

More information

Learning Web Page Scores by Error Back-Propagation

Learning Web Page Scores by Error Back-Propagation Learning Web Page Scores by Error Back-Propagation Michelangelo Diligenti, Marco Gori, Marco Maggini Dipartimento di Ingegneria dell Informazione Università di Siena Via Roma 56, I-53100 Siena (Italy)

More information

Slide07 Haykin Chapter 9: Self-Organizing Maps

Slide07 Haykin Chapter 9: Self-Organizing Maps Slide07 Haykin Chapter 9: Self-Organizing Maps CPSC 636-600 Instructor: Yoonsuck Choe Spring 2012 Introduction Self-organizing maps (SOM) is based on competitive learning, where output neurons compete

More information

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro

CMU Lecture 18: Deep learning and Vision: Convolutional neural networks. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 18: Deep learning and Vision: Convolutional neural networks Teacher: Gianni A. Di Caro DEEP, SHALLOW, CONNECTED, SPARSE? Fully connected multi-layer feed-forward perceptrons: More powerful

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 20. PGM Representation Next Lectures Representation of joint distributions Conditional/marginal independence * Directed vs

More information

Data Mining. Neural Networks

Data Mining. Neural Networks Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most

More information

Pattern Recognition Using Graph Theory

Pattern Recognition Using Graph Theory ISSN: 2278 0211 (Online) Pattern Recognition Using Graph Theory Aditya Doshi Department of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India Manmohan Jangid Department of

More information

Machine Learning : Clustering, Self-Organizing Maps

Machine Learning : Clustering, Self-Organizing Maps Machine Learning Clustering, Self-Organizing Maps 12/12/2013 Machine Learning : Clustering, Self-Organizing Maps Clustering The task: partition a set of objects into meaningful subsets (clusters). The

More information

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition

More information

Efficient SQL-Querying Method for Data Mining in Large Data Bases

Efficient SQL-Querying Method for Data Mining in Large Data Bases Efficient SQL-Querying Method for Data Mining in Large Data Bases Nguyen Hung Son Institute of Mathematics Warsaw University Banacha 2, 02095, Warsaw, Poland Abstract Data mining can be understood as a

More information

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either

More information

Subgraph Matching Using Graph Neural Network

Subgraph Matching Using Graph Neural Network Journal of Intelligent Learning Systems and Applications, 0,, -8 http://dxdoiorg/0/jilsa008 Published Online November 0 (http://wwwscirporg/journal/jilsa) GnanaJothi Raja Baskararaja, MeenaRani Sundaramoorthy

More information

Sparsity issues in self-organizing-maps for structures

Sparsity issues in self-organizing-maps for structures University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2011 Sparsity issues in self-organizing-maps for structures Markus Hagenbuchner

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

This Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured

This Talk. 1) Node embeddings. Map nodes to low-dimensional embeddings. 2) Graph neural networks. Deep learning architectures for graphstructured Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 1 This Talk 1) Node embeddings Map nodes to low-dimensional embeddings. 2) Graph neural networks Deep learning architectures

More information

Context Encoding LSTM CS224N Course Project

Context Encoding LSTM CS224N Course Project Context Encoding LSTM CS224N Course Project Abhinav Rastogi arastogi@stanford.edu Supervised by - Samuel R. Bowman December 7, 2015 Abstract This project uses ideas from greedy transition based parsing

More information

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos Machine Learning for Computer Vision 1 18 October, 2013 MVA ENS Cachan Lecture 6: Introduction to graphical models Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris

More information

Methods for Intelligent Systems

Methods for Intelligent Systems Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering

More information

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS

LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Neural Networks Classifier Introduction INPUT: classification data, i.e. it contains an classification (class) attribute. WE also say that the class

More information

CS229 Final Project: Predicting Expected Response Times

CS229 Final Project: Predicting Expected  Response Times CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time

More information

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem

The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Int. J. Advance Soft Compu. Appl, Vol. 9, No. 1, March 2017 ISSN 2074-8523 The Un-normalized Graph p-laplacian based Semi-supervised Learning Method and Speech Recognition Problem Loc Tran 1 and Linh Tran

More information

ESANN'1999 proceedings - European Symposium on Artificial Neural Networks

ESANN'1999 proceedings - European Symposium on Artificial Neural Networks Computation of Tree-Recursive Information for Structures Gradient of Neural Information Processing, University of Ulm, 89069 Ulm, Dept. Email: kuechler@neuro.informatik.uni-ulm.de Germany, Recently, so-called

More information

Clustering with Reinforcement Learning

Clustering with Reinforcement Learning Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of

More information

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute

More information

1. Why Study Trees? Trees and Graphs. 2. Binary Trees. CITS2200 Data Structures and Algorithms. Wood... Topic 10. Trees are ubiquitous. Examples...

1. Why Study Trees? Trees and Graphs. 2. Binary Trees. CITS2200 Data Structures and Algorithms. Wood... Topic 10. Trees are ubiquitous. Examples... . Why Study Trees? CITS00 Data Structures and Algorithms Topic 0 Trees and Graphs Trees and Graphs Binary trees definitions: size, height, levels, skinny, complete Trees, forests and orchards Wood... Examples...

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL) Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL) http://chrischoy.org Stanford CS231A 1 Understanding a Scene Objects Chairs, Cups, Tables,

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國 Conditional Random Fields - A probabilistic graphical model Yen-Chin Lee 指導老師 : 鮑興國 Outline Labeling sequence data problem Introduction conditional random field (CRF) Different views on building a conditional

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets: Inference (Finish) Variable Elimination Graph-view of VE: Fill-edges, induced width

More information

Week 3: Perceptron and Multi-layer Perceptron

Week 3: Perceptron and Multi-layer Perceptron Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,

More information

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks

More information

Machine Learning. Sourangshu Bhattacharya

Machine Learning. Sourangshu Bhattacharya Machine Learning Sourangshu Bhattacharya Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Curve Fitting Re-visited Maximum Likelihood Determine by minimizing sum-of-squares

More information

Figure 4.1: The evolution of a rooted tree.

Figure 4.1: The evolution of a rooted tree. 106 CHAPTER 4. INDUCTION, RECURSION AND RECURRENCES 4.6 Rooted Trees 4.6.1 The idea of a rooted tree We talked about how a tree diagram helps us visualize merge sort or other divide and conquer algorithms.

More information

DOMINATION IN SOME CLASSES OF DITREES

DOMINATION IN SOME CLASSES OF DITREES BULLETIN OF THE INTERNATIONAL MATHEMATICAL VIRTUAL INSTITUTE ISSN (p) 20-4874, ISSN (o) 20-4955 www.imvibl.org /JOURNALS / BULLETIN Vol. 6(2016), 157-167 Former BULLETIN OF THE SOCIETY OF MATHEMATICIANS

More information

Representations of Weighted Graphs (as Matrices) Algorithms and Data Structures: Minimum Spanning Trees. Weighted Graphs

Representations of Weighted Graphs (as Matrices) Algorithms and Data Structures: Minimum Spanning Trees. Weighted Graphs Representations of Weighted Graphs (as Matrices) A B Algorithms and Data Structures: Minimum Spanning Trees 9.0 F 1.0 6.0 5.0 6.0 G 5.0 I H 3.0 1.0 C 5.0 E 1.0 D 28th Oct, 1st & 4th Nov, 2011 ADS: lects

More information

Lecture 11: Clustering Introduction and Projects Machine Learning

Lecture 11: Clustering Introduction and Projects Machine Learning Lecture 11: Clustering Introduction and Projects Machine Learning Andrew Rosenberg March 12, 2010 1/1 Last Time Junction Tree Algorithm Efficient Marginals in Graphical Models 2/1 Today Clustering Project

More information

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an

More information