Learning highly non-separable Boolean functions using Constructive Feedforward Neural Network

Size: px

Start display at page:

Download "Learning highly non-separable Boolean functions using Constructive Feedforward Neural Network"

Barbra McLaughlin
5 years ago
Views:

1 Learning highly non-separable Boolean functions using Constructive Feedforward Neural Network Marek Grochowski and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University, Grudzi adzka 5, Toruń, Poland, grochu@is.umk.pl; Google: Duch 7th International Conference on Artificial Neural Networks

2 Motivation and previous work Parity problem Learning of problems with inherent non-separable Boolean logic is still a challenge (text categorization, medical data, natural language processing, bioinformatics, etc.) Many learning systems, such as SVMs, decision trees or similarity-based methods cannot deal with such problems - obtained solutions are too complex, generalization is poor The goal is to find general algorithm capable of solving a large set of problems with the smallest possible number of parameters

3 N-bit parity problem Motivation and previous work Parity problem Examples of solving n-bit parity problem it is trivial with periodic function with a single parameter MLP need O(n 2 ) parameters, learning is difficult most of neural network solutions proposed to solve n-bit parity problem will not work for other complex problems (E. Iyoda, H. Nobuhara, K. Hirota (2003); D. Stork, J. Allen (992), B. Wilamowski, D. Hunter (2003)) projection on a line with n threshold parameters (k-separable solution), Duch (2006) x w x x2

4 Non-separable Boolean functions Motivation and previous work Parity problem The difficulty of learning Boolean functions grows quickly with the minimum k required to solve a given problem. For many complicated problems often a simple linear mapping exists that leaves only trivial non-linearities that may be separated using window-like neurons. The main idea is to create constructive network with neurons that realize window-like functions

5 Parity problem Motivation and previous work Parity problem Example of 0 dimensional parity problem (2 0 = 024 vectors) Sequencion of labels of projection on random direction

6 Parity problem Motivation and previous work Parity problem Example of 0 dimensional parity problem (2 0 = 024 vectors) Sequencion of labels of projection on random direction diagonal direction weights: k = (-separability)

7 Parity problem Motivation and previous work Parity problem Example of 0 dimensional parity problem (2 0 = 024 vectors) Sequencion of labels of projection on random direction diagonal direction weights: k = (-separability)

8 Error Function Error function The Node Architecture and Learning Combination of projection and clustering Hard-window transfer function M i ( x; w, a, b) = j if wx [a, b] 0 if wx / [a, b] Error measure E( x; Γ) = E x y( x; Γ) c( x)) does not give any control over purity of clusters Error function with purity term E( x; Γ; a, b, λ) = E x y( x; Γ) c( x)) + λe y [a,b] y( x; Γ) c( x)) λ controls the tradeoff between the covering and the purity

9 Transfer Functions Error function The Node Architecture and Learning Soft window-like transfer functions M( x; w, a, b, β) = σ(β( w x a)) σ(β( w x b)) M( x; w, t, a, β) = σ(β( w x t a))( σ(β( w x t + a)) M( x; w, a, b, β) = ` 2 tanh(β( w x a)) tanh(β( w x b)) slope parameter controls softness of transfer function M( x; Γ i ) β M( x; Γ i )

10 Error Function Error function The Node Architecture and Learning P E( x; Γ, λ, λ 2) = 2 x (y( x; Γ) c( x))2 + X X + λ ( c( x))y( x; Γ) λ 2 c( x)y( x; Γ) x {z } penalty x {z } reward Penalty factor λ increases the total error for vectors x i from class c(x i ) = 0 that falls into group of vectors from class (it is a penalty for unclean clusters). Reward factor λ 2 decreases the value of total error for every vector x i from class that was correctly placed inside created clusters (it is a reward for large clusters).

11 Learning Error function The Node Architecture and Learning Constructive algorithm y w M Learning starts with an empty hidden layer and in every phase of the training one new unit is added, initialized and trained using the backpropagation algorithm x

12 Learning Error function The Node Architecture and Learning Constructive algorithm y M M 2 w w 2 x The input vectors correctly handled by the first neuron do not contribute to the error, therefore the weights of this neuron are kept frozen during further learning. Slope β is set to large value to obtain hard boundaries. Next node is added and learning procedure is repeated on the remaining data

13 Learning Error function The Node Architecture and Learning Constructive algorithm y M M2 M 3 w w 2 w 3 Most nodes and connections are fixed and only weights of one node are modified at each training step x

14 Learning Error function The Node Architecture and Learning Constructive algorithm y M M2 M 3 w w 2 w 3 If number of cases correctly classified by a given new node drops below certain minimum the learning procedure stops and this node is removed from the network x

15 Influence of Penalty and Reward Average over 92 6-separable 4 dimensional Boolean functions To Reward or to Punish? Some Conclusions Error Neurons neurons (average) error (average) penalty reward Cycles penalty reward cycles (average) reward penalty 0.9

16 Influence of Penalty and Reward Average over 92 6-separable 4 dimensional Boolean functions To Reward or to Punish? Some Conclusions Neurons Cycles cycles (average) neurons (average) penalty reward Error reward error (average) penalty penalty reward 0.

17 Influence of Penalty and Reward Average over 92 6-separable 4 dimensional Boolean functions To Reward or to Punish? Some Conclusions Cycles Error error (average) cycles (average) penalty 0.4 Neurons reward 0. neurons (average) reward penalty penalty reward

18 Boolean functions To Reward or to Punish? Some Conclusions parity parity functions with small perturbation of labels random Boolean functions Average training accuracy Average no. of neurons parity parity with 5% perturbation average over 00 random functions accuracy neurons dimension parity parity with 5% perturbation average over 00 random functions dimension

19 Boolean functions To Reward or to Punish? Some Conclusions parity parity functions with small perturbation of labels random Boolean functions Average no. of neurons Average training accuracy parity parity with 5% perturbation average over 00 random functions accuracy neurons parity parity with 5% perturbation average over 00 random functions dimension dimension

20 Real World Data Test for datasets from UCI repository To Reward or to Punish? Some Conclusions 0x0 CV test accuracy dataset -NN Naive Bayes SVM c3sep Appendicitis 8.3± ± ± ±.0 Australian 78.0± 80.0± ± ± Flag 50.±. 4.±. 5.±. 53.6±.8 Glass 68.2± ± ±0.9 6.±.3 Ionosphere 85.2± ± 85.2± 85.±.5 Iris 95.9± ± 95.5± ±.0 Pima-diabetes 70.5± ± ± ±0.4 Promoters 78.5± ±.3 93.± ±5.6 Sonar 86.8± ± ±. 77.9±2.4 Wine 95.± 98.± ± 97.±

21 Real World Data Test for datasets from UCI repository To Reward or to Punish? Some Conclusions Comparison of complexity SVM vs. c3sep support neurons vectors (total) average number of neurons per class Appendicitis Australian Flag Glass Ionosphere Iris Pima-diabetes Promotores Sonar Wine

22 Summary and Further Work To Reward or to Punish? Some Conclusions Some conclusions First steps towards efficient learning of Boolean functions have been made here The approach presented here has been able to learn quite difficult Boolean functions using a very simple model Great advantage is small computational costs of the constructive network training Further work Global minimization algorithms instead of backpropagation looking for k-separable solution -e.g. network with weight shering different error function additional transformation of input data Experiments with high dimensional real world problems Many more ideas...

Recursive Similarity-Based Algorithm for Deep Learning

Recursive Similarity-Based Algorithm for R Tomasz Maszczyk & W lodzis law Duch Nicolaus Copernicus University Toruń, Poland ICONIP 2012 {tmaszczyk,wduch}@is.umk.pl 1 / 21 R Similarity-Based Learning ()