Controlled Information Maximization for SOM Knowledge Induced Learning

Similar documents
Effects of Model Complexity on Generalization Performance of Convolutional Neural Networks

Detection and Recognition of Alert Traffic Signs

A Two-stage and Parameter-free Binarization Method for Degraded Document Images

Input Layer f = 2 f = 0 f = f = 3 1,16 1,1 1,2 1,3 2, ,2 3,3 3,16. f = 1. f = Output Layer

An Unsupervised Segmentation Framework For Texture Image Queries

A ROI Focusing Mechanism for Digital Cameras

Journal of World s Electrical Engineering and Technology J. World. Elect. Eng. Tech. 1(1): 12-16, 2012

IP Network Design by Modified Branch Exchange Method

A modal estimation based multitype sensor placement method

ANN Models for Coplanar Strip Line Analysis and Synthesis

Spiral Recognition Methodology and Its Application for Recognition of Chinese Bank Checks

A Neural Network Model for Storing and Retrieving 2D Images of Rotated 3D Object Using Principal Components

Optical Flow for Large Motion Using Gradient Technique

A New Finite Word-length Optimization Method Design for LDPC Decoder

Color Correction Using 3D Multiview Geometry

Point-Biserial Correlation Analysis of Fuzzy Attributes

XFVHDL: A Tool for the Synthesis of Fuzzy Logic Controllers

Segmentation of Casting Defects in X-Ray Images Based on Fractal Dimension

A Generalized Profile Function Model Based on Artificial Intelligence

Modelling, simulation, and performance analysis of a CAN FD system with SAE benchmark based message set

Linear Ensembles of Word Embedding Models

COLOR EDGE DETECTION IN RGB USING JOINTLY EUCLIDEAN DISTANCE AND VECTOR ANGLE

POMDP: Introduction to Partially Observable Markov Decision Processes Hossein Kamalzadeh, Michael Hahsler

A VECTOR PERTURBATION APPROACH TO THE GENERALIZED AIRCRAFT SPARE PARTS GROUPING PROBLEM

Slotted Random Access Protocol with Dynamic Transmission Probability Control in CDMA System

A Shape-preserving Affine Takagi-Sugeno Model Based on a Piecewise Constant Nonuniform Fuzzification Transform

ANALYTIC PERFORMANCE MODELS FOR SINGLE CLASS AND MULTIPLE CLASS MULTITHREADED SOFTWARE SERVERS

RANDOM IRREGULAR BLOCK-HIERARCHICAL NETWORKS: ALGORITHMS FOR COMPUTATION OF MAIN PROPERTIES

Approximating Euclidean Distance Transform with Simple Operations in Cellular Processor Arrays

Cellular Neural Network Based PTV

Cardiac C-Arm CT. SNR Enhancement by Combining Multiple Retrospectively Motion Corrected FDK-Like Reconstructions

Obstacle Avoidance of Autonomous Mobile Robot using Stereo Vision Sensor

Clustering Interval-valued Data Using an Overlapped Interval Divergence

An Extension to the Local Binary Patterns for Image Retrieval

On using circuit-switched networks for file transfers

Title. Author(s)NOMURA, K.; MOROOKA, S. Issue Date Doc URL. Type. Note. File Information

Machine Learning for Automatic Classification of Web Service Interface Descriptions

SYSTEM LEVEL REUSE METRICS FOR OBJECT ORIENTED SOFTWARE : AN ALTERNATIVE APPROACH

Ranking Visualizations of Correlation Using Weber s Law

Positioning of a robot based on binocular vision for hand / foot fusion Long Han

A Minutiae-based Fingerprint Matching Algorithm Using Phase Correlation

High performance CUDA based CNN image processor

3D Hand Trajectory Segmentation by Curvatures and Hand Orientation for Classification through a Probabilistic Approach

On Error Estimation in Runge-Kutta Methods

LOSSLESS audio coding is used in such applications as

BUPT at TREC 2006: Spam Track

HISTOGRAMS are an important statistic reflecting the

A Memory Efficient Array Architecture for Real-Time Motion Estimation

Extract Object Boundaries in Noisy Images using Level Set. Final Report

New Algorithms for Daylight Harvesting in a Private Office

A New and Efficient 2D Collision Detection Method Based on Contact Theory Xiaolong CHENG, Jun XIAO a, Ying WANG, Qinghai MIAO, Jian XUE

And Ph.D. Candidate of Computer Science, University of Putra Malaysia 2 Faculty of Computer Science and Information Technology,

Communication vs Distributed Computation: an alternative trade-off curve

Prediction of Time Series Using RBF Neural Networks: A New Approach of Clustering

The EigenRumor Algorithm for Ranking Blogs

Towards Adaptive Information Merging Using Selected XML Fragments

Effective Missing Data Prediction for Collaborative Filtering

Improvement of First-order Takagi-Sugeno Models Using Local Uniform B-splines 1

Generalized Grey Target Decision Method Based on Decision Makers Indifference Attribute Value Preferences

Multiview plus depth video coding with temporal prediction view synthesis

ADDING REALISM TO SOURCE CHARACTERIZATION USING A GENETIC ALGORITHM

Decision Support for Rule and Technique Discovery in an Uncertain Environment

Image Enhancement in the Spatial Domain. Spatial Domain

Parallel processing model for XML parsing

AUTOMATED LOCATION OF ICE REGIONS IN RADARSAT SAR IMAGERY

Topological Characteristic of Wireless Network

ART GALLERIES WITH INTERIOR WALLS. March 1998

DEADLOCK AVOIDANCE IN BATCH PROCESSES. M. Tittus K. Åkesson

View Synthesis using Depth Map for 3D Video

A Novel Automatic White Balance Method For Digital Still Cameras

Module 6 STILL IMAGE COMPRESSION STANDARDS

SCALABLE ENERGY EFFICIENT AD-HOC ON DEMAND DISTANCE VECTOR (SEE-AODV) ROUTING PROTOCOL IN WIRELESS MESH NETWORKS

Shortest Paths for a Two-Robot Rendez-Vous

Method of controlling access to intellectual switching nodes of telecommunication networks and systems

INFORMATION DISSEMINATION DELAY IN VEHICLE-TO-VEHICLE COMMUNICATION NETWORKS IN A TRAFFIC STREAM

Class imaging of hyperspectral satellite remote sensing data using FLSOM

FINITE ELEMENT MODEL UPDATING OF AN EXPERIMENTAL VEHICLE MODEL USING MEASURED MODAL CHARACTERISTICS

Time in Hyperspectral Processing: a Temporal based Classification Approach

vaiation than the fome. Howeve, these methods also beak down as shadowing becomes vey signicant. As we will see, the pesented algoithm based on the il

MULTI-TEMPORAL AND MULTI-SENSOR IMAGE MATCHING BASED ON LOCAL FREQUENCY INFORMATION

CLUSTERED BASED TAKAGI-SUGENO NEURO-FUZZY MODELING OF A MULTIVARIABLE NONLINEAR DYNAMIC SYSTEM

Multi-azimuth Prestack Time Migration for General Anisotropic, Weakly Heterogeneous Media - Field Data Examples

Classifying Datasets Using Some Different Classification Methods

Hierarchical Region Mean-Based Image Segmentation

A Full-mode FME VLSI Architecture Based on 8x8/4x4 Adaptive Hadamard Transform For QFHD H.264/AVC Encoder

Fifth Wheel Modelling and Testing

Feature Enhancement with a Reservoir-based Denoising Auto Encoder

Number of Paths and Neighbours Effect on Multipath Routing in Mobile Ad Hoc Networks

Illumination methods for optical wear detection

Triggering Memories of Conversations using Multimodal Classifiers

A TSK-Type Recurrent Neuro-Fuzzy Systems for Fault Prognosis

Prioritized Traffic Recovery over GMPLS Networks

Erasure-Coding Based Routing for Opportunistic Networks

Simulation and Performance Evaluation of Network on Chip Architectures and Algorithms using CINSIM

arxiv: v2 [physics.soc-ph] 30 Nov 2016

An Improved Resource Reservation Protocol

Methods for history matching under geological constraints Jef Caers Stanford University, Petroleum Engineering, Stanford CA , USA

Lecture 9: Other Applications of CNNs

A Neuro-Fuzzy System for Modelling of a Bleaching Plant

The International Conference in Knowledge Management (CIKM'94), Gaithersburg, MD, November 1994.

Transcription:

3 Int'l Conf. Atificial Intelligence ICAI'5 Contolled Infomation Maximization fo SOM Knowledge Induced Leaning Ryotao Kamimua IT Education Cente and Gaduate School of Science and Technology, Tokai Univeisity 7 Kitakaname, Hiatsuka, Kanagawa 59-9, Japan yo@keyaki.cc.u-tokai.ac.jp Abstact The pesent pape aims to contol infomation content in multi-layeed neual netwoks to impove genealization pefomance. Following Linske s maximum infomation pinciple, infomation should be inceased as much as possible in multi-layeed neual netwoks. Howeve, it is needed to contol infomation incease appopiately to impove the pefomance. Thus, the pesent pape poposes a method to contol infomation content so as to incease genealization pefomance. Expeimental esults on an atificial data and the spam data set showed that impoved genealization pefomance could be obtained by appopiately contolling infomation content. In paticula, bette pefomance could be obseved fo complex poblems. Compaed with the esults by the conventional methods such as the suppot vecto machine, bette pefomance could be obtained when the infomation was lage. Thus the pesent esults cetainly show a possibility of SOM knowledge in taining multi-layeed netwoks. Keywods: Maximum infomation, contolling infomation, multi-layeed netwoks, SOM, deep leaning. Intoduction. Maximum Infomation Infomation-theoetic methods have eceived due attention since Linske [], [], [3], [] tied to descibe infomation pocessing in living systems by the maximum infomation pinciple. In this pinciple, infomation content in multilayeed neual netwoks should be inceased as mush as possible fo each pocessing stage. Linske demonstated the geneation of featue detecting neuons by maximizing infomation content fo simple and linea neual netwoks. Howeve, because difficulty have existed in taining multilayeed neual netwoks, few esults on this pefomance of fully multi-layeed neual netwoks have been epoted. Recently, multi-layeed neual netwoks has eceived much attention because seveal methods to facilitate the leaning of multi-layeed neual netwoks have been poposed in the deep leaning [5], [6], [7], [8]. Thus, the time has come to examine the effectiveness of the maximum infomation pinciple in taining multi-layeed neual netwoks. In the deep leaning, unsupevised featue detection is ealized by the auto-encode and the esticted Boltzmann machines. Howeve, they ae not necessaily good at detecting main featues of input pattens, because they have not been developed as featue detectos. Thus, it is needed to use moe efficient featue detecting methods fo multi-layeed neual netwoks.. SOM Knowledge In taining multi-layeed neual netwoks, it is impotant to extact the main featues of input pattens. In the pesent pape, the self-oganizing maps (SOM) is used to detect the featues fo taining multi-layeed neual netwoks. As is well known, the SOM has been developed to extact impotant featues and in addition to visualize those featues. If it is possible to use the featues detected by the SOM fo taining multi-layeed neual netwoks, the taining can be moe facilitated, and in addition, final esults can be visualized fo easy intepetation. Recently, the SOM was found to be effective in taining multi-layeed neual netwoks unde the condition that infomation content of each hidden laye is maximized o inceased as much as possible [9]. This means that Linske s pinciple of maximum infomation pesevation is effective in taining multi-layeed neual netwoks with the SOM. Meantime, it has been obseved that infomation should not be simply inceased. The infomation incease o maximization should be appopiately contolled to have bette pefomance, in paticula, bette genealization pefomance. Thus, the objective of the pape is to contol appopiately the pocess of infomation maximization and to exploe to what extent genealization pefomance can be impoved..3 Outline In Section, the SOM knowledge induced leaning is intoduced, which is composed of SOM and supevised multilayeed neual netwoks. Then, the infomation content is defined as decease of uncetainty of neuons. This infomation is contolled by using the numbe of layes multiplied by the othe paamete. The paamete is intoduced to adjust the infomation content fo given poblems. In Section 3, the atificial and spam data ae used to examine to what extent infomation can be inceased and genealization pefomance

Int'l Conf. Atificial Intelligence ICAI'5 33 can be impoved. Expeimental esults show that infomation can be inceased and coespondingly genealization eos can be deceased by the pesent method.. Theoy and Computational Methods. SOM Knowledge Induced Leaning The SOM knowledge induced leaning is a method to use the knowledge by the SOM to tain multi-layeed neual netwoks. Figue shows a netwok achitectue fo the leaning. As shown in the figue, the leaning is composed of two phases, namely, the infomation acquisition (a) and use (b) phase. In the infomation acquisition phase in Figue (a), each competitive laye is tained with SOM to poduce weights. These weights ae used to tain multi-layeed neual netwoks in Figue (b). In the infomation use phase, the odinay back-popagation leaning is applied with the ealy stopping citeia. The poblem is whethe the weights by the SOM ae effective in impoving genealization pefomance.. Infomation Content The SOM knowledge is effective only with the maximum infomation pinciple. Thus, this section deals with how to incease infomation content. As shown in Figue, a netwok is composed of the input laye, multiple competitive layes and an output laye. Let us explain how to compute output fom competitive and output neuons. Now, the sth input patten can be epesented by x s = [x s,x s,,x s L ]T, s =,,,S. Connection weights into the jth competitive neuon ae denoted by w j = [w j,w j,,w Lj ] T, j =,,...,M. The output fom an output neuon is computed by vj s =exp ( xs w j ), () whee σ denotes a spead paamete o Gaussian width. The output fom the jth neuon is defined by v j = M vj s. () S j= The fiing pobabilities ae computed by v j p(j) = M m= v. (3) m The uncetainty o entopy of this neuon is M H = p(j)logp(j). () j= The infomation content is defined by diffeence between maximum and obseved uncetainty σ I = H max H M = logm + p(j)logp(j). (5) j=.3 Contolled Infomation Maximization This infomation can be inceased by deceasing the Gaussian width σ. The width is hee defined by σ(t) = t, (6) whee t denotes the laye numbe. When the numbe of layes inceases, the spead paamete σ deceases and the coesponding infomation tends to incease. In addition, the paamete is needed to contol the spead paamete. When the paamete inceases, the spead paamete σ deceases and coespondingly the infomation tends to incease. Figue shows the spead paamete σ as a function of the numbe of layes t when the paamete inceases fom. to.5. As shown in the figue, the spead paamete deceases when the the numbe of layes inceases. In addition, the spead paamete deceases when the paamete t inceases. When the laye numbe is highe, the spead paamete gadually deceases and infomation inceases. In this case, the numbe of stongly fiing neuons in black gadually diminishes as shown in Figue. This means that the numbe of effective competitive neuons gadually diminishes and featues can be gadually compessed into a smalle numbe of competitive neuons. 3. Results and Discussion 3. Application to Atificial Data 3.. Expeimental Outline To show the effectiveness of the infomation maximization, an atificial data set was made, which could be divided into two classes as shown in Figue 3(a). The total numbe of input pattens was. Among them, only pattens wee fo taining ones. Even if the numbe of taining patten inceased, the tendency hee epoted was obseved. The emaining 9 and pattens wee fo the validation and testing ones, espectively. The numbe of input, competitive and output neuons wee, 5 (5 by 5) and, espectively. Then, to make the poblem moe complex, the standad deviation of the data inceased gadually. When the standad deviation inceased fom one in Figue 3(a) to five in Figue 3(b), the bounday between two classes became ambiguous and the poblem of classification became moe difficult. 3.. Weights by SOM The SOM tys to imitate input pattens as much as possible. This means that connection weights tend to be expanded to include all input pattens. Figues (a) and (b) show connection weights in blue and data in geen by the self-oganizing maps. In Figue (a) and (b), weights in blue wee expanded to cove all data points in geen. This means that the SOM tied to acquie infomation ove connection weights on input pattens as much as possible. The poblem is whethe these weights ae effective in taining multilayeed neual netwoks.

3 Int'l Conf. Atificial Intelligence ICAI'5 nd competitive laye t= th competitive laye t= w kj x s k fiing neuon Input laye st competitive laye t= Weight tansfe 3d competitive laye t=3 (a) Infomation acquisition phase 5th competitive laye t=5 w kj w ji Outputs o s i y s i Tagets Input laye Output laye (b) Infomation use phase Fig. : Netwok achitectue with two components of SOM knowledge induced leaning whee black neuons fie stongly..9.8.7.6 σ.5..3...5.6.7.3 =.. infomation inceased gadually. Then, genealization eos deceased gadually as shown in Figue 5(b). Figue 5(c) shows the esults when the standad deviation was five. As shown in Figue 5(c), infomation inceased and the genealization eos deceased, though some fluctuations could be seen in Figue 5(c). 3.. Summay of Results on Genealization..5 3 5 Numbe of layes t Fig. : Spead paamete σ as a function of the numbe of layes t fo diffeent valued of the paamete. 3..3 Results with Infomation Maximization Figue 5 shows infomation and genealization eos when the paamete inceased fom.5 to.5. As shown in Figue 5(a), when the standad deviation was one, infomation inceased when the paamete inceased. Howeve, the genealization eos did not decease and in the end, they inceased apidly. Figue 5(b) shows the esults when the standad deviation was thee. As shown in Figue 5(b), Table I shows the summay of genealization eos. The best aveage eos in bold faces wee obtained by the infomation maximization. Only when the standad deviation was one, the suppot vecto machine (SVM) showed the pefomance equivalent to that by the infomation maximization. When the standad deviation was one, the best eo of.7 by infomation maximization was obtained fo =.. When the standad deviation was two, the best eo was obtained with =.6. When the standad deviation was thee, the best eo was obtained fo =.. When the standad deviation was fou, the best eo was with =.. Finally, when the standad deviation was five, the best eo was obtained fo =.7. Thus, when the paamete was highe, and coespondingly infomation was highe, the bette pefomance could be obtained.

Int'l Conf. Atificial Intelligence ICAI'5 35 8 6 3 5 5 (a) Standad deviation= (b) Standad deviation=5 Fig. 3: Data with five diffeent values of standad deviation. 5 3 3 5 5 5 5 3 3 5 6 (a) Stadad deviation= 5 5 5 5 (b) Stadad deviation=5 Fig. : Weights with epochs by SOM when the standad deviation was one and five. Table : Summay of expeimental esults fo the atificial data, whee "Conv", "With" and "Without" epesent the conventional multi-layeed netwoks and netwoks with and without infomation maximization, espectively. The values of the paamete denotes netwoks with the best genealization pefomance. SOM induced leaning Std dev Conv Without With SVM Avg..5.7..7 Std dev.7.66..5 Avg.7.338.53.6.6 Std dev.... 3 Avg.7.3.5..66 Std dev.5.98..7 Avg.39.83.3..36 Std dev..9..5 5 Avg.365.66.38.7.37 Std dev.7.6.7.6 3..5 Results without Infomation Maximization Figue 6 shows infomation as a function of the numbe of layes by the method without infomation maximization. As can be seen in the figue, infomation tended to incease gadually when the numbe of layes inceased, though the amount of infomation was smalle than that by the infomation maximization. The pesent method is successfully used to incease the infomation, because this natual tendency of infomation incease can be accentuated by the pesent method. Howeve, when the laye numbe was thee, the infomation deceased in Figue 6. In Figue 5(b), the infomation inceased when the standad deviation was thee. Thus, the pesent method can incease the infomation in spite of the absence of natual tendency of infomation incease. In addition, in Figue 5, infomation incease seems to be coelated with impoved genealization when the standad deviation is lage. This means that when the poblem becomes moe complex, the pesent method will be moe effective in inceasing genealization pefomance.

36 Int'l Conf. Atificial Intelligence ICAI'5.5. th laye.8 Infomation Infomation.5.5 5th laye st laye.5.5. 5. 5 (a) Infomation (a) Genealization eos (a) Standad deviation=.8.6...8.6 5th laye 3d laye nd laye nd laye 3d laye th laye Genealization Genealization.6... 3.9.8.7.6.. st laye.. 5. 5. 5.5.5.5 (b) Infomation (b) Genealization eos (b) Standad deviation=3..8.6 5th laye.5.39 Infomation...8.6 3d laye th laye nd laye Genealization.38.37.36.. st laye.5.5.5 (c) Infomation.35.3. 5. 5. 5 (c) Standad deviation=5 (c) Genealization eos Fig. 5: Infomation and genealization eos with the infomation maximization component when the standad deviation inceased fom one to five.

Int'l Conf. Atificial Intelligence ICAI'5 37 Infomation.3.5..5..5 Std dev= Std dev= Std dev=5 Std dev= Std dev=3 3 5 The numbe of layes Fig. 6: Infomation and genealization eos by the method without the infomation maximization component. 3. Application to Spam Data Set 3.. Expeimental Outline The spam data set fom the machine leaning database [] was used to examine the pefomance of the pesent method. The numbe of pattens was 6 with 57 vaiables and of them wee fo taining data. The numbe of validation data set was and the emaining ones wee fo testing. The numbe of input, competitive and output neuons wee 57, 5 (5 by 5) and, espectively. 3.. Infomation and Genealization Figue 7 shows infomation and genealization eos by the pesent method. Infomation content inceased gadually when the paamete inceased fom.5 to. in Figue 7(a), though in the fouth laye infomation deceased. Figue 7(b) shows genealization when the paamete inceased fom.5 to.. The genealization eos deceased gadually and the lowest eo was obtained when the paamete was.5. Those esults show that when infomation inceased, genealization eos tended to decease accodingly. As mentioned, fo the fouth laye, the infomation inceased when the paamete inceased fom.5 to. in Figue 7(a). Howeve, the infomation then deceased when the paamete inceased fom.5 to. in Figue 7(a). As shown in Figue 7(b), the genealization eos fluctuated when the paamete inceased fom.5 to.. This fluctuation may be explained by the decease in infomation fo the fouth laye. 3..3 Summay of Genealization Pefomance Table II shows the summay of genealization eos. The lowest eo of. was obtained by the pesent method. The second best eo of.5 was by the suppot vecto machine. Then, the conventional multi-layeed netwoks shows the thid best eo of.83. The wost eo of Table : Summay of expeimental esults fo the spam data, whee "Conv", "With" and "Without" epesent the conventional multi-layeed netwoks and netwoks with and without infomation maximization, espectively. The values of the paamete denotes netwoks with the best genealization pefomance. SOM induced leaning Conv Without With SVM Avg.83.363..5.5 Std dev.9.89.35..363 was by the method without the maximum infomation component. As shown in the table, the lagest standad deviation of.89 was obtained by the method without the maximum infomation component. By the maximum infomation component, the standad deviation deceased fom.89 to.35, which was howeve the second lagest value. Thus, the pesent method poduced esults with lage standad deviation and these lage values can be deceased by the infomation maximization component. Howeve, by the pesent method, the standad deviation was still lage. Thus, it is necessay to examine why such lage standad deviation is poduced and to develop a method to stabilize the leaning by multi-layeed neual netwoks with SOM knowledge. 3.. Compaison of Infomation Incease Figue 8 shows the infomation incease by the method without the maximum infomation component. The infomation content inceased when the laye numbe inceased fom one to thee, and then it deceased when the laye numbe inceased fom fou and five. The infomation maximization component could incease infomation in spite of the tendency of infomation decease fo the highe layes. 3.3 Conclusion The pesent pape has shown that it is impotant to contol infomation content in taining multi-layeed neual netwoks. Linske stated that infomation content should be maximized fo each pocessing stage. Howeve, simple infomation maximization does not necessaily imply bette pefomance in multi-layeed neual netwoks. Infomation content is inceased appopiately fo each pocessing stage. Expeimental esults on the atificial data and spam data set showed that the appopiate contol of infomation incease was essential in inceasing bette genealization pefomance. One of the main poblems is that the pesent method sometimes tended to poduce the lage vaiances of esults. Thus, it is needed to develop a method to stabilize leaning. Though thee ae some poblems to be solved, the pesent esults cetainly show that the appopiate contol of infomation content is one of the most impotant

38 Int'l Conf. Atificial Intelligence ICAI'5 Infomation.8.6...8.6. 5th laye 3d laye th laye nd laye Genealization.3.3.3.8.6....8. st laye.5.5 (a) Infomation.6..5.5 (b) Genealization Fig. 7: Infomation and genealization eos by the method with the infomation maximization component fo the spam data set. Infomation.5..5..5.5.5 3 3.5.5 5 Numbe of layes Fig. 8: Infomation by the method without the infomation maximization component. factos in taining multi-layeed neual netwoks with SOM knowledge. Refeences [] R. Linske, Self-oganization in a peceptual netwok, Compute, vol., pp. 5 7, 988. [] R. Linske, How to geneate odeed maps by maximizing the mutual infomation between input and output, Neual Computation, vol., pp., 989. [3] R. Linske, Local synaptic ules suffice to maximize mutual infomation in a linea netwok, Neual Computation, vol., pp. 69 7, 99. [] R. Linske, Impoved local leaning ule fo infomation maximization and elated applications, Neual Netwoks, vol. 8, pp. 6 65, 5. [5] G. E. Hinton and R. R. Salakhutdinov, Reducing the dimensionality of data with neual netwoks, Science, vol. 33, no. 5786, pp. 5 57, 6. [6] G. Hinton, S. Osindeo, and Y.-W. Teh, A fast leaning algoithm fo deep belief nets, Neual computation, vol. 8, no. 7, pp. 57 55, 6. [7] G. E. Hinton, Leaning multiple layes of epesentation, Tends in cognitive sciences, vol., no., pp. 8 3, 7. [8] Y. Bengio, Leaning deep achitectues fo ai, Foundations and tends in Machine Leaning, vol., no., pp. 7, 9. [9] R. Kamimua and R. Kitajima, Som knowledge induced leaning with maximum infomation pinciple to impove multi-layeed neual netwoks, in Poceedings of computational intelligence confeences, 5. [] K. Bache and M. Lichman, UCI machine leaning epositoy, 3.