HYBRID MODELS BASED ON FUZZY LOGIC AND NEURAL NETWORKS

Size: px

Start display at page:

Download "HYBRID MODELS BASED ON FUZZY LOGIC AND NEURAL NETWORKS"

Marlene Morgan
5 years ago
Views:

1 CHAPTER 3 HYBRID MODELS BASED ON FUZZY LOGIC AND NEURAL NETWORKS 3.0 INTRODUCTION Hybrid systems clubbing two or more Machine Learning Algorithms such as Fuzzy Logic Artificial Neural Network and evolutionary computation are proved to be very effective in addressing life problems. Every intelligent technique has particular computational properties (e.g. ability to learn, explanation of decisions) that make them suited for solving a particular problem and cannot be generalised. For instance though ANN's are capable of handling patterns recognition, the method of classification in ANN cannot be explained while Fuzzy logic systems, which can handle imprecise information, explain the decisions and the rules on which the decisions are taken, cannot automatically acquire the rules they use to make those decisions. These limitations have been a central driving force behind the creation of intelligent hybrid systems where two or more techniques are combined in a manner that overcome the limitations of individual techniques and utilise their capabilities. (Castellano et all, 2007 ) The varied nature of application domains also demands the importance of constructing Hybrid systems. Many complex domains have many different component problems, each of which may require different types of processing. If there is a complex application which has two distinct sub-problems, of two different kinds then a neural network and an expert system respectively can be used for solving these separate tasks. The use of intelligent hybrid systems is with successful applications in many areas including medical diagnosis..( Fan et al 2011) This chapter explains in detail the various hybrid models which are developed using both neural network adaptive capabilities and Fuzzy logics qualitative approach and also describes.the fundamental concepts based on which these models have been developed. The efficiency of such hybrid model is illustrated through its application on a real life data set on diabetes. 45

2 3.1 FUZZY-NEURAL HYBRID SYSTEM While fuzzy logic provides an inference mechanism under cognitive uncertainty, computational neural networks offer exciting advantages, such as learning, adaptation, fault-tolerance, parallelism and generalization. To enable a system to deal with cognitive uncertainties in a manner more like humans, one may incorporate the concept of fuzzy logic into the neural networks (Fuller,1995). The hybridization of both paradigms yields the capabilities of learning, good interpretation and incorporating prior knowledge. The combination can be in different forms. The simplest form may be the concurrent neurofuzzy model, where a fuzzy system and a neural network work separately. The output of one system can be fed as the input of the other system..(zhang 2011) The cooperative neurofuzzy model corresponds to the case that one system is used to adapt the parameters of the other system. Neural networks can be used to learn the membership values for fuzzy systems, to construct IF-THEN rules or to construct a decision logic (Tran et al, 2003). The true synergy of the two paradigms is a hybrid neural/fuzzy system, which captures the merits of both the systems. It can be in the form of either a fuzzy neural network or a neurofuzzy system. A hybrid neural-fuzzy system does not use multiplication, addition, or the sigmoidal function. Alternatively, fuzzy logic operations such as t-norm and t-conorm are used. A fuzzy neural network is a neural network equipped with the capability of handling fuzzy information, and it is more like a neural network, where the input signals, activation functions, weights, and/or the operators are based on the fuzzy set theory..(pedrycz, 1993) Thus, symbolic structure is incorporated. The network can be represented in an equivalent rulebased format, where the premise is the concatenation of fuzzy AND and OR logic, and the consequence is the network output. Two types of fuzzy neurons, namely AND neuron and OR neuron, are defined. The NOT logic is integrated into the weights. Weights always have values in the interval [0, 1], and negative weight is achieved by using the NOT operator. The outputs of the AND and OR neurons are shown in Figure

3 Figure 3.1 Two types of fuzzy neurons a. AND neuron b. OR neuron A neurofuzzy system is a fuzzy system whose parameters are learned by a learning algorithm obtained from neural networks. It can always be interpreted as a system of fuzzy rules. Learning is used to adaptively adjust the rules in the rule base, and to produce or optimize the membership functions of a fuzzy system. A neurofuzzy system has a neural-network architecture constructed from fuzzy reasoning. Structured knowledge is codified as fuzzy rules, while the adapting and learning capabilities of neural networks are retained. Expert knowledge can increase learning speed and estimation accuracy. (Chitra & Seenivasagam, 2013) Both fuzzy neural networks and neurofuzzy systems can be treated as neural networks, where the units employ the t-norm or t-conorm operator instead of an activation function. The hidden layers represent fuzzy rules. Neurofuzzy systems can be obtained by representing some of the parameters of a neural network, such as the inputs, weights, outputs, and shift terms as continuous fuzzy numbers. When only the input is fuzzy, it is a Type-I neurofuzzy system. When everything except the input is fuzzy, we get a Type-II model. A Type-III model is defined as one where the inputs, weights, and shift terms are all fuzzy (Du and Swamy,2006) (Blum & Socha, 2005) 3.2 FUZZY NEURO SYSTEMS Fuzzy Neuron Definition: AND Fuzzy Neuron (Fuller,1995) The input signal x i and the corresponding weight w i are combined by a triangular conorm S to produce 47

4 p i = S(w i, x i ), i = 1, 2. (3.1) The input information p i is aggregated by a triangular norm T to produce the output y = AND(p 1, p 2 ) = T(p 1, p 2 ) = T(S(w 1, x 1 ), S(w 2, x 2 )) (3.2) of the neuron. So, if T = min and S = max then the AND neuron realizes the min-max composition y min w1 x1, w2 x2. (3.3) Figure 3.2 AND Fuzzy Neuron Definition: OR Fuzzy Neuron (Fuller,1995) The input signal x i and the corresponding weight w i are combined by a triangular norm T to produce p i =T (w i, x i ), i = 1, 2 (3.4) The input information p i is aggregated by a triangular norm T to produce the output y = OR(p 1, p 2 ) = S(p 1, p 2 ) = S(T(w 1, x 1 ), T(w 2, x 2 )) (3.5) of the neuron. So, if T = min and S = max then the AND neuron realizes the min-max composition 48

5 y min w1 x1, w2 x2. (3.6) Figure 3.3 OR Fuzzy Neuron Approximate Reasoning Based Intelligent Control Berenji s Approximate Reasoning.(Zhang 2011) is a Fuzzy Neuro system Based Intelligent Control(ARIC) architecture (Figure 3.4). It is a neural network model of a fuzzy controller and learns by updating its prediction of the physical system s behaviour and fine tunes a predefined control knowledge base. Figure 3.4 Berenji s ARIC Architecture ARIC consists of two coupled feed-forward neural networks, the Actionstate Evaluation Network (AEN) and the Action Selection Network (ASN). The ASN is a multilayer neural network representation of a fuzzy controller. It consists of two separated nets, where the first one is the fuzzy inference part and the second 49

6 one is a neural network that calculates p[t,t+1], a measure of confidence associated with the fuzzy inference value u(t+1), using the weights of time t and the system state of time t+1..(zhang 2011) A stochastic modifier combines the recommended control value u(t) of the fuzzy inference part and the so called probability value p and determines the final output value t out, pt, t1 u (3.7) of the ASN. The hidden units z i of the fuzzy inference network represent the fuzzy rules, the input units x j the rule antecedents, and the output unit u represents the control action, that is the defuzzified combination of the conclusions of all rules (output of hidden units). In the input layer the system state variables are fuzzified. Only monotonic membership functions are used in ARIC, and the fuzzy labels used in the control rules are adjusted locally within each rule. The membership values of the antecedents of a rule are then multiplied by weights attached to the connection of the input unit to the hidden unit. The minimum of those values is its final input. In each hidden unit a special monotonic membership function representing the conclusion of the rule is stored. Because of the monotonicity of this function the crisp output value belonging to the minimum membership value can be easily calculated by the inverse function. This value is multiplied with the weight of the connection from the hidden unit to the output unit. The output value is then calculated as a weighted average of all rule conclusions. The AEN tries to predict the system behavior. It is a feed-forward neural network with one hidden layer that receives the system state as its input and an error signal r from the physical system as additional information. The output vt t, of the network is viewed as a prediction of future reinforcement, that depends of the weights of time t and the system state of time t, where t may be t or t+1. Better states are characterized by higher reinforcements. The weight changes are determined by a reinforcement procedure that uses the output of the ASN and the AEN. The ARIC architecture was applied to cart-pole balancing and it was shown that the system is able to solve this task (Berenji,1992, Fuller,1995). 50

7 3.3 NEUROFUZZY MODELS The ANFIS Model Architecture of the ANFIS Model : The ANFIS is a well-known neurofuzzy model. The ANFIS model,.( Jang & Mizutani (1996) shown in Figure 3.5 has a six-layer (n-nk-k-k-k-1) architecture, and is a graphical representation of the TSK model. Figure 3.5 ANFIS: Graphical representation of the TSK Model The functions of the various layers are given below. 1. Layer 1 is the input layer with n nodes. 2. Layer 2 has nk nodes, each outputting the membership value of the i th antecedent of the j th rule 2 ij A i j x i (3.8) i for i = 1,, n, j = 1,...,K, where A j defines a partition of the space of x i, and x Ai i is typically selected as a generalized bell MF defined by j i i i A i x i x i ; c j, a j, b j (3.9) j 51

8 The parameters c i i i j, a j and b j are referred to as premise parameters. 3. Layer 3 has K fuzzy neurons with the product t-norm as the aggregation operator. Each node corresponds to a rule, and the output of the j th neuron determines the degree of fulfilment of the j th rule 3 o j x (3.10) n i1 A i j i for j=1,2,3...k 4. Each neuron in layer 4 performs normalisation, and the outputs are called normalised firing strengths 4 o j o3 j K k k1 3 o (3.11) for j=1,2,3...k. 5. The output of each node in layer 5 is defined 4 f x o 5 j o j j (3.12) for j=1,2,3...k, where f j (.) is given for the j th are referred to as consequent parameters. node in layer 5.Parameters in f j (x) 6. The outputs of layer 5 are summed and the output of the network gives the TSK model K o6 5 j o j (3.13) j1 52

9 Learning of the ANFIS Model : In the ANFIS model, functions used at all the nodes are differentiable, thus the BP algorithm can be used to train the network. Each MF j is specified by a predefined shape and its corresponding shape A i parameters. The shape parameters are adjusted by a learning algorithm using a sample set of size N, {(x t, y t )}. For nonlinear modeling, the effectiveness of the model is dependent on the MFs used. The TSK fuzzy rules are employed in the ANFIS model for i = 1,,K, where A A 1, A2, A3..., An a j n Ri: IF x is Ai, THEN y = f i (x) = ai 0 j 1, j x j a i, i i i i i are fuzzy sets and i,, j 0,1,2... n are consequent parameters. The output of the network at time t is thus given by yˆ t K Ai t i1 K A i i1 x f x x t i (3.14) where Ai n n j 1 j t, j t, j A j 1 A i i j x x x. t Accordingly, the error measure at time t is defined by Et 1. yˆ t yt 2 (3.15) 2 After the rule base is specified, the ANFIS adjusts only the MFs of the antecedents and the consequent parameters. The BP algorithm can be used to train both the premise and consequent parameters. A more efficient procedure is to learn the premise parameters by the BP, but to learn the linear consequent parameters a i,j 53

10 by the RLS method (Jang,1993). The learning rate η can be adaptively adjusted by using a heuristic, for example, increase η by 10% if the error decreases in four consecutive steps, and decrease η by 10% if the error is subject to consecutive combinations of increase and decrease. It is reported in (Jang,1993) that this hybrid learning method provides better results than the MLP trained by the BP method and the cascade-correlation network (Fritzke, B. (1994).) Second-order methods are also applied for the training of the ANFIS. The LM method is used for ANFIS training. (Jang,1996), Compared to the hybrid method, the LM method achieves a better precision, but the interpretability of the final MFs is quite weak. RLS methods are used to learn the premise parameters and the consequent parameters, respectively.( Liu, Tseng (1999) The ANFIS model has been generalized for classification by employing parameterized t-norms (Sun,1994). The generalized model employs tree partitioning for structure identification and the Kalman filtering method for parameter learning. The ANFIS is attractive for applications in view of its network structure and the standard learning algorithm. Training of the ANFIS follows the spirit of the minimal disturbance principle and is thus more efficient than the MLP (Jang and Sun,1995). However, the ANFIS is computationally expensive due to the curse-of dimensionality problem arising from grid partitioning. Constraints on MFs and initialization using prior knowledge cannot be provided to the ANFIS model due to the learning procedure. The learning results may be difficult to interpret. Thus, the ANFIS model is suitable for applications, where performance is more important than interpretation. (Du and Swamy 2006 ) In order to preserve the plausibility of the ANFIS, one can add some regularization terms to the cost function so that some constraints on the interpretability are considered (Jang and Sun,1995). Tree or scattering partitioning can resolve the problem of the curse of dimensionality, but leads to a reduction in the interpretability of the generated rules. 54

11 Generalization of the ANFIS Model The ANFIS has been extended to the coactive ANFIS (Mizutani and Jang,1995) and to the generalized ANFIS (Azeem et al.,2000). The coactive ANFIS is a generalization of the ANFIS by introducing nonlinearity into the TSK rules. The generalized ANFIS is based on a generalization of the TSK model and a generalized Gaussian RBFN. The generalized fuzzy model is trained by using the generalized RBFN model, based on the functional equivalence between the two models. The sigmoid-anfis (Zhang et al.,2004) is a special form of the ANFIS, where only sigmoidal MFs are employed in this model. The sigmoid-anfis is a combination of the additive TSK-type MLP and the additive TSK-type FIS. The additive TSK-type MLP, as an extended model of the MLP, is proved to be functionally equivalent to the TSK-type FIS and to be a universal approximator (Zhang et al.,2004). The sigmoid-anfis model adopts the interactive-or operator (Benitez et al.,1997) as its fuzzy connectives. The gradient-descent algorithm can also be directly applied to TSK model without representing it in a network structure (Nomura et al.,1992). The unfolding-in-time is a method to transform an RNN into an FNN so that the BP algorithm can be used. The ANFIS-unfolded-in-time (Yilmaz et al.,2004) is a method that duplicates the ANFIS T times to integrate temporal information, where T is the number of time intervals needed in the specific problem. The ANFISunfolded-in-time is designed for prediction of time series data. Simulation results show that the recognition error is much smaller in the ANFIS-unfolded-in-time compared to that in the ANFIS Generic Fuzzy Perceptron The generic fuzzy perceptron (GFP)) (Nauck et al.,1997) has a structure similar to that of the three-layer MLP. In contrast to the MLP, the network inputs and the weights are modeled as fuzzy sets, and t-norm or t-conorm is used as the activation at each unit. The hidden layer acts as the rule layer. The output units usually use a defuzzufication function. The GFP can interpret its structure in the form of linguistic rules and the structure of the GFP can be treated as a linguistic rule base, where the weights between the input and hidden (rule) layers are called 55

12 fuzzy antecedent weights and the weights between the hidden (rule) and output layers fuzzy consequent weights. The GFP model is based on the Mamdani model. Based on the GFP model, there are three fuzzy models, namely, neurofuzzy controller (NEFCON) ( Nauck & Kruse (1992) Nauck (1997) Nurnberger et al (1999) neurofuzzy classification (NEFCLASS) and neuronfuzzy function approximation (NEFPROX) )( Nauck et al.,1997).due to the use of non-differentiable t-norm and t- conorm, the gradient-descent method cannot be applied. A set of linguistic rules are used for describing the performance of the models. This knowledge-based fuzzy error is independent of the range of the output value. Learning algorithms for all these models are derived from the fuzzy error using simple heuristics. Initial fuzzy partitions are needed to be specified for each input variable. Some connections that have identical linguistic values are forced to have the same weights so as to keep the interpretability. Prior knowledge can be integrated in the form of fuzzy rules to initialize the neurofuzzy systems, and the remaining rules are obtained by learning. Both the fuzzy sets and the fuzzy rules can be learned, and learning results in shifting the MFs and changing their supports. The NEFCON has a single output node, and is used for control. A reinforcement learning algorithm is used for online learning. The NEFCON tries to integrate as many rules known a priori as possible. The NEFCLASS and the NEFPROX can learn rules by using supervised learning instead of reinforcement learning. The rule base of a NEFCLASS system approximates an unknown function that represents the classification problem and maps an input to its class. Compared to neural networks, the NEFCLASS uses a much simpler learning strategy, where no clustering is involved in finding the rules. The NEFCLASS does not use MFs in the rules consequents. The NEFPROX is similar to the NEFCON and the NEFCLASS, but it is more general. As an example of the GFP model, the architecture of the NEFPROX is shown in Figure 3.6. If there is no prior knowledge, a NEFPROX system can be started with no hidden unit and rules can be incrementally learned. If the learning algorithm creates too many rules, only the best rules are kept by evaluating 56

13 individual rule errors. Each rule represents a number of samples of the unknown function in the form of a fuzzy sample. Parameter learning is used to compensate for the error caused by rule removing. Figure 3.6 Architecture of NEFPROX The NEFPROX is more important for function approximation. An empirical performance comparison between the ANFIS and the NEFPROX has been made in (Nauck et al.,1997). For the problems considered therein, the NEFPROX is an order of magnitude faster than the ANFIS model of (Jang,1993), but with a higher approximation error. This is due to the fact that the ANFIS uses a complex learning algorithm, which results in a longer computation time and small approximation error. Interpretation of the learning result is difficult for both the ANFIS and the NEFPROX: the ANFIS represents a TSK system, while the NEFPROX represents a Mamdani system with too many rules. To increase the interpretability of the NEFPROX, pruning strategies can be employed to reduce the number of rules.(du and Swami,2006) 3.4 OTHER NEURO FUZZY MODELS Neurofuzzy systems can employ network topologies similar to those of the layered FNN architecture, ( Hayashi et al (1993), Juang & Lin (1998) Chow et al (1999), Denoeux & Masson (2004) the RBFN model (Akhmetov et all (2001), Wu & Er MJ (2000), Pedrycz et al (2005) the SOM model (Vuorimaa,1994), and the RNN architecture ( Lin & Wai (2001) Liu (2000) ) 57

14 Neurofuzzy models typically have a layered FNN architecture and are based on TSK-type FISs. Neurofuzzy systems are usually trained by using the gradientdescent method (Kim & Kasabov (1999) Lin & Lu (1996) Ishibuchi et al (1993), Hayashi et al (1993), Pal & Mitra (1992), Liu & Li (2004) Tsekouras et al (2004)). Gradient descent in this case is sometimes termed as the fuzzy BP algorithm. CG algorithms are also used for training a neurofuzzy systems (Liu and Li,2004). Based on the fuzzification of the linear auto associative neural networks, the fuzzy PCA (Denoeux and Masson,2004) can extract a number of relevant features from highdimensional fuzzy data. A typical architecture of a neurofuzzy system includes an input layer, an output layer, and several hidden layers. The weights are fuzzy sets, and the neurons apply t-norm or t-conorm operations. The hidden layers are usually used as rule layers. The layers before the rule layers perform as premise layers, while those after perform as consequent layers.(du and Swamy 2006). 3.5 CLASSIFIERS The most significant application for Machine Learning is Data Mining. The data mining techniques find out the relationship between the multiple features of the input-output variables. The success of ML is assessed by the efficiency of the techniques as classifiers. Every instance in any data set used by ML algorithms represented using a set of features which remains the same for all data in the database. The features may be continuous, categorical or binary. If instances are given with the corresponding outputs (labels) then the learning is called supervised learning and in unsupervised learning the instances are unlabeled. The findings in the data set of unsupervised learning is to discover useful classes or to find the clusters of the data set (Kotseantis and Pintelas,2004).The classifier can be categorized into the groups as learning methods and rule between classifiers. Rule based classifiers are important to generate decision rules that have high predictability. Perceptron based techniques can only classify linearly separable set of records. If the instances or records or not linearly separable, classification is not done properly and hence Artificial Neural Network is used for classification. In the following section, the classifiers that are used in this research work have been discussed. 58

15 3.5.1 Fuzzy Classifier System (FCS) FCS learns by creating fuzzy rules which relates the value of the input variables to output variables. It has credit assignment mechanisms which also resemble those of common classifier systems, but with a fuzzy nature (Rendon, 2004).It represents its fuzzy rules as binary strings and allows inputs, outputs and internal variables to take continuous values over given range similar to fuzzy controllers. The FCS represents all its knowledge by means of fuzzy rules. The FCS operates over variables (input, output, internal) and generates outputs according to Fuzzy Rules. These rules are represented as Binary Strings that encode the membership functions of the fuzzy sets involved in the fuzzy rules. FCS linearly maps all variables to the range [0,1]. For each variable, n component fuzzy sets are defined by the user depending on the precision required. The FCS must translate the messages referring to output variable into real values. This process of defuzzification is accomplished in the output unit by decomposing each output message with its corresponding minimal messages. The activity level of minimum messages corresponding to the same variable and component sets are added up and the messages are substituted by a single message. For each output variable a fuzzy union is performed over the component sets represented by the minimal message multiplied by their activity levels. Then the gravity center of the union is taken and this gravity center is its output value (Procyk and Mamdani, 1979) k Nearest Neighbour (knn) knn is based on the principle that the instances within a dataset will generally exist in a close proximity to other instances that have similar properties. If the instances are tagged with a classification label, then the value of unclassified instances can be determined by observing the class of its nearest neighbours. The knn locates the k nearest instances to the query instance and determines its class by identifying the single most frequent class label. The instances can be considered as points within an n dimensional space where each of the n dimensions correspond to one of the n features that are used to describe an instance. The absolute position 59

16 of the instances within this space is not significant as the relative distance between instances. This relative distance is determined by using a distance metric. The distance metric must minimize the distance between two similarly classified instances, while maximizing the distance between instances of different classes. Though the knn has been demonstrated in a number of real life applications, the major disadvantage is it takes large computational time for classification, requires large storage and is also sensitive to the choice of the similarity function that is used to compare distances (Kotseantis and Pintelas, 2004) Bayesian Network A Bayesian Network (BN) is a graphical model for probability relation among a set of features. The Bayesian Network structure S is a Directed Acylic Graph (DAG) and the nodes in S are in one to one correspondence with the features X. A feature (node) is conditionally independent from its non-dependents given its parents (X i is conditionally independent from X 2 given X 3 if P( X 1 X 2 X 3 ) =P( X 1 X 3 ) for all possible values of X 1, X 2, X 3. The task of learning a Bayesian Network can be divided into two tasks; one is learning the DAG structure of the network and then determination of its parents. A BN structure can also be found by learning conditional independence relationships amongst the features of the data set. These algorithms are constraint based algorithms. The most important feature of BN compared to ANN is the possibility of taking into account prior information about a given problem in terms of structural relationship among its features and BN classifiers are not suitable for data sets with many features while the major advantage of Naïve Bayes classifier is its short computational time for training (Kotseantis and Pintelas,2004) Support Vector Machines Support vector machines (SVM) represent data points in an N dimensional feature space. The SVM is a binary classifier that marks each data point into one of the two categories. Each category is grouped from the other as far as possible. An SVM basically creates a hyper plane dividing the two classes in an N dimensional 60

17 space that is used for classification (Figure 4.3). More than one or multiple classifiers can be built for a set of given data. The best hyper plane is the one that separates the classes as far as possible or one that maximizes the margin between the two data classes. The SVM takes in many parameters like the type of kernel function, slack constant, etc. Slack constant defines how many wrong classified samples can be misclassified in the other class. The classifier also uses several kernels like polynomial, radial basis function (RBF) and quadratic functions. The polynomial kernel needs the degree of the polynomial as input and the RBF would need a sigma value as input to name a few. SVM classifier can be more flexibly used by inputting the optimization method too Cheng & Cheng, 2011). Fig 3.7 An Illustration of the hyper plane in two dimensional feature space of SVM Support Vector Machine estimates the hyper plane from the support vectors (SVs). The SVs are individuals from each class or population that are chosen as representative of their class, so that they carry all relevant information about the classification problem. The SVs are also chosen so as to minimize their distance to the hyper plane (Sun et al,2002). 3.6 APPLICATION TO DIABETES DATA To prove the efficacy of the hybrid techniques developed using soft computing methods for disease classification, the methods have been tried to classify the same dataset. In the first part, the principal constituents of soft computing methods namely, Fuzzy Logic and Artificial Neural Network are used 61

18 as classifiers for classification of the dataset. Besides these techniques, the Baye s classifier, k Nearest Neighborhood and Support Vector Machine have also been tried for classification of the chosen dataset. Finally, ANFIS which is a fusion of ANN and FL, is applied on the dataset as a classifier. MATLAB software of version (R2011a) is used for the purpose of classification using ANN, SVM, knn and Baye s classifier. WEKA software is used for the application of FL. As a case study, the PIMA Indian Diabetes Dataset from the UCI Depository of Machine learning (ML) Data base is taken.. The objective is to prove the efficiency of the hybrid model formed by clubbing two of the Machine Learning techniques Fuzzy Logic and Artificial Neural Network PIMA Indian Diabetes Data base This data base consists of 768 records. This is a collection of all female patients of at least 21 years old of PIMA Indians. The record has information on eight condition attributes which are either integer or real values and also has a binary decision attribute that reveals the presence or absence of diabetes. The Eight conditional attributes included for study are, the number of times pregnant, Plasma Glucose concentration after two hours in oral Glucose Tolerance Test, Diastolic blood pressure(mmhg),triceps skin fold thickness (mm),two hours serum insulin(mu/u/ml), Body Mass Index(wt in kg/(height in m)**2), Diabetes pedigree function and Age(in years) The data set is partitioned into training set, test set and validation set as given in Table 3.1. Table 3.1 Data Partition Set S. No. Data Set Recorder % 1. Training Set Validation Set Test Set TOTAL

19 3.6.2 Performance evaluation methods A number of useful performance metrics such as classification accuracy sensitivity, specificity and the area under the Receiver Operating Characteristic Curve (ROC) are computed. The results are analysed and the performance of the techniques have been compared. Classification accuracy : `Classification accuracy' is the most common evaluation method to measure the performance of the classifiers. Most researchers use it as the main parameter of criteria to evaluate the performance of their techniques. The higher the classification accuracy, the better the classifier is. In this study, the classification accuracy is calculated by using the following formula: Classification accuracy = (TP + TN) / Total No. Of pattern Where TP : No. Of true positives and TN : No. Of true negatives Performance matrix : A performance matrix represents information about actual and classified cases produced by a classification system. Performance of such system is commonly evaluated by demonstrating the correct and incorrect patterns. The typical construction of the performance matrix for the two class problem is presented in Table 3.2 Table 3.2 Performance Matrix Test Positive Actual Negative Positive TP FP Negative FN TN Sensitivity :The sensitivity measure was first introduced in medical domain to ensure the test ability of the classifier.(verma et al,2010). Sensitivity regards only positive cases, for instance, it can be used to find patients with observed final 63

20 diagnosis. Sensitivity is computed as a number of true positive decisions over a number of actual positive cases. That is, the sensitivity is the probability of correctly predicting the positive cases. Sensitivity = TP/(TP +FN) where TP = No. of True Positive cases and FN = No. of False Negative cases Specificity : The specificity measure was also established in medical domain and computed in the same fashion as sensitivity.(verma et al,2010).. The difference is that it deals only with negative cases.. The specificity can be calculated as the number of true negative decisions over number of actual negative cases. That is the specificity is the probability correctly predicting the negative cases Specificity= TN/(FP +TN) cases. where TN = No. of True Negative cases and FP = No. of False Positive Receiver Operating Characteristic (ROC) : The classification potential in discriminating the objects in different decision classes is measured in the form of ROC cure or as an approximation of the Area Under the Curve (AUC). The ROC is a plot of sensitivity verses (1-specificity) in the range of 0 to 1, where straight line from (0,0) to (1,1) represents no discriminating power and a line from (0,1) to (1,1) represents perfect discrimination. 3.7 ARTIFICIAL NEURAL NETWORK (ANN) MODEL A Multilayer perception Model of ANN using Sigmoid transfer function was constructed for classification. The ANN is trained with Back Propagation Algorithm for minimizing the error in each training epoch. The performance of the ANN is assessed by the test data set. The number of hidden layer used is only one and the number of nodes in the hidden layer was optimized based on the mean square error obtained from the test data set. The diagnostic variables were fed into the network and the output was calculated. 64

Output values given by the ANN is compared with the target value and the difference is the error. If there are n records used for testing the ANN, Error = (3.

21 Output values given by the ANN is compared with the target value and the difference is the error. If there are n records used for testing the ANN, Error = (3.16) where t k and y k are the target and output values. Error minimization is achieved using gradient descent algorithm Experimental set up for ANN The parameters for developing the ANN Model are chosen as follows: No. of Hidden layers used : 1 Number of output nodes : 1 Number of input nodes : 8 Hidden layer activation function Output layer activation function : sigmoidal : purelin Achieved MSE error : No. of training samples : 537 No. of testing samples : 154 No. of validating sample : 77 Fig. 3.8 MSE OF ANN MODEL 65

22 The architecture of the ANN Model is shown in figure 3.9 Input Layer Hidden Layer Output Layer Figure 3.9 ANN architecture The 8 nodes in the input layer represents 8 input features 1 Number of times pregnant 2 Plasma glucose concentration 3 Diastolic blood pressure 4 Triceps skin fold thickness 5 2 hour serum insulin 6 Body Mass Index 7 Diabetes pedigree function 8 Age The output layer represents either diabetic or non diabetic case Though ANN learns through adaptive learning, its performance can be improved through the proper designing of its learning parameters viz. number of hidden layers and number of nodes in the hidden layers and activation functions that are used for designing the architecture of ANN. The architecture of ANN

23 One hidden layer is used for designing the ANN and the optimum number of nodes for the hidden layer is found to be 10. Ten fold cross validation is applied and the average accuracy is taken as the classifier performance. Number of nodes Table 3.3 Performance of the ANN for varying number of nodes Classification Accuracy MSE Sensitivity Specificity Precision FUZZY LOGIC MODEL For Fuzzy Logic classification, the novel Fuzzy Rule based classification method, Fuzzy Unordered Rule Induction Algorithm (FURIA) has been used. FURIA rules are the extended algorithm of the well-known RIPPER algorithm. (Cohen 2002 ). FURIA algorithm learns fuzzy rules instead of conventional rules and unordered rules instead of rule lists and it extends an efficient rule stretching methods for dealing with uncovered examples (Huhn and Hullermer,2006). 3.9 ANFIS MODEL ANFIS is a combination Neural Networks and Fuzzy System. In this approach Neural networks is used to determine the parameters of the Fuzzy System. The Fuzzy System is automatically improved through the Artificial Neural Network. ANFIS has shown significant results in modelling nonlinear functions. In ANFIS, the membership function parameters are extracted from a dataset that describes the system behaviour. The ANFIS learns features in the data set and adjusts the system parameters according to a given error criterion( Jang,1993). The ANFIS presented in this study was trained with the back propagation gradient descent method in combination with the least squares method. The ANFIS classifier was then trained validated and tested. The eight input variables are fuzzified using linguistic variables. The configuration of the eight input variables fed into the ANFIS system and the corresponding Mean Squared Error of ANFIS Model is presented in Table

24 Table 3.4 Configuration of ANFIS S.No Configuration Error RESULTS The performance matrices showing the performances of the Fuzzy, ANN and ANFIS models are shown in tables 3.5, 3.6, 3.7, 3.8, 3.9 & 3.10 respectively. Table 3.5 Performance Matrix for ANN Model Predicted Target Class Table 3.6 Performance Matrix for FL Model Predicted Target Class 68

25 Table 3.7 Performance Matrix for ANFIS Model Predicted Target Class Table 3.8 Performance Matrix for SVM Model Predicted Target Class Table 3.9 Performance Matrix for Baye s Model Predicted Target Class Table 3.10 Performance Matrix for knn Model Predicted Target Class 69

26 in Table 3.11 The performances of the all the classification techniques have been presented Table 3.11 Performance Of Different Classifiers On The PIMA Indian Diabetes Data Set Classifier Sensitivity Specificity Precision Classification Accuracy ANN Fuzzy Logic ANFIS SVM Baye s knn The performance curves of the ANN and the ANFIS models are shown in figures 3.10 and 3.11 and the ROC curves of the two models are shown in figures 3.12 and Figure 3.10 Performance of ANN Figure 3.11 Performance of ANFIS 70

27 Figure 3.12 ROC Curve for ANN Figure 3.13 ROC Curve for ANFIS 71

28 3.11 CONCLUSION The soft computing techniques fuzzy logic, artificial neural network and the hybrid technique ANFIS which is a combination of fuzzy logic and neural network are applied for the classification of PIMA diabetes data set. Besides the soft computing method, Machine Learning techniques SVM, Baye's classifier and knn have been used for classification of PIMA Indian Diabetes Data Set. The hybrid model ANFIS out performs the other models. The soft computing methods perform better than the other Machine Learning techniques for disease classification. 72

Unit V. Neural Fuzzy System

Unit V. Neural Fuzzy System Unit V Neural Fuzzy System 1 Fuzzy Set In the classical set, its characteristic function assigns a value of either 1 or 0 to each individual in the universal set, There by discriminating between members