Neural Networks Lesson 9 - Fuzzy Logic

Neural Networks Lesson 9 - Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 26 November 2009 M. Scarpiniti Neural Networks Lesson 9-1 / 37

1 2 Fuzzy Clustering M. Scarpiniti Neural Networks Lesson 9-2 / 37

M. Scarpiniti Neural Networks Lesson 9-3 / 37

to Fuzzy logic is a form of multi-valued logic to deal with reasoning that is approximate rather than precise. Fuzzy logic emerged as a consequence of the 1965 proposal of fuzzy set theory by Lotfi Zadeh. Though fuzzy logic has been applied to many fields, from control theory to artificial intelligence, it still remains controversial among most statisticians who prefer Bayesian logic, and some engineers who prefer traditional two-valued logic. A logic based on the two truth values True and False is sometimes inadequate when describing human reasoning. Fuzzy logic uses the whole interval between 0 (False) and 1 (True) to describe human reasoning. M. Scarpiniti Neural Networks Lesson 9-4 / 37

Fuzzy sets Fuzzy sets are a further development of the mathematical concept of a set. A set is any collection of objects which can be treated as a whole. Cantor described a set by its members, such that an item from a given universe is either a member or not. A set can be specified by its members, they characterize a set completely. The list of members A = {0, 1, 2, 3}. Nobody can list all elements of an infinite set, we must instead state some property which characterizes the elements in the set, for instance the predicate x > 10. Following Zadeh many sets have more than an either-or criterion for membership. Take for example the set of young people. A one year old baby will clearly be a member of the set, and a 100 years old person will not be a member of this set, but what about people at the age of 20, 30, or 40 years? Another example is a weather report regarding high temperatures, strong winds, or nice days. Zadeh proposed a grade of membership, such that the transition from membership to non-membership is gradual rather than abrupt. M. Scarpiniti Neural Networks Lesson 9-5 / 37

Fuzzy sets The grade of membership for all its members thus describes a fuzzy set. An item s grade of membership is normally a real number between 0 and 1, often denoted µ. The higher the number, the higher the membership. Zadeh regards Cantor s set as a special case where elements have full membership µ = 1. He nevertheless called Cantor s sets nonfuzzy, but today the term crisp set is used. Note that a fuzzy membership function is different from a statistical probability distribution. A possible event does not imply that it is probable. However, if it is probable it must also be possible. You might view a fuzzy membership function as your personal distribution, in contrast with a statistical distribution based on observations. M. Scarpiniti Neural Networks Lesson 9-6 / 37

Fuzzy sets Notice that Zadeh does not give a formal basis for how to determine the grade of membership. The membership for a 50 year old in the set young depends on one s own view. The grade of membership is a precise, but subjective measure that depends on the context. Elements of a fuzzy set are taken from a universe. The universe contains all elements that can come into consideration and it depends on the context. For example the number between 0 and 100, or, for non-numerical quantity, bitter, sweet, sour, salt, hot,... M. Scarpiniti Neural Networks Lesson 9-7 / 37

Membership function Every element in the universe of discourse is a member of the fuzzy set to some grade, maybe even zero. The set of elements that have a non-zero membership is called the support of the fuzzy set. The function that ties a number to each element x of the universe is called the membership function µ(x). There are two alternative ways to represent a membership function: continuous or discrete. In the continuous form the membership function is a mathematical function. A membership function is for example bell-shaped (also called a π-curve), s-shaped (called an s-curve), a reverse s-curve (called z-curve), gaussian, triangular, or trapezoidal. In the discrete form the membership function and the universe are discrete points in a list (vector). Sometimes it can be more convenient with a sampled (discrete) representation. A fuzzy set is normalized if its largest membership value equals 1. M. Scarpiniti Neural Networks Lesson 9-8 / 37

Membership function: example Following some well-known membership functions are reported. Triangular: µ(x) = max ( ( x a min b a, c x ) ), 0 c b Trapezoidal: ( ( x a µ(x) = max min b a, d x ) ), 0 d b M. Scarpiniti Neural Networks Lesson 9-9 / 37

Membership function Other well-known membership functions are reported. Gaussian: ) (x a)2 µ(x) = exp ( 2b 2 Bell-shaped: µ(x) = 1 1 + x a c 2b M. Scarpiniti Neural Networks Lesson 9-10 / 37

Singletons A fuzzy set A is a collection of ordered pairs A = {(x, µ(x))} Item x belongs to the universe and µ(x) is its grade of membership in A. A single pair (x, µ(x)) is called a fuzzy singleton. Thus the whole set can be viewed as the union of its constituent singletons. It is often convenient to think of a set A just as a vector a = [µ(x 1 ), µ(x 2 ),..., µ(x n )] M. Scarpiniti Neural Networks Lesson 9-11 / 37

Linguistic variables Just like an algebraic variable takes numbers as values, a linguistic variable takes words or sentences as values. The set of values that it can take is called its term set. Each value in the term set is a fuzzy variable defined over a base variable. The base variable defines the universe of discourse for all the fuzzy variables in the term set. In short, the hierarchy is as follows: linguistic variable fuzzy variable base variable. Example Let x be a linguistic variable with the label Age. Terms of this linguistic variable are from the term set T = {Old, VeryOld, NotSoOld, MoreOrLessYoung, QuiteYoung, Young, VeryYoung} Each term is a fuzzy variable defined on the base variable, which might be the scale from 0 to 100 years. A primary term is a term or a set that must be defined a priori, for example Young and Old, whereas the sets VeryYoung and NotYoung are modified. M. Scarpiniti Neural Networks Lesson 9-12 / 37

Linguistic variables: an example Given the set X = {1.5, 1.6, 1.75, 1.8, 2}, we can define the following characterization: 1 short = {1.5/1, 1.6/0.8, 1.75/0.5, 1.8/1, 2/0}; 2 normal = {1.5/0, 1.6/0.5, 1.75/1, 1.8/1, 2/0}; 3 tall = {1.5/0, 1.6/0.2, 1.75/0.5, 1.8/0.8, 2/1}; M. Scarpiniti Neural Networks Lesson 9-13 / 37

Operations on Fuzzy sets Like for the traditional sets, some elementary operations can be developed for fuzzy sets too. Complement: µ Ā (x) = 1 µ A (x) Intersection: µ A B (x) = min (µ A (x), µ B (x)) Union: µ A B (x) = max (µ A (x), µ B (x)) M. Scarpiniti Neural Networks Lesson 9-14 / 37

Fuzzy logic The fuzzy logic invalidate two of the strongholds of the classical logic: 1 Law (or principle) of the excluded third (or Tertium non datur in Latin): ( ) A Ā = True 2 Principle of contradiction (or principium contradictionis in Latin): ( ) A Ā = False In this way the reasoning for fuzzy logic is different from the classical one: Usually natural and artificial languages have some rules, of the type: if then For example: if it is raining then we get wet; if the pressure is high then the volume is small. M. Scarpiniti Neural Networks Lesson 9-15 / 37

Fuzzy reasoning In the fuzzy reasoning the linguistic variable contained in a rule, are defined by fuzzy sets: if x is A then y is B A and B are fuzzy sets defined on the universe X and Y ; the values x X and y Y ; the fuzzy rule defines a relation R on the space X Y ; a fuzzy relation is a fuzzy set defined on several domains: R = X Y Z... µ R (x, y, z,...) For example: binary relation R = A B µ R (x, y) = min (µ A (x), µ B (y)) M. Scarpiniti Neural Networks Lesson 9-16 / 37

Fuzzy rules A fuzzy rule is defined as a conditional statement in two forms: 1 Mamdami rules: IF x 1 is A 1 and x 2 is A 2 and... and x N is A N THEN y 1 is B 1 and y 2 is B 2 and... and y M is B M 2 Takagi-Sugeno rules: IF x 1 is A 1 and x 2 is A 2 and... and x N is A N THEN y = f k (x 1, x 2,..., x N ) The part included between the words IF and THEN is called antecedent, the last part, after the word THEN, is called consequent. The single assignment x i is A i or y i is B i is called atom. Example IF temp IS HIGH AND press IS LOW THEN lever IS UP M. Scarpiniti Neural Networks Lesson 9-17 / 37

Inference principle: modus ponens and modus tollens Given two domains X and Y and a fuzzy relation R, and let us pose the fuzzy sets A defined on X and B defined on Y, then how can I produce a fuzzy reasoning? Classical logic uses two fundamental inference principles: Modus Ponens Premise: if (x is A) then (y is B) Antecedent: x is A Consequent: y is B Modus Tollens Premise: if (x is A) then (y is B) Antecedent: y is not B Consequent: x is not A M. Scarpiniti Neural Networks Lesson 9-18 / 37

Fuzzy inference principle: generalized modus ponens The modus ponens from standard logical propositional calculus cannot be used in the fuzzy logic environment causing such an inference can take place if, and only if, the fact or premise is exactly the same as the antecedent of the IF-THEN rule. In fuzzy logic the generalized modus ponens is used. It allows an inference when the fact is only partly known or when the fact is only similar but not equal to it. Generalised Modus Ponens (for fuzzy logic) Premise: if (x is A) then (y is B) Antecedent: x is A Consequent: y is B µ B (y) = max (min (µ A (x), µ R(x, y))) X Example Premise: if the tomato is red then it is sweet, possibly sweet-sour, and likely to be sour. Antecedent: The tomato is more or less red (µ Red = 0.8). Consequent: Taste =? M. Scarpiniti Neural Networks Lesson 9-19 / 37

Fuzzy inference principle: a first example As first example we show an inference with one rule and one atom: µ C (y) = max (min (µ A (x), µ R(x, y))) X where µ R (x, y) = min (µ A (x), µ C (y)). Hence, we have µ C (y) = max X (min (µ A (x), µ A (x), µ C (y))) = = min {max X (min (µ A (x), µ A (x))), µ C (y)} = min (q 0, µ C (y)) M. Scarpiniti Neural Networks Lesson 9-20 / 37

Fuzzy inference principle: a second example As a second example we show an inference with one rule and two atoms: µ C (y) = max X (min (µ A (x 1 ), µ B (x 2 ), µ R (x 1, x 2, y))) = = max X (min (µ A (x 1 ), µ B (x 2 ), µ A (x 1 ), µ B (x 2 ), µ C (y))) = = min {max X1 (min (µ A (x 1 ), µ A (x 1 ))), max X2 (min (µ B (x 2 ), µ B (x 2 ))), µ C (y) = min (q x10, q x20, µ C (y)) M. Scarpiniti Neural Networks Lesson 9-21 / 37

Fuzzy inference principle: a third example As a third example we show an inference with two rules and two atoms each: M. Scarpiniti Neural Networks Lesson 9-22 / 37

Defuzzification Defuzzification is the process of producing a quantifiable result in fuzzy logic. A fuzzy quantity is converted into a crisp one (a single number). Defuzzifier is the system implementing the defuzzification. Several methods are implemented for defuzzification. 1 Centroid point: x x C = xµ(x)dx x µ(x)dx 2 Mean of the maximum: x med = 1 N N µ MAX (x i ) i=1 3 Minimum Max: x M = min X {µ MAX(x i )} M. Scarpiniti Neural Networks Lesson 9-23 / 37

Fuzzy Clustering M. Scarpiniti Neural Networks Lesson 9-24 / 37

Fuzzy Clustering Analogies and differences between circuits and neurofuzzy networks The fuzzy model can be seen as a digital circuit, designed for signal processing. Hence there exist an analogy between the fuzzy model and a digital circuit, shown in the following table: Circuits components topology synthesis Fuzzy Model rules reasoning clustering and rules inserting The neurofuzzy network can be thought as a generalization of the classical digital circuits, but there exist a substantial difference, show in the following table: Circuits indirect information Fuzzy Model direct information The linguistic information can be used directly into the neurofuzzy network: this possibility is not done for the digital circuits, where these properties must be kept into account with particular data transformations. M. Scarpiniti Neural Networks Lesson 9-25 / 37

Neurofuzzy networks Fuzzy Clustering The term neurofuzzy refers to combinations of artificial neural networks and fuzzy logic. Neurofuzzy hybridization results in a hybrid intelligent system that synergies these two techniques by combining the human-like reasoning style of fuzzy systems with the learning and connectionist structure of neural networks. Neurofuzzy hybridization is widely termed as Fuzzy Neural Network (FNN) or Neuro-Fuzzy System (NFS) in the literature. Neurofuzzy system incorporates the human-like reasoning style of fuzzy systems through the use of fuzzy sets and a linguistic model consisting of a set of IF-THEN fuzzy rules. The main strength of neurofuzzy systems is that they are universal approximators with the ability to solicit interpretable IF-THEN rules. The strength of neurofuzzy systems involves two contradictory requirements in fuzzy modeling: interpretability versus accuracy. In practice, one of the two properties prevails. We can show two type of neurofuzzy systems: the first type following the Mamdami s rules and the second one following the Takagi-Sugeno s rules. M. Scarpiniti Neural Networks Lesson 9-26 / 37

Neurofuzzy networks Fuzzy Clustering A neurofuzzy network of the first type is depicted in the following figure: the output of this network is an expansion of the input following some basis given by the rules. M. Scarpiniti Neural Networks Lesson 9-27 / 37

Neurofuzzy networks Fuzzy Clustering A neurofuzzy network of the second type is depicted in the following figure: it is an example of a first order function f (x). This kind of neurofuzzy network realize an input-output relation as a linear piecewise approximation with smoothing at the border. In fact each rule is locally defined on a small region of the input data. M. Scarpiniti Neural Networks Lesson 9-28 / 37

Synthesis of neurofuzzy networks Fuzzy Clustering The main problem in designing neurofuzzy systems is the determination of the fuzzy rules. The specification of such rules are established by determining high density regions in data space using clustering algorithms. Each rule corresponds to each cluster. Synthesis of a neurofuzzy networks is based on a clustering algorithm. In particular the membership function is evaluated on the basis of the corresponding cluster density: after choosing a shape for the membership function, its parameters are determined by its density. A possible form is the following: 1 µ(x) =, a > 0, b > 0 1 + x v 2b a where the distance between input data x and v can be measured in several ways. M. Scarpiniti Neural Networks Lesson 9-29 / 37

Synthesis of neurofuzzy networks Fuzzy Clustering A generic algorithm for the determination of he neurofuzzy network, as describe in the previous slide, is proposed below: Synthesis algorithm 1 choosing a membership function shape; 2 applying a clustering algorithm to the training set; 3 associating to each cluster a membership function; 4 learning the membership function parameters proportionally to the corresponding cluster density; 5 projecting the clusters to the axis of the input space, determining the fuzzy values A i k for the rule s antecedent; 6 using the proposed network obtaining the crisp result y after defuzzification. M. Scarpiniti Neural Networks Lesson 9-30 / 37

Fuzzy Clustering Fuzzy Clustering In fuzzy clustering, data elements can belong to more than one cluster, and associated with each element is a set of membership levels. These indicate the strength of the association between that data element and a particular cluster. We can modify well-known clustering algorithm in order to obtain new fuzzy version of this algorithm. Another procedure is to consider several procedures valid for fuzzy sets and develop new clustering algorithm. We will show two clustering algorithms, one for each category: 1 Fuzzy C-means algorithm; 2 Min-Max algorithm. M. Scarpiniti Neural Networks Lesson 9-31 / 37

Fuzzy C-means Fuzzy Clustering Given as examples the N vectors X k, k = 1, 2,..., N, the C-means algorithm is summarized below: Fuzzy C-means algorithm 1 choosing C vectors V i (c = 1, 2,..., C) as centroids; 2 for each input vector X k choosing a membership function µ ik (x); 3 evaluate the new centroids V i = N k=1 µm ikx k N k=1 µm ik i = 1, 2,..., C with m an integer greater than 1. 4 evaluating the sum of the distance of the new centroid from the old ones: E = C V i V i i=1 5 if E < ε with ε a small positive and arbitrary threshold stop the algorithm, else repeat from step 2. M. Scarpiniti Neural Networks Lesson 9-32 / 37

Fuzzy Min-Max Fuzzy Clustering The Min-Max clustering algorithm uses a neurofuzzy network called Min- Max and is based on the subdivision of the data space into hypercubes (HC) defined by a couple of points. 1 When a new data point is presented the network evaluates, looking to its neighborhood, if this point can be included in a present hypercube with an expansion. 2 This neighborhood is measured by a membership function. 3 If there no exist such an hypercube, a new one is created and its vertexes are defined by the data point X k. 4 If two or more hypercubes are partially overlapped, we must contract these HCs, deciding to which HC a data point belongs. M. Scarpiniti Neural Networks Lesson 9-33 / 37

Fuzzy Min-Max Fuzzy Clustering An example of Min-Max clustering algorithm is reported in the following figure. M. Scarpiniti Neural Networks Lesson 9-34 / 37

Fuzzy Clustering Example: Fuzzy controlling of distance Let us consider a fuzzy controller for controlling a distance between two cars traveling on a road. The following figure demonstrates how the braking force B depends upon the distance D and the velocity v. It is assumed that the distance D is constant, and only the mapping, B = f (v), is modeled. The membership functions are reported below M. Scarpiniti Neural Networks Lesson 9-35 / 37

Fuzzy Clustering Example: Fuzzy controlling of distance The considered rules are the following ones IF velocity is low, THEN braking force is small. IF velocity is medium THEN braking force is medium. IF velocity is high, THEN braking force is large while the relation B = f (v) is the following M. Scarpiniti Neural Networks Lesson 9-36 / 37

References Fuzzy Clustering L.A. Zadeh. Fuzzy sets. Information and Control, Vol. 8, pp. 338-353, 1965. V. Kecman. Learning and Soft Computing. Support Vector Machines, Neural Networks and Models The MIT Press, 2001. O. Maimon and L. Rokach. Soft Computing for Knowledge Discovery and Data Mining. Springer, 2008. M. Scarpiniti Neural Networks Lesson 9-37 / 37