Fuzzy Modeling - Compact Fuzzy Models and Classifiers through Model Reduction and Evolu

Chapter 1 Compact Fuzzy Models and Classifiers through Model Reduction and Evolutionary

1.2 Fuzzy Modeling

An important characteristic of fuzzy models is that they are based on partitioning information into fuzzy regions by means of fuzzy sets [23]. Contrary to classical set theory where a crisp set divides the universe of discourse into two groups, members and non-members, fuzzy sets allows us to describe various forms of gradual transition from total membership to total non-membership. This allows for smooth transitions from one region of operation to another. In each of these regions, the characteristics of the system are more or less different. The fuzzy model is typically a rule base with fuzzy rules capturing these characteristics by means of if-then rules with fuzzy predicates that establish relations between the relevant system variables (e.g., inputs and outputs). When the fuzzy predicates are associated with linguistic terms (labels), the fuzzy model becomes a qualitative description of the system using rules like

If the temperature is moderate and the volume is small then the pressure is low.

The fuzzy sets associated with the labels moderate, small and low are given by membership functions defined in the numerical domain of the respective system variables, temperature, volume and pressure, as illustrated in Figure 1.1. Such models are often called linguistic fuzzy models.

One of the most commonly used inference mechanisms in fuzzy models is the compositional rule of inference [23], a generalization of the traditional modus ponens known from classical logic. In applying this inference, the fuzzy model is seen as a relation R defined on X×Y, where X is the premise space and Y is the consequent space. Each rule is a fuzzy relation defining a locally valid model.

The total relation is composed by combining the relations defined by the individual rules. Different operators can be used for implementing this type of fuzzy inference. A method proposed by Mamdani [24] is frequently encountered in control engineering [25]. Mamdani fuzzy models use rules in which both the premise and consequent are described by fuzzy sets (Figure 1.1). Another fuzzy model type, often used in systems modeling and control, is the Takagi-Sugeno (TS) model [18]. Like the Mamdani model, it has a fuzzy premise; the consequents of the rules, however, are defined by (linear) functions of the premise variables. This makes them more suitable for modeling dynamic systems and for data-driven modeling. In the following we will consider fuzzy models of the TS type.

Figure 1.1 Example of a linguistic fuzzy rule

1.2.1 The Takagi-Sugeno Fuzzy Model

Rule-based models of the TS type [18] are suitable for the approximation of a broad class of functions. The TS model consists of a set of rules where the rule consequents are often taken to be linear functions of the inputs:

. denotes the ith rule, and Ai1,…, Ain are fuzzy sets defined in the antecedent space by membership functions µ_Aij(xj) : _ℜ_→_[₀_1,_], p_i1,…, pi(n+1) are the consequent parameters and M is the number of rules.

Each rule in the TS model defines a hyperplane in the antecedent-consequent product space, which locally approximates the real system’s hypersurface. The outputyof the model is computed as a weighted sum of the individual rule contributions:

Transparency is strongly related to the number of rules used by the model and to the partitioning of the input space (the premise of the rule base). Fixed membership functions are often used to partition the feature space [10].

Membership functions derived from the data, however, explain the data patterns in a better way. Typically, less sets and less rules result than in a fixed partition approach. If the membership functions derived from data have simple shapes and are well separated, then they can still be assigned meaningful linguistic labels by the domain experts.

Fuzzy clustering methods have proven useful for identifying this partitioning from data. Unlike the common approach of unsupervised clustering in the premise space (inputs only), when output data (labels) are available, it can be useful to supervise the clustering by considering the product space of the inputs and outputs. The cluster algorithm then seeks to establish groups within the data that are homogenous with regard to both the structure in the input and the output [5,26]. This is the approach followed here.

From data, an initial fuzzy rule-based model is derived in two steps. First, the fuzzy antecedents Aij are determined by means of fuzzy clustering. Then, with the premise fixed, the rule consequents are determined by least squares parameter estimation [19]. For clustering, a regression matrix X^T= [x1,…, xK] and an output vector y^T =[y1,…, yK] are constructed from the available data. Note that the number of used inputs (features) is important for the transparency of the resulting model. However, we do not explicitly deal with feature selection in this chapter.

Assuming that a proper data collection has been done, clustering takes place in the product space of X and y to identify regions where the system can be locally approximated by TS rules. Various cluster algorithms exist, differing mainly in

the shape or size of the cluster prototypes applied. In the following, we will apply the popular fuzzy c-means algorithm [26].

Given the data Z^T = [X, y], the cluster algorithm computes the fuzzy partition matrix U whose ikth element µik ∈ [0,1] is the membership degree of the data object z_k∈ Z, in cluster i. The rows of U are thus multidimensional fuzzy sets (clusters) represented point-wise. Univariate fuzzy sets A_ijare obtained by projecting the rows of U onto the input variables xj:

µ^Aij^(x^jk^{)= proj}^j⁽µ^ik^{) ,} ⁽⁴⁾

where proj is the point-wise projection operator [27]. The point-wise defined fuzzy sets Aij are typically non-convex. However, the core and the corresponding left and right parts of the set can be recognized. To obtain reasonable, e.g., convex, fuzzy sets, in order to compute µAij(xj) for any value of x_j, the sets are approximated by fitting suitable parametric functions to the point-wise projections [19] as illustrated in Figure 1.2.

0 20 40 60 80 100

Figure 1.2 Fuzzy sets are defined by fitting parametric functions (solid lines) to the projections (dots) of the point-wise defined fuzzy sets in the fuzzy exponential functions), the resulting model will in general have a higher accuracy in fitting the training data. Such functions, however, are less suitable for linguistic interpretation.

1.2.3 Estimating the Consequent Parameters

Once the antecedent membership functions have been fixed, the consequent parameters piq, q = 1,…, n+1, of each individual rule are obtained as a local least squares estimate. Let θi = [pi1,…, pin, pi(n+1)]^T, let Xe denote the matrix [X1] with rows [xk, 1], and let Wi denote a diagonal matrix in_ℜ^K×^Khaving the degree of activation βⁱ^(x^k) (Eq. 3) as its kth diagonal element. The consequents of the ith rule is the weighted least squares solution of y = Xeθi + ε, where θi is given by:

[

ⁱ

]

ⁱ^y

i= X^T_eWX_e⁻¹X^T_eW (6)

Dans le document GENETIC ALGORITHMS (Page 71-75)