Experiments in MATLAB - Statistical Based Approaches

Statistical Based Approaches

2.4 Experiments in MATLAB

In this section some few experiences in MATLAB regarding the techniques discussed in this chapter are presented.

x = rand(1,10);

Fig. 2.10 The PCA method applied in MATLAB to a random set of points lying on the liney=x.

An example of the use of the PCA method in MATLAB is given in Figure 2.10.

The figure shows the set of instructions in MATLAB for computing the principal components of a random set of two-dimensional points. In the following, all the instructions in Figure 2.10 are commented on step by step. The functionrandis used for generating a random vector of points close to the liney =2x. Thexcoordinates of the points are created byrand, while theycoordinates are obtained by adding a random number to 2x. In MATLAB all the variables, if it is not differently specified, are matrices of real numbers (for details about MATLAB the reader is referred to Appendix A). Then,x andy are matrices, where one of their dimensions is 1, and this makes them actually vectors. It is very important to keep in mind that MATLAB considers variables as matrices, when functions such asrandneed to be used. Indeed, randtakes two input parameters: the number of rows and the number of columns of the random matrix to be generated. If a vector is needed, one of these two parameters has to be 1.

After defining a set of points, the covariance matrix related to the variables used for representing these points, the coordinatesxandy, needs to be computed. In MAT-LAB, the functioncovcomputes the covariance matrix of a given set of variables.

The result is stored inAand used as an input parameter for the functioneig.eig computes the eigenvaluesdand the eigenvectorsvof the covariance matrixA. The eigenvectors play the role of the vectorα1in equation (2.1). They can be used for computing the transformed variables. The two new variables arex1andy1.

Two calls to the functionplotcreates Figure 2.11. The figure contains the original set of points and the transformed set of points. The original points are marked by circles. From the figure, it is clear that the variability on one of the transformed vari-ables is very small. The varivari-ablesvar_x1andvar_y1contain this information. Note that the basicplotfunction needs two input parameters only: a vector containing the xcoordinates and a vector containing theycoordinates of the points to draw. In this case, other optional parameters are also used. For a description of these options refer to Appendix A. They are used for marking each point with a particular marker having a certain color. The vector[.49 1 .63]specifies a particular tonality of green. The instructionhold onis used for letting the different graphs created byplotoverlap on each other.

0 0.5 1 1.5 2 2.5 3

−0.5 0 0.5 1 1.5 2 2.5 3

Fig. 2.11 The figure generated if the MATLAB instructions in Figure 2.10 are executed.

Let us generate now in MATLAB interpolating and regression functions. In Fig-ure 2.12 a sequence of MATLAB instruction is shown. The calls to the function plotgenerate Figure 2.13(a). In this example, a set of 9 points in a two-dimensional space is considered. The 9 points are specified in MATLAB through theirx andy coordinates, contained in the vectorxand the vectory, respectively. These points are drawn in the figure by using the first call to the functionplot. The functionplot is then used another time for drawing all the points in the set. This time no options are used, and, by default, the functionplotconnects the points to draw by a line.

What is drawn is therefore the join-the-dots function interpolating the set of points.

The polynomial interpolating the points is instead computed by using the function polyfit. The specified degree is 8, since the polynomial passing through 9 points is unique if its degree equals the number of points minus one. The functionpolyfit needs as input parameters thexandycoordinates of the points to interpolate, and the degree of the polynomial. The output of the function is a vectorccontaining the coef-ficients of the polynomial. In order to draw this polynomial, it must be evaluated on a certain number of independent variables, and the couples of independent/dependent

x = [-8 -6 -3 -2 1 5 7 9 10];

y = [1 2 2 1 -1 1 0 0 -1];

plot(x,y,’ko’,’MarkerSize’,10,’MarkerEdgeColor’,’k’,’MarkerFaceColor’, [.49 1 .63])

hold on plot(x,y)

c = polyfit(x,y,8);

xx = -8:0.1:10;

yy = polyval(c,xx);

plot(xx,yy,’r:’)

Fig. 2.12 A sequence of instructions for drawing interpolating functions in MATLAB.

−8 −6 −4 −2 0 2 4 6 8 10

−1.5

−1

−0.5 0 0.5 1 1.5 2 2.5 3

(a)

−8 −6 −4 −2 0 2 4 6 8 10

−1.5

−1

−0.5 0 0.5 1 1.5 2 2.5 3

(b)

Fig. 2.13 Two figures generated by MATLAB: (a) the instructions in Figure 2.12 are executed; (b) the instructions in Figure 2.14 are executed.

variables can be used to draw the polynomial using the functionplot. If the used independent variables are sufficiently close to each other, then the figure generated by the functionplotis a good approximation of the polynomial. The vectorxxis used for storing the independent variables. It is a vector whose first component is

−8 (the smallest value inx), whose last component is 10 (the largest value inx), and such that the difference between any consecutive components inxxis 0.1. The functionpolyvalcan evaluate a polynomial. It takes as input parameters the poly-nomial coefficients and a vectorxxcontaining a set of independent variables. The

plot(x,y,’ko’,’MarkerSize’,10,’MarkerEdgeColor’,’k’,’MarkerFaceColor’, [.49 1 .63])

hold on

yy = spline(x,y,xx);

plot(xx,yy,’k’) c = polyfit(x,y,1);

yy = polyval(c,xx);

plot(xx,yy) c = polyfit(x,y,2);

yy = polyval(c,xx);

plot(xx,yy,’m:’)

Fig. 2.14A sequence of instructions for drawing interpolating and regression functions in MAT-LAB.

result, the set of corresponding dependent variables, is given in output and stored in yy. The functionplotis then called for drawing the points specified inxxandyy. The option’r:’forces the figure to be in red and drawn with dashed lines.

As discussed above, there are other ways for interpolating or approximating a certain set of points by a function. Suppose that the variablesxandyare still in memory as defined in the code in Figure 2.12, then the code in Figure 2.14 generates Figure 2.13(b). The points are drawn another time, by the first call to the function plot. Then, the cubic spline interpolating such points is computed. The function splineevaluates the cubic spline passing through the given points specified in x andyin the independent variables inxx. The corresponding dependent variables are stored inyy. Once again, the functionplot is called for drawing the points specified in xx andyy. This time’k’is used as option, meaning that the figure must be black. After that, the linear regression approximating the points is computed by using the functionpolyfit. This function has been used before for finding the coefficients of the interpolating polynomial. The only difference stands in the degree of the polynomial: it has to be 1 if the linear regression function is needed. The two coefficients of the linear function are then stored in^c, the functionpolyvalis used for evaluating such linear function in a set of points that are utilized byplot. The same procedure is used at the end for drawing the quadratic regression function.

2.5 Exercises

Some exercises related to the principal component analysis, the interpolating func-tions and the regression funcfunc-tions are presented in this section. All the solufunc-tions are reported in Chapter 10.

1. Given the set of points

(1,−1), (3,0), (2,2), compute the range of variability of their components.

2. In MATLAB, generate randomly a set of points in a two-dimensional space lying on the liney=x. Apply PCA in order to reduce the dimension of the set of points.

3. Compare the original set of points randomly generated in Exercise 2 to the set with reduced dimension obtained by PCA. For this purpose, create a figure in MATLAB that displays the two sets.

4. Given 2 points in a two-dimensional space:

(1,0), (0,−2),

compute the equation of the unique line passing through them.

5. Build a figure in MATLAB of the line obtained in Exercise 4.

6. Given 3 non-aligned points in a two-dimensional space:

(0,1), (1,2), (−1,3),

compute the equation of the unique parabola passing through them.

7. Consider 5 points in a two-dimensional space:

(4,2), (2,2), (1,4), (0,0), (−1,3).

Build a MATLAB figure containing the points and the join-the-dots function interpolating them.

8. Consider the same points of the previous exercise. Build a MATLAB figure con-taining the points and the quadratic regression approximating such points.

9. Consider 6 points in a two-dimensional space:

(1,2), (2,3), (1,−1), (−1,3), (1,−2), (0,−1).

Build a MATLAB figure in which the points are represented with their linear and quadratic regression functions.

10. Consider the same set of points of the previous exercise. Suppose that each point (x, y)of the set is approximated with the corresponding point(x, f (x))of the lin-ear regressionf obtained in the previous exercise. Compute the mean arithmetic error on the whole set of points using MATLAB.

Chapter 3

Dans le document DATA MINING IN AGRICULTURE (Page 59-66)