Model Functions - Achim Zielesny From Curve Fitting to Machine Learning

Since model functions play an important role throughout the book a categorization of model functions is helpful. A good starting point is the most prominent model function: The straight line.

1.3.1 Linear Model Functions with One Argument

Clear["Global‘*"];

<<CIP‘Graphics‘

The well-known functional form of the straight line is y=f(x) =a₁+a₂x

pureFunction=Function[x,1.0+2*x];

argumentRange={0.0,5.0};

functionValueRange={0.0,12.0};

labels={"x","y","Straight line"};

CIP‘Graphics‘Plot2dFunction[pureFunction,argumentRange, functionValueRange,labels]

The straight line is linear in two ways: It describes a linear relation between argu-mentxand function valueyand is itself linear in its parametersa₁anda₂, i.e.a₁ anda₂have exponent 1. A general model function which is linear in its parameters can be deﬁned as follows:

y=f(x) =a₁g₁(x) +a₂g₂(x) +...+aLg_L(x) =∑^L_v=1a_vg_v(x)

This general linear function consists ofLparametersa₁toa_Lthat are each multiplied by a functiong_v(x). The functionsg_v(x) depend onxand do only have ﬁxed and known internal parameters. Note that the general linear function does not necessarily describe a linear relation between argumentxand function valuey: This relation

may be highly non-linear, e.g. for agv(x) that is equal toe^x. From the point of view of the general linear function the straight line is just a special case with

L=2 ; g₁(x) =x⁰=1 ; g₂(x) =x that leads to

y=f(x) =a₁+a₂x

Another well-known example of this type of linear model functions are polynomials y=f(x) =a₁+a₂x+a₃x²+...+aLx^L−1=∑^L_v=1a_vx^v−1

e.g. the quadratic parabola

y=f(x) =∑³_v=1a_vx^v−1=a₁+a₂x+a₃x²

pureFunction=Function[x,11.0-15.0*x+5.0*x^2];

argumentRange={0.0,3.0};

functionValueRange={-1.0,12.0};

labels={"x","y","Quadratic parabola"};

CIP‘Graphics‘Plot2dFunction[pureFunction,argumentRange, functionValueRange,labels]

Model functions that are linear in their parameters make up an important special case for curve ﬁtting procedures to experimental data: It can be shown that they lead to optimization problems with only one global optimum which in principle may be

calculated with pencil and paper by means of analytic calculation strategies (e.g. see [Hamilton 1964], [Barlow 1989], [Bevington 2002], [Brandt 2002] or [Press 2007]).

Again, note that the term linear model function denotes a function that is lin-ear in its parameters only. It does not necessarily mean a linlin-ear dependence of the function valueyon the argumentx. This subtle difference often causes some misunderstandings in scientiﬁc practice as far as non-linear ﬁts are concerned.

1.3.2 Non-linear Model Functions with One Argument

Clear["Global‘*"];

<<CIP‘Graphics‘

A model function that is not linear in its parameters is called a non-linear model function, e.g.

y=f(x) =a₁e^a²^x

To recognize the non-linearity in parameters of the example function a power series expansion is helpful (in this case aroundx= 0 with a display up to the 4th power):

Series[Subscript[a, 1]*Exp[Subscript[a, 2]*x],{x,0,4}]

a₁+a₁a₂x+¹₂a₁a²₂x²+¹₆a₁a³₂x³+₂₄¹a₁a⁴₂x⁴+O[x]⁵

The cross terms likea₁a₂ora₁a²₂and the higher powers ofa₂likea²₂,a³₂,a⁴₂etc. now become directly obvious. A prominent example is the exponential decay model that describes radioactive processes of disintegration or chemical ﬁrst-order kinetics:

pureFunction=Function[x,1.0*Exp[-8.0*x]];

argumentRange={0.0,1.0};

functionValueRange={0.0,1.5};

labels={"x","y","Exponential decay"};

CIP‘Graphics‘Plot2dFunction[pureFunction,argumentRange, functionValueRange,labels]

Nature (fortunately) is not linear (otherwise living organisms would not exist) so non-linear model functions play a predominant role in science. But compared to lin-ear models non-linlin-ear model functions may cause severe problems in data analysis procedures. They lead to optimization problems with multiple optima so analytic calculation strategies are no longer applicable in general: Only iterative strategies can be followed that may disastrously fail.

So far only one dimensional model functions with one argumentxare discussed.

One dimensional model functions play the central part in curve ﬁtting methods where the structural form of the model function is often known but not the values of its parameters (see chapter 2).

1.3.3 Linear Model Functions with Multiple Arguments

Clear["Global‘*"];

<<CIP‘Graphics‘

Model functions with multiple argumentsx₁tox_Mmay be linear in their parameters and are generally written in the form (that utilizes the general linear function with one argument from above):

y=f(x1,x₂,...,x_M) =

∑^L_v=1a_1vg_1v(x1) +...+

∑^L_v=1a_Mvg_Mv(xM) y=f(x1,x₂,...,x_M) =∑^M_u=1

∑^L_v=1a_uvg_uv(xu)

The multidimensional analog of the straight line is the hyperplane that is derived from the general linear model function with

L=2

y=f(x1,x₂,...,x_M) =∑^M_u=1

∑²_v=1a_uvg_uv(xu)

=∑^M_u=1(au1g_u1(xu) +a_u2g_u2(xu)) y=f(x1,x₂,...,x_M) =∑^M_u=1a_u1g_u1(xu) +∑^M_u=1a_u2g_u2(xu)

and

a_u=a_u1; g_u1(xu) =x_u; a_M+1=∑^M_u=1a_u2;g_u2(xu) =1 that leads to

y=f(x1,x₂,...,x_M) =∑^M_u=1a_ux_u+a_M+1 A 3D plane withM=2

y=f(x1,x₂) =a₁x₁+a2x₂+a₃ is visualized below:

pureFunction=Function[{x,y},1.0+2.0*x+3.0*y];

xRange={-0.1,1.1};

yRange={-0.1,1.1};

labels={"x","y","z"};

CIP‘Graphics‘Plot3dFunction[pureFunction,xRange,yRange,labels]

What holds for one dimensional linear model functions still holds for their multidi-mensional analogs: Model ﬁtting procedures to experimental data lead to optimiza-tion problems with one global optimum with analytic calculaoptimiza-tion strategies for its position.

1.3.4 Non-linear Model Functions with Multiple Arguments

Clear["Global‘*"];

<<CIP‘Graphics‘

Non-linear model functions with multiple argumentsx₁tox_M like y=f(x1,x₂,...,x_M) =a₁sin(x1) +exp

∑^M_u=2a_ux²_u

(where exp{x}denotese^x)may be viewed as curved hyper surfaces with multiple minima and maxima in comparison to linear hyperplanes. The already shown curved 3D surface may again be taken as an example:

pureFunction=Function[{x,y},

1.9*(1.35+Exp[x]*Sin[13.0*(x-0.6)^2]*Exp[-y]* Sin[7.0*y])];

xRange={-0.1,1.1};

yRange={-0.1,1.1};

labels={"x","y","z"};

CIP‘Graphics‘Plot3dFunction[pureFunction,xRange,yRange,labels]

It is these kinds of curved hyper surfaces that answer the most subtle questions about nature but on the other hand they cause the worst data analysis problems.

Machine learning methods usually lead to this kind of surfaces to optimize (see chapter 4): They require iterative optimization techniques which in turn need considerable computational power to be applied with success.

1.3.5 Multiple Model Functions

In a last step multiple model functions may be collected together to generate an output vectory(the answer) for an input vectorx(the question)

y₁=f₁(x1,x2,...,x_M) y₂=f₂(x1,x2,...,x_M)

...

y_N=f_N(x1,x₂,...,x_M)

which may be written in an abbreviated vector notation:

y=f(x)

Note that the output vectoryand the function vectorf are of dimensionNwhereas the input vectorxis of (maybe different) dimensionM. Model function collections of this kind play the crucial role in machine learning methods where function col-lections are constructed to describe experimental data in multiple dimensions (see chapter 4).

1.3.6 Summary

The Holy Grail of the sciences to calculate nature with output=f(input)

may now be written in mathematical detail:

y=f(x)

Questions about nature are asked with adequately defined input vectorsxthat are submitted to model functions f to give the answer in form of an adequately defined output vectory. This is a rather general scheme: Nearly everything can be adequately coded in input/output vectors, e.g. molecules, pharmacological effects, material’s properties etc. The details of this kind of coding may be subtle and difficult and are the realm of specific areas of science like chemoinformatics or bioinformatics. The proper coding is an essential precondition to any data analysis: If the interesting parts of the world are not adequately coded then any association of them by model functions must inevitably fail.

Dans le document Achim Zielesny From Curve Fitting to Machine Learning (Page 49-57)