The probability of success remains the same from trial to trial

Random Variables, Probability Distributions, and Important

4. The probability of success remains the same from trial to trial

Perhaps the easiest way to envision whether applying the binomial dis-tribution is appropriate is to think of whether the situation under evaluation is equivalent to a sequence of coin flips with the same coin, in which we are interested in the probability that we get a given number (x) of tails in n flips.

Suppose we would like to plot the binomial distribution of the random variable describing the price movements of the stock over the next three years. What would be the values on the x-axis? The values for the random variable can be 0, 1, 2, or 3—the number of successes can be either 0 (the stock’s value went down every year), 1 (the stock’s value went down in one of the three years, but went up in the remaining two), and so on. We will denote the random variable by ˜X, but note that this is not the same random variable as the random variable in section 3.2. In order to create the listing of probabilities, we compute:

P( ˜X=0)=(0.70)·(0.70)·(0.70)=0.343=34.3%

P( ˜X=1)=(0.30)·(0.70)·(0.70)+(0.70)·(0.30)·(0.70) +(0.70)·(0.70)·(0.30)=.441=44.1%

P( ˜X=2)=(0.30)·(0.30)·(0.70)+(0.70)·(0.30)·(0.30) +(0.30)·(0.70)·(0.30)=.189=18.9%

P( ˜X=3)=(0.30)·(0.30)·(0.30)=0.027=2.70

Therefore, the distribution can be represented as the graph in Exhibit 3.3.

To clarify why the probabilities were computed in the way just illus-trated, note that there is only one way to have three successes in three trials ( ˜X=3): when every trial is a success. The probability of this event is the product of the probabilities that each of the three trials is a success. (The

0 1 2 3 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

Probability

E X H I B I T 3 . 3 Binomial distribution with n=3 trials and p=0.30 probability of success.

multiplication of probabilities to obtain the total probability is permitted because the trials are assumed to be independent.) However, there are three ways to have one success in three trials: The success can be in the first, sec-ond, or third trial. To account for these different combinations, we can use the following well-known formula from the branch of mathematics called combinatorics (sometimes called the science of counting):

x!(n−x)!.

The preceding formula computes the number of ways in which one can select x out of n objects. The symbol “n!” (pronounced “n factorial”) stands for the expression 1·2·. . .·n. The exact formula for computing the probability of obtaining x successes in n trials when the probability of success is p (in our example, p=0.30) is

P( ˜X=x)= n!

x!(n−x)!p^x(1−p)ⁿ⁻^x,x=0, . . . ,n

0 1 2 3 4 5 6 7 8 9 10

E X H I B I T 3 . 4 Shape of binomial distributions with the same number of trials (n=10) and different values for the probability of success p: (A) p=0.3; (B) p= 0.5; (C) p=0.7.

This is the PMF of the binomial distribution, and is the formula software packages use to compute the binomial distribution probabilities. Note that depending on the magnitude of the probability of success, the binomial distribution can be shaped differently. Exhibit 3.4 illustrates the binomial distribution for three different values of the probability of success.

3 . 4 N O R M A L D I S T R I B U T I O N A N D P R O B A B I L I T Y D E N S I T Y F U N C T I O N S

The binomial distribution is a discrete probability distribution because the values the random variable can take are countable (0, 1, 2, etc.). Let us see what happens if we try to model the movements of the stock in 100 years.

0 1 2 3

E X H I B I T 3 . 5 Shape of binomial distributions with the same probability of success p=0.3 and an increasing number of trials n: (A) n=3; (B) n=20; (C) n=100.

Exhibit 3.5 shows the binomial distribution for probability of success of 0.30 and number of trials n=3, 20, and 100.

Note that the binomial distribution begins to look symmetric as the number of trials increases. Also, the range of values begins to look like a continuum. When the random variable takes values on a range, as opposed to at discrete points, the random variable is called continuous, and its proba-bility distribution is called continuous as well. Continuous distributions are defined by their probability density functions (PDFs)—basically, functions that describe the shape of the curve on the graph. They are often denoted by the standard mathematical notation for functions, f (x), where x represents the possible values the random variable can take.

A common mistake is to think of f (x) as the probability that the random variable will take the value x, analogously to the way we defined the PMF

p(x). This is incorrect. In fact, the value of f (x) may be greater than 1, which a probability cannot be. Instead, we need to think about probabilities for continuous distributions in terms of areas under a curve that describes the probability distribution. Intuitively, the reason continuous distributions are associated with areas under the PDF f (x) is that a continuum represents an infinite number of values that the random variable can take. If we try to assign a bar whose height equals a nonzero probability to each value the random variable can take, as we did in the case of the binomial distribution, the total sum of all bars (and, hence the total probability for that distribution) will be infinity. However, the total probability, added up over all possible values for the random variable, cannot be more than 1.

Consequently, a better way to think of the probability of each particular value of the random variable is as infinitely small (virtually, 0), but then realize that when many, many of these values are added together, they have a significant probability mass. This is the concept of integration in calculus. The area under a given curve can be computed by adding up an infinite number of tiny areas above intervals of length dx on the x-axis. The probability that a continuous random variable takes values between two constants a and b can be expressed as the integral

b a

f (x) dx,

and the total probability (the area under the entire curve) should be 1:

∞

−∞

f (x) dx=1.

It turns out that the binomial distribution approaches a very important continuous distribution, called the normal distribution, as the number of trials becomes large. The normal distribution is bell-shaped and is entirely defined by two parameters: its meanµand standard deviationσ. This means that if we know them, we can draw the shape of the distribution. (We will introduce the concepts of mean and standard deviation shortly. For now, just think ofµandσ as inputs to the formula for the normal distribution PDF.) This is because the normal PDF is given by the formula

f (x)= 1 σ√

2πe⁻^(x−µ)2²^σ² .

–5 –4 –3 –2 –1 0 1 2 3 4 5 0

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

PDF

E X H I B I T 3 . 6 Standard normal distribution.

The standard normal distribution hasµ=0 andσ=1 (see Exhibit 3.6).

You may also encounter the notation f (x|µ, σ), that is, f (x|µ, σ)= 1

σ√

2πe⁻^(x−µ)2²^σ² .

The symbol | stands for “conditional on” or “given.” In other words, given specific values forµorσ, the PDF for the normal distribution is given by the preceding formula. The symbol | will be useful in other circumstances as well, such as stating the PDF of a random variable conditional on the realization of another random variable. For example, the probability distri-bution of asset returns in one time period may depend on (“be conditional on”) the realization of asset returns in the previous time period. We will provide a more formal definition of conditioning in section 3.10.

The normal distribution appears surprisingly often in nature and was studied in detail well before its prominent use in finance. Modern day ran-dom process modeling in finance has borrowed a lot of findings from natural

sciences such as physics and biology. For example, a classical assumption in modeling asset returns is that the changes in asset prices over small periods of time are normally distributed (despite the fact that the empirical evidence from real-world markets does not support the position that changes in asset returns follow a normal distribution). We will use the normal distribution extensively in this book.

The binomial and the normal distributions are famous representatives of the two classes of probability distributions: discrete and continuous. How-ever, there are numerous other useful probability distributions that appear in practice. We will review some of these distributions later in this chap-ter. But first, let us introduce a couple of important concepts for describing probability distributions.

3 . 5 C O N C E P T O F C U M U L A T I V E P R O B A B I L I T Y

Cumulative probability is the probability that a random variable takes a value that is less than or equal to a given value. Cumulative probability is an important concept, and is available as a function from a number of soft-ware packages, including Excel and MATLAB. The cumulative distribution function (CDF) can be thought of as a listing of the cumulative probabilities up to every possible value the random variable can take. Examples of CDFs for a continuous and a discrete distribution are shown in Exhibit 3.7. The CDFs always start at 0 and end at 1, but the shape of the curve on the graph is determined by the PDF or PMF of the underlying random variable.

(The CDF for a discrete random variable has a characteristic staircase-like shape—i.e., “step function” in mathematical jargon.)

1.0 0.8 0.6 0.4 0.2

0.00 1 2 3 4 5 6 7

1.0 0.8 0.6 0.4 0.2 0.0

–1 1 2 3 4 5 6 7 8 9 10

(A) (B)

E X H I B I T 3 . 7 The CDF of (A) a continuous random variable (lognormal), and (B) a discrete random variable (binomial with 10 trials and probability of success 0.30). The values on the horizontal axis are the values the random variable takes.

To show how one would compute cumulative probability for a dis-crete distribution, let us consider the binomial distribution example from section 3.3, which is also the CDF plotted in Exhibit 3.7(B).

Suppose that the probability of success is 0.30, and we would like to compute the probability that the number of successes in 10 trials will be at most six. Intuitively, we are trying to estimate the total height of the first six bars in the first picture in Exhibit 3.4(B). We can write this expression as

P( ˜X≤6)=P( ˜X=1)+ · · · +P( ˜X=6)

= 10!

1!(10−1)!0.30¹(1−0.30)¹⁰⁻¹+ · · · + 10!

6!(10−6)!0.30⁶(1−0.30)¹⁰⁻⁶

= 6 k=1

10!

k!(10−k)!0.30^k(1−0.30)¹⁰⁻^k where we have used the classical symbolfor sum.

To construct the entire CDF, we would perform the same calculation for all possible values of ˜X, that is, compute P( ˜X≤ 0), P( ˜X≤ 1), . . . , P( ˜X≤ 10). We would then plot ˜X on the x-axis, and the corresponding P( ˜X≤x) on the vertical axis (by convention referred to as the y-axis).

For continuous distributions, we would replace the sum by an integral, and find the area under the PDF that is less than or equal to a given constant.

For example, if f (x) is the PDF of a continuous probability distribution (such as the normal distribution and other distributions such as t, chi-square, and exponential distributions that we describe later), then the CDF (usually denoted by F(x)) can be computed as

F (x)= P( ˜X≤x)= x

−∞

f (x) dx.

Further, the probability that a random variable takes values between two constants a and b can be linked to the CDF as follows:

P(a≤ X˜ ≤b)= b a

f (x) dx=F (b)−F (a).

To illustrate this, let us look at the picture in Exhibit 3.8. Suppose we would like to compute the probability that the random variable takes a value

0.030 39.9%

–20 0 20 40 60 80 100 120 140

45.7% 14.4%

20.0 60.0

0.025

0.020

0.015

0.010

0.005

0.000

E X H I B I T 3 . 8 Calculation of the probability that the random variable falls between 20 and 60 as a difference of cumulative probabilities up to 20 and 60.

between 20 and 60 (which is the area under the PDF between 20 and 60, and is 45.7% according to the picture). To compute that probability, we can equivalently compute the cumulative probability (the area) up to 20 (which is 39.9% according to the picture) and subtract it from the cumulative probability (the area) up to 60 (which is 100% −14.4%= 85.6%). We obtain F(60) – F(20)=85.6%−39.9%=45.7%, which is the same number.

3 . 6 D E S C R I B I N G D I S T R I B U T I O N S

A probability distribution can be used to represent the uncertain outcomes of a project, or the possible future value of an investment in an asset. But what does the picture of this probability distributions tell us, and how can we convey the most important insights to others? This section introduces math-ematical terminology for describing probability distributions. Specifically, the graph of a probability distribution gives us information about:

Where the most likely or most representative outcomes are (central tendency).

Whether we can be optimistic or pessimistic about the future (skew).

What the risk is (spread and tails of the distribution).

Finance practitioners are also often concerned with how “fat” the tails of the distribution are—which tells us how likely it is that “extreme” events, that is, events that are very far from the “representative” outcomes in the middle of the distribution, will occur. We will discuss a measure of this distribution characteristic (kurtosis) in section 3.6.4.

3 . 6 . 1 M e a s u r e s o f C e n t r a l T e n d e n c y

Measures of central tendency include:

Mean

Median

Mode

The mean is by far the most commonly utilized measure in financial applications for theoretical reasons, despite the fact that it has some serious drawbacks, most notably sensitivity to extreme values. We discuss the mean in the most detail, and review briefly the other two measures.

M e a n On an intuitive level, the mean (also called the “expected value”

or the “average”) is the weighted average of all possible outcomes in the distribution, where the weights equal the probabilities that these values are taken. This is easier to imagine in the case of discrete distributions than in the case of continuous distributions, but the main idea is the same.

In mathematical jargon, the mean is called the first moment of a proba-bility distribution.²It is denoted as E[ ˜X] (for “expected value of the random variable ˜X”).

In the case of a discrete distribution,

E[ ˜X]=

All values x of the random variable

x·P( ˜X=x).

In the case of a continuous distribution,

E[ ˜X]= ∞

−∞

x· f (x) dx.

For example, the mean of the Bernoulli distribution in section 3.2 is 0·0.70+1·0.30=0.30

The mean of a normal distribution can be computed as

E[ ˜X]= ∞

−∞

x· 1 σ√

2πe⁻^(x−µ)2²^σ² dx

In the case of the normal distribution, of course, this calculation is redundant since the parameterµin the expression inside the integral is ac-tually the mean. However, the mean is not a parameter in the PDF formulas for most probability distributions, and this is the calculation that would be used to compute it. To practice, you can compute the preceding integral, and verify that the calculation indeed givesµas the answer.

As a final remark, we note that the mean is not always the “middle point” of the range of the distribution. (We will see ample examples in this book.) One outlier (that is, one value for the random variable that is very far from the others) can shift the mean significantly. Note also that the mean does not have to be one of the values of the probability distribution, as the previous Bernoulli example illustrated. (The mean for the Bernoulli distribution was 0.30, which is not 0 or 1.) The mean is merely the “center of gravity” of the probability distribution.

M e d i a n The median is a more robust measure of the “middle” of the

distribution. It is the value on the horizontal so that 50% of the distribution lies on each side of it. Since the median does not take into consideration where the values on each side of it lie (as opposed to the mean, which considers the actual values and their probabilities of occurrence), the median is not as influenced by the presence of extreme values (values that are very far from the center of the distribution) as the mean.

M o d e The mode of a distribution is the most likely outcome. One can think

of it as the value at which the PDF/PMF of the distribution is at its highest.

For example, in Exhibit 3.4, the mode of the first binomial distribution is 3, the mode of the second distribution is 5, and the mode of the third distribution is 7.

You may hear about “unimodal” or “bimodal” distributions. These terms basically just refer to how many “peaks” the distribution has. Almost all theoretical distributions used in financial modeling are unimodal, as their properties are easier to model mathematically. The distributions we have introduced so far, for example, are unimodal. Of course, real-world data do not always follow neat mathematical rules, and may present you with distributions that have more than one mode.

3 . 6 . 2 M e a s u r e s o f R i s k

V a r i a n c e a n d S t a n d a r d D e v i a t i o n When thinking of risk, one usually thinks of how far the actual realization of an uncertain variable will fall from what one expects. Therefore, a natural way to define a measure of un-certainty is as the spread, or dispersion of a distribution. Two measures that describe the spread of the distribution are variance and standard deviation.

The two are strongly related: the standard deviation is the square root of the variance, and we usually need to compute the variance before computing the standard deviation. Exhibit 3.9 illustrates the relationship between variance/

standard deviation and the spread of the distribution. Suppose we are con-sidering investing in two assets, A and B. The probability distribution for B has a wider spread and higher variance/standard deviation than the proba-bility distribution for A.

Mathematically, the variance of a random variable is related to the second moment of a probability distribution, and is computed as follows:

For discrete distributions:

Var( ˜X)=

All values x of the random variable

(x−µ)²·P( ˜X=x)

−100 −80 −60 −40 −20 0 20 40 60 80 100

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04

Return (%)

B A

E X H I B I T 3 . 9 Comparison of two probability distributions in terms of risk and central tendency.

For continuous distributions:

Var( ˜X)= ∞

−∞

(x−µ)²· f (x) dx

In the preceding equations,µdenotes the mean of the distribution.

In words, the variance is the sum of the squared deviations of all values in the probability distribution from the mean (used as a measure for the center) of the distribution. The reason why squared deviations are used is so that deviations to the left of the mean do not cancel deviations to the right of the mean when added together.³Otherwise, a distribution that has many observations far from the center and, therefore, has a large “spread,”

may end up with the same variance as a distribution with values very close to the mean and a small spread. In fact, a distribution that has a very wide spread may end up with a variance of zero if all large positive deviations from the mean have corresponding large negative deviations. This would make variance useless as a measure of spread.

Sometimes, a more convenient (and equivalent) way of expressing the variance is through the equation

These expressions link the variance explicitly to the first and the second moments of the distribution.

The variance of a distribution measures the spread in square units of the random variable, and the number is difficult to interpret. The standard deviation is widely used instead. The standard deviation simply takes the square root of the variance (σX=

Var( ˜X)), and presents a measure of the average deviation of the values in the distribution from the mean that has the same units as the random variable, hence making it easier to interpret.

In the financial context, standard deviation is often used interchangeably with the term “volatility.”

C o e f f i c i e n t o f V a r i a t i o n Let us consider again the picture in Exhibit 3.9.

We mentioned that the probability distribution for A has a smaller standard deviation than the probability distribution for B, but notice also that the mean of the distribution for A is lower than the mean for the distribution for B. If you had to invest in one of them, which one would you choose?

This situation brings up the idea of measuring spread (the “risk” of the distribution) relative to the mean (the “representative” value of the distribu-tion). This is the statistical concept of coefficient of variation (CV), which is reported in percentages, and is mathematically expressed as

CV= σ

µ×100,

whereµis the mean andσ is the standard deviation of the distribution. The CV gives us a unit-free ratio that can be used to compare random variables.

If the CV for investment A is 70% and the CV for investment B is 50%, we may decide that investment A is more “risky” than B relative to the average return we can expect, even though investment A has the smaller standard deviation.

CV represents the trade-off between expectation and risk from the sta-tistical point of view. In finance, the inverse ratio is often used (that is, instead of the amount of risk per unit of expected reward, one looks at the expected reward per unit of risk). The financial measure became popular based on work by Sharpe (see, for example, Sharpe 1994). We will talk about the Sharpe ratio in the context of portfolio optimization in Chapter 7.

The main idea behind using both measures, however, is the same.

R a n g e The range of a random variable is the difference between the

Dans le document • Part I, Fundamental Concepts, provides insights on the most important issues in fi nance, simulation, optimization, and optimization under uncertainty (Page 75-110)