The degree of confidence we have that this interval will cover the true mean

Random Variables, Probability Distributions, and Important

2. The degree of confidence we have that this interval will cover the true mean

s from the sample to approximateσX—and modify the estimation slightly to account for the additional inaccuracy that using a sample statistic in the formula brings.¹⁰

The concept of confidence interval (CI) estimates is to present not only a point estimate for the parameter of interest (in this case, one value for the sample mean X), but also to state something about how far away this estimate will be from the true parameter (the true mean of the underlying distribution). This is achieved by stating an interval centered at X whose length depends on:

1. The variability of the estimate (the larger the standard error of X, the less likely it is that we will get the correct estimate for the true mean from the sample).

2. The degree of confidence we have that this interval will cover the true mean.

Typical values for the confidence level include 90%, 95%, and 99%. t-distribution with N−1 degrees of freedom.¹¹In words, the (100−α)% CI for the sample mean X states that the probability that the interval computed from the preceding formula covers the true meanµis (100−α)%.

As we saw in Exhibit 3.17, the t-distribution becomes very close to the normal distribution as the parameter for degrees of freedomν (which here equals N−1) becomes large. In the context of simulation, where sample sizes would generally be large, replacing the value of the percentile from the t-distribution with the value of the percentile from the normal distribution will make no difference for all practical purposes.

3 . 1 1 . 3 B o o t s t r a p p i n g

The CI estimate for the mean is the confidence interval most widely used in practice. However, sometimes we need to evaluate the accuracy of other parameter estimates for one or more distributions. For example, we may be concerned about how accurate our estimates of the distribution percentile or skewness are. (This kind of application is relevant in portfolio risk man-agement.) Alternatively, suppose that you have computed the correlation between historical returns on Investment A and Investment B from Ex-hibit 3.9, and have discovered that it is 0.6. Is this a strong correlation in the statistical sense, that is, does it appear to be strong just by chance (because we picked a good sample), or can we expect that will remain strong in other samples, given the overall variability in this sample’s data?

The procedure of computing the CI estimate of the mean is a repre-sentative of classical methods in statistics, which relied on mathematical formulas to describe the accuracy of a sample statistic estimate. There are some theoretical results on confidence interval estimates for distribution parameters other than the mean, but in general, the results are not as easy to summarize as the closed-form expression for the CI estimate for the mean from section 3.2.

Bootstrapping¹²is a statistical technique that is useful in such situations.

It involves drawing multiple samples of the same size (say, k samples) at random from the observations in the original sample of n observations.

Each of the k samples is drawn with replacement—that is, every time an

observation is drawn, it is placed back in the original sample, and is eligible to be drawn again, in the same sample or in future samples.

After drawing k samples from the original sample, the statistic corre-sponding to the parameter we are interested in estimating (whether it is percentile, skewness, correlation, or a general function of the random vari-able) is computed for each of the k samples. The so-obtained k values for the sample statistic are used to form an approximation for the parameter’s sam-pling distribution. This samsam-pling distribution can be used, for example, to determine what values of the sample statistic are most likely to occur, what is the variability of the estimate, and what the 5th and the 95th percentile of the approximate sampling distribution are. These percentiles provide an approximation for the confidence interval for the true parameter we are trying to estimate.

At first glance, this technique appears pointless. Why do we not collect k more samples instead of reusing the same data set k times? The reason is that it is expensive to collect data, and that sometimes additional data collection is not an option, especially with financial data.

Bootstrapping is used not only for evaluating the accuracy in the estimate of a probability distribution parameter, but also as a simple method for generating forecasts for uncertain quantities based on a set of historical observations for risk management purposes. The concept of simulation is necessary to understand how the bootstrap method is actually applied, as the random draws from the original sample need to be generated with a random number generator. (See Chapter 4 for a discussion of the simulation technique, and Practice 4.2 on the companion web site to see an application of the bootstrapping technique for evaluating the accuracy of the estimate of the 5th percentile of a portfolio’s return.)

3 . 1 1 . 4 H y p o t h e s i s T e s t i n g

Hypothesis testing is a fundamental concept in statistics. The procedure is as follows. We start out by stating a hypothesis (the null hypothesis) about a parameter for the distribution that corresponds to a variable of interest. For example, we claim that the skew of the distribution for the returns of the S&P 500 index is zero. (As we will see later in this book, while often unjustified, making assumptions like these substantially simplifies risk measurement and derivative pricing.) Then we take a sample of S&P 500 returns, and estimate the skew for that sample. Suppose we get a skew of –0.11. The statistical hypothesis testing methodology allows us to test whether the value obtained from the sample (–0.11) is sufficiently “close” to the hypothesized value (0) to conclude that, in fact, the “real” skew could indeed be zero, and that the difference is only a consequence of sampling error.

To test whether the difference between the hypothesized and the ob-served value is statistically significant, that is, large enough to be con-sidered a reason to reject the original hypothesis, we would compute a quantity called test statistic, which is a function of observed and hypoth-esized parameters. In some cases, this test statistic follows a known prob-ability distribution. For example, if we are testing a hypothesis about the sample mean, then the test statistic is called the t-statistic because it fol-lows a t-distribution with degrees of freedom equal to the sample size mi-nus one. Knowing that the test statistic follows a specific distribution, we can see whether the test statistic obtained from the sample is “very far”

from the center of the distribution, which tells us whether the sample es-timate we obtained would be a rare occurrence if the hypothesized value were true.

There are different ways to measure the rarity of the observed statis-tic. Modern statistical software packages typically report a p-value, which is the probability that, if our null hypothesis were true, we would get a more extreme sample statistic than the one we observed. If the p-value is

“small” (the actual cutoff is arbitrary, but the values typically used are 1%, 5%, or 10%), then we would reject the null hypothesis. This is because a small p-value means that there is very small probability that we would have obtained a sample statistic as extreme as the one we obtained if the hypothesis were true, that is, that the current test statistic is in the tails of the distribution.

The type of hypothesis tests we described so far are called one-sample hypothesis tests because they interpret the information only for one sample.

There are two-sample hypothesis tests that compare observed statistics from two samples with the goal of testing whether two populations are statistically different. They could be used, for example, to compare whether the returns of one investment have been statistically better on average than the returns on another investment.

Finally, more sophisticated hypothesis tests are used in the background for a number of important applications in financial modeling. Multivariate regression analysis, which we will discuss in the context of factor model estimation in Chapter 11, uses hypothesis testing to determine whether a forecasting model is statistically significant. Goodness-of-fit tests, which are used in the context of fitting probability distributions to observed data, use statistical hypothesis testing to determine whether the observed data are statistically “significantly close” to a hypothesized probability distribution.

Chi-square tests, which we mentioned in section 3.7.2, are a very popular type of goodness-of-fit tests. (See also the discussion on selecting probability distributions as inputs for simulation models in section 4.1.1 of the next chapter.)

S U M M A R Y

Uncertainty can be represented by random variables with associated probability distributions.

Probability distributions can be viewed as “listings” of all possible values that can happen and the corresponding probability of occurrence, and are conveniently represented in graphs (histograms and bar charts).

Probability distributions can be discrete or continuous. Discrete dis-tributions are defined over a countable number of values, whereas continuous distributions are defined over a range. Discrete distribu-tions are described by probability mass funcdistribu-tions (PMFs), whereas continuous distributions are described by probability density functions (PDFs).

A continuous random variable can take an infinite number of values, and the probability of each individual value is 0. In the context of continuous distributions, we replace the concept of probability of a value with the concept of probability of an interval of values, and the probability is the area under the PDF.

Probability distributions can be summarized in terms of central ten-dency (mean, median, and mode), variability (standard deviation, vari-ance, coefficient of variance [CV], range, percentile, etc.), skewness, and kurtosis.

The mean, variance, skew, and kurtosis of a probability distribution are related to the first, second, third, and fourth moment of the distribution.

The moments of a probability distribution can be computed from its moment generating function. Inversely, if the moments of a probability distribution are known, most generally the distribution can be uniquely identified. However, sometimes the moments do not exist.

Important discrete probability distributions include the binomial distri-bution, the Bernoulli distridistri-bution, the discrete uniform distridistri-bution, and the Poisson distribution.

Important continuous probability distributions include the normal dis-tribution, the continuous uniform disdis-tribution, the triangular distri-bution, the Student’s t-distridistri-bution, the lognormal distridistri-bution, the exponential distribution, the chi-square distribution, and the beta distribution.

Covariance and correlation are measures of average codependence of two random variables.

Empirical distributions are derived from data. All summary measures for probability distributions apply to empirical distributions.

The Central Limit Theorem states that the probability distribution a sum of N independent and identically distributed observations tends to a normal distribution with a specific mean and standard deviation.

Confidence intervals allow us to state both our estimate of a parame-ter of an unknown probability distribution and our confidence in that estimate.

Bootstrapping is a statistical technique for evaluating the accuracy of sample statistics based on one sample of observations. It involves draw-ing multiple samples from the original sample, calculatdraw-ing the sample statistic in each of them, and analyzing the resulting distribution of the sample statistic. Bootstrapping can also be used as a simple way to gen-erate future forecasts for uncertain quantities based on a sample of past observations.

Hypothesis testing is a statistical methodology for evaluating whether a statistic observed in a sample is significantly different from the expected value of a parameter of interest. It is used in many applications, including goodness-of-fit tests, multivariate regression analysis, and comparisons of two populations.

S O F T W A R E H I N T S

@ R I S K

@RISK has very convenient features for defining probability distributions. As explained in Appendix B, Introduction to @RISK (on the companion web site) you can see the distribution and how its shape changes immediately after entering the distribution parameters in the dialog box. It is also easy to compute probabilities and cumulative probabilities with @RISK by simply sliding the bar on the graph, as shown in Appendix B and Chapter 4. @RISK provides summary statistics for distributions automatically.

To calculate means, standard deviations, and other summary statistics for empirical distributions (that is, when you are given data in a sample, rather than data that have already been processed and organized to fit a probability distribution), you can use Excel’s built-in functions as well:

=AVERAGE(Array)computes the sample average value ofArray.

=STDEV(Array)computes the sample standard deviation ofArray.

=VAR(Array) computes the sample variance ofArray.

=PERCENTILE(k,Array)returns the value that represents the kth per-centile in^Array.

=PERCENRANK(Array,x)returns the rank of the value x inside the data setArray.

=COVAR(Array1,Array2)computes the sample covariance of Array1 andArray2.

=CORREL(Array1,Array2) computes the sample correlation of Array1andArray2.

E X H I B I T 3 . 2 1 Example of calculation of the mean of a general discrete probability distribution with Excel.

Unfortunately, Excel has no built-in functions for computing descrip-tive statistics for general distributions. In other words, you cannot request that Excel compute the weighted average of distribution values. However, it is easy to do this manually. The functionSUMPRODUCT(Array1, Array2) multiplies the corresponding values in Array1 and Array2, and adds them up. Exhibit 3.21 illustrates an example of its application. In this case, SUMPRODUCTcomputes the mean of a general discrete distribution by taking each of the values in the distribution (2, 5, and 10), multiplying it by its probability (0.6, 0.1, and 0.3), and adding up all the resulting products.

M A T L A B

We provide the MATLAB commands for producing the probability distri-butions graphs in this chapter.

B e r n o u l l i D i s t r i b u t i o n Note that the Bernoulli distribution is just a bino-mial distribution with one trial, so we can use the MATLAB command for binomial PDF to plot the PDF of the Bernoulli distribution.

p = 0.3; % Probability of success for each trial n = 1; % Number of trials

k = 0:n; % Outcomes

m = binopdf(k,n,p); % Probability mass vector bar(k,m) % Create a bar chart

grid on

B i n o m i a l D i s t r i b u t i o n

p = 0.3; % Probability of success for each trial n = 3; % Number of trials

k = 0:n; % Outcomes

m = binopdf(k,n,p); % Probability mass vector

bar(k,m) % Create a bar chart to visualize the probability distribution

grid on

N o r m a l D i s t r i b u t i o n

>> mu = 0; sigma = 1;

>> x=linspace(-5,5,45); y = normpdf(x,mu,sigma);

>> plot(x,y); grid on

D i s c r e t e U n i f o r m D i s t r i b u t i o n

>> x = 1:6; y = (1/6)*ones(1,6);

>> bar(x,y); grid on

C o n t i n u o u s U n i f o r m D i s t r i b u t i o n To obtain the picture in Exhibit 3.12(A), use

>> x = 3:0.1:3.5; y = unifpdf(x,3,3.5);

>> plot([3 x 3.5],[0 y 0])

To obtain the picture in Exhibit 3.12(B), use

>> x = 0:0.1:1; y = unifpdf(x);

>> plot([0 x 1],[0 y 0])

T r i a n g u l a r D i s t r i b u t i o n MATLAB has no function for computing the tri-angular PDF directly, but you can plot the function manually by using plot(x,y). For example, use

>> x = [0 1 2]; y = [0 1 0];

>> plot(x,y)

S t u d e n t ’ s t- D i s t r i b u t i o n The function tpdf(x,n)computes the PDF of a distribution with n degrees of freedom. To obtain the family of t-distributions with varying degrees of freedom in Exhibit 3.14, use

>> x=linspace(-6,6,45);

>> y=tpdf(x,2); plot(x,y); grid on;

>> hold on

>> y=tpdf(x,8); plot(x,y,’r:’);

>> y=tpdf(x,30); plot(x,y,’g---’); >> hold off

L o g n o r m a l D i s t r i b u t i o n

>> x = (0:0.02:10); y = lognpdf(x,0,1);

>> plot(x,y); grid on;

E x p o n e n t i a l D i s t r i b u t i o n

>> lambda = 0.2; % Failure rate t = 0:0.3:30; % Outcomes

f = exppdf(t,1/lambda); % Probability density vector plot(t,f) % Visualize the probability distribution grid on

P o i s s o n D i s t r i b u t i o n

>> x = 0:15; y = poisspdf(x,5);

>> bar(x,y); grid on

C h i - S q u a r e D i s t r i b u t i o n

>> x = 0:0.2:20; y = chi2pdf(x,5); plot(x,y); grid on

B e t a D i s t r i b u t i o n To get the graph in Exhibit 3.18(A), use

>> x=linspace(-.2,1.2,45); y = betapdf(x,4,6);

>> plot(x,y); grid on;

To get the graph in Exhibit 3.18(B), use

>> x=linspace(-.2,1.2,45); y = betapdf(x,6,4);

>> plot(x,y); grid on;

MATLAB has a number of built-in functions for computing descriptive statistics of samples of data. These includemean(for calculating the mean of a sample),^std(for calculating sample standard deviation),^cov(for cal-culating sample covariance),corrcoef(for calculating sample correlation), andprctile(for calculating the kth percentile of a given set of data). See MATLAB’s help for specific syntax and examples. Also, as explained in Appendix C, Introduction to MATLAB (on the companion web site) MAT-LAB’s commands usually use as inputs multidimensional arrays, so you can compute summary statistics along different dimensions of the arrays. For

example, suppose you have a set of data for the returns on 30 assets over 100 days, stored in a matrixDataMxwith 100 rows and 30 columns. You can compute the mean return on a specific asset over the 100 days by com-puting the mean in that stock’s column (mean(DataMx,2)), or compute the mean return on the 30 assets on a particular day by computing the mean in that day’s row (mean(DataMx,1)), with a single command option in the mean command that specifies which dimension you are considering.

N O T E S

1. The notation X_i, a capital letter corresponding to the name of the random variable, followed by an index, is a standard statistics notation to denote the ith observation in a sample of observations drawn from the distribution of the random variable ˜X.

2. A complete review of probability theory is beyond the scope of this book, but the concept of moments of probability distributions is often mentioned in quantitative finance, so let us explain briefly what a moment generating function (MGF) is. The kth moment of a random variable is defined as

m_kX˜

All values x of the random variable

x^k·P( ˜X=x) ∞

−∞

x^k·f (x) dx .

For many useful cases (albeit not for all because for some distributions some moments may not exist), knowing these moments allows us to indentify uniquely the actual probability distribution. The MGF lets us generate these moments. It is defined as

All values x of the random variable

e^tx·P( ˜X=x) ∞

−∞

e^tx· f (x) dx .

By setting the value of the parameter t to different values (that is, 1,2,3, etc.), we can recover the first, second, third, and so on moment. While this con-struction looks rather awkward, it is very useful for proving theoretical results such as “The sum of two independent normal random variables is also a normal random variable.” (The sum of two random variables does not necessarily have the same distribution as the random variables in the general case.) This is done by computing the MGF of the sum of two normal variables, and then check-ing if the resultcheck-ing MGF is of the same type as the MGF of a normal random variable, which is computationally more convenient than computing the actual probability distribution of the sum of two random variables.

3. Recall that squaring a number always results in a nonnegative number, so the sum of squared terms is nonnegative.

4. Some software and simulation software packages report excess kurtosis, rather than kurtosis, as default. The excess kurtosis is computed as the value for kurtosis minus 3 (the kurtosis of the normal distribution), so that the normal distribution has kurtosis of 0. Make sure that you check the help file for the software package you are using before you interpret the value for kurtosis in statistical output.

5. The Poisson distribution is also widely used in operations management ap-plications, such as queue management, revenue management, and call center staffing.

6. W. S. Gossett stumbled across this distribution while working for the Guinness brewery in Dublin. At the time, he was not allowed to publish his discovery under his own name, so he used the pseudonym Student, and this is how the distribution is known today.

7. The squared random variables arise when computing squared differences be-tween observed values (which vary because the sample is random) and expected values. Chi-square goodness-of-fit tests consider the sum of the total squared differences between the observed frequencies of values in a sample and the ex-pected frequencies if the sample came from a particular distribution. Whether the difference between what was observed and what was expected is statistically

“large” depends on value of the difference relative to the critical value of the chi-square test statistic, which is computed from the chi-square distribution. See

Dans le document • Part I, Fundamental Concepts, provides insights on the most important issues in fi nance, simulation, optimization, and optimization under uncertainty (Page 110-121)