Tests for two independent vectors - A practical introduction to statistics

When you have two vectors of observations, it is important to distinguish betweenIN -DEPENDENTvectors (random variables) andPAIREDvectors (random variables). In the

DRAFT

0 50 100 150

0.0000.0100.020

length in ms

density

t n

Figure 4.4: Estimated probability density functions for the length in milliseconds of thet andnin the Dutch prefixont-.

case of independent vectors, the observations in the one vector are not linked in a sys-tematic way to the observations in the other vector. Consider, for instance, sampling100 words at random from a frequency list compiled for Jane Austin’sPride and Prejudice, and then sampling another100words at random from a frequency list compiled for Herman Melville’sMoby Dick. The two vectors of frequencies can be compared in various ways in order to address differences in general frequency of use between the two writers, and contain independent observations. As an example of paired observations, consider the case in which a specific list of100word types is compiled, with for each word type its frequency inPride and prejudiceand its frequency inMoby Dick. The observations in the two vectors are now paired: The frequencies are tied, pairwise, to a given word. For such paired vectors, more powerful tests are available. In what follows, we first discuss tests for independent vectors. We then proceed to the case of paired vectors.

4.2.1 Are the distributions the same?

Recall that we observed a bimodal density for the Dutch prefixver-in Figure 4.1. The presence of two modes for this distribution can be traced to two distributions having been mixed together, a distribution of semantically more opaque, non-compositional words, and a distribution of semantically more transparent, compositional words. The data frame

DRAFT

verwith word frequencies also contains a column with information about semantic class (opaque versus transparent). Figure 4.5 plots the densities of the opaque and transparent words separately. The two distributions are quite dissimilar. There are many transparent and only a few opaque low-frequency words (recall that a log frequency of0represents a word with frequency1, which explains the hump of probability mass above the zero in the graph for transparent formations).

Figure 4.5 requires the following steps. We first partition the words into the two classes:

> ver.transp = ver[ver$SemanticClass == "transparent",]$Frequency

> ver.opaque = ver[ver$SemanticClass == "opaque", ]$Frequency

Next, we calculate the densities and store these, as we have to determine the limits for the horizontal and vertical axes before we can begin with plotting.

> ver.transp.d = density(ver.transp)

> ver.opaque.d = density(ver.opaque)

> xlimit = range(ver.transp.d$x, ver.opaque.d$x)

> ylimit = range(ver.transp.d$y, ver.opaque.d$y)

> plot(ver.transp.d, lty = 1, col = "black", + xlab = "frequency", ylab = "density", + xlim = xlimit, ylim = ylimit, main = "")

> lines(ver.opaque.d, col = "darkgrey")

Before we make too much of the separation visible in our density plot, we should check whether this separation might have arisen by chance. To avoid complaints about ties with theTWO-SAMPLEKOLMOGOROV-SMIRNOV TEST, we add some jitter.

> ks.test(jitter(ver.transp), jitter(ver.opaque)) Two-sample Kolmogorov-Smirnov test data: jitter(ver.transp) and jitter(ver.opaque) D = 0.3615, p-value = < 2.2e-16

alternative hypothesis: two.sided

The very smallp-value provides support for the classification of these words into trans-parent and opaque subsets, each with its own probability density function.

4.2.2 Are the means the same?

In Chapter 1, we had a first look at the81English nouns for which several kinds of ratings as well as visual lexical decision latencies were collected. Here we visualize how the word frequencies are distributed for the subsets of simple and complex words cross-classified by class (plantversusanimal) by means of a trellis boxplot.

> bwplot(Frequency ˜ Class | Complex, data = ratings) 84

DRAFT

0 5 10

0.000.100.20

frequency

density

Figure 4.5: Estimated probability density function of the transparent (black line) and opaque (grey line) words with the Dutch suffixver-.

Figure 4.6 suggests that the distributions of frequencies for plants and animals differ for simplex words, with the animals having somewhat higher frequencies than the plants.

We can ascertain whether we indeed have reason to be surprised by testing whether the means of these two distributions are different. The boxplots suggest reasonably symmet-ric distributions, so we use the two-sample version of thet-test and apply it to the subset of morphologically simple words.

> simplex = ratings[ratings$Complex == "simplex", ]

> freqAnimals = simplex[simplex$Class == "animal", ]$Frequency

> freqPlants = simplex[simplex$Class == "plant", ]$Frequency

> t.test(freqAnimals, freqPlants) Welch Two Sample t-test data: freqAnimals and freqPlants

t = 2.674, df = 57.545, p-value = 0.009739

alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

0.1931830 1.3443152 sample estimates:

mean of x mean of y 5.208494 4.439745

DRAFT

Frequency

animal plant

2 3 4 5 6 7

complex

animal plant

simplex

Figure 4.6: Boxplots for frequency as a function of natural class (animal versus plant) grouped by morphological complexity for81English nouns.

The summary of thet-test begins with the statement that a WELCH TWO-^SAMPLEt-TEST has been carried out. Thet-test in its simplest form presupposes that its two input vectors are normally distributed with the same variance. Often, however, the variances of the two input vectors are not the same. The Welch two samplet-test corrects for this difference by adjusting the degrees of freedom. Normally, degrees of freedom are integers. How-ever, you can see in this example that the Welch adjustment led to a fractional number of degrees of freedom:57.545..

The next lines of the summary explain what thet-test did: It calculated the difference between the two means, and then tested whether this difference is not equal to0. The95%

confidence interval around this difference in the means,5.208494−4.439745 = 0.768749, does not include zero. As expected, thep-value is less than0.05. If you need to know another confidence interval, for instance, the99% confidence interval, this can be specified with the optionconf.level:

> t.test(simplex[simplex$Class == "animal", ]$Frequency, + simplex[simplex$Class == "plant", ]$Frequency, + conf.level = 0.99)

t = 2.674, df = 57.545, p-value = 0.009739

alternative hypothesis: true difference in means is not equal to 0 86

DRAFT

99 percent confidence interval:

0.002881662 1.534616532

It is important to keep in mind that the two-samplet-test is appropriate only for rea-sonably symmetrical distributions. For the opaque and transparent words with the prefix ver-, where we are dealing with bimodal, and markedly asymmetric distributions, we use thewilcox.test():

> wilcox.test(ver.opaque, ver.transp)

Wilcoxon rank sum test with continuity correction data: ver.opaque and ver.transp

W = 113443.5, p-value = < 2.2e-16

alternative hypothesis: true mu is not equal to 0

This test confirms the conclusion reached above using the Kolmogorov-Smirnov test: We are dealing with two quite different distributions. These distributions differ in shape, and they differ in their medians, such that opaque words have the higher average frequency of use.

In Chapter 1 we started exploring the data set on the dative alternation in English studied by Bresnan et al. [2007]. We calculated the mean length of the theme for clauses with animate and inanimate recipients withtapply().

> tapply(verbs$LengthOfTheme, verbs$AnimacyOfRec, mean) animate inanimate

1.540278 1.071130

We now use a Welch two-samplet-test to verify that the two means are significantly dif-ferent.

> t.test(LengthOfTheme ˜ AnimacyOfRec, data = verbs) Welch Two Sample t-test

data: LengthOfTheme by AnimacyOfRec

t = 5.3168, df = 100.655, p-value = 6.381e-07

alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

0.2941002 0.6441965 sample estimates:

mean in group animate mean in group inanimate

1.540278 1.071130

Inspection of the distributions by means of a boxplot suggests some asymmetry for the inanimate group, but, as the reader may verify for herself, a Wilcoxon test also leaves no doubt that we have ample reason for surprise.

DRAFT

4.2.3 Are the variances the same?

It may be important to know whether the variances of two normal random variables are different. Here is an example from theRhelp forvar.test(). Two vectors with stan-dard normal random numbers with different means and stanstan-dard deviations are defined first.

> x <- rnorm(50, mean = 0, sd = 2)

> y <- rnorm(30, mean = 1, sd = 1)

Withvar.test()we subsequently observe that, as expected, the two variances are not the same.

> var.test(x, y)

F test to compare two variances data: x and y

F = 2.7485, num df = 49, denom df = 29, p-value = 0.004667 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval:

1.380908 5.171065 sample estimates:

ratio of variances 2.748496

TheF-value is the ratio of the two variances,

> var(x)/var(y) [1] 2.748496

the degrees of freedom are one less than the numbers of observations in each vector, so we can just as well calculate thep-value directly without invokingvar.test().

> 2 * (1 - pf(var(x)/var(y), 49, 29)) [1] 0.004666579

This test should only be applied to variances of normally distributed random variables.

The help page forvar.test()points to other functions that you should consider if this condition is not met.

Dans le document A practical introduction to statistics (Page 46-49)