Exercise 1.
(Tutorial for lesson page 5)Are people’s behaviour in relation to tobacco and people’s gender related, with a 10% significant level?
Here are the results of a survey made on a sample of 51 men and 66 women:
G : variable "gender" B : variable "behaviour in relation to tobacco"
Gm : men Bn : never smoked
Gw : women Bs : smoke
Bss : stopped smoking observed
frequencies:
theoretical frequencies
according to H0: Detailed Chi-squares and total:
Gm Gw Gm Gw Gm Gw
Bn 12 23 Bn Bn
Bs 31 26 Bs Bs
Bss 8 17 Bss Bss
1) Place the subtotals and the general total in the first table, and in the second one, identically.
2) Fill the second table (6 central theoretical values) following proportional calculations.
3) Table #3: calculate the six Chi-square, then add them to get the value χ²calc. 4) Test writing:
Null hypothesis:
Observed χ²
Value of the variable χ² between the observed and the theoretical samples: χ²calc = Rejection area
Significance level: α = Number of dof: (r-1)(k-1) =
Value of the variable χ² limit until rejection : χ²lim = Comparison and decision:
Exercise 2.
Two candidates A and B compete for a presidential election. Once the new president is elected, 500 people are interviewed among the voters. 100 of them are retired people, 50 are unemployed and 350 are employees.
In this sample, the vote results are:
candidates
A B blank/
abstention voters
unemployed 24 16 10
employees 122 148 80
retired 36 27 37
1) Decide, with a 1% significance level, whether people’s vote depends on their social group or not in this country.
2) What can we say if we do not include blank votes and abstentions?
Exercise 3.
The table shows attendance in two stores A and B: how many people made at least one purchase. These clients are sorted by age group (10 to 15 years old, and so on).
1. Say, with a 5% significance level, whether the chosen store depends on the age of a client.
store
age A B
10 - 15 46 24 15 - 20 29 35 20 - 40 14 17
> 40 12 18 2) What age group mostly contributes to the previous result? Explain.
3) Give the meaning of the “5% significance level” on your first answer.
4) According to your Chi² table, can you be more accurate about the chance taken in this statement (your first answer)?
Exercise 4.
In a survey, 100 people were asked about their age and their attendance at theatres (cinema). We name X the variable "age" and Y the variable "number of annual cinema shows". The survey result is the following table of quotes (fr.: citations) :
Y X [15 ; 25[ [25 ; 50[ ≥ 50
none 4 6 13
1 to 11 10 16 15
12 to 23 13 8 4
≥ 24 6 3 2
1) By a χ² independence test, with a 2% significance level, decide whether there’s a link or not between the age and the level of attendance at the cinema.
2) Using your form table, discuss the level of confidence you can assign to the assertion : “they are dependent”.
3) Identify the most important partial Chi-2s and give the meaning of these high values.
Exercise 5.
Using the data series introduced in the exercice 11, decide, by the mean of a Chi-square test, whether both variables are independent or not.
Exercise 6.
(Tutorial for lesson page 10)Let’s have a close look of a company’s turnover evolution through time.
Year N N+1 N+2 N+3
tri1 tri2 tri3 tri4 tri1 tri2 tri3 tri4 tri1 tri2 tri3 tri4 tri1 tri2 tri3 tri4 (M€) 28 45 49 36 30 44 48 40 28 46 52 37 31 42 54 39
Though there are big seasonal variations, due to its particular activity, is it possible to find out a global trend on several years?
Let’s decide to calculate and display the 5 by 5 moving means:
(do it as a group job: divide the set of calculations with your neighbours and share your results)
1-5 2-6 3-7 … 12-16
X Y calculations:
Exercise 7.
(Tutorial for lesson page 11)Let’s take back one of the examples introduced page 3 (lessons doc): effect of the amount of fertilizer on the harvested production.
fertilizer harvest
plot # X (kg.ha-1) Y (q.ha-1)
1 150 46
2 80 37
3 120 46
4 220 51
5 100 43
1) For each half-cloud, determine the mean points coordinates.
2) Determine the expression of the Mayer’s line (G1G2).
3) On a graph, plot the initial table and draw this line.
Exercise 8.
Determine the expression of the Mayer’s line, taking back the case given in exercise 6.
Exercise 9.
(Tutorial for lesson page 12)Calculate or display on your calculator: the means and standard deviations, the variances, the covariance, the expression of the least square method’s line.
1) Taking the data of exercise 7 (fertilizer/harvest)
2) Taking the data of exercise 4 (age/# of cinema shows) – choose 60 as average age for the class 50 and more;
choose 36 as average number of shows for the class 24 and more.
Exercise 10.
(Tutorial for lesson page 13)Let’s consider the following time series: a company’s annual expenses in advertising.
X : year 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Y : expense (k€) 41 60 55 66 87 61 90 95 82 120 125 118
The corresponding scatter plot is represented:
Determine the expression of the Y on X fitting line, following the least square method; then, draw it.
Exercise 11.
500 people, having passed their driving license exam, are sorted in the table below.
They are distributed with respect to the number X of times they took the exam before passing it and to the number Y of hours of driving lessons before their first attempt.
1) Define a margin frequency. Then, give an example from the table.
2) Describe, shortly, the way to enter the data set in your calculator.
3) Calculate the covariance of the pair (X, Y) and give a concrete comment about this value.
4) Among those who took between 15 and 25 hours of driving lessons, what is the rate of those who passed their exam on the third attempt?
5) Among those who passed their exam on the third attempt, what is the rate of those who took between 15 and 25 hours of driving lessons?
Exercise 12.
A sales agent wishes to analyse his (or her) activity and efficiency. On each appointment to a prospect have been noted the length (X, in minutes) of the presentation of the product, and the sold quantity (Y). The twelve values inside the table show the number of appointments that correspond to each pair (X, Y).
1) Give the meaning of the frequency "8" found inside the table.
2) Calculate, manually, the average time spent per appointment.
3) Give the covariance of the pair (X, Y).
Exercise 13.
The following table indicates the sales price (€) of an equipment and the number of sold items, for 4 years.
year rank 1 2 3 4
sales price (€) X 300 210 270 375
# of sold items Y 198 240 222 160
year 1: 2009
1) Build the scatter plot with an orthogonal frame. The axes intersection must be the point (210, 160);
scales: 1 cm for €15 on the abscissas axis, 1 cm for 10 items on the ordinates axis.
2) Determine the coordinates of G, mean point of the cloud.
3) a. Determine the expression of the Y on X fitting line, following the least square method.
The coefficients will be expressed with 6 significant figures.
b. Draw this regression line on the graph.
4) Which year saw the highest turnover? For which amount?
going further:
5) Now, we assume that, each year, the number of sold items y and the sales price x are related this way:
y = – 0.498 x + 349. We denote S(x) the turnover achieved by selling y items, €x each.
a. Express S(x) with respect to x.
b. Find the variations of the function S defined in [210 ; 375].
c. Deduce the sales price we would have to set for a fifth year if we want a maximum turnover. How many items will be sold (round to one unit)? For what turnover?
Exercise 14.
A survey wishes to compare people's expense in high tech equipment compared to their sales. Each column of the chart T below represents, in a given French land, the average monthly income of people (X) and the average monthly expense (Y) in high-tech equipment.
land A B C D E F
income X (€) 1550 1620 1770 1850 1930 2000
expense Y (€) 57 61 66 73 76 82
1) Calculate the covariance and then the linear correlation coefficient of the pair (X, Y).
Give an interpretation of both parameters.
2) a. Give, by the mean of your calculator, the expression of the Y on X regression line.
b. Obtain the expression of the Mayer's line of the series, from the chart T.
c. Both lines slightly differ. Find the income for which they both give the same expense. What makes this common point special, inside the point cloud?
Exercise 15.
(Tutorial for lesson page 17)Data about the fuel consumption of a motorcycle have been collected. Consumption: Y, in L/100km, speed: X, in km/h):
X 10 20 30 40 50 60 70 80 90 Y 15.2 11.6 9.3 7.8 7 6.6 6.9 8 9.6
The scatter plot, on the right, clearly shows us that a linear
regression would be inappropriate to describe the evolution of the consumption with respect to the speed. Thus, we will propose a variable change.
1) Let’s define the variable T by: T = (X – 60)².
Complete the following table:
T
Y 15.2 11.6 9.3 7.8 7 6.6 6.9 8 9.6
2) Perform a linear regression of Y on T.
3) Thus, deduce the expression of the regression curve, for the initial scatter plot.
Exercise 16.
quadratic fittingA company took note of its profits Y with respect to X, produced and sold quantity:
X (tons) 2 3 5 7 11
Y (k€) 38 55 72 69 24
T
1) Thanks to your calculator, give the linear correlation coefficient between X and Y. Comment.
2) Let’s settle the variable T = –(X – 6)².
a. Complete the table.
b. Calculate Cov(T, Y) and then the linear correlation coefficient between both variables.
c. Is a linear fitting of Y on T appropriate?
d. Determine the expression of the Y on T fitting line, following the least square method.
e. Deduce an expression of the regression of Y on X.
Exercise 17.
quadratic fittingA market study was conducted on a new type of product. The chart below gives, for several proposed sales price, the number of people willing to pay that price.
unit price (€) X 2 3 4 5 6 7
number of people Y 66 47 34 25 18 14
1) Calculate the covariance of the variables X and Y, then comment its sign.
2) We set T = X(X – 20)
a. Calculate le the linear correlation coefficient between both variables T and Y.
b. Comment its value.
c. Determine the expression of the Y on T fitting line, following the least square method.
d. Deduce an expanded expression of the regression of Y with respect to X.
3) Here we examine the expected turnover (unit selling price × number of sales), if the numbers of citations obtained in the survey are considered to be the numbers of units sold.
a. Calculate the turnovers that can be extracted from the initial table.
b. Calculate, for the same values of X, the turnovers CA' that can be got thanks to the formula obtained in question 2)d.
c. What unit selling price should we fix, so that the best turnover would be reached?
Exercise 18.
inverse fittingA perfumery, on analysing its turnover, connects the sales quantities (Y) to various perfume brands and models prices (X). The results are gathered in the following chart:
X, bottle’s price (€) 15 25 30 40 45 60 75 90 Y, # of sold bottles 202 117 107 82 78 60 55 48 Answer the questions beginning with "calculate" by using your calculator’s results.
1) a. Calculate the covariance of X and Y; comment its sign.
b. Calculate the linear correlation coefficient of X and Y; comment its value.
2) In order to have a more precise idea of how X and Y are related, we set the variable change: 850 T = X a. After having calculated the list of values of T, in a third list (calculator), justify that the linear correlation
is excellent between T and Y.
b. Give the expression of the Y on T regression line, according to the least square method.
c. What is the least square criterion?
d. Deduce from question 2)b a modelled expression of Y with respect to X.
e. According to this model, how many bottles whose cost is €150 would the perfumery expect to sell?
Exercise 19.
(Tutorial for lesson page 18)Calculate the point estimates, in the given situations.
1) Taking back exercise 10, give an estimate of the expense in 2022.
2) Taking back exercise 7, give an estimate of the quantity of fertilizer that would offer a harvest of 60 q/ha.
3) Taking back exercise 15, give an estimate of the fuel consumption when the speed is 100 km/h.
Exercise 20.
(Tutorial for lesson page 18)Let’s take back exercise 10. We want to estimate the expense, for the year 2022, by a 95% confidence interval.
1) a. Get the values of Y’, from the values of X and the expression of the fitting line;
b. Get the values of Z, by dividing Y by Y’; c. Then, give the mean and standard deviation of Z.
2) Give the point estimate of the expense in 2022.
3) Give the coefficient u corresponding to the confidence level.
4) Then, give the confidence interval.
Exercise 21.
(Tutorial for lesson page 18)With exercise 7, estimate the harvest by a 99% confidence interval, due to 300 kg/ha of fertilizer.
1) a. Get the values of Y’, from the values of X and the expression of the fitting line;
b. Get the values of Z, by dividing Y by Y’; c. Then, give the mean and standard deviation of Z.
2) Give a point estimate of the harvest.
3) Give the coefficient u corresponding to the confidence level.
4) Then, give the confidence interval.
Exercise 22.
(Tutorial for lesson page 18)On each person in a sample, a survey noted the age class (X) and the visual acuity (Y, 1/10 = 0.1):
X
[5 ; 35[ [35 ; 45[ [45 ; 55[ [55 ; 65[
Y
0.3 1 5 10 20
0.6 8 12 25 18
0.9 55 30 14 6
Estimate the visual acuity of a 80 year-old person, by a 99% confidence interval.
Exercise 23.
In a country, two variables are compared: the consumer force index and the turnover of its car industry:
consumer force (index) X 3.26 3.85 3.44 3.08 3.6
car industry turnover (G€) Y 9.3 9.56 9.36 9.24 9.47
1) Give the expression of the Y on X Mayer’s line.
2) By the mean of a point estimate, give a value of the consumer force that would correspond to a G€ 10 car industry turnover.
3) Is a strong correlation between two variables a sign of a cause and effect relationship between them?
Exercise 24.
least square + confidence intervalMonthly revenues of a commercial website are listed below, from January to December 2018:
in k€ : 3 5 4 8 10 9 13 12 17 18 18 21
1) In a few words, describe the least square method.
2) Thanks to the global trend of the evolution of the monthly revenue, give the 95% confidence interval of the predictable revenue in December 2019. (number the months from 1 for January 2018)
3) Give the probability that, in December 2019, the revenue would be less than k€ 29.23.
4) Build the scatter plot (scale: 1 cm for two monthes), draw the regression line and finally represent the confidence interval.
Exercise 25.
Mayer + confidence intervalcity X Y The given table includes eight among the major cities of a country. The variable X gives, in thousands, the number of city residents; the variable Y gives, in
thousands, the number of students in this city.
1) Build the scatter plot from this data series.
2) Give the coordinates of the mean point of the cloud.
3) a. Using Mayer’s method, determine manually the expression of the Y on X regression line.
b. Draw this line. Does G belong to it?
c. Give "Mayer’s principle".
A 850 58 B 623 37 C 587 38 D 360 20 E 312 16 F 275 15 G 262 12 H 244 12
4) We will use here another fitting line, whose expression is: y' = 0.07 x – 6.
a. With this line, give the 95% confidence interval of the predictable number of students in a town that has two million inhabitants.
b. What can we say about the chances that the number of students would exceed 155,000 in such a town ?
Exercise 26.
logarithmic fitting + confidence intervalService life of some identical office equipment has been studied. In the following table, ti represents the duration of use - expressed in thousands of hours - and R(ti) the rate of equipment still in use at the time ti. (e.g. : after 1,000 hours, ti = 1, there are still 90 % left of equipment in use, R(ti) = 0.90)..
ti 1 2 3 4 5 6 7 8 9
R(ti) 0.9 0.66 0.53 0.4 0.32 0.25 0.19 0.14 0.1
1) We set yi = ln[R(ti)] where ln is the natural logarithm. Fill the following table, then build the scatter plot, using the points Mi (ti, yi), into an orthogonal frame.
ti 1 2 3 4 5 6 7 8 9
yi
2) May a linear fitting be relevant in the previous point?
Calculate the linear correlation coefficient between T and Y.
3) Using the least square method, determine an expression of the Y on T regression line.
Deduce from this expression that there are two positive real numbers k and λ such that: R(t) = k e–λt. 4) In this question, we'll take k = 1.174 and λ = 0.266.
a. Determine the predictable rate of equipment still in use after 10,000 hours.
b. After how long are there exactly 50 % of equipment still in use?
5) Give a 99% confidence interval of the rate of equipment still in use after 10,000 hours of service.
Exercise 27.
100 children have been classified by age (X) and size (Y):
Y
[95 ; 105[ [105 ; 125[ [125 ; 135[
X
[3 ; 5[ 15 10 0
[5 ; 7[ 8 32 5
[7 ; 9[ 2 13 15
1) Enter this table in your calculator.
2) Give the means and standard deviations of X and Y, calculate their covariance.
3) Calculate their linear correlation coefficient. Comment this value.
4) Nevertheless, does the table allow us to see some trend?
5) Assuming that the relationship between age and size is linear until the age of 12, give the 95% confidence interval of the size of a 12 year-old child.
IUT TC MATHEMATICS FORM FOR BIVARIATE STATISTICS
χ
² law tableThe table gives values χ²lim
such that p(χ² > χ²lim) = α
α α α α
χ²lim
χ
²α 1 − α