Universit´e Joseph Fourier L2/STA230
Lab 2: Probabilities and quantiles and description of a continuous quantitative variable
Objectives: The first objective of this session is to compute some probabilities and quantiles of some usual distributions. The second objective is to compute the usual descriptive indicators and graphs for a continuous quantitative variable.
1 Computation of probabilities and quantiles
R proposes exact computation of density functions, cumulative distribution function and quantile func- tions for standard probability families. For example,
• Gaussian distribution:
– Its density function isdnorm,
– its cumulative distribution function ispnorm(input a value, output a prob- ability), – its quantile function isqnorm(input a probability, output a value).
• Binomial distribution
– Its density function isdbinom,
– its cumulative distribution function ispbinom, – its quantile function isqbinom,
Exercise 1
From past experience, it is known that a certain surgery has a 90% chance to succeed. This surgery is going to be performed on 5 patients. LetX be the random variable equal to the number of successes out of the 5 attempts.
1. What model do you propose forX?
2. SimulateX
n <- 5; p <- 0.9 # parameters N <- 100 # sample size
X <- rbinom(N,n,p); X # simulated sample
3. What is the probability that the surgery will fail all 5 times?
n <- 5; p <- 0.9 # parameters N <- 1e7 # sample size
X <- rbinom(N,n,p) # sample
F <- length(which(X==0))/N; F # frequency of event dbinom(0,n,p) # R does it exactly
4. What is the probability for the surgery to fail exactly 2 times? Calculate exactly the probability with R.
5. Simulate the convergence of the proportion of 5 surgeries to fail exactly 2 times when the sample size increases.
N <- 2e4 # sample size X <- rbinom(N,n,p) # sample
F <- cumsum(X==3)/(1:N) # frequencies plot(1:N,F,pch=".") # plot frequencies
abline(h=dbinom(3,n,p),col="red") # theoretical value
6. What is the probability for the surgery to succeed at least 2 times? Simulate the convergence.
Exercise 2
The heightX of men in France is modeled by a normal distributionN(172,196) (unit: cm).
1. What proportion of French men are less than 160 cm tall?
mfm <- 172; sdfm <- sqrt(196) # parameters
p <- pnorm(160,mean=mfm,sd=sdfm); p # probability 2. What proportion of French men are more than two meters tall?
3. What proportion of French men are between 165 and 185 centimeters tall?
4. If ten thousand French men chosen at random were ranked by increasing height, how tall would be the 9000-th?
2 Basic statistics of a continuous quantitative variable
Exercise 3
Data HER (Health Exam Results) are from the US department of Health and Human Services, National Center for Health Statistics, and correspond to the Third National Health and Nutrition Ex- amination Survey. Variables that have been collected are:
iden: identification number of the individual dias: diastolic blood pressure (mmHg) sex: 0 for men, 1 for women chol: cholesterol (mg)
age: in years BMI: body mass index (kg/m2)
ht: height (cm) leg: upper leg length (cm)
wt: weight (kg) elbow: elbow breadth (cm)
waist: circumference (cm) wrist: wrist breadth (cm) pulse: pulse rate (beats per minute) arm: arm circumference (cm) sys: systolic blood pressure (mmHg) treat: treatment group
1. Upload theher.csvfile, and assign it todata.
data=read.table("her.csv", header=TRUE, sep = "")
2. Display its dimensions, its first 10 rows, the data of rows 2,4,5, and columns 5,6. Display the column names.
head(data) data[1:10,]
data[c(2,4,5),]
data[,c(5,6)]
names(data)
2
3. Assign the fourth column of datatoH(height). Display the summary of H.
4. Plot a histogram of H, add a red line indicating the mean, a blue line indicating the median, two green lines indicating the quartiles.
hist(H)
abline(v=mean(H), col="red")
Plot a histogram with 30 classes and comment.
hist(H, nclass=30)
5. Illustrate the distribution with a boxplot.
boxplot(H)
Compare with two box plots the distribution of the height of men and women.
HW<-data[data$sex==1,4]
HM<-data[data$sex==0,4]
boxplot(HW, HM, names=c("Women", "Men"), main="height")
boxplot(data$height data$sex, names=c("Women", "Men"), main="height")
6. Plot the empirical cumulative distribution function.
7. Define HcrasHcentered and reduced.
8. Plot a histogram of the empirical frequencies of Hcr, superpose a blue histogram with 30 classes, then a red density plot.
hist(Hcr,probability=TRUE)
hist(Hcr,nclass=30,probability=TRUE,border="blue",add=TRUE) lines(density(Hcr),col="red")
3