Adaptive sparse Poisson functional regression for the analysis of NGS Data.
S. Ivanoff‡, F. Picard?, V. Rivoirard‡
‡CEREMADE, Univ. Paris Dauphine,F-75775 Paris, France
?LBBE, Univ. Lyon 1, F-69622 Villeurbanne, France
April 2016
Outline
1 Studying replication by high throughput sequencing
2 Poisson functional regression
3 Calibration of the Lasso weights
4 Simulation study
5 Application on OriSeq data
Next Generation Sequencing Data
• Massive parallel sequencing of DNA molecules
• Can be used to quantify DNA in a sample
• Expression, copy numbers, DNA-prot. interactions
• Focus on mapped data
DNA1?X DNA2
?X
Cut Amplify sequence
map
map
DNA15X DNA2 3X
Outline of the OriSeq project
• DNA replication: duplication of 1 molecule into 2 daughter
molecules
• The exact duplication of
mammalian genomes is strongly controlled
• Spatial control (loci choice)
• Temporal control (firing timing)
⇒ What are the (epi)genetic determinants of these controls ?
Origin of replication
Chromatin
Mapping human replication origins: a technical challenge
• “bubbles” are small and instable (last only minutes by cycle)
• no clear consensus sequence (like in S. Cerevisiae)
• their specification is associated with both DNA sequence and chromatin structure
⇒ Origin-Omics: SNS Sequencing
Extraction & Purification Short Nascent Strand (SNS)
- qPCR analysis (local) - DNA tiling arrays - Sequencing (Ori-Seq)
Selection of 1.5-2kb SS fragments lambda exonuclease Digestion
Example of OriSeq data
0123
Genomic coordinates
Number of reads in 2.5kb windows
Outline
1 Studying replication by high throughput sequencing
2 Poisson functional regression
3 Calibration of the Lasso weights
4 Simulation study
5 Application on OriSeq data
Introduction to Poisson functional Regression
• Yt the observed number of reads at positionXt on the genome, with t = 1, . . . ,n.
• Model sequencing data by functional Poisson regression Yt|Xt ∼ P(f0(Xt)),
• Goal: estimatef0.
• f a candidate estimator off0, decomposed on a functional dictionary with p elements {ϕ1, . . . , ϕp}:
logf(x) =
p
X
j=1
βjϕj(x)
Dictionaries vs. basis approaches
• The basis approach is designed to catch specific features of the signal (p=n)
• If many features are present simultaneously ?
• Consider overcomplete dictionaries (p>n)
• Typical dictionaries:
Histograms, Daubechies wavelets, Fourier
How to select the dictionary elements ?
0 200 400 600 800 1000
1020304050
Dictionaries vs. basis approaches
• The basis approach is designed to catch specific features of the signal (p=n)
• If many features are present simultaneously ?
• Consider overcomplete dictionaries (p>n)
• Typical dictionaries:
Histograms, Daubechies wavelets, Fourier
How to select the dictionary elements ?
0 200 400 600 800 1000
1020304050
Dictionaries vs. basis approaches
• The basis approach is designed to catch specific features of the signal (p=n)
• If many features are present simultaneously ?
• Consider overcomplete dictionaries (p>n)
• Typical dictionaries:
Histograms, Daubechies wavelets, Fourier
How to select the dictionary elements ?
0 200 400 600 800 1000
10203040
A Penalized likelihood framework
• We consider a likelihood-based penalized criterion to select β,
• We denote by Athen×p-design matrix with Aij =ϕj(Xi), Y= (Y1, . . . ,Yn)T
• The log-likelihood of the model is:
logL(β) =X
j∈J
βj(ATY)j −
n
X
i=1
expX
j∈J
βjAij
−
n
X
i=1
log(Yi!),
• Selection can be performed by the lasso such that:
βb ∈arg min β∈Rp
−logL(β) +
p
X
j=1
λj|βj|
.
Outline
1 Studying replication by high throughput sequencing
2 Poisson functional regression
3 Calibration of the Lasso weights
4 Simulation study
5 Application on OriSeq data
Weights calibration using concentration inequalities
• Gaussian framework with noise variance σ2, weights for the Lasso
∝σ√ logp
• λj is used to control the fluctuations of ATj Y around its mean,
• key role of Vj, a variance term (the analog ofσ2) defined by
Vj =V(ATj Y) =
n
X
i=1
f0(Xi)ϕ2j(Xi).
• For anyj, we choose a data-driven value for λj as small as possible so that with high probability, for any j ∈ {1, ...p},
ATj (Y−E[Y])| ≤λj.
Form of the data-driven weights for the Lasso
• Letj be fixed andγ >0 be a constant. Define Vbj =Pn
i=1ϕ2j(Xi)Yi
the natural unbiased estimator of Vj and
Vej =Vbj + r
2γlogpVbjmax
i ϕ2j(Xi) + 3γlogpmax
i ϕ2j(Xi).
• Let
λj = q
2γlogpVej +γlogp
3 max
i |ϕj(Xi)|,
• then
Pr
|ATj (Y−E[Y])| ≥λj
≤ 3 pγ.
Using the group Lasso for Poisson Functional Regression
• In some situations, coefficients can be grouped:
{1, . . . ,p}=G1∪. . .∪GK
• For wavelets: group by scales for instance (or adjacent positions)
• In this case selection can be performed by the group-Lasso such that:
βb ∈arg min β∈Rp
(
−logL(β) +
K
X
k=1
λkkβGkk2 )
.
• The block`1−norm onRp is defined by:
kβk1,2=
K
X
k=1
kβGkk2 =
K
X
k=1
sX
j∈Gk
|βj|2
Form of the data-driven weights for the group-Lasso
• λgk should depend on sharp estimates of the variance parameters (Vj)j∈Gk
• Letk ∈ {1, . . . ,K} be fixed andγ >0 be a constant. Assume that there exists M >0 such that for anyx,|f0(x)| ≤M.
• Let
ck = sup
x∈Rn
kAGkATG
kxk2 kATG
kxk2 .
• For all j ∈Gk, still with Vbj =Pn
i=1ϕ2j(Xi)Yi,define
Vejg = Vbj + r
2(γlogp+ log|Gk|)Vbjmax
i ϕ2j(Xi) + 3(γlogp+ log|Gk|) max
i ϕ2j(Xi).
Form of the data-driven weights for the group-Lasso
• Letγ >0 be fixed. Definebki =q P
j∈Gkϕ2j(Xi) andbk = max
i bik. Finally, we set
λgk =
1 + 1
2√
2γlogp
sX
j∈Gk
Vejg + 2p
γlogp Dk,
• Dk = 8Mck2+ 16b2kγlogp.
Pr kATG
k(Y−E[Y])k2 ≥λgk
!
≤ 2 pγ.
• The form of the weights is analog to the weights in the Gaussian setting
• We show that the associated Lasso / Group Lasso procedure are theoretically optimal (oracle inequalities).
• With a theoretical form for the weights, much computing power is spared !
Outline
1 Studying replication by high throughput sequencing
2 Poisson functional regression
3 Calibration of the Lasso weights
4 Simulation study
5 Application on OriSeq data
Simulation settings
• We considered the classical Donoho & Johnstone functions (Blocks, bumps, doppler, heavisine)
• The intensity functionf0 is set such that (withα∈ {1, . . . ,7}) f0 =αexpg0
• Observations are sampled on a fixed regular grid (n = 210) with Yt ∼ P(f0(Xt)).
• Use Daubechie, Haar and Fourier as elements of the dictionary
• Check the normalized reconstruction error:
MSE = kbf −f0k22 kf0k22
• Compete with the Haar-Fisz transform and cross-validation
Reconstruction errors
0.10.20.30.40.5
2 4 6
alpha
mse
blocks
0.2250.2500.2750.3000.325
2 4 6
alpha
mse
bumps
0.10.20.30.40.5
2 4 6
alpha
mse
doppler
0.10.20.30.4
2 4 6
alpha
mse
lasso.exact lasso.cvj grplasso.2
grplasso.4 grplasso.8 HaarFisz heavi
Estimated intensity functions (Lasso)
0102030
0.00 0.25 0.50 0.75 1.00
blocks
510152025
0.00 0.25 0.50 0.75 1.00
bumps
0510152025
0.00 0.25 0.50 0.75 1.00
doppler
01020
0.00 0.25 0.50 0.75 1.00
lasso.exact lasso.cvj HaarFisz f0 heavi
Estimated intensity functions (group-Lasso)
0102030
0.00 0.25 0.50 0.75 1.00
blocks
510152025
0.00 0.25 0.50 0.75 1.00
bumps
0510152025
0.00 0.25 0.50 0.75 1.00
doppler
01020
0.00 0.25 0.50 0.75 1.00
lasso.exact grplasso.2 grplasso.4 grplasso.8 f0 heavi
Choosing the best dictionary by cross-validation
D
DF
DH DFH F
H FH D H
DF
DH DFH
F
FH H
0.150.160.170.180.19
25 35 45 55 65
df
MSE
aa a
grplasso.2 HaarFisz lasso
blocks
D
DF DFH DH
F
FH H
D
D DFH DF
DH F
H FH
0.260.270.280.290.300.31
5.0 7.5 10.0 12.5
df
MSE
aa a
grplasso.2 HaarFisz lasso
bumps
D
DF DFH
DH F
FH H D
D DF
DFH DH
F
FH H
0.1250.1500.175
30 40 50
df
MSE
aa a
grplasso.2 HaarFisz lasso
doppler
DDF DFH F DH
FH
H
D DFD
DFH F DH
FH
H
0.070.090.110.13
7.5 10.0 12.5 15.0
df
MSE
aa a
grplasso.2 HaarFisz lasso
heavi
Outline
1 Studying replication by high throughput sequencing
2 Poisson functional regression
3 Calibration of the Lasso weights
4 Simulation study
5 Application on OriSeq data
Promising results on OriSeq
0.00 0.05 0.10 0.15 0.20 0.25
024681012
Genomic Location (Mb)
Number of reads
Perspective for the integration of multiple datasets
• This model-based framework offers many perspectives for the analysis of peak-like data
• Comparison of two conditions (A vs B) and/or normalization logfA(x) = X
j∈J
βjϕj(x) βj = αBj +γj
• Extend to varying coefficients models (βj(x)).
• Consider the Negative Binomial distribution
• Preprint: http://arxiv.org/abs/1412.6966