• Aucun résultat trouvé

Contribution à l’analyse de données non vectorielles Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr

N/A
N/A
Protected

Academic year: 2022

Partager "Contribution à l’analyse de données non vectorielles Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr"

Copied!
44
0
0

Texte intégral

(1)

Contribution à l’analyse de données non vectorielles

Nathalie Villa-Vialaneix

nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org

Soutenance HDR, 13 novembre 2014 INRA, Toulouse

(2)

Overview of the topics

Functional Data Analysis

Graph/Network

2002/...

Applications in human sciences Applications

in biology

data:non vectorial data, high dimension...

methods:machine learning & data mining, kernel methods, neuronal

methods...

inference

2012/...

mining

visualization, clustering 2006/...

(3)

Overview of the topics

Functional Data Analysis

Graph/Network

2002/...

Applications in human sciences Applications

in biology

data:non vectorial data, high dimension...

methods:machine learning & data mining, kernel methods, neuronal

methods...

inference

2012/...

mining

visualization, clustering 2006/...

(4)

Overview of the topics

Functional Data Analysis

Graph/Network

2002/...

Applications in human sciences Applications

in biology

data:non vectorial data, high dimension...

methods:machine learning & data mining, kernel methods, neuronal

methods...

inference

2012/...

mining

visualization, clustering 2006/...

(5)

Functional Data Analysis

(6)

FDA framework and examples

Observationsof a random variableX taking values in aspace of functionsX(typically, an infinite dimensional Hilbert space asL2):

x1, . . . ,xn.

The function isobserved atti1, . . . ,tidi.

Applicationsin: time series analysis, speech recognition, biochemistry (NIR spectra for instance), metabolomic data (NMR), weather data...

(7)

Typical issues in FDA

1 variance operator used in standard modeldoes not have a continuous inverse: ΓX =E(X⊗X)−E(X)⊗E(X)is a Hilbert Schmidt operator

⇒Γ−1X is not bounded.

As a consequence,ΓnX = 1nP

ixi⊗xi−x⊗xisill-conditionned (and its inverse a bad estimate ofΓ−1X ).

⇒regularization or penalizationtechniques are needed.

2 (xi)i are never perfectly observedand only a digitized (sometimes noisy) estimation is available.

⇒reconstructiontechniques are needed to provide a functional representation of the data, remove noise and measurement artefacts (translation, scaling, . . . , of the functions).

(8)

Typical issues in FDA

1 variance operator used in standard modeldoes not have a continuous inverse: ΓX =E(X⊗X)−E(X)⊗E(X)is a Hilbert Schmidt operator

⇒Γ−1X is not bounded.

As a consequence,ΓnX = 1nP

ixi⊗xi−x⊗xisill-conditionned (and its inverse a bad estimate ofΓ−1X ).

⇒regularization or penalizationtechniques are needed.

2 (xi)i are never perfectly observedand only a digitized (sometimes noisy) estimation is available.

⇒reconstructiontechniques are needed to provide a functional representation of the data, remove noise and measurement artefacts (translation, scaling, . . . , of the functions).

(9)

Overview of my contributions

Supervised learning framework

(X,Y)stX ∈ X,Y ∈R(regression) orY ∈ {−1,1}(binary classification).

Observations: (xi,yi)i=1,...,n

Purpose: estimateyˆfor a newx. (xi,yi)iis used to define a prediction functionΦn.

Inverse methodsuseL(X|Y)to estimateL(Y|X)

I [2](Ferré & Villa,Scandinavian Journal of Statistics, 2006)

I [3](Hernández,et al.,Statistica Sinica, 2014)

Kernel methods

I [6](Rossi & Villa,Neurocomputing, 2006)

I [8](Rossi & Villa-Vialaneix,Pattern Recognition Letter, 2011)

[5,12],(Rohart,et al.,Journal of Animal Science, 2012),(Villa-Vialaneix,et al., Communication in Statistics, 2014)biomarker identification from metabolomic (NMR) data using functional approaches

(10)

Overview of my contributions

Supervised learning framework

(X,Y)stX ∈ X,Y ∈R(regression) orY ∈ {−1,1}(binary classification).

Observations: (xi,yi)i=1,...,n

Purpose: estimateyˆfor a newx. (xi,yi)iis used to define a prediction functionΦn.

Inverse methodsuseL(X|Y)to estimateL(Y|X)

I [2](Ferré & Villa,Scandinavian Journal of Statistics, 2006)in theFIR (Functional Inverse Regression)model

Y =F(hX, β1i, . . . ,hX, βdi, ),

withFand(βj)j to be estimated, smooth estimation of the(βj)j combined with the estimation ofFby multi-layer perceptron.

I [3](Hernández,et al.,Statistica Sinica, 2014)

Kernel methods

I [6](Rossi & Villa,Neurocomputing, 2006)

I [8](Rossi & Villa-Vialaneix,Pattern Recognition Letter, 2011)

[5,12],(Rohart,et al.,Journal of Animal Science, 2012),(Villa-Vialaneix,et al., Communication in Statistics, 2014)biomarker identification from metabolomic (NMR) data using functional approaches

(11)

Overview of my contributions

Supervised learning framework

(X,Y)stX ∈ X,Y ∈R(regression) orY ∈ {−1,1}(binary classification).

Observations: (xi,yi)i=1,...,n

Purpose: estimateyˆfor a newx. (xi,yi)iis used to define a prediction functionΦn.

Inverse methodsuseL(X|Y)to estimateL(Y|X)

I [2](Ferré & Villa,Scandinavian Journal of Statistics, 2006)

I [3](Hernández,et al.,Statistica Sinica, 2014)calibration problem in chemiometry: under the assumption thatL(X|Y)is Gaussian, estimation of

f(x|y) =exp







 X

j1

rj(y) λj

xj−rj(y) 2

!







which is used in a plug-in estimate ofE(Y|X =x).

Kernel methods

I [6](Rossi & Villa,Neurocomputing, 2006)

I [8](Rossi & Villa-Vialaneix,Pattern Recognition Letter, 2011)

[5,12],(Rohart,et al.,Journal of Animal Science, 2012),(Villa-Vialaneix,et al., Communication in Statistics, 2014)biomarker identification from metabolomic (NMR) data using functional approaches

(12)

Overview of my contributions

Supervised learning framework

(X,Y)stX ∈ X,Y ∈R(regression) orY ∈ {−1,1}(binary classification).

Observations: (xi,yi)i=1,...,n

Purpose: estimateyˆfor a newx. (xi,yi)iis used to define a prediction functionΦn.

Inverse methodsuseL(X|Y)to estimateL(Y|X)

I [2](Ferré & Villa,Scandinavian Journal of Statistics, 2006)

I [3](Hernández,et al.,Statistica Sinica, 2014)

Kernel methods(less sensitive to high dimension, can include some functional pre-processing)

I [6](Rossi & Villa,Neurocomputing, 2006)SVM for functional data analysis

I [8](Rossi & Villa-Vialaneix,Pattern Recognition Letter, 2011)

derivative-based kernel methods for functional classification and regression

[5,12],(Rohart,et al.,Journal of Animal Science, 2012),(Villa-Vialaneix,et al., Communication in Statistics, 2014)biomarker identification from metabolomic (NMR) data using functional approaches

(13)

Overview of my contributions

Supervised learning framework

(X,Y)stX ∈ X,Y ∈R(regression) orY ∈ {−1,1}(binary classification).

Observations: (xi,yi)i=1,...,n

Purpose: estimateyˆfor a newx. (xi,yi)iis used to define a prediction functionΦn.

Inverse methodsuseL(X|Y)to estimateL(Y|X)

I [2](Ferré & Villa,Scandinavian Journal of Statistics, 2006)

I [3](Hernández,et al.,Statistica Sinica, 2014)

Kernel methods

I [6](Rossi & Villa,Neurocomputing, 2006)

I [8](Rossi & Villa-Vialaneix,Pattern Recognition Letter, 2011)

[5,12],(Rohart,et al.,Journal of Animal Science, 2012),(Villa-Vialaneix,et al., Communication in Statistics, 2014)biomarker identification from metabolomic (NMR) data using functional approaches

(14)

Kernels for functional data

GivenK :X × X →Rst

symmetry:K(x,x0) =K(x0,x)

positivity:∀N∈N,∀(αi)⊂RN,∀(xi)⊂ XN,P

i,jαiαjK(xi,xj)≥0.

∃! (H,h., .iH)(RKHS) andΨ :X → H stK(x,x0) =hΨ(x),Ψ(x0)iH

General form for FD

pre-processing: P:X → D

∀x,x0 ∈ X,Q(x,x0) =K(P(x),P(x0)).

1 projections: forVd =Vect{ψ1, . . . , ψd},P(x) =Pd

j=1hx, ψjj(and K =Kd, standard kernel onRd).

2 functional transformation: P(x) =Dqx,. . .

3 FIR. . .

(15)

A consistent approach

[6](Rossi & Villa,Neurocomputing, 2006)

SVM for functional data: (X,Y)∈ X × {−1,1}withkernelQ =Kd◦ PVd,Kd standard kernel onRd

selection ofa= (d,Kd,C)∈N× Id ×[0,Cd]: split observations(x1,y1), ...,(xn,yn)

training set (l) test set (n−l)

SVMa ErrValid(SVMa) + λd

√ n−l

| {z } penalized error

⇒Φn:=SVMa

(16)

Assumptions

[6](Rossi & Villa,Neurocomputing, 2006)

Assumptions on the law of

X

(A1)X takes its values in a bounded subspace ofX.

Assumptions on the parameters:

∀d ≥

1,

(A2)Jd is a finite set;

(A3)∃Kd ∈ Jd (kernels onRd) st:Kd isuniversaland

∃νd >0: N(Kd, ) =O(−νd); (A4)Cd >1 ;

(A5)P

d≥1|Jd|e−2λ2d <+∞.

Assumptions on the training/validation sets

(A6)limn→+∞l = +∞;

(A7)limn→+∞n−l = +∞; (A8)limn→+∞llog(n−l)n−l =0.

(17)

Convergence to the Bayes error

[6](Rossi & Villa,Neurocomputing, 2006)

Theorem: universal consistency

Under the assumptions(A1)-(A8),

n n−−−−−−→+∞→L, where

n =P(Φn(X),Y)

L=P(Φ(X),Y)withΦ(x) =

( 1 ifP(Y =1|X =x)>1/2,

−1 otherwise.

[8](Rossi & Villa-Vialaneix,Pattern Recognition Letter, 2011)alternative kernel for: digitized observations

derivation preprocessing

with a general consistency result in regression and binary classification.

(18)

Convergence to the Bayes error

[6](Rossi & Villa,Neurocomputing, 2006)

Theorem: universal consistency

Under the assumptions(A1)-(A8),

n n−−−−−−→+∞→L, where

n =P(Φn(X),Y)

L=P(Φ(X),Y)withΦ(x) =

( 1 ifP(Y =1|X =x)>1/2,

−1 otherwise.

[8](Rossi & Villa-Vialaneix,Pattern Recognition Letter, 2011)alternative kernel for:

digitized observations derivation preprocessing

with a general consistency result in regression and binary classification.

(19)

Graph mining

(20)

Framework of this section

A graph (network)G= (V,E,W)with nverticesV ={x1, . . . ,xn};

a set of edges,E, weighted byWij =Wji ≥0 (Wii =0).

(21)

Visualization: a tool for graph mining

Standard approach for graph visualization: force directed placement algorithms(FDP).Drawbacks:

slow(impracticable for large graphs);

based onaestheticcriteria rather than on interpretability:

I trend: short and uniform length edges;

I negative consequence: nodes with the largest degrees are grouped in the middle of the layout.

Amore natural way to explore a graph:

1 highlight themacroscopic structure: find “communities” and relations between them;

2 eventuallyfocus on finer detailsin some communities.

(22)

Visualization: a tool for graph mining

Standard approach for graph visualization: force directed placement algorithms(FDP).Drawbacks:

slow(impracticable for large graphs);

based onaestheticcriteria rather than on interpretability:

I trend: short and uniform length edges;

I negative consequence: nodes with the largest degrees are grouped in the middle of the layout.

Amore natural way to explore a graph:

1 highlight themacroscopic structure: find “communities” and relations between them;

2 eventuallyfocus on finer detailsin some communities.

(23)

Overview of my contributions

topographic maps: combine clustering and visualization using a prior structure (iea map):

I batch kernel SOM[1](Boulet,et al.,Neurocomputing, 2008)andon-line multiple relational SOM[4](Olteanu & Villa-Vialaneix,Neurocomputing, 2015)(not restricted to graphs; can handle multiple graphs or labeled graphs)

I maps based on anorganized modularity criterion[7](Rossi & Villa-Vialaneix,Neurocomputing, 2010)

[9](Rossi & Villa-Vialaneix,Journal de la SFdS, 2011)hierarchical approach (based on modularity maximization) for graph clustering and visualization;

applications

I formining a network extracted from a corpus of medieval documents [10](Rossi,et al.,Digital Medievalist, 2013)

I forco-expression network analysis[13](Villa-Vialaneix,et al.,PLoS ONE, 2013)(clustering in co-expression network with ontology validation)

(24)

Overview of my contributions

topographic maps: combine clustering and visualization using a prior structure (iea map):

I batch kernel SOM[1](Boulet,et al.,Neurocomputing, 2008)andon-line multiple relational SOM[4](Olteanu & Villa-Vialaneix,Neurocomputing, 2015)(not restricted to graphs; can handle multiple graphs or labeled graphs)

I maps based on anorganized modularity criterion[7](Rossi &

Villa-Vialaneix,Neurocomputing, 2010)

[9](Rossi & Villa-Vialaneix,Journal de la SFdS, 2011)hierarchical approach (based on modularity maximization) for graph clustering and visualization;

applications

I formining a network extracted from a corpus of medieval documents [10](Rossi,et al.,Digital Medievalist, 2013)

I forco-expression network analysis[13](Villa-Vialaneix,et al.,PLoS ONE, 2013)(clustering in co-expression network with ontology validation)

(25)

Overview of my contributions

topographic maps: combine clustering and visualization using a prior structure (iea map):

I batch kernel SOM[1](Boulet,et al.,Neurocomputing, 2008)andon-line multiple relational SOM[4](Olteanu & Villa-Vialaneix,Neurocomputing, 2015)(not restricted to graphs; can handle multiple graphs or labeled graphs)

I maps based on anorganized modularity criterion[7](Rossi &

Villa-Vialaneix,Neurocomputing, 2010)

[9](Rossi & Villa-Vialaneix,Journal de la SFdS, 2011)hierarchical approach (based on modularity maximization) for graph clustering and visualization;

applications

I formining a network extracted from a corpus of medieval documents [10](Rossi,et al.,Digital Medievalist, 2013)

I forco-expression network analysis[13](Villa-Vialaneix,et al.,PLoS ONE, 2013)(clustering in co-expression network with ontology validation)

(26)

Overview of my contributions

topographic maps: combine clustering and visualization using a prior structure (iea map):

I batch kernel SOM[1](Boulet,et al.,Neurocomputing, 2008)andon-line multiple relational SOM[4](Olteanu & Villa-Vialaneix,Neurocomputing, 2015)(not restricted to graphs; can handle multiple graphs or labeled graphs)

I maps based on anorganized modularity criterion[7](Rossi &

Villa-Vialaneix,Neurocomputing, 2010)

[9](Rossi & Villa-Vialaneix,Journal de la SFdS, 2011)hierarchical approach (based on modularity maximization) for graph clustering and visualization;

applications

I formining a network extracted from a corpus of medieval documents [10](Rossi,et al.,Digital Medievalist, 2013)

I forco-expression network analysis[13](Villa-Vialaneix,et al.,PLoS ONE, 2013)(clustering in co-expression network with ontology validation)

(27)

Overview of my contributions

topographic maps: combine clustering and visualization using a prior structure (iea map):

I batch kernel SOM[1](Boulet,et al.,Neurocomputing, 2008)andon-line multiple relational SOM[4](Olteanu & Villa-Vialaneix,Neurocomputing, 2015)(not restricted to graphs; can handle multiple graphs or labeled graphs)

I maps based on anorganized modularity criterion[7](Rossi &

Villa-Vialaneix,Neurocomputing, 2010)

[9](Rossi & Villa-Vialaneix,Journal de la SFdS, 2011)hierarchical approach (based on modularity maximization) for graph clustering and visualization;

applications

I formining a network extracted from a corpus of medieval documents [10](Rossi,et al.,Digital Medievalist, 2013)

I forco-expression network analysis[13](Villa-Vialaneix,et al.,PLoS ONE, 2013)(clustering in co-expression network with ontology validation)

(28)

Using topographic maps to visualize a clustered graph

vertices are clustered inUunits units are arranged on a prior2D-grid equipped with a topology

prior positionsof the units are used to produce a simplified representation

(29)

How to extend topographic maps to graph?

Generic approach using kernel/dissimilarity

[4](Olteanu & Villa-Vialaneix,Neurocomputing, 2015),[1](Boulet,et al.,Neurocomputing, 2008)graphs can be described by pairwise relations between nodes:

with a kernel (ex:Kβ=e−βL, withL the graph Laplacian,heat kernel) with a dissimilarity (ex: shortest path length)

⇒extension of SOM to kernel/dissimilarity data using prototypes of the formpu =Pn

i=1γuixi.

Graph specific approach

[7](Rossi & Villa-Vialaneix,Neurocomputing, 2010)Extension of themodularity quality criterion to a criterion taking into account the prior topology

O= 1 2m

X

ijk

MikSklMjkBij

B= Wijdidj

2m

modularity matrix (di=P

jWijandm= Pi2di);Mik =1xi∈k;Sencodes the topology of the prior map (similarity)

(30)

How to extend topographic maps to graph?

Generic approach using kernel/dissimilarity

[4](Olteanu & Villa-Vialaneix,Neurocomputing, 2015),[1](Boulet,et al.,Neurocomputing, 2008)graphs can be described by pairwise relations between nodes:

with a kernel (ex:Kβ=e−βL, withL the graph Laplacian,heat kernel) with a dissimilarity (ex: shortest path length)

⇒extension of SOM to kernel/dissimilarity data using prototypes of the formpu =Pn

i=1γuixi.

Graph specific approach

[7](Rossi & Villa-Vialaneix,Neurocomputing, 2010)Extension of themodularity quality criterion to a criterion taking into account the prior topology

O= 1 2m

X

ijk

MikSklMjkBij

B= Wijdidj

2m

modularity matrix (di=P

jWijandm= Pi2di);Mik =1xi∈k;Sencodes the topology of the prior map (similarity)

(31)

Practical aspects of kernel/relational SOM

this approach (and others) is implemented in the R package SOMbreroprovided with a WUI based onshiny;

it can also handle data described bymultiple dissimilarities: the combination is optimized by including agradient descent-like stepin the relational SOM algorithm;

it has been applied to graphs, labeled graphs and to other non vectorial data (school-to-work trajectories)

(32)

Practical aspects of kernel/relational SOM

this approach (and others) is implemented in the R package SOMbreroprovided with a WUI based onshiny;

it can also handle data described bymultiple dissimilarities: the combination is optimized by including agradient descent-like stepin the relational SOM algorithm;

it has been applied to graphs, labeled graphs and to other non vectorial data (school-to-work trajectories)

(33)

Network inference

(34)

Network inference

Data: large scale gene expression data

individuals n'30/50









 X =











. . . .

. . Xij . . .

. . . .











| {z } variables (genes expression),p'104

What we want to obtain: a graph/network with nodes: (selected) genes;

edges: strong links between gene expressions.

(35)

Overview of my contributions

application of network inferencefor understanding the determinant of human adipose tissue gene expression[11](Viguerie,et al.,PLoS Genetics, 2012)

network inference with multiple samples[14](Villa-Vialaneix,et al.,Quality Technology and Quantitative Management, 2014)

(36)

Motivation for multiple networks inference

Pan-European project Diogenes1(with Nathalie Viguerie, INSERM):gene expressions (lipid tissues) from 204 obese womenbeforeandaftera low-calorie diet (LCD).

Assumption: A common functioning exists regardless the condition;

Which genes are linked independently

from/depending onthe condition?

1http://www.diogenes-eu.org

(37)

Consensus LASSO

GGM with

L1

penalty: one condition

(Xi)i=1,...,n arei.i.d. Gaussian random variablesN(0,Σ)

XjTj X−j+ ; arg min

jj0)j0

n

X

i=1

Xij−βTj Xi−j2

+λkβjkL1

j ←→j0(genesjandj0are linked)⇔βjj0 ,0

Consensus LASSO: multiple conditions

1

Tj\j\jβjTjj\j+λkβjkL1+µX

c

wccj −βconsj k2L2 withbΣ\j\j: block diagonal matrixDiag

1\j\j, . . . ,bΣk\j\j

and similarly forbΣj\j: wc: real number used to weight the conditions;

µregularization parameter;

βconsj whatever you want...?

(38)

Choice of a consensus

Case 1: a priori consensus

Using a fixedβconsj , the optimization problem is equivalent to minimizing thepfollowing standard quadratic problem inRk(p−1)withL1-penalty:

1

Tj B1(µ)βjTj B2(µ) +λkβjkL1,

Case 2: learn the consensus

Usingβconsj =Pk c=1

nc

nβcj, the optimization problem is equivalent to minimizing the following standard quadratic problem withL1-penalty:

1

TjSj(µ)βjTjj\j+λkβjkL1

Optimizationby active set, combined with bootstrap approach (BOLASSO type). R packagetherese.

(39)

Future work

(40)

Research project: data mining and integration with SOM

On-going issues

stabilize results, improve quality, speed-up training:aggregation, boosting...

improve interpretabilityin a multi-kernel/dissimilarity context:

visualization, prototype sparsity...

targeted applications: multi-’omics integration and exploration; ncRNA typology

Collaborations

methodological aspects: Jérôme Mariette (PhD, MIAT, INRA), Madalina Olteanu (SAMM, Université Paris 1)

application to multi-’omics data: Nathalie Viguerie (INSERM, Diogenes project)

application to ncRNA: Christine Gaspin (MIAT, INRA)

(41)

Research project: data mining and integration using graphical approaches

On-going issues

integrating multi-’omics datain network inference and mining, possibly with different numbers of observations

taking temporal aspects into accountin network inference/clustering

Collaborations

methodological aspects: Valérie Sautron (PhD, GenPhySE, INRA) applications: projects SusOStress & PigHeat (GenPhySE, INRA) on systems genetics in pigs (stress & heat resistance)

application: Nathalie Viguerie (INSERM, Diogenes project) & Ignacio Gonzáles (MIAT, INRA)... submitted article

(42)

Thank you for your attention...

... questions?

(43)

R. Boulet, B. Jouve, F. Rossi, and N. Villa.

Batch kernel SOM and related Laplacian methods for social network analysis.

Neurocomputing, 71(7-9):1257–1273, 2008.

L. Ferré and N. Villa.

Multi-layer perceptron with functional inputs: an inverse regression approach.

Scandinavian Journal of Statistics, 33(4):807–823, 2006.

N. Hernández, R.J. Biscay, N. Villa-Vialaneix, and I. Talavera.

A non parametric approach for calibration with functional data.

Statistica Sinica, 2014.

Forthcoming.

M. Olteanu and N. Villa-Vialaneix.

On-line relational and multiple relational SOM.

Neurocomputing, 147:15–30, 2015.

Forthcoming.

F. Rohart, A. Paris, B. Laurent, C. Canlet, J. Molina, M.J. Mercat, T. Tribout, N. Muller, N. Iannuccelli, N. Villa-Vialaneix, L. Liaubet, D. Milan, and M. San Cristobal.

Phenotypic prediction based on metabolomic data on the growing pig from three main European breeds.

Journal of Animal Science, 90(12), 2012.

F. Rossi and N. Villa.

Support vector machine for functional data classification.

Neurocomputing, 69(7-9):730–742, 2006.

F. Rossi and N. Villa-Vialaneix.

Optimizing an organized modularity measure for topographic graph clustering: a deterministic annealing approach.

Neurocomputing, 73(7-9):1142–1163, 2010.

F. Rossi and N. Villa-Vialaneix.

Consistency of functional learning methods based on derivatives.

Pattern Recognition Letters, 32(8):1197–1209, 2011.

(44)

F. Rossi and N. Villa-Vialaneix.

Représentation d’un grand réseau à partir d’une classification hiérarchique de ses sommets.

Journal de la Société Française de Statistique, 152(3):34–65, 2011.

F. Rossi, N. Villa-Vialaneix, and F. Hautefeuille.

Exploration of a large database of French notarial acts with social network methods.

Digital Medievalist, 9, 2013.

N. Viguerie, E. Montastier, J.J. Maoret, B. Roussel, M. Combes, C. Valle, N. Villa-Vialaneix, J.S. Iacovoni, J.A. Martinez, C. Holst, A. Astrup, H. Vidal, K. Clément, J. Hager, W.H.M. Saris, and D. Langin.

Determinants of human adipose tissue gene expression: impact of diet, sex, metabolic status and cis genetic regulation.

PLoS Genetics, 8(9):e1002959, 2012.

N. Villa-Vialaneix, N. Hernandez, A. Paris, C. Domange, N. Priymenko, and P. Besse.

On combining wavelets expansion and sparse linear models for regression on metabolomic data and biomarker selection.

Communications in Statistics - Simulation and Computation, 2014.

Forthcoming.

N. Villa-Vialaneix, L. Liaubet, T. Laurent, P. Cherel, A. Gamot, and M. San Cristobal.

The structure of a gene co-expression network reveals biological functions underlying eQTLs.

PLoS ONE, 8(4):e60045, 2013.

N. Villa-Vialaneix, M. Vignes, N. Viguerie, and M. San Cristobal.

Inferring networks from multiple samples with concensus LASSO.

Quality Technology and Quantitative Management, 11(1):39–60, 2014.

Références

Documents relatifs

Le Corbusier énonce pour principes fondateurs de sa maison,sans préciser que ces principes soit d'ordre constructifs, esthétiques ou fonctionnels, les 5 points d'une

The data set used in this project is the famous Fisher’s iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and

In the kernel SOM framework, this setting is made natural by the use of the kernel that maps the original data into a (large dimensional) Euclidean space (see [16, 1] for

Experimental design are usually not as simple as this example: they can include multiple experimental factors (day of experiment, flow cell,. ) and multiple covariates (sex,

constrained hierarchical clustering for arbitrary similarities (or kernels) or dissimilarities (extends e.g., rioja).. can be used for large scale (e.g., genomic) datasets:

Nathalie Villa-Vialaneix | Méthodes pour l’apprentissage de données massives 4/46...

liste non exhaustive et restreinte à ceux que j’ai testés Nathalie Villa-Vialaneix | J’ai testé pour vous..... Un MOOC, qu’est-ce

573 bloggers vous donnent les dernières news... sur lequel vous