• Aucun résultat trouvé

What is a good (gene) network?

N/A
N/A
Protected

Academic year: 2021

Partager "What is a good (gene) network?"

Copied!
3
0
0

Texte intégral

(1)

HAL Id: hal-00658814

https://hal.archives-ouvertes.fr/hal-00658814

Submitted on 11 Jan 2012

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

What is a good (gene) network?

Nathalie Villa-Vialaneix, Laurence Liaubet, Magali San Cristobal

To cite this version:

Nathalie Villa-Vialaneix, Laurence Liaubet, Magali San Cristobal. What is a good (gene) net- work?. Journal of Animal Breeding and Genetics, Wiley, 2011, 128 (1), pp.1-2. �10.1111/j.1439- 0388.2010.00916.x�. �hal-00658814�

(2)

What is a good (gene) network?

We are fashion victims. Networking is fashionable so we decided last year that it was time to start to work on networks. In the last years, biology has evolved to understand how the relationships between a large number of elements (genes, proteins, ...) can influence the way a living organism functions. This question is well modeled by the use of biological networks that are a huge topic of interest in recent literature. As an example, the Leipzig WCGALP included several papers about biological networks (e.g. Tesson et al.; Jager et al.;

Kadarmideen et al.; Reverter et al.). One of those was our work (Liaubet et al.) which, at the end of the talk, gave rise to THE question: "What is a good network?" This innocent and apparently simple question has haunted us ever since. The answer is complex, much too complex for a short editorial, but we can draw some evidence from our experience.

When working with biological networks, we are dealing with many different underlying questions: gene networks, protein networks, or when speaking about the kind of relationships that they model, transcriptomic networks, regulation networks, interaction networks... We may consider the particular case of a gene co-expression network based on high throughput transcriptomic data. Usually, in this field, very limited prior biological knowledge is available as well as a frustrating annotation level (usually, in that kind of experiment in livestock species, about half of the genes have no functional or ontological annotation). With that restricted background, the use of a network model can help to improve the knowledge about the way genes interact and to emphasize key genes implicated in a given process. In Leipzig, we admired Trudy Mackay’s talk because the Drosophila model is so powerful.

However, network inference with that kind of data has to be handled with care. For example, our first attempt was to build co-expression networks of differential genes (genes whose expression varies according to a phenotype of interest) for a developmental trait. The results were very disappointing because networks were unstable, had similar structure, whatever the kind of genes considered (differential or chosen at random), and no pertinent biological conclusion could be drawn. This was the perfect example of a bad network! In that first experiment, the main problem was the too low number (about five) of observations available for each species. However it was the starting point to understand the key features needed to obtain a good network. Now note that the question “What is a good network?'' can be divided into two sub-questions: What is a good network for biologists? What is a good network for statisticians?

Statisticians like robustness. The number of observations used to define the network is never large enough. A simple simulation study can illustrate the fact that at least 20-30 observations are needed to accurately estimate a correlation coefficient, and even more to infer a medium size network (Schäfer and Strimmer, 2005, Bioinformatics, 21, 754-764). Furthermore, methods designed to deal with “small” sample sizes and a large number of variables are required: When using the common Graphical Gaussian Models, this can consist of regularization (Dobra et al., 2004, J. Multiv. Analysis, 90,196-212; Mainshausen and Bühlmann, 2006, Annals of Statistics, 34, 3), shrinkage (Schäfer and Strimmer, 2005, Statist.

Appl. Genet. Mol. Biol., 4, 32), bootstrap (Schäfer and Strimmer, 2005a), etc. But once the

strength of the relations between two genes has been inferred by the chosen method, the

subsequent question is: What are the important interactions? No other processing could be

done (e.g., keeping the complete network with edges weighted by the partial correlations), but

(3)

use of ad hoc or significant thresholds can be beneficial for the readability and further interpretability of the network.

Biologists like simple outputs to understand the data and recognize biological processes and molecular pathways. Moreover, they are on a quest for the “Grail of causality'', with the help of statistical models (Schadt et al., 2005, Nat. Genet., 37, 710-717; the Leipzig WCGALP papers by Rosa et al.; Tesson et al.; Mackay), or based on the properties of a gene network. In a good network one should be able to emphasize the key genes in the biological process: For instance, hubs (nodes (genes) that are connected with a large number of genes) are straightforward candidates for being interesting genes but they can sometimes be disappointing because they are much too obvious or can hide other interesting features.

Moreover, a good network could be divided into relevant groups of genes working together (Mao et al., 2009, BMC Bioinformatics, 10, 34) because it is useful to stress out the macro- structure of networks by separating modules. Hence, there is a huge need for statistical methods designed for graph mining such as clustering (that isolates groups of highly inter- connected genes). Even more interesting is to relate modules to a small subset of biological functions. At that point, biologists would benefit from easy bioinformatic tools enabling the collection of additional information on interesting genes, such as mapping and annotation.

With the list of key biological functions, finding what is expected is reassuring but should not be the final step of the analysis. Approaching the unknown is surely the most exciting part of untargeted approaches: In that field, using networks can provide indications to decipher the way a large group of genes work together and, by association with what is already known, to understand the role of each gene, even those that are not yet annotated or studied.

In conclusion, a standard approach to work with gene co-expression networks might entail the following: Using a very large body of observations to select several hundred interesting genes, a gene network could be inferred through a robust approach. Then, by clustering the genes from the network structure we could obtain a small number of groups. In the ideal case, each group would be related to a single biologically relevant function. Such a conclusion can be obtained and would comfort us – and has done so – in our involvement in this fashionable research direction (and, actually, it did). Furthermore, from that conclusion, a few facts can be derived about a good network: It is a network built with a rigorous statistical methodology that can be validated by biological facts and whose analysis provides new scientific issues. It is the convergence between biologists' and statisticians' requirements. Finally, a good network is built by a good scientific and human collaboration network: so a good network is a network that makes everybody happy!

Nathalie VILLA-VIALANEIX

IMT Toulouse, France

Laurence LIAUBET and Magali SANCRISTOBAL

INRA Toulouse, France

Email: magali.san-cristobal@toulouse.inra.fr

Références

Documents relatifs

When the probability of unwanted links is very high, then the optimal strategy consists in S 4 : De forms no links, and influences a number of agents that allow her to obtain that

If the degree centrality of G � is less than the one for G, we create a new group G” by adding a new node to G � and check its group degree once again and we repeat this process

The main contributions are: (1) Extending INO by modeling interaction types (classes) each signaled with multiple keywords in literature sentences and adding many new terms

Macroscopic scale: when stochastic fluctuations are neglected, the quantities of molecu- les are viewed as derivable functions of time, solution of a system of ODE’s, called

Abreviations : GCNv = Gene Copy Number Variation ; ELv = Expression Level Variation; ES = Ewing Sarcoma..

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

The Eadgene Post-Analysis Workshop was an opportunity to test this approach on a SABRE/EADGENE data set, but the results were difficult to interpret (Jaffrezic and

gene networks, protein networks or when speaking about the kind of relationships that they model, transcriptomic networks, regulation networks and interaction networks.. We may