• Aucun résultat trouvé

Transcription Network Feature Analysis

Dans le document Data Mining in Biomedicine Using Ontologies (Page 120-124)

GO-Based Gene Function and Network Characterization

5.7 Transcription Network Feature Analysis

to various forms of stress. SIN1 appears to occupy an important node in a network of pathways that safeguard cells against environmental affronts and subsequently allow the cells either to die or to recover from damage. PCBP2, which is as vital as SIN1 in shielding against apoptosis, is also expressed coordinately with genes that encode large numbers of cell-survival, as well as cell-death, factors.

5.7 Transcription Network Feature Analysis

Gene ontology can also be used for GO-enrichment analysis to identify various network features in different networks, such as a relevance network, an associ-ate network, or a regulatory network [8, 41–48]. Here, we take the example of a regulatory network to show the application of ontology to the analysis of regulons.

A regulon is a set of genes that are regulated by the same transcription factor. The function of any regulon on a subnetwork can be summarized by fi nding signifi cant enriched GO terms. We conducted GO enrichment within the Arabidopsis network, reconstructed using a meta-analysis of microarray data.

Figure 5.12 Classes of annotated genes that demonstrate expression profi les similar to both SIN1 and PCBP2. GO biological processes were used to identify functional classes of the 984 annotated coexpressed genes.

5.7.1 Time Delay in Transcriptional Regulation

Having a successful application in constructing a functional-linkage network, we applied meta-analysis for studying the regulatory relationship between a transcrip-tion factor and its targets. It has been shown that the activatranscrip-tion of a regulator under stress conditions usually occurs earlier than the activation of its targets [49, 50]. A noticeable time difference exists among changes in concentrations of the regulator mRNA, the regulator protein, and the mRNAs of its targets. Therefore, in order to infer a regulatory relationship from the microarray data, we develop a chemical kinetic model to theoretically fi t the time lag between these events (Figure 5.13) [51].

5.7.2 Kinetic Model for Time Series Microarray

The regulator-protein concentration can be modeled by the following chemical ki-netic equation, without considering posttranslational regulation:

p

tran m p p

dR K R K R

dt = − (5.22)

where Rp is the regulator-protein concentration, Rm is the regulator-mRNA con-centration, Ktran is the apparent rate of mRNA translation, and Kp is the turnover rate of the regulator protein. Accordingly, the time course of the target mRNA concentration can be modeled as

Figure 5.13 Schematics of the transcriptional regulation process. (a) Steps of chemical reactions considered in the kinetic model and (b) schematics of the temporal curves of the regulator protein and target mRNA in response to regulator mRNA changes.

5.7 Transcription Network Feature Analysis 105

where Tm is the concentration of the target mRNA, Bt is the basal transcription rate of the target gene, Kt is the turnover rate of the target mRNA, and f(Rp) measures the regulated transcription rate. For simplicity, f(Rp) takes the following form:

( )

p act p

f R =K R (5.24)

Usually, what is reported in transcription-profi ling experiments is not the ab-solute concentration of mRNA, but rather a fold change, compared to the basal transcription level of that gene. Thus, we defi ne relative changes of Rm and Tm as Rm′ and Tm′,

where Tmbasal and Rmbasal are the basal concentrations of the regulator protein and target mRNA, respectively. Combining the above equations leads to the following second-order ordinary differential equation: To predict the target of a specifi c regulator, we can solve (5.27) to obtain the theoretical target-behavior curve, and then fi nd the genes with mRNA levels similar to those of the theoretical curve, which will be identifi ed as the potential targets of that regulator.

5.7.3 Regulatory Network Reconstruction

The kinetic model for the time-lag problem, along with the meta-analysis technique to combine inferences from different microarray datasets, provided basic elements for constructing gene regulatory subnetworks around transcription factors. We evaluated our method on an Arabidopsis gene expression dataset containing 497 arrays measuring responses to various stress conditions [19, 50, 52] and compared with the online available database AgrisDB [53, 54]. In this experiment, wild-type Arabidopsis plants were subjected to stress treatments for various periods (1, 2, 5, 10, and 24 hours), and extracted mRNA samples were hybridized to a cDNA mi-croarray. For meta-analysis, we used 9 separate tissue-specifi c microarray datasets, as gene expression is typically tissue-specifi c. That is, each tissue typically has its own set of genes expressed, although there are overlaps among tissues. Tissuewise partitioning of microarray data and combining it using meta-analysis shows ~7

times improvement in the network over the one from using the causal-regression model [19], as shown in Table 5.4. This indicates that consistent relationships be-tween a transcription factor and a target, across most tissues, indicate a more ro-bust prediction for gene regulation.

In Table 5.4, the fi rst column shows the method used to build the network, the second column shows the network size (number of edges in the network), and the last column shows the confi rmed edges from the Agris database.

5.7.4 GO-Enrichment Analysis

Using this global network, we predicted ~179 genes that are signifi cantly regulated by the E2F transcription factor in at least 7 out of 9 tissues, as mentioned in Section 5.7.3, and we identifi ed new candidate genes. This transcription factor provides es-sential activities for coordinating the control of cellular proliferation and cell fate.

Figure 5.14 Distribution of putative genes regulated by the E2F transcriptions factor in Arabidopsis.

The description of these GO terms shows that the major categories of these processes include cell cycle, DNA replication, and DNA repair.

Table 5.4 Regulatory Network Construction for Arabidopsis, Using Two Different Techniques

Method Network Size Confi rmed Pairs Regression ~ 40,000 16

Meta-analysis ~ 12,000 35

Dans le document Data Mining in Biomedicine Using Ontologies (Page 120-124)