Assigning function to natural allelic variation via dynamic modeling of gene network induction

(1)

HAL Id: hal-01976613

https://hal.archives-ouvertes.fr/hal-01976613

Submitted on 8 Jun 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Distributed under a Creative Commons Attribution| 4.0 International License

dynamic modeling of gene network induction

Magali Richard, Florent Chuffart, Hélène Duplus-bottin, Fanny Pouyet, Martin Spichty, Etienne Fulcrand, Marianne Entrevan, Audrey Barthelaix,

Michael Springer, Daniel Jost, et al.

To cite this version:

Magali Richard, Florent Chuffart, Hélène Duplus-bottin, Fanny Pouyet, Martin Spichty, et al.. Assign-

ing function to natural allelic variation via dynamic modeling of gene network induction. Molecular

Systems Biology, EMBO Press, 2018, 14 (1), pp.e7803. �10.15252/msb.20177803�. �hal-01976613�

(2)

Assigning function to natural allelic variation via dynamic modeling of gene network induction

Magali Richard ¹ ^, ² ^,* , Florent Chuffart ¹ , Hélène Duplus-Bottin ¹ , Fanny Pouyet ¹ , Martin Spichty ¹ , Etienne Fulcrand ¹ , Marianne Entrevan ¹ , Audrey Barthelaix ¹ , Michael Springer ³ , Daniel Jost ² ^,** &

Gaël Yvert ¹ ^,***

Abstract

More and more natural DNA variants are being linked to physiolog- ical traits. Yet, understanding what differences they make on molecular regulations remains challenging. Important properties of gene regulatory networks can be captured by computational models. If model parameters can be “ personalized ” according to the genotype, their variation may then reveal how DNA variants operate in the network. Here, we combined experiments and computations to visualize natural alleles of the yeast GAL3 gene in a space of model parameters describing the galactose response network . Alleles altering the activation of Gal 3 p by galactose were discriminated from those affecting its activity (production/degra- dation or efficiency of the activated protein). The approach allowed us to correctly predict that a non-synonymous SNP would change the binding affinity of Gal 3 p with the Gal 80 p transcrip- tional repressor. Our results illustrate how personalizing gene regulatory models can be used for the mechanistic interpretation of genetic variants.

Keywords galactose; personalized medicine; SNP function; stochastic model;

yeast

Subject Categories Methods & Resources; Network Biology; Quantitative Biology & Dynamical Systems

DOI 10 . 15252 /msb. 20177803 | Received 7 June 2017 | Revised 15 December 2017 | Accepted 18 December 2017

Mol Syst Biol. ( 2018 ) 14 : e 7803

Introduction

In the past decade, countless DNA variants have been associated with physiological traits. A major challenge now is to understand how they operate at the molecular level. This is a difficult task because the mechanistic consequences resulting from each variant are not easy to identify. Even when the function of a gene is well

documented, investigators need to determine the tissues, cells, or organelles in which a mutant allele makes a biological difference, the developmental stage at which this may happen, the metabolic or regulatory network that may be involved, as well as possible molec- ular scenarios. A mutation may alter the regulation of transcription or mRNA splicing; the enzymatic activity of the target protein; its rate of production, maturation, or degradation; its intracellular localization; its binding affinity to an interacting partner or the specificity of its molecular interactions. In the vast majority of cases, information from the DNA sequence alone is not sufficient to delimit the perimeter of possible implications.

Systems biology has opened new opportunities to better predict the action of DNA variants. First, “omics” data that are gathered at various levels (DNA, transcripts, proteins, metabolites, etc.) estab- lish relations between target sequences and functional pathways.

Information about molecular and genetic interactions, expression profiles, chromatin landscapes, post-transcriptional and post-trans- lational regulations can be exploited to derive functional predictions of DNA variants. Various methods have been proposed to do this, such as Bayesian genetic mapping (Gaffney et al, 2012), visualiza- tion of SNPs on relational protein networks (Bauer-Mehren et al, 2009), prioritization based on negative selection (Levenstien &

Klein, 2011), inference of miRNA:RNA binding defects (Coronnello et al, 2012), or combinations of lncRNA eQTL-mapping with DnaseI-hypersensitivity maps (Guo et al, 2016). In addition, struc- tural data of biomolecules can also highlight functional perturba- tions in specific domains such as catalytic sites or interaction surfaces (Barenboim et al, 2008; Al-Numair & Martin, 2013).

Another alternative is to model the quantitative and dynamic properties of molecular reactions and to explore which feature(s) may be affected by a DNA variant. The functional consequences of mutations can then be inferred by considering their impact on speci- fic parameters of the model. In other words, assigning function to a DNA variant may be straightforward after it is linked to parameters of a model. This perspective may also, on the long term, generate developments in personalized medicine: If a model can be

1 Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université Lyon 1, Université de Lyon, Lyon, France 2 Univ. Grenoble Alpes, CNRS, CHU Grenoble Alpes, Grenoble INP, TIMC-IMAG, Grenoble, France

3 Department of Systems Biology, Harvard Medical School, Boston, MA, USA

*Corresponding author. Tel: +33 4 56 52 00 68; E-mail: magali.richard@univ-grenoble-alpes.fr

**Corresponding author. Tel: + 33 4 56 52 00 69 ; E-mail: daniel.jost@univ-grenoble-alpes.fr

***Corresponding author. Tel: +33 4 72 72 80 00; E-mail: gael.yvert@ens-lyon.fr

(3)

personalized according to the patient’s genotype, then it can help predict disease progress or treatment outcome and therefore adapt medical care to the patient’s specificities. Such an approach nonetheless differs from machine-learning techniques, which can be efficient for prediction but where parameters are often not inter- pretable. For it to become reality, the model must be (i) informative on the biological trait of interest and (ii) identifiable (variation of one parameter cannot be exactly compensated by variation of another parameter) and sufficiently constrained (few parameters with limited degrees of freedom) so that fitted parameter values can inform on the patient’s specificities. These two requirements antago- nize each other regarding the complexity of the model to be used.

The former asks for completeness: The molecular control of the trait must be correctly covered by the model, describing known reactions as best as possible. The latter asks for simplicity: If too many param- eters are allowed to be adjusted to the data, then the validity of the personalized model is questionable and none of the adjustments are informative. It is therefore important to determine if and how personalizing model parameters can be productive.

For a given molecular network, individuals from natural popula- tions have different genotypes at several nodes (genes) of the network, as well as in numerous external factors that can affect network properties. Such external factors can modify, for example, global translation efficiencies, metabolic states, or pathways that cross-talk with the network of interest. Adapting model parameters to specific individuals is challenging when so many sources of varia- tion exist. A way to circumvent this difficulty is to study the network experimentally in the context of a more reduced and focused variation. If investigators have access to nearly isogenic individuals that differ only at specific genes of the network, they can then characterize the differences in network behavior that result from these specific allelic differences. The numerous exter- nal factors affecting the network can then be ignored or drastically simplified in the model because they are common to all individu- als. This way, the parameter space is constrained and only poten- tially informative parameters are allowed to be adjusted to fit individual-specific data.

Some model organisms such as the yeast Saccharomyces cere- visiae offer this possibility. They can be manipulated to generate single allelic changes, which provides an ideal framework to link DNA variants to model parameters. In particular, the gene regula- tory network controlling the yeast response to galactose (GAL network) is well characterized, both in vivo and in silico. This circuit controls galactose utilization by upregulating the expression of regulatory and metabolic genes in response to extracellular galactose (Sellick et al, 2008). Regulation is based on the tran- scriptional activator Gal4p, the galactose transporter Gal2p, a signal transducer Gal3p, and the transcriptional inhibitor Gal80p.

In addition, the galactokinase Gal1p involved in galactose metabo- lism is also a co-inducer of the response (Bhat & Hopper, 1992).

This system can display either a gradual induction (where the rate of transcription progressively increases in each cell according to the timing and intensity of the stimulus) or a binary induction (where some cells are rapidly activated and others not). This dual behavior has received a lot of attention, and important molecular features have been elucidated by experimental and theoretical approaches (Biggar & Crabtree, 2001; Hawkins & Smolke, 2006;

Song et al, 2010; Apostu & Mackey, 2012). In particular, the

dynamic response of a population of cells to galactose can be described by two quantities: (i) The inducibility of the network is defined as the proportion of activated cells in the population and (ii) the amplitude of the response refers to the expression level that is reached by induced cells. Regulatory feedback loops of the network are critical to the switch-like behavior. They were shown to feed back the dynamics of transcription bursts rather than the levels of expression (Hsu et al, 2012). They regulate the amplitude response by reducing noise in GAL gene expression (Ramsey et al, 2006), they control inducibility by fine-tuning the timing of the switch (Ramsey et al, 2006), and they participate to the memory of previous inductions (Acar et al, 2005; Kundu & Peterson, 2010).

As a consequence, bimodal distributions of expression of the GAL genes can be observed in isogenic populations exposed to interme- diate concentrations of inducer (Becskei et al, 2001; Venturelli et al, 2012; Peng et al, 2015), and this population heterogeneity can confer a growth advantage during the transition from glucose to galactose metabolism (diauxic shift) (Venturelli et al, 2015).

Interestingly, wild yeast isolates present diverse types of induction dynamics during the diauxic shift, ranging from strictly unimodal to transient bimodal distribution of expression levels (New et al, 2014; Wang et al, 2015). This indicates that natural genetic varia- tion can modify the network dynamics.

The GAL3 gene plays a central role in the network. Its protein product Gal3p is activated by binding to galactose and ATP and then binds as a dimer to Gal80p dimers to release the repression on Gal4p at target promoters (Sellick et al, 2008). The protein is enriched in the cytoplasm prior to stimulation and in the nucleus after the stim- ulation, although this cytonuclear transfer does not account for the dynamics of activation (Jiang et al, 2009; Egriboz et al, 2011).

Expression of GAL3 is itself under Gal4p/Gal80p control (positive feedback). In addition, the sequence of GAL3 differs between natu- ral isolates of S. cerevisiae and this allelic variation was recently associated with different sensitivities of the network to galactose (Lee et al, 2017). There are multiple ways that a GAL3 variant could affect the dynamics of induction: by modifying the production or degradation rates of the Gal3p protein or of its messenger RNA, by changing the affinity of Gal3p to galactose or ATP, by changing the capacity of Gal3p to dimerize, by changing the nucleocytoplasmic ratio of Gal3p molecules, or by changing the affinity of Gal3p to Gal80p. A GAL3 variant may also affect the background expression level of Gal3p prior to stimulation, which is known to be critical for network memory of prior stimulations (Stockwell & Rifkin, 2017).

Thus, it is difficult to predict the functional consequence of sequence variation in GAL3.

Using the yeast GAL3 gene as a model framework, we show here

that experimental acquisitions combined with network modeling are

efficient to predict the effect of sequence variants. The principle of

the approach is to link genetic variation to informative changes of

parameter values of the model. We show that replacing natural

GAL3 alleles can be sufficient to transform a gradual response into a

binary activation, and the approach allowed us to distinguish

between different types of GAL3 alleles segregating in S. cerevisiae

populations: those altering the activation of Gal3p by galactose and

those altering the strength with which activated Gal3p alleviates the

transcriptional inhibition operated by Gal80p. In particular, our

approach was efficient to associate a non-synonymous SNP with a

change of binding affinity for Gal80p.

(4)

Results

Natural variation in GAL 3 is sufficient to convert a gradual induction into a binary switch

We constructed a panel of yeast strains that were all isogenic to the reference laboratory strain BY, except for GAL3. At this locus, each strain carried an allele that was transferred from a natural strain of the Saccharomyces Genome Resequencing Project (Liti et al, 2009;

Appendix Fig S1). All strains of the panel also harbored a P

_GAL1

-GFP reporter of network activity, where the promoter of the GAL1 gene controlled the expression of a GFP fluorescent protein destabilized by a degradation signal (Mateus & Avery, 2000; Chuffart et al, 2016).

GAL1 is a paralogous gene of GAL3 (Hittinger & Carroll, 2007) and transcription at its promoter is commonly used as a proxy of GAL network activity (Acar et al, 2005; Venturelli et al, 2015; Wang et al,

2015). Using flow cytometry, we monitored the dynamics of network activation in each strain (Fig 1). This was done by first culturing cells for 3 h in a medium containing 2% raffinose, a sugar known to be neutral on network activity, adding galactose (0.5% final concen- tration), and quantifying fluorescence at multiple time points for 4 h.

Significant differences in the dynamics of activation were observed between the strains. Those harboring the GAL3

^NCYC361

, GAL3

^K11

, GAL3

^BY

, GAL3

^DBVPG1788

, GAL3

^DBVPG1853

, and GAL3

^JAY291

alleles displayed a gradual response and all cells of the population were induced and responded with similar rate of expression, maintaining population homogeneity (see example shown in Fig 1A). In contrast, strains harboring the GAL3

^Y12

and GAL3

^YJM978

alleles displayed a binary response, with a transient coexistence of induced (ON) and uninduced (OFF) cells in the population (example in Fig 1B).

We quantified induction using two metrics: the mean level of reporter expression in activated cells (response amplitude) and the

0 50 100 150 200 250 2

3 4 5

0 50 100 150 200 250 0%

25%

50%

100%

75%

GAL3

^BY

GAL3

^NCYC361

GAL3

^K11

GAL3

^Y12

GAL3

^DBVPG1788

GAL3

^DBVPG1853

GAL3

^YJM978

GAL3

^JAY291

0 2 4 6 8

0 10 20 30 40 60 80 100 130 160 205 250

0 2 4 6 8

0 10 20 30 40 60 80 100 130 160 205 250

Time (min)

P

_GAL1

GFP (a.u.) Time (min)

Time (min)

# cells histograms of GAL3

^NCYC361

# cells histograms GAL3

^Y12

Fraction of ON cells

P

_GAL1

GFP (a.u.)

P

GAL1

GFP of activated cells (a.u.)

Response amplitude

Inducibility A

B

C

D

OFF

OFF/ON threshold ON

OFF ON

Time (min)

Figure 1. Dynamic response to galactose in the context of GAL 3 variants.

Acquisitions were made on strains where the GAL 3 allele was replaced by the indicated natural alleles. These strains were otherwise isogenic, with a BY background.

A, B Flow cytometry data obtained on strains harboring the GAL 3

^NCYC361

allele (A) or the GAL 3

^Y12

allele (B). Cells were cultured in raffinose 2 % and induced at time 0 by adding galactose at a final concentration of 0.5%. a.u., arbitrary units. Gray dashed line, threshold used to distinguish ON cells from OFF cells.

C Amplitude of the response (mean expression) as a function of time for each GAL3 replacements strain. Error bars represent standard error of the mean (n = 6).

D Inducibility of the response (fraction of ON cells) as a function of time for each GAL3 replacement strain. Error bars represent standard error of the mean (n = 6).

(5)

proportion of activated cells in the population (inducibility of the network). We observed that the response amplitude varied little among the strains, all of them approaching steady state with compa- rable kinetics (Fig 1C). In contrast, inducibility of the network dif- fered between strains (Fig 1D). As expected, in strains showing a gradual response, the fraction of ON cells increased significantly during the first 2 h of induction, reaching full inducibility (all cells activated) by the end of the experiment. On the opposite, the strains showing a transient binary response displayed reduced inducibility over time. For instance, 21% of GAL3

^Y12

cells were still not induced after 250 min of stimulation. These results indicate that natural genetic variation in GAL3 is sufficient to modify the inducibility of the network and to convert a gradual response into a binary response, or vice versa.

A quantitative model of inducibility over time

To examine what functional properties of the GAL3 gene could determine a gradual or a binary response, we constructed a dynamic stochastic model of the network (Fig 2A). We based our quantita- tive model on the following current molecular knowledge, which derives from reference laboratory strains. In absence of galactose, a homodimer of the transcription factor Gal4p is constitutively bound to upstream activation sites (UAS) of promoter regions of GAL genes. However, transcription is inactive because of the homo- dimeric Gal80p inhibition of Gal4p (Peng & Hopper, 2002; Pilauri et al, 2005). When intracellular galactose binds Gal3p, it changes conformation and associates with Gal80p dimers (Lavy et al, 2012), thereby releasing Gal80p from promoters and allowing

Binary response A

B C

Graded response External galactose

galactose Gal3p

Gal3p*

+

Gal80p P

_GAL80

GAL80

P

_GAL3

GAL3

P

_GAL1

GFP -

P

_GAL1

GAL1 Gal1p

Gal1p*

galactose

+

5

4

3 2 100%

25%

0%

50%

75%

0 50 100 150 200 250

time (min) 5

4

3 2 100%

25%

0%

50%

75%

0 50 100 150 200 250

time (min)

OFF/ON threshold OFF/ON threshold

P

GAL1

GFP in single cell (a.u.) Fraction of ON cells

Fraction of ON cells P

GAL1

GFP in single cell (a.u.)

Figure 2 . In silico model of network induction.

A Schematic representation of the model used. Galactose-activated Gal 1 p and Gal 3 p proteins become Gal 1 p* and Gal 3 p*, respectively. Pointed and blunt arrows represent activation and inhibition, respectively. Positive and negative feedback loops are highlighted by + and signs. The central Gal4p activator is not shown because its dynamics is not included in the model.

B Example of a gradual response predicted by the model ([gal] = 0.5%, q

Gal3

= 140 and K

Gal

= 0.055). Thin violet lines represent stochastic simulations of network activation in individual cells. Dashed red line represents the threshold distinguishing ON from OFF cells. Green thick line indicates the fraction of ON cells as a function time.

C Example of a binary response predicted by the model ([gal] = 0 . 5 %, q

Gal3

= 40 , and K

Gal

= 0 . 055 ). Same color code as in (B).

(6)

Gal4p-mediated transcriptional activation. It was initially thought that activated Gal3p sequestered Gal80p in the cytoplasm, prevent- ing it from its inhibitory role in the nucleus (Peng & Hopper, 2002).

Later studies revised this view by showing that Gal3p molecules were not exclusively cytoplasmic (Jiang et al, 2009) and that forcing Gal3p to be mostly nuclear did not alter the kinetics of induction (Jiang et al, 2009). In addition, the slowness of the nucleocyto- plasmic translocation of Gal80p, which depends both on transport rates and on the Gal4p:Gal80p dissociation rate, contrasts with the fast induction of transcription (Egriboz et al, 2011). This implies a direct role of Gal3p in promoting the dissociation of Gal80p from UAS. In addition, the galactokinase Gal1p (a paralog of Gal3p) can also act as a co-inducer of the regulatory circuit, presumably using similar mechanisms as Gal3p (Venturelli et al, 2012).

Our model covers the mRNA and protein species of three major players of GAL network induction: GAL1, GAL3, and GAL80, as well as of the reporter gene. We considered that promoters of each GAL gene could switch between an ON state (full transcription) and an OFF state (leaky transcription) at rates that depended on the concen- tration of Gal80 dimers, activated Gal3p dimers, and activated Gal1p dimers. The model is provided (computer Code EV1), and a detailed description of it is given in Materials and Methods and in Appendix Text S1. Most of the parameters of the model were fixed at values obtained from previous studies (Appendix Tables S1 and S2).

Stochastic simulations reproduce the two types of induction observed experimentally

We first explored if our model captured the two types of responses of allele-replacement strains (i.e., binary and gradual). We ran stochastic simulations (Gillespie, 1977) that accounted for intrinsic and extrinsic sources of noise (see Appendix Text S1). We observed that tuning the parameters related to GAL3, while keeping all other parameters constant, was sufficient to modify inducibility and to obtain either a gradual (Fig 2B) or a binary (Fig 2C) response of the network at a given concentration of galactose. In the gradual system, the simulated single-cell trajectories were all similar; in the binary system, the simulated single-cell trajectories bifurcated, with a subset of cells having a stochastic lagging time before responding.

The single-cell value of this lag time is directly correlated with the number of potential inducer proteins (Gal1p and Galp3p) present in the cell just before induction (Appendix Fig S2). This is in very good agreement with recent single-cell experiments on galactose induc- tion (Stockwell & Rifkin, 2017). Note that a binary response is not necessarily a signature of steady-state bistability (Hermsen et al, 2011) but may represent a transient regime converging to a mono- stable ON state at equilibrium (see Appendix Text S1).

We then studied the response predicted by the model when stim- ulating the network with various concentrations of galactose while keeping model parameters constant (Appendix Fig S3). Inducibility increased with the concentration of galactose, with low concentra- tions causing a binary induction and high concentrations causing a gradual one.

Two parameters related to GAL 3 control network behavior

A detailed analysis of the model showed that inducibility of the system was mainly controlled by the average values of promoter

switching rates k

_on

and k

_off

at the time of induction (see Materials and Methods, Appendix Text S1, and Figs S2 and S4). Rates k

_off

depend only on GAL80 and are therefore invariant to GAL3 allelic variation. Rates k

_on

depend on GAL3 in two ways: via Gal3p*, the amount of galactose-activated Gal3p, and via K

₃

, which corresponds to an effective concentration encompassing the dissociation constants of the Gal3p-Gal80p interaction and of Gal3p dimerization (see Appendix Text S1). Gal3p* is determined by the level of Gal3p and by parameter K

_gal

, which represents the typical concentration of galactose needed to efficiently activate Gal3p. While K

_gal

was identi- fiable, several other GAL3-related parameters, such as those control- ling the level of Gal3p, were not and we grouped them into a meta-parameter, q

Gal3

, which we termed the strength of GAL3. q

Gal3

corresponds to the invert ratio between K

₃

and the mean concentra- tion of Gal3p at the time of induction, which depends on the leaky transcription rate, the translation rate and the degradation rates of GAL3 mRNA and protein product.

This formalism made the network sensitive to only two identifi- able GAL3-related parameters, K

_gal

and q

Gal3

. At a fixed concentra- tion of galactose induction, high q

Gal3

values correspond to high numbers of Gal3p dimers that can rapidly be activated to release Gal80 repression. The model predicted that high values of q

Gal3

would generate a gradual response (Appendix Fig S5A) because the number of potential activators was high enough in each cell to rapidly trigger the GAL1/GAL3-mediated positive feedback loop. In contrast, low values of q

Gal3

would generate a binary response (Appendix Fig S5B) because the number of activators is more stochastic, with many cells having too few initial Gal1p or Gal3p dimers to directly trigger the response. These cells need a lag time before fast activation (Fig 2B and C, and Appendix Fig S2). The other important parameter, K

_gal

, corresponds to a threshold of galac- tose concentration below which induction was limited and favoured a binary response, and above which induction was efficient and favoured a gradual response (Appendix Fig 3C). In summary, both q

Gal3

and K

_gal

values can determine whether the network adopts a gradual or a binary response at a given concentration of galactose induction.

Linking GAL 3 alleles to specific parameter values

We first examined if our model could capture a known functional

alteration of the GAL3 gene. The ATP and galactose binding pocket

of Gal3p was previously described (Lavy et al, 2012). It contains

an aromatic cage that encircles the adenine nucleotide of ATP. A

mutation targeting this cage should therefore reduce the affinity of

Gal3p for ATP. Galactose is localized more internally than ATP,

suggesting that the sequential activation of Gal3p starts first with

galactose binding and then ATP binding. We analyzed how para-

meters of our model should be affected by a mutation reducing

Gal3p:ATP affinity, and we found that K

_gal

should increase, that

q

Gal3

should decrease, and that the relative change to wild-type

values should be more pronounced for K

_gal

(Appendix Text S1). In

contrast, if ATP binds prior to galactose, then K

_gal

should also

increase but q

Gal3

should remain unaffected. Using Crispr/Cas9, we

replaced in the BY reference strain the W117 residue of the cage by

either an alanine or a threonine, and we monitored the dynamics

of transcriptional activation of the corresponding mutants

(Appendix Fig S6A–C). The non-conservative W117T mutation

(7)

fully abolished the response, even at very high concentrations of galactose. The W117A mutation profoundly reduced activation and caused a binary response. We then inferred parameters q

Gal3

and K

_gal

for the wild-type strain and for the W117A mutant. This was done by selecting a set of parameters that minimized a global chi- squared score of deviation between the measured and predicted fractions of induced cells at different times after induction and for the different galactose concentrations (for details, see Materials and Methods and Appendix Text S1). Inferred K

_gal

value was about 10 times higher for the mutant than for the wild type, and q

Gal3

was reduced by about threefold (Appendix Fig S6D). This fully agreed with the expected sequence of activation of Gal3p (galactose first, ATP second).

We then used our model to study natural GAL3 alleles. We measured the transcriptional response of the GAL3

^BY

, GAL3

^Y12

, and GAL3

^YJM978

strains at different galactose concentrations (0.05, 0.1, and 0.5%). This confirmed that inducibility increased with galactose concentration (Fig 3A). We used these experimental data as above to infer parameters q

Gal3

and K

_gal

for each of the three strains. To evaluate the usefulness of the inferred parameter values, we used

the fitted model to predict the behavior of each strain at a galactose concentration that was not used for model training (0.2%) (Fig 3B).

Finally, to test model predictions, we experimentally monitored GAL3

^BY

, GAL3

^Y12

, and GAL3

^YJM978

induction at 0.2% galactose.

Without any additional fitting procedure, we observed that inducibility (fraction of activated cells over time) differed between strains in a way that was entirely consistent with model predictions.

Thus, the differences among parameter values assigned to the dif- ferent natural GAL3 alleles are relevant outside the specific experi- mental conditions used for parameters estimation.

Natural GAL 3 alleles map to distinct locations of the parameter space

We sought to classify GAL3 alleles based on the parameter values assigned to them. We made experimental measurements on two additional strains (GAL3

^NCYC361

and GAL3

^DBVPG1788

), and we deter- mined best-fit q

Gal3

and K

_gal

values to them as for the three strains described above. Appendix Fig S7 shows these data and the corre- sponding fitted models. Figure 4A and B shows the obtained A

B

0 50 100 150 200 250

0 25 50 75 100

0 50 100 150 200 250

0 25 50 75 100

0 50 100 150 200 250

0 25 50 75 100

time (min)

fraction of ON cells (%)

time (min) time (min)

ρ

_Gal3

=91 K

_gal

=0.055%

ρ

_Gal3

=41 K

_gal

=0.075%

ρ

_Gal3

=73 K

_gal

=0.11%

GAL3

^BY

GAL3

^Y12

GAL3

^YJM978

0 50 100 150 200 250

0 25 50 75 100

0 50 100 150 200 250

0 25 50 75 100

0 50 100 150 200 250

0 25 50 75 100

time (min) time (min) time (min)

fraction of ON cells (%)

[gal]=0.5%

[gal]=0.2%

[gal]=0.1% [gal]=0.05%

inference inference inference

prediction prediction prediction

GAL3

^BY

GAL3

^Y12

GAL3

^YJM978

Figure 3. Strain-specific training of the model and validation.

A Model fitting. Each panel corresponds to one strain carrying the indicated GAL 3 allele. Inducibility was measured by flow cytometry (data points s.e.m., n = 6 ) after stimulating cells with three different concentrations of galactose (points colored according to the concentration). For each strain, these data were used to fit the GAL 3 -dependent parameters q

^Gal3

and K

Gal

. Inferred parameter values are shown. Lines in plain (resp. dashed and dotted) represent the inducibility predicted by the model at [gal] = 0 . 5 % (resp. 0 . 1 and 0 . 05 %).

B With the parameters inferred in (A), we use the model to predict the inducibility of each strain at a galactose concentration of 0 . 2 % (lines), and this prediction was

compared to experimental measures (dots s.e.m., n = 2 ).

(8)

parameters, q

Gal3

and K

_Gal

, normalized by the corresponding values of our reference strain GAL3

^BY

. Different data points represent results obtained by applying the inference process to models with different GAL3-independent parameters (see Appendix Text S1).

The fold change of a parameter between two different strains is indicative of the functional nature of the genetic variations between the two GAL3 alleles. In agreement with model predictions (Appendix Fig S3), we observed that more gradual strains

A B

C

1 2 3

K Gal /K Gal ( GAL3 BY )

GAL3

^Y12

GAL3

^YJM978

GAL3

^NCYC361

GAL3

^DBVPG1788

0.4 1 1.6 2.2

ρ Gal3 /ρ Gal3 ( GAL3 BY )

GAL3

^Y12

GAL3

^YJM978

GAL3

^NCYC361

GAL3

^DBVPG1788

GAL3

^Y12

GAL3

^YJM978

GAL3

^NCYC361

GAL3

^DBVPG1788

GAL3

^BY

0.5 1 1.5 2 2.5 3

ρ

Gal3

/ρ

Gal3

( GAL3

BY

)

K

_Gal

/K

_Gal

( GAL3

^BY

) [gal]=0.05%

0.5 1 1.5 2 2.5 3

ρ

Gal3

/ρ

Gal3

( GAL3

BY

)

K

_Gal

/K

_Gal

( GAL3

^BY

) [gal]=0.1%

0.5 1 1.5 2 2.5 3

ρ

Gal3

/ρ

Gal3

( GAL3

BY

)

K

_Gal

/K

_Gal

( GAL3

^BY

) [gal]=0.5%

Graded response

Binary response

D

1 10 100

0.01 0.1 1 10 100

ρ

Gal3

/ρ

Gal3

( GAL3

BY

)

ON BISTABLE OFF

1 10 100

0.01 0.1 1 10 100

ρ

Gal3

/ρ

Gal3

( GAL3

BY

)

ON BISTABLE OFF

1 10 100

0.01 0.1 1 10 100

ρ

Gal3

/ρ

Gal3

( GAL3

BY

)

ON BISTABLE OFF

[gal]=0.05% [gal]=0.1% [gal]=0.5%

K

_Gal

/K

_Gal

(GAL3

^BY

) K

_Gal

/K

_Gal

(GAL3

^BY

) K

_Gal

/K

_Gal

(GAL3

^BY

) GAL3

^Y12

GAL3

^YJM978

GAL3

^NCYC361

GAL3

^DBVPG1788

GAL3

^BY

Figure 4. GAL 3 alleles map to distinct locations of the parameter space.

A, B Parameter values obtained by fitting the model to experimental data collected on five strains at three concentrations of the inducer ([gal] = 0.05, 0.1, and 0.5%).

Six independent fits were performed (one per gray line). For each one, different values of GAL3-independent parameters were chosen (see Appendix Text S1), and parameters q

Gal3

(A) and K

gal

(B) were estimated for each strain. Dots represent their value for the indicated strain, relative to the value estimated for the GAL 3

^BY

strain. Error bars: uncertainty on parameter estimation for each inference (see Materials and Methods).

C Phenotypic landscape predicted by the model. At defined concentrations of the inducer ([gal]), the values of q

^Gal3

and K

gal

determine whether the response is gradual (brown) or binary (pink). The white zone is an intermediate region where the distinction between gradual and binary is unclear. Using parameters inferred in (A) and (B), alleles are mapped to the landscape (colored dots). Error bars: standard deviation of the six distinct estimations.

D Bifurcation diagram of the deterministic description of the network at steady state. Yellow: region of the (q

Gal3

, K

gal

) space where the system is bistable at the

indicated galactose concentration. Parameters: q

Gal1

= 100, q

Gal80

= 250, others as in Appendix Table S1. Symbols: positions of the five natural GAL3 alleles as in (C).

(9)

(GAL3

^NCYC361

and GAL3

^DBVPG1788

) display a high GAL3 strength q

Gal3

and a low “typical” galactose concentration K

_Gal

. Interestingly, we observed that q

Gal3

and K

_Gal

can be decorrelated. In particular, although both GAL3

^YJM978

and GAL3

^Y12

strains were binary respon- ders at all galactose concentrations tested, the model attributed this behavior to different functional effects: a low sensitivity to galactose (high K

_gal

) for the Gal3 protein originating from YJM978 and a reduced strength of the GAL3 gene originating from Y12. Thus, the induction specificities of the strains can be attributed to distinct GAL3-related parameters.

To address the direct relationship between the network proper- ties (gradual or binary response) and the GAL3-related parameters, we positioned each of the tested strains within a phenotypic land- scape according to their relative q

Gal3

and K

_Gal

parameters (Fig 4C).

According to our model, q

Gal3

and K

_Gal

parameters are sufficient to predict the behavior (gradual or binary) associated with a given GAL3 allele at a given concentration of galactose. As an illustration of these predictions, we specifically observed the dynamics of tran- scriptional activation of the network for the strain GAL3

^DBVPG1788

(Appendix Fig S8). The position of the GAL3

^DBVPG1788

allele on the phenotypic landscape corresponded to a transient binary activation at low concentration ([gal] = 0.05%) converted into a gradual response at higher concentration ([gal] = 0.1% and [gal] = 0.5%).

Importantly, mathematical analysis of the system at steady state indicated that all strains should reach a monostable ON state at equilibrium (Fig 4D and Appendix Text S1). Thus, although a binary regime of induction is observed in some strains, this is a tran- sient regime that should eventually convergence to monostability after a very long time (> 10 h).

Variation in induction dynamics is consistent with variation in diauxic shift decision

The physiological relevance of the GAL network regulation is to switch from the consumption of glucose (the preferred carbon source) to the consumption of galactose when glucose supply is running out. This diauxic switch is controlled not only by galactose induction but also by glucose-mediated repression. When both sugars are present, their relative concentration ratio determines whether cells activate the switch or not (New et al, 2014; Escalante- Chong et al, 2015; Venturelli et al, 2015). At some ratio values, only a fraction of the cells are induced, even at steady state. Given this dual regulation, the propensity of a strain to activate GAL metabo- lism can be quantified by measuring the fraction of induced cells after a prolonged period (8 h) of simultaneous induction (by galac- tose) and repression (by glucose). If this measurement is repeated at a given concentration of galactose and various concentrations of glucose, a useful score can be computed (called “decision thres- hold” hereafter): the concentration of glucose needed to maintain half the population of cells in the repressed (OFF) state (Fig 5A). A high decision threshold corresponds to an early activation of GAL genes during the diauxic shift.

A previous study identified GAL3 as an important genetic deter- minant for this decision: The concentration ratio at which cells turn GAL expression ON differs between strains harboring different natu- ral alleles of GAL3 (Lee et al, 2017). We asked if this variation, visi- ble in the steady-state exposure to both glucose and galactose, was correlated with the variation observed on the dynamics of network

induction upon exposure to galactose only. We chose four strains that showed different decision thresholds because of different GAL3 alleles (Lee et al, 2017; Fig 5B), and we monitored their dynamics of induction at three different concentrations of galactose (with no glucose). We then used our model to assign q

Gal3

and K

_Gal

parame- ter values to each strain. Experimental data and model fitting are shown in Fig 5B and Appendix Fig S9. We used the inferred param- eter values to visualize the four strains in the parameter space where binary and gradual responses upon stimulation at [gal] = 0.25% are delimited (Fig 5D). Remarkably, the properties of induction dynamics in absence of glucose were fully consistent with the decision threshold during diauxic shift from glucose to galac- tose. Strains having a low decision threshold, such as GAL3

^YJM421

, displayed a transient binary response, and strain GAL3

^BC187

had a high decision threshold and responded gradually. Coordinates of strains in the parameter space indicate that q

Gal3

values are highly informative on the decision threshold (Fig 5D). Thus, mapping allelic variation to dynamic parameters of induction is also useful to understand trade-offs that are observed at steady state.

A quantitative parameter change predicts a role of H 352 D SNP on Gal 3 :Gal 80 complex formation

We noticed that, at position 352 of the Gal3p protein, all natural strains harbored an aspartic acid, whereas the reference laboratory strain BY harbored a histidine. This aspartic acid was also conserved in Saccharomyces mikatae, Saccharomyces paradoxus, and Saccharomyces uvarum protein sequences (Cherry et al, 2012).

Given the prevalence of this aspartic acid, we hypothesized that a single H352D amino acid change could have consequences on Gal3p regulatory function.

To test this, we generated an artificial GAL3

^BY-H352D

allele by introducing the H352D mutation in the GAL3

^BY

strain and we moni- tored the dynamics of induction of the resulting strain. At similar concentrations of galactose, induction was faster for the modified strain than for the original strain (compare Fig 6A with 2A). We then used our model to make functional predictions. We fitted our model to experimental data of induction as described above for natural alleles. Induction dynamics of the modified strain were fully explained by preserving parameter K

_Gal

and increasing q

Gal3

(Fig 6B). This suggested that the H352D mutation did not affect acti- vation of Gal3p by galactose but rather the strength of Gal3p, which summarizes six biochemical features: the basal level of GAL3 tran- scription prior to induction, its translation and degradation rate, the degradation rate of its coding mRNA, its capacity to homodimerize, and the affinity of activated Gal3p for Gal80p.

How the implicated SNP could change either the leaky transcrip- tion level prior to induction or the transcription rate during induc- tion is difficult to imagine. In addition, the amino acid change was not surrounded by any particular peptide motif, nor was it located at the extremity of the protein. This did not support for an effect on translation or degradation rates. Thus, the most plausible interpreta- tion of the parameter change of the model was that the H352D modification would increase either the capacity of Gal3p* to dimer- ize or the affinity of the Gal3p* dimer for Gal80p.

To explore these possibilities, we analyzed the structure of the

heterotetramer [Gal3p*]

2

-[Gal80p]

2

that was previously solved

(Lavy et al, 2012). We made three important observations. First,

(10)

His352 is located at the binding interface of the Gal3p* dimer with the Gal80p dimer (Fig 6C), and distant from the pocket containing galactose and ATP. Secondly, it is spatially close to the Gal80p site where the acidic domain of Gal4p is known to bind (Thoden et al, 2008). Finally, the Gal80p dimer exhibits a positive electrostatic surface potential in the vicinity of Gal3p-His352, suggesting that the replacement of the neutral His352 by a negatively charged aspartic acid would stabilize the Gal3p*-Gal80p complex. Stabilization refers here to a gain in thermodynamic stability relative to the Gal4p- Gal80p complex, or in other words, to a decrease in Gibbs free energy change (DG

sub

) for the substitution of the Gal4p dimer by the Gal3p dimer as binding partner of the Gal80p dimer. A molecular dynamics simulation of the Asp352 mutant (in a model system of the Gal3p-Gal80 complex) indicates that two positively charged amino acids, Gal3p-Arg362 and Gal80p-Lys287, are able to form direct salt bridges with Asp352 (Fig 6D). These attractive interac- tions of Asp352 with its environment are, however, expected to be partially canceled out by repulsive interactions with the less proxi- mate, negatively charged amino acids Gal3p-Glu363 and Gal80p- Glu348 (Fig 6D). Also, the polar solution (water + counter ions) could partially reduce the stabilization effect of the H352D mutation because residue 352 is better solvated in the Gal3p dimer than in the Gal3p*-Gal80p tetramer. Thus, to quantify a possible stabiliza- tion effect of the H352D mutation, we computed the change in the

Gibbs free energy difference, ΔΔG

sub

= ΔG

subD352

ΔG

subH352

, with the aid of the thermodynamic cycle depicted in Fig 6E. The actual free energy calculations (see Materials and Methods) yielded ΔΔG

sub

= 2.8 0.9 kcal/mol, which indicates that the H352D mutation indeed increases the thermodynamic stability of the Gal3p*-Gal80p complex with respect to the Gal4p-Gal80p complex.

Thus, as predicted by the dynamic model of network induction, the H352D mutation increases the cellular response by facilitating the formation of the complex.

Discussion

We experimentally monitored the induction dynamics of the yeast GAL network in the context of natural genetic variation at the GAL3 gene. This revealed that GAL3 natural variation is sufficient to convert a gradual induction into a binary one. We built a stochastic model of the network and used it to link GAL3 alleles to functional network parameters. This approach discriminated alleles that increased the strength of activated Gal3p (e.g., of strains NCYC361 and DBVPG1788) from alleles that desensitized Gal3p to galactose activation (e.g., of strain YJM978). Alleles showing different glucose/galactose trade-offs at equilibrium displayed different dynamics of induction, and they were associated with different

A B

0.5 1 1.5 2 2.5 3

0.5 1 1.5 2 2.5

3

[gal]=0.25%

ρ

Gal3

/ρ

Gal3

( GAL3

BY

)

K

_Gal

/K

_Gal

( GAL3

^BY

) C

Decision threshold ([glu] %w/v )

F rac tion of ON c ells at 8 hours

Decision threshold

1 0.25 0.063 0.016 0.004

GAL3

^BY

GAL3

^BC187

GAL3

^YJM428

GAL3

^YJM421

D 8 hours of culture

Glucose concentration

galactose

[gal]=0.05, 0.1 or 0.5%

time [glu]

time (min)

Fraction of ON cells (%)

[gal]=0.1%

[gal]=0.5% [gal]=0.05%

GAL3

^BC187

GAL3

^YJM428

GAL3

^YJM421

[gal]

0.25% +

0 50 100 150 200 250

0 25 50 75 100

ρ _Gal3 , K _Gal

inference 50%

ρ

Gal3

Dec. Thr.

Figure 5 . Relationship between inducibility and diauxic shift decision threshold.

A Schematic representation of decision threshold measurement. The decision threshold corresponds to the concentration of glucose at which 50 % of the cells are induced in the presence of 0 . 25 % galactose. The blue curve is theoretical and shown to explain how the fraction of ON cells depends on glucose concentration.

B Decision thresholds for strains GAL 3

^BY

, GAL 3

^YJM421

, GAL 3

^YJM428

, and GAL 3

^BC187

at [gal] = 0 . 25 %.

C Schematic representation of GAL 3 induction parameters determination.

D Location of the GAL3 replacement strains in the phenotypic landscape of the model at [gal] = 0.25%. Error bars: as in Fig 4C. Inset: q

Gal3

values as a function of the

decision threshold, with dots corresponding to strains.

(11)

+ -

Δ G

^D352_sub

- Δ G

^H352_sub

= Δ G

^tetramer_alchem.

- Δ G

^dimer_alchem.

= -2.8 0.9 kcal/mol Δ G

^tetramer_alchem.

Δ G

^dimer_alchem.

Δ G

^D352_sub

[Gal3

^D352

]

₂

[Gal80]

₂

+ [Gal4]

₂

[Gal3

^D352

]

₂

+ [Gal80]

₂

[Gal4]

₂

Δ G

^H352_sub

[Gal3

^H352

]

₂

[Gal80]

₂

+ [Gal4]

₂

[Gal3

^H352

]

₂

+ [Gal80]

₂

[Gal4]

₂

D

E C

A [gal]=0.5%

Graded response

Binary response

0.5 1 1.5 2 2.5 3

ρ Gal3 /ρ Gal3 ( GAL3 BY )

K _Gal /K _Gal ( GAL3 ^BY )

0 50 100 150 200 250

0 25 50 75 100

time (min)

fraction of ON cells (%)

ρ

_Gal3

=145 K

_gal

=0.055%

GAL3

^BY-H352D

inference

B

[gal]=0.1%

[gal]=0.5%

[gal]=0.05%

Asp352 Lys287 Arg362

Glu348 Glu363

GAL3

^BY

GAL3

^BY-H352D

Figure 6 . Functional inference of the H 352 D variant of GAL 3 .

A Experimental acquisitions (dots s.e.m., n = 4 ) and model fitting (curves) of the induction dynamics of the GAL 3

^BY-H352D

strain.

B GAL 3

^BY

(blue dot) and GAL 3

^BY-H352D

(gray dot with standard deviation bars) strains localization in the phenotypic landscape of the model at [gal] = 0 . 5 %. Arrows:

phenotypic trajectory between the two alleles. Error bars: as in Fig 4C.

C Structure of the tetrameric complex [Gal3p*]

2

[Gal80p]

2

(PDB entry 3V2U). Residue His352 of one Gal3p unit is in the back side and not visible. The His352 residue of the other Gal3p unit is shown as green beads in the center; it is located at the binding interface of the Gal3p* dimer (white beads) and the Gal80p dimer (colored surface). Gal80p residues are colored according to their electrostatic surface potential from red (≤ 10 kT/e) to blue (≥ +10 kT/e). Yellow beads: The acidic activation domain of Gal 4 p was inserted in the complex by superimposition with crystal structure 3 BTS. A similar insertion in the other Gal 80 p unit is in the back side and not visible. Created with VMD software.

D Local stabilization of Gal 3 p-Asp 352 by residues Gal 3 p-Arg 362 and Gal 80 p-Lys 287 in the [Gal 3 ]

₂

[Gal 80 ]

₂

complex. Green and white labels refer to residues from Gal 3 p and Gal 80 p units, respectively. The figure shows a snapshot from a molecular dynamics simulation of the mutation H 352 D carried out for a model system of the complex (see Appendix Text S 2 ). Atoms within 15 Å of residue 352 are shown as thin sticks in white (Gal 3 p) or dark gray (Gal 80 p). Remaining atoms are shown as a solid surface. Created with VMD software.

E Thermodynamic cycle quantifying the energetic impact of the H352D mutation on the substitution of [Gal4p]

2

by [Gal3p]

2

as binding partner of [Gal80p]

2

(DG

sub

, horizontal arrows). This impact is measured as DDG = DG

subD352

DG

subH352

, which equals to DG

alchemtetramer

DG

alchemdimer

(vertical arrows) because free enthalpy

is a state function. These latter quantities correspond to the free enthalpy change for the alchemical (double) mutation of His > Asp in the Gal3p-Gal80p tetramer

and in the Gal3p dimer, respectively, which were computed by alchemical free energy calculations (see Appendix Text S2).

(12)

strength of activated Gal3p. Our approach also predicted a func- tional effect of a single non-synonymous SNP that was validated by atomistic simulations of the binding interface between Gal3p and Gal80p dimers. These results provide further details on the yeast GAL system and, perhaps more importantly, they constitute a proof- of-concept of the feasibility and usefulness of linking genetic vari- ants to model parameters.

Genetic variability of the yeast GAL network

Our in vivo and in silico analyses of the induction kinetics of yeast GAL activation reveal properties of this system and how it is sensi- tive to genetic variation. The fact that an allelic exchange in GAL3 was sufficient to convert a gradual response into a transient binary one is important because it shows that, although binary induction requires a polygenic architecture of transcriptional regulation, whether it takes place or not can be dictated by the genotype of a single gene.

Previously, several computational models of the network have been proposed, usually in an effort to understand the properties of the system at steady state (Acar et al, 2005; Apostu & Mackey, 2012; Venturelli et al, 2012). Particularly, they highlighted the important role of Gal3p-, Gal1p-, and Gal80p-mediated feedback loops. Our in silico analysis suggests that the gradual or binary kinetic response is mainly controlled by the initial number of repres- sors (Gal80p) and inducers (Gal1p and Gal3p), the efficacy of galac- tose to activate the inducers, and the efficiency of the activated inducers to release the effect of repressors. In particular, a low mean number of inducers at the time of induction may lead to high cell- to-cell variability in their actual number. Cells with few inducers (as compared to repressors) display a lag time before responding, lead- ing to a binary response pattern at the population level. This predic- tion from our model is fully consistent with recent experiments that tracked the induction of the network at the single-cell level and showed that the initial concentrations of Gal1p and Gal3p are predictive of the transient bimodal response (Stockwell & Rifkin, 2017). We also observed that feedback loops were important to control the strength of cell-to-cell variability before induction (Gal80-mediated negative feedback) and the duration of lag times (Gal3/Gal1-mediated positive feedbacks), which agrees with the previous observation that disabling the Gal80p and Gal3p feedback loops can transform a gradual response into a binary one (Ramsey et al, 2006). Our results on GAL3 genetic variants also complement previous genetic manipulations of the feedback loops, where their effect on bimodality was tested by modulating promoter activities (Acar et al, 2005; Ramsey et al, 2006; Venturelli et al, 2012; Peng et al, 2015). Here, we showed that a non-synonymous variant affecting Gal3p:Gal80p interaction directly affects the dynamics of transient bimodality. This is a novel experimentally based observa- tion that is totally coherent with the conclusions of Venturelli et al (2012) who showed computationally that steady-state bimodality of the network could rely on protein–protein binding affinities.

We also observed that genetic variations at GAL3 could affect its propensity to be activated by galactose/ATP binding. In particular, the GAL3

^YJM978

allele was associated with increased values of the K

_Gal

parameter (more galactose needed for its activation). This allele harbored three non-synonymous SNPs: M179I, R312I, and H352D.

As shown above, H352D is found in all natural alleles that we tested

and it therefore does not explain a change in K

_Gal

specifically for GAL3

^YJM978

. According to the structure of the Gal3p:Gal80p tetramer complexed with galactose and ATP (Lavy et al, 2012), the other two polymorphic sites do not map close to the pocket containing the ligands. Met179 is located at the surface of the complex, distant from any binding interface and distant from the bound galactose (30 A ˚ ) and ATP (25 A ˚ ). The mutational effect of the rather conservative amino acid change (methionine to isoleucine) on the K

_Gal

parameter might therefore be negligible. In contrast, the non-conservative argi- nine to isoleucine mutation at site 312 could influence K

_Gal

in several ways: First, the positively charged arginine contributes favorably to the binding of the negatively charged ATP through long-range elec- trostatic interactions. The charge-neutral Ile312 variant lacks this favorable interaction and may have lower affinity for ATP, thereby penalizing activation by the two ligands. Second, residues Arg312 of the two Gal3p units are in direct contact with each other, and the non-conservative R312I change may affect the dimerization of Gal3p. Lavy et al (2012) reported that, in absence of galactose, Gal3p is monomeric in solution and adopts an open conformation that differs from the conformation generating the Gal3p:Gal3p dimeric interface found upon interaction with Gal80p. If the R312I modification alters Gal3p dimerization, this could modify the overall activation by galactose because these processes are coupled.

We observed that genetic variation of the strength of activated Gal3p (q

Gal3

), estimated from the dynamic properties of network activation, was correlated with variation of the glucose/galactose trade-off at steady state. This implies that the two traits co-evolve in natural populations of S. cerevisiae. Given the relatively short time scale of network induction, mild differences in the dynamics of activation alone are unlikely to cause fitness differences unless environmental galactose concentrations are highly dynamic. In contrast, variation in the sensitivity of the network to the ratio of external sugars corresponds to the triggering of an adaptive metabolic process, which is highly related to fitness even for slow environmental changes. The induction dynamics that we observed on short time scales are probably not themselves under selection, but they provide valuable information on the molecular mechanism affecting a fitness-related trait operating on longer time scales.

The H352D variant is interesting in this regard. At this position in Gal3p, a histidine residue was found in all laboratory strains (BY4741, CEN.PK, D273-10B, FL100, FY1679, JK9-3d, SEY6210, W303, X2180-1A, YPH499), while nearly all natural isolates as well as distant species possess an aspartic acid. Our results showed the importance of this aspartic acid for interaction with Gal80p, which suggests that its conservation in wild population results from purify- ing selection. The presence of slightly deleterious mutations in labo- ratory strains is well known. Examples from the reference strain BY/S288c include mutations in AMN1 (Yvert et al, 2003), BUL2 (Kwan et al, 2011), ERC1 (Fehrmann et al, 2013), FLO8 (Kron, 1997), GPA1 (Yvert et al, 2003), and HAP1 (Gaisne et al, 1999).

These mutations likely resulted from a release of purifying selection caused by strong population bottlenecks when propagating yeast on petri dishes. As for the genes listed above, the implication for GAL3 is that most mechanistic studies refer to a “wild-type” protein that is in fact a slightly hypomorphic allele not found in nature.

We also noted cases where the specificities of a GAL3 allele in

the context of the BY strain did not reflect the properties of the

(13)

donor strain. An extreme example of this was the GAL3

^DBVPG1853

allele which improved the response of the BY strain (Fig 1D) while the DBVPG1853 strain itself did not respond at all to galactose (not shown), presumably because of genetic defects in other genes.

Another example was the GAL3

^Y12

allele which, when introduced in the BY strain, conferred a binary induction that was milder than the binary phenotype of the Y12 strain itself (Appendix Fig S10). This illustrates the importance of other loci than GAL3 that also contri- bute to the dynamics of network induction. Background-specific effects are common and should be taken into account when inter- preting the functional impact of natural alleles in their original strain context (Gerke et al, 2010).

Linking DNA variants to model parameters: feasibility and potential

We developed our approach using a model system, the yeast GAL network, which was an ideal context for investigation: Molecular players were well known, important network properties had been previously described, genetic engineering could be used to study the effect of a single gene in an otherwise isogenic background, and experimental measurements were relatively cheap. If network modeling had provided no added value in such a context, it would be hard to imagine how it could be useful in more complex frame- works. We report that it did: Observing different dynamics experi- mentally was not sufficient to make functional inferences, but combining data and modeling was. The concept is therefore fruitful and it is interesting now to consider how it can be extended to other biological systems.

First, it is important to realize that inference is based on the wealth of information contained in the dynamics of activation.

Evidently, studying the system at equilibrium would not be suffi- cient. Mapping DNA variants to model parameters is therefore promising for systems where time-course data are available.

Second, even in the simple context of our study, not all param- eters of the model were identifiable and it was necessary to aggre- gate several of them into a meta-parameter (q

Gal3

). We admit that this constitutes a limit of the approach: When the H352D SNP was linked to this meta-parameter, additional assumptions were needed to infer biochemical effects. Similar difficulties will likely be encountered in other systems and the identifiability and sensi- tivity analysis of the model are therefore crucial to determine the nature of biological information that can be retrieved by the approach.

Third, our method here was to infer function and then to validate a prediction by exploring the structural data of a protein complex.

Depending on the system under consideration and the data avail- able, it may be judicious to reverse the approach: scanning protein structures first in order to identify variants modifying binding affi- nities and then studying these variants specifically using experimen- tal measurements and model fitting. This way, a parameter change is first inferred from structural data and a dynamic model of the network then allows one to predict its phenotypic effect. The SAAP database (Al-Numair & Martin, 2013), which registers structurally relevant variants of human proteins, may constitute a very helpful resource to do this.

Fourth, while we based our approach on cell population distri- butions, tracking the response dynamics of individual cells over

time is also possible (Stockwell & Rifkin, 2017) and can provide more information on the network response. In other contexts, such methods had been very useful to infer parameters associated with individual cells (Llamosi et al, 2016). A variant may then be asso- ciated with one parameter by a whole distribution of values, which likely carries more information than a single scalar value as presented here. In addition, such time-lapse acquisitions can provide high temporal resolution of gene induction (Aymoz et al, 2016).

Fifth, additional work is now needed to extend the approach to more than one gene. At the level of an entire network, the overall genotype of the individual is a combination of alleles. The number of such combinatorial genotypes of the network segregating in natural populations can be very large and mapping this diversity to the parameter space would be very interesting. In particular, models accounting for genetic changes might predict and explain genetic interactions (epistasis) within the network. The challenge to achieve this will likely reside in the number of free parameters:

If the genotype is allowed to vary at too many genes, parameters cannot be constrained efficiently. Mapping variants one gene at a time, as we did here, and then in combination would maintain this necessary constraint while evaluating epistasis. A more difficult task would be to infer the contribution of genes that are external to the network while nonetheless affecting its behavior (e.g., by modifying widely transcription rates or the stability of proteins, or cross-talks with other networks). Studying these factors by our approach is only possible after they are identified and connected to the network. Their identification can be obtained by genetic mapping. For example, we recently identified a locus on yeast chromosome V that affects the variability of the GAL response at transient times of activation (Chuffart et al, 2016). Once identified, these factors must be integrated in the network model, which may be a complex task.

Network modeling is expected to help the development of personalized medicine and the fact that it is possible, in a yeast system, to personalize model parameters according to DNA vari- ants is encouraging. Can the approach described here be applied to human variants? This requires overcoming several difficulties that could be avoided in our framework. First, most regulatory networks of human systems are incompletely known. Second, most of these networks comprise numerous genes, implying many model parameters and, possibly, too many degrees of freedom for adjustments and identifiability issues. The first task is therefore a careful identifiability and sensitivity analysis of the model and, as much as possible, a reduction of its complexity. The work of Zhao et al (2015) is encouraging in this regard. The authors studied the mitochondrial outer membrane permeabilization network control- ling entry in apoptosis. Their model comprised ~50 parameters and ~20 molecular species, but the network critical behavior (bi- furcation point) was sensitive to less than half of the parameters.

Assigning function to natural allelic variation via dynamic modeling of gene network induction

HAL Id: hal-01976613

https://hal.archives-ouvertes.fr/hal-01976613

Submitted on 8 Jun 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Distributed under a Creative Commons Attribution| 4.0 International License

dynamic modeling of gene network induction

Magali Richard, Florent Chuffart, Hélène Duplus-bottin, Fanny Pouyet, Martin Spichty, Etienne Fulcrand, Marianne Entrevan, Audrey Barthelaix,

Michael Springer, Daniel Jost, et al.

To cite this version:

Magali Richard, Florent Chuffart, Hélène Duplus-bottin, Fanny Pouyet, Martin Spichty, et al.. Assign-

ing function to natural allelic variation via dynamic modeling of gene network induction. Molecular

Systems Biology, EMBO Press, 2018, 14 (1), pp.e7803. �10.15252/msb.20177803�. �hal-01976613�

Assigning function to natural allelic variation via dynamic modeling of gene network induction

Magali Richard 1 , 2 ,* , Florent Chuffart 1 , Hélène Duplus-Bottin 1 , Fanny Pouyet 1 , Martin Spichty 1 , Etienne Fulcrand 1 , Marianne Entrevan 1 , Audrey Barthelaix 1 , Michael Springer 3 , Daniel Jost 2 ,** &

Gaël Yvert 1 ,***

Abstract

Keywords galactose; personalized medicine; SNP function; stochastic model;

yeast

Subject Categories Methods & Resources; Network Biology; Quantitative Biology & Dynamical Systems

DOI 10 . 15252 /msb. 20177803 | Received 7 June 2017 | Revised 15 December 2017 | Accepted 18 December 2017

Mol Syst Biol. ( 2018 ) 14 : e 7803

Introduction

Systems biology has opened new opportunities to better predict the action of DNA variants. First, “omics” data that are gathered at various levels (DNA, transcripts, proteins, metabolites, etc.) estab- lish relations between target sequences and functional pathways.

1 Laboratoire de Biologie et de Modélisation de la Cellule, Ecole Normale Supérieure de Lyon, CNRS, Université Lyon 1, Université de Lyon, Lyon, France 2 Univ. Grenoble Alpes, CNRS, CHU Grenoble Alpes, Grenoble INP, TIMC-IMAG, Grenoble, France

3 Department of Systems Biology, Harvard Medical School, Boston, MA, USA

*Corresponding author. Tel: +33 4 56 52 00 68; E-mail: magali.richard@univ-grenoble-alpes.fr

**Corresponding author. Tel: + 33 4 56 52 00 69 ; E-mail: daniel.jost@univ-grenoble-alpes.fr

***Corresponding author. Tel: +33 4 72 72 80 00; E-mail: gael.yvert@ens-lyon.fr

In addition, the galactokinase Gal1p involved in galactose metabo- lism is also a co-inducer of the response (Bhat & Hopper, 1992).

Song et al, 2010; Apostu & Mackey, 2012). In particular, the

Thus, it is difficult to predict the functional consequence of sequence variation in GAL3.

Using the yeast GAL3 gene as a model framework, we show here

that experimental acquisitions combined with network modeling are

efficient to predict the effect of sequence variants. The principle of

the approach is to link genetic variation to informative changes of

parameter values of the model. We show that replacing natural

GAL3 alleles can be sufficient to transform a gradual response into a

binary activation, and the approach allowed us to distinguish

between different types of GAL3 alleles segregating in S. cerevisiae

populations: those altering the activation of Gal3p by galactose and

those altering the strength with which activated Gal3p alleviates the

transcriptional inhibition operated by Gal80p. In particular, our

approach was efficient to associate a non-synonymous SNP with a

change of binding affinity for Gal80p.

Results

Natural variation in GAL 3 is sufficient to convert a gradual induction into a binary switch

We constructed a panel of yeast strains that were all isogenic to the reference laboratory strain BY, except for GAL3. At this locus, each strain carried an allele that was transferred from a natural strain of the Saccharomyces Genome Resequencing Project (Liti et al, 2009;

Appendix Fig S1). All strains of the panel also harbored a P

-GFP reporter of network activity, where the promoter of the GAL1 gene controlled the expression of a GFP fluorescent protein destabilized by a degradation signal (Mateus & Avery, 2000; Chuffart et al, 2016).

GAL1 is a paralogous gene of GAL3 (Hittinger & Carroll, 2007) and transcription at its promoter is commonly used as a proxy of GAL network activity (Acar et al, 2005; Venturelli et al, 2015; Wang et al,

Significant differences in the dynamics of activation were observed between the strains. Those harboring the GAL3

, GAL3

, GAL3

, GAL3

, GAL3

, and GAL3

alleles displayed a gradual response and all cells of the population were induced and responded with similar rate of expression, maintaining population homogeneity (see example shown in Fig 1A). In contrast, strains harboring the GAL3

and GAL3

alleles displayed a binary response, with a transient coexistence of induced (ON) and uninduced (OFF) cells in the population (example in Fig 1B).

We quantified induction using two metrics: the mean level of reporter expression in activated cells (response amplitude) and the

0 50 100 150 200 250 2

3 4 5

0 50 100 150 200 250 0%

25%

50%

100%

75%

GAL3

GAL3

GAL3

GAL3

GAL3

GAL3

GAL3

GAL3

0 2 4 6 8

0 10 20 30 40 60 80 100 130 160 205 250

0 2 4 6 8

Magali Richard ¹ ^, ² ^,* , Florent Chuffart ¹ , Hélène Duplus-Bottin ¹ , Fanny Pouyet ¹ , Martin Spichty ¹ , Etienne Fulcrand ¹ , Marianne Entrevan ¹ , Audrey Barthelaix ¹ , Michael Springer ³ , Daniel Jost ² ^,** &

Gaël Yvert ¹ ^,***