Characterization of amino acid interaction networks in proteins

(1)

HAL Id: hal-00431447

https://hal.archives-ouvertes.fr/hal-00431447

Submitted on 12 Nov 2009

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Characterization of amino acid interaction networks in proteins

Omar Gaci, Stefan Balev

To cite this version:

Omar Gaci, Stefan Balev. Characterization of amino acid interaction networks in proteins. Journées

Ouvertes en Biologie, Informatique et Mathématiques 2008, Jun 2008, Lille, France. pp.59-60. �hal-

00431447�

(2)

Characterization of amino acid interaction networks in proteins

Omar Gaci and Stefan Balev

LITIS, Le Havre University, 25 rue Ph. Lebon BP 540, 76058 Le Havre Cedex, France {Omar.Gaci, Stefan.Balev}@univ-lehavre.fr

Abstract:

A protein interaction network is a graph whose vertices are the protein’s amino acids and whose edges are the interactions between them. Using a graph the- ory approach, we study the properties of these networks. In particular, we are interested in the degree distribution and mean degree of the vertices. The results presented in this paper constitute the first steps of a new network approach to the protein folding problem.

Keywords:

protein structure, interaction network, scale-free network.

Proteins have complex, irregular structures [3]. In their natural environment, they adopt a native compact three-dimensional form. This process is called folding and is not fully understood. The process is a resuslt of interactions between the protein’s amino acids which form chemical bonds.

We use the following network model to describe a protein. We define a graph in which each vertex represents an amino acid participating in secondary structure element (SSE). We ignore the rest of the amino acids because the most important structure determining interactions are those between amino acids belonging to the same SSE on local level and between different SSEs on global level. Two vertices are connected by an edge if the corresponding amino acids are in contact. We consider that two amino acids are in contact if the distance between their

C_α

atoms is less than 7 ˚ A. We call this graph SSE interaction network (SSE-IN).

In this paper we study some of the properties of SSE-INs. The main advantage of this model is that it allows to cope with different biological problems related to protein structure using graph theory tools. Ignoring details, such as the type and the exact position of each amino acid, this ab- stract and compact description allows to focus on the interactions’ structure and organization. The characterization we propose constitutes a first step of a new approach to the protein folding problem.

The properties identified here, but also other properties we plan to study, can give us an insight on the folding process. They can be used to guide a folding simulation in the topological pathway from unfolded to folded state.

We computed the cumulative degree distribution (P

_k

) of SSE-IN of all proteins in PDB [2]. A sample of these results is presented on Fig. 1. For all studied networks we observe a power law regime followed by a sharp cutoff,

P_k∼k^−(α−1)e^−k/α

. A major part of the vertices have relatively homogeneous degrees close to the mean degree. It is interesting to note that the cutoff occurs near the mean degree. In this way, there is only a small fraction of hubs with degree greater than the mean.

These properties characterize SSE-IN as truncated scale-free networks [1].

Since the mean degree plays the role of a threshold beyond which the cumulative degree distri- bution decreases exponentially, it is interesting to study its evolution with the size of the network.

Fig. 2a shows that the mean degree increases very slightly with the size of the network. Whatever

the size of the network is, we observe that the mean degree is always between 5 and 8. This mean

(3)

0.01 0.1 1

1 10 100

Cumulative Distribution

Degree k 1ES9 SSE-IN Mean degree Pk

0.01 0.1 1

1 10 100

Cumulative Distribution

Degree k 1AON SSE-IN

Mean degree Pk

Figure 1. Cumulative degree distributions. Left: 1ES9 (size 50, mean degree 6.6). Right: 1AON (size 4988, mean degree 7.5).

degree interval is a common property characterizing all SSE-IN. In order to explain this property, let us consider the structure of our networks. They are composed of densely connected subgraphs corre- sponding to SSEs. The number of edges connecting different subgraphs is relatively small, but these edges are the most important, since they correspond to interactions determining the tertiary structure.

0 20 40 60 80 100

1-50 50-100 100-500 500-1000 1000-8055 0

2 4 6 8 10

Proportion Mean degree

Proteins SSE-IN size All proteins SSE-IN

Mean degree

0 20 40 60 80 100

1-10 10-20 20-25 25-270 0

2 4 6 8 10

a-Helix-IN size All a-Helix IN

Mean degree

0 20 40 60 80 100

1-10 10-20 20-50 50-468 0

2 4 6 8 10

b-Sheet-IN size All b-Sheet IN

Mean degree

(a) (b) (c)

Figure 2. Network size distribution and mean degree as a function of the size: (a) SSE-IN, (b)α-helix subnetworks, (c)β-heet subnetworks.

The mean degrees of SSE subgraphs are shown on Fig. 2b and c. We can see that the mean degree evolution at microscopic level is almost the same as at macroscopic level. Independently of the SSE size and type, the mean degree of each SSE subgraph,

zSSE

is always bounded:

zmin < zSSE < zmax

when the size of the network is more than 10. In the general case

zmin = 5

and

zmax = 8, but when

we consider a specific SSE size and type, finer bounds can be found.

Using the bounds on

z_SSE

, it is not difficult to give more precise bounds on the mean degree

z

of a SSE-IN by showing that

zmin/(1−r) < z < zmax/(1−r), where r

is the ratio of inter-SSE edges. Proteins with bigger size have more SSEs and hence more links between different SSEs. This explains the increase of the mean degree with the size of the networks. The ratio of inter-SSE edges is quite variable, but it never exceeds 20%.

References

[1] L.A.N. Amaral, A. Scala, M. Barth´el´emy, and H. E. Stanley. Classes of small-world networks. Proc. Natl.

Acad. Sci USA., 97(21), 2000.

[2] H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E.

Bourne. The protein data bank. Nucleic Acids Research, 28:235–242, 2000.

[3] C. Branden and J. Tooze. Introduction to protein structure. Garland Publishing, 1999.