• Aucun résultat trouvé

Part 1 - Mapping structures to sequence

2.3 Description of tools

2.4.1 SAALSA provides key information for a manual expertise

2.4.1.1 Integration of mappings and browsing alignments

SAALSA uses SSMap to generate protein features. As demonstrated in the first part, among other available mappings, SSMap provides the largest proportion of correct mappings, and then through SAALSA, contributes in reducing the curation time. The complete integration of the mapping in the annotation interface reduces also considerably the time previously employed to figure out which residues in the sequence correspond to given structural features. More specifically, the ligand environment 3D viewer and the ligand environment tables show directly residue numbers in the UniProtKB sequence of interest.

SAALSA can access any of the alignments stored in the SSMap database (down to 70% sequence identity) and display them in the web interface. In this way, curators are sure to have a list of all interesting alignments. From this knowledge, they can quickly reattribute PDB chains possibly not or wrongly attributed in SSMap.

2.4.1.2 Highlighting data inconsistencies and information of interest for annotation

The SAALSA interface presents several features that help curators to evaluate if a given structural information is relevant or not for the annotation task.

First, the summary section in the main page aims to provide a quick overview of all available structural information for a given protein, even if a structure has not been automatically mapped to this UniProtKB entry (e.g. because of ambiguous mapping on different UniProtKB entries).

Then, for PDB chains that are mapped to the UniProtKB entry, relations and inconsistencies between the structural data and the UniProtKB sequence are displayed both in the main page and in the specialized views for the annotation of ligand binding sites:

- In the main page and the residue-level mapping editor, alignments are represented consistently and in a way to indicate at best the context of the local alignment. The main reason for that is to allow curators to check quickly the validity of any mapping used to produce annotation. Instead of showing simply local alignments (the BLAST output), SAALSA renders also unaligned regions. This feature is useful to check boundaries manually and in rare cases to determine that the automatic boundaries found through the BLAST algorithm are not correct. The resolved/unresolved status of each residue are also indicated on alignments. By a simple look, we can identify the residues for which there is structural information, or which are located unstructured regions or floppy loops. Finally, sequence variations are displayed in red in alignments. It is essential to highlight sequence variations together with taxonomic indicators in order to evaluate and possibly modify entry-level mapping of a given PDB entry.

- To define ligand binding site, several features were designed to facilitate the interpretation about the importance of each residue to the constitution of a given structural environment. First, in the 3D structural viewer, residues that are mapped and not mapped are clearly indicated (Annex 6).

Secondly, in the related table:

o a color gradient shows the relevance of the interaction of each residue with the ligand of interest;

o other ligands that are present in the surrounding of the ligand of interest, and residues that are mapped to another protein are clearly indicated;

o sequence variations between the UniProtKB and the PDB reconstructed sequence are highlighted in red. Indeed, the variation of even one residue can be critical for the conformation the other residues in the binding site.

2.4.1.3 Grouping information

Grouping the information is also a way to distinguish common and specific features among different structures. Features shared among all available structures are relevant and the resulting annotations can be produced with high confidence. On the contrary, most of the time, sporadic data can be considered as less important and not taken into account for the generation of annotations. We tried to group information as much as possible in different contexts to provide summarized information that can be easily checked manually.

In the main view, PDB chains are grouped by non-redundant alignments; shortening the list of alignments to check and making easier the comparison of the different PDB sequences associated to the same UniProtKB entry. Also PDB structures that are not solved on the same regions can be easily identified by grouping PDB reconstructed sequences. Subsequently, it helps identify most relevant structures for the description of specific sequence features.

In the framework of ligand binding site definition, ligand names can be defined for each individual isoform sequence or more globally for the UniProtKB entry (Figure 20-b). When defining ligand environments, individual environments are grouped by using the position of residues in the UniProtKB sequence. Through this necessary step, we take advantage of the multiple occurrences of structural ligand environment

to confirm non-redundant binding sites or to highlight possible alternative ligand binding sites.

Features automatically generated by SAALSA (sequence variations and residue modifications) also benefit from a simple grouping using the position of related residues in the UniProtKB sequence. Non-redundant structure identifiers where the residue modification or the sequence variation is observed are listed along with each annotation, with links to the corresponding alignment and PDB chain list. This feature allows us for example to detect easily structural evidences for alternative disulfide bonds.

2.4.2 Finding a reasonable equilibrium between manual and

Documents relatifs