• Aucun résultat trouvé

Introduction to applied bioinformatics

N/A
N/A
Protected

Academic year: 2022

Partager "Introduction to applied bioinformatics"

Copied!
35
0
0

Texte intégral

(1)

Introduction to applied bioinformatics

PETRA MATOUŠKOVÁ

2021/2022 4/10

(2)

Schedule update ?

X

Exam 5.4. or 12.4. ?

1 2 3 4

5 6 7 8 9 (10)

? (after Easter)

1 2 3 4 +5

5 6 7 8 9 (10)

15.3. 6+7 29.3. 8 Mon 4.4. 9 +10

(3)

„Protein bioinformatics III“

Retrieving protein sequences from databases (Uniprot: FASTA formate)

Computing amino-acids compositions, molecular weight, isoelectric point, and other parameters (SMS) Prediction of proteases cutting (PeptideCutter)

Predicting elements of protein secondary structure, signal peptide, transmembrane helix

Finding 3-D structure

Finding all proteins that share a similar sequence

Finding evolutionary relationships between proteins, drawing proteins’ family trees Computing the optimal alignment between two or more protein sequences

(4)

Pairwise alignment

Global alignment – aligns full length sequence

Local alignment – aligns part of the sequences that fit best (eg similar domains comparison, repetitive sequences…)

BLAST-part of the results is pairwise alignment

(5)

Needle

Pairwise alignment- Global

(6)

Pairwise alignment- Local

(7)

Try pairwise alignment.

(global and local)

Compare „ your“ sequence (human) with sequence from mouse (Mus musculus).

How similar are these proteins?

Practical part

(8)

Multiple sequence alignment (MSA)

=The alignment of more than two sequences

The goal of MSA is twofold:

Aligning corresponding regions of the sequences

Revealing positions that are conserved

The main steps to a useful MSA require

Choosing the right sequences

Choosing the right MSA method

Interpreting the alignment

(9)

„Evolution in a Nutshell“

Amino acids mutate randomly

Mutations are then selected (accepted) or counter-selected (rejected) If a mutation is harmful, it is counter-selected

It disappears from the genome

You never see it

Mutations of important positions (such as active sites) are almost always harmful You can recognize important positions because they never mutate!

MSAs reveal these conserved positions

(10)

An Example of Conserved Positions:

(The Serine Proteases Active Site)

Active Site

(11)

Using MSA:

(12)

Gathering Sequences with BLAST

The most convenient way to select your sequences for comparison is to use a BLAST server

➢Homework 3. : 4 ) Find and download five similar sequences.

(13)

Gathering Sequences with BLAST

➢Homework 3. : 4 ) Find and download five similar sequences.

(14)

Aligning Your Sequences

Aligning sequences correctly is very difficult

◦ It’s hard to align protein sequences with less than 25% identity (70% identity for DNA)

All methods are approximate

Alignment methods use the progressive algorithm

◦ Compares the sequences two by two

◦ Builds a guide tree

◦ Aligns the sequences in the order indicated by the tree

(15)

Alignment: MultAlin

http://multalin.toulouse.inra.fr/multalin/

Substitution matrix: PAM/BLOSUM

(16)
(17)

Alignment: NCBI/COBALT

(18)

Alignment: Clustal Omega

(*) conserved amino acids

(:) amino acids with similar size and hydrophobicity (.) amino acids with similar size or hydrophobicity

(19)

Try multiple alignment.

Download five similar sequences from different organisms. (H3)

Practical part

(20)

„advanced“ phylogeny analysis

(21)

„advanced“ phylogeny analysis

(22)

Try building the phylogeny tree using phylogeny.org

Compare the trees

Practical part

(23)

3-D protein structure: PDB

practical show

(24)

Try PDB.

Find out if your sequence has a 3D structure.

Practical part

(25)

Enzyme database: Brenda

(26)

Protein interactions

(27)

Protein interactions

(28)

Look into the specific databases

Does your protein have any interaction partners?

Is your protein an enzyme? Find E.C. (Hw)

(29)

„Protein bioinformatics III“

Retrieving protein sequences from databases (Uniprot: FASTA formate)

Computing amino-acids compositions, molecular weight, isoelectric point, and other parameters (SMS) Prediction of proteases cutting (PeptideCutter)

Predicting elements of protein secondary structure, signal peptide, transmembrane helix

Finding 3-D structure

Finding all proteins that share a similar sequence

Finding evolutionary relationships between proteins, drawing proteins’ family trees Computing the optimal alignment between two or more protein sequences

(30)

1) Compare your sequence with the „same“ sequence from mouse, how identical are they?

2) Prepare multiple alignment of the five sequences from Hw3, snip the phylogeny tree.

3) Does your sequence have any isoforms (search in Uniprot)? Align them.

4) Is there a 3D structure? Snip one figure.

5) Is your protein an enzyme? Find E.C.

E.g use „výstřižky“

➢Compile in „one note“ (or word, or pdf)

Homework 4

„snipping tool“

Work with your“ protein.

(31)

Homework 4:example

Porovní proběhlo v celém rozsahu obou proteinů.

(32)

Protein bioinformatics I-III

SUMMARY AND EXAMPLES

(33)

Ex1: DHRS7

Find two human DHRS7 sequences: DHRS7B (AAH09679.1) and DHRS7C (AAI47025.1) Run pairwise alignment. How identical are these two proteins?

(34)

PEx2: NQO1 isoforms

Find in Uniprot sequences of human NQO1 isoforms and align them.

How many isofroms are there?

Compare the output to description of each isoform, is it correct?

Hint: Uniprot ID: P15559, on the left sequence (3), download fasta formates and align them using eg.multalin, description is below each isoform (i2:140-173: Missing.)

(35)

Ex3: sequence identification

What is the proposed function of unknown protein? (Ex3 in study materials) What organism does it come from?

Does the „unknown sequence“ have any transmembrane helices?

Références

Documents relatifs

1) RQ1: We separated this RQ into two aspects. The first is whether the child nodes are sufficient conditions for parent nodes on the attack tree after the ISM

MRFalign showed better alignment precision and recall than HHalign and HMMER on a dataset of 60K non-redundant SCOP70 protein pairs with less than 40% identity with respect to

The 6.38 factor converting the nitrogen content determined by the Kjeldahl method in milk proteins, used for more than one century in all the international standards and recognised

Previously identified matrix pro- tein components can be divided into three characteristic groups as follows: (i) “egg white” proteins, which are also pres- ent in the eggshell,

In the example described in Section 2, all three described inconsistencies are related, they all can be fixed by either choosing fixing actions F1 or F2 as is apparent in Figure

Nous avons vécu des entretiens qui ont duré entre 30 et 60 minutes. Nous avons annoncé une durée de 45 minutes pour permettre aux enseignants de prendre le temps de répondre

Except in the case of fusion proteins, a PDB chain can be mapped to only one UniProtKB entry (in DR PDB lines). The growing amount of information coming from PDB makes

Indeed, it can be found that the leading regularization comes from the surface tension and so by keeping the dominant term g 00 of the curvature for t  1, the self-similar structure