• Aucun résultat trouvé

The PROSITE dictionary of sites and patterns in proteins, its current status

N/A
N/A
Protected

Academic year: 2022

Partager "The PROSITE dictionary of sites and patterns in proteins, its current status"

Copied!
8
0
0

Texte intégral

(1)

Article

Reference

The PROSITE dictionary of sites and patterns in proteins, its current status

BAIROCH, Amos Marc

BAIROCH, Amos Marc. The PROSITE dictionary of sites and patterns in proteins, its current status. Nucleic Acids Research , 1993, vol. 21, no. 13, p. 3097-103

DOI : 10.1093/nar/21.13.3097 PMID : 8332530

Available at:

http://archive-ouverte.unige.ch/unige:36853

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

The PROSITE dictionary of sites and patterns in proteins, its current status

Amos Bairoch

Department of Medical Biochemistry, University of Geneva, 1 rue Michel Servet, 1211 Geneva 4, Switzerland

BACKGROUND

PROSITE is a compilation of sites and patterns found in protein sequences; it can be used as a method of determining the function of uncharacterized proteins translated from genomic or cDNA sequences. In some cases the sequence of an unknown protein is too distantly related to any protein of known structure to detect its resemblance by overall sequence alignment, but relationships can be revealed by the occurrence in its sequence of a particular cluster of residue types which is variously known as a pattern, motif, signature, or fingerprint. These motifs arise because specific region(s) of a protein which may be important, for example, for their binding properties or for their enzymatic activity are conserved in both structure and sequence. These structural requirements impose very tight constraints on the evolution of these small but important portion(s) of a protein sequence. The use of protein sequence patterns to determine the function of proteins is becoming very rapidly one of the essential tools of sequence analysis. This reality has been recognized by many authors [1,2]. While there have been a number of reviews of published patterns [3,4,5], no attempt had been made until very recently [6,7] to systematically collect biologically significant patterns or to discover new ones. Based on these observations, we decided in 1988, to actively pursue the development of a database of patterns which would be used to search against sequences of unknown function. This database, called PROSITE, contains some patterns which have been published in the literature, but the majority have been developed in the last four years by the author.

LEADING CONCEPTS

The design of PROSITE follows four leading concepts:

Completeness. For such a compilation

to

be helpful in the determination of protein function, it is important that it contains as many biologically meaningful patterns

as

possible.

High specificity of the patterns. In the majority of

cases we

have chosen patterns that are specific enough that they do

not

detect

too

many unrelated sequences, yet they will detect most, if not all, sequences that clearly belong

to

the

set

in consideration.

Documentation. Each of the patterns is fully documented; the documentation includes

a

concise description of the protein family that it is designed

to

detect

as

well

as a

summary of the

reasons

leading

to

the development of the pattern.

Periodic reviewing. It is important that each pattern be periodically reviewed

to

insure that it is still valid.

FORMAT

The PROSITE database is composed of two ASCII (text) files.

The first file (PROSITE.DAT) is a computer-readable file that contains all the information necessary for programs that make use of PROSITE to scan sequence(s) for the occurrence of the patterns. This file also includes, for each of the patterns described, statistics on the number of hits obtained while scanning for that pattern in the SWISS-PROT protein sequence data bank [8].

Cross-references to the corresponding SWISS-PROT entries are also present in the file. The second file (PROSITE.DOC), which we call the textbook, contains textual information that documents each pattern. A user manual (PROSUSER.TXT) is distributed with the database; it fully describes the format of both files. A sample textbook entry is shown in Figure 1 with the corresponding data from the pattern file.

CONTENT OF THE CURRENT RELEASE

Release 10.1 of PROSITE (April 1993) contains 635 documentation entries describing 803 different patterns. The list of these entries is provided in Appendix 1. The database requires about 2 Mb of disk storage space.

COMPUTER PROGRAMS THAT MAKE USE OF PROSITE

Many academic groups and commercial companies have developed computer programs that make use of PROSITE. We list here some of these programs (a fuill descriptive list is included with the database and is stored in a file called 'PROSITE.PRG').

ACADEMIC

Program Operatingsystem Author

MacPattern prosite.c ProSearch cregex dbsite/mksite PROINDEX Quelsite Scrutineer PATTERN PIP andPIPI PATMAT PROTOMAT

AppleMcIntosh IBM3090400Eand Unix Unix andDOS(AWK) Unix and DOS Unix VAX VMS VAX VMS

VAX VMSand Unix Unix

VAXVMS and Unix OS and Unix OS andUnix

Rainer Fuchs[9]

KlausHartmuth LeeKolakowski [10]

Jack Leunissen J.-M. Claverie SteveClark ClaudeValencien PeterSibbald[11]

Olivier Boulot Roger Staden[12]

StevenHenikoff[13]

StevenHenikoff[14]

(3)

3098 Nucleic Acids Research, 1993, Vol. 21, No. 13

COMMERCIAL

Program Package Supplier Operatng system

MOTIF GCG Genetics Comp. Group VaxVMSandUnix

QUEST IG-Suite IntelliGenetics VaxVMS and Unix

PROMOT OML Vax VMSand Unix

PROSITE PC/Gene IntelliGenetics DOS

PROSITE GeneWorks IntelliGenetics AppleMcIntosh

Protean LaserGene DNASTAR Apple McIntosh

PROTSITE National Biosciences DOS

SEQ/Pattern ProExplore Biostructure Unix

FUTURE DEVELOPMENTS

There are a number of protein families as well as functional or structural domains that cannot be detected using patterns due to their extreme sequence divergence. Typical examples of important functional domains which are weakly conserved are the immunoglobulin domains, the SH2 and SH3 domains, and the fibronectin type I domain. In such domains there

are

only a few sequence positions which are well conserved. Any attempt to build a consensus pattern for such regions will either fail to pick up a significant proportion of the protein sequences that contain such a region (false negatives) or will pick up too many proteins that do not contain the region (false positives).

Techniques based on the use of weight matrices [15,16,17] allows the detection of such proteins or domains. We plan to collaborate closely with Dr P. Bucher of the Swiss Cancer Research Institute (ISREC) in Lausanne and with Dr T.K. Attwood of the University of Leeds to develop such methods and

to

integrate them into PROSITE.

HOW TO OBTAIN PROSITE

PROSrTE is distributed on magnetic tape and

on

CD-ROM by the EMBL Data Library. For all enquiries regarding the subscription and distribution of PROSITE

one

should

contact:

EMBL Data Library

European Molecular Biology Laboratory Postfach 10.2209, Meyerhofstrasse 1 6900 Heidelberg, Germany

Telephone: (+49 6221) 387 258

Telefax: (+49 6221) 387 519 or 387 306

Electronic network address: [email protected] PROSITE can be obtained from the EMBL File Server [18].

Detailed instructions on how to make the best use of this service, and in particular on how to obtain PROSITE, can be obtained by sending to the network address [email protected] the following message:

HELP

HELP PROSITE

If you have access to a computer system linked to the Internet you can obtain PROSITE using FTP (File Transfer Protocol), from the following file servers:

EMBL anonymous FTP server

Internet address: fip.EMBL-heidelberg.de (or 192.54.41.33)

NCBI Repository (National Library of Medicine, NIH, Washington D.C., U.S.A.)

Internet address: ncbi.nlm.nih.gov (130.14.20.1)

Basel Biozentrum Biocomputing

server

(EMBnet SWISS node) Internet address: bioftp. unibas. ch (or 131.152.8.1)

ExPASy (Expert Protein Analysis System server, University of Geneva, Switzerland)

Internet address: expasy.hcuge.ch (129.195.254.61) National Institute of Genetics (Japan) FTP

server

Internet address: ftp.nig.ac.jp (133.39.16.66)

You

can

also browse through PROSITE using various Internet Gopher servers that specialize in biosciences (biogophers) [19].

Gopher is a distributed document delivery service that allows

a

neophyte

user to access

various types of data residing

on

multiple hosts in a seamless fashion.

The present distribution frequency is four releases per year.

No restrictions are placed on use or redistribution of the data.

REFERENCES

1. Doolittle R.F.(In)OfURFs andORFs: aprimeronhow toanalyzederived amino acid sequences.,University Science Books, MillValley,California, (1986).

2. Lesk A.M. (In) Computational Molecular Biology, Lesk A.M., Ed., ppl7-26, Oxford University Press, Oxford (1988).

3. Barker W.C., Hunt T.L., George D.G. Protein Seq. Data Anal.

1:363-373(1988).

4. HodgmanT.C.Comput. Appl. Biosci.5:1-13(1989).

5. Taylor W.R.,Jones D.T. Curr. Opin. Struct. Biol. 1:327-333(1991).

6. Bork P. FEBSLett. 257:191-195(1989).

7. SmithH.O., Annau T.M.,ChandrasegaranS. Proc.Natl.Acad. Sci.USA 87:826-830(1990).

8. Bairoch A., BoeckmannB. Nucleic AcidsRes.20:2019-2022(1992).

9. FuchsR. Comput. Appl. Biosci. 7:105-106(1991).

10. Kolakowski L.F. Jr., Leunissen J.A.M., Smith J.E. Biotechniques 13:919-921(1992).

11. Sibbald P.R., Sommerfeldt H., Argos P. Comput. Appl. Biosci.

7:535-536(1991).

12. StadenR. DNASequence 1:369-374(1991).

13. Wallace J.C., Henikoff S. Comput. Appl. Biosci. 8:249-254(1992).

14. Henikoff S., Henikoff J. Nucleic Acids Res. 19:6565-6572(1991).

15. Staden R. Nucleic Acids Res. 12:505-519(1984).

16. GribskovM.,Luethy R., Eisenberg D. Meth. Enzymol. 183:146-159(1990).

17. AttwoodT.K.,Eliopoulos E.E., Findlay J.B.C. Gene98:153-159(1991).

18. StoehrP.J., Omond R.A. Nucleic AcidsRes. 17:6763-6764(1989).

19. GilbertD. Trends Biochem. Sci. 18:107-108(1993).

(4)

a

(PDOC00107)

(PS00116; DNA POLYNERASEB)

(BEGIN) -

DNA otzerisef_1tySslrnature*

Replicative DNA polymerases (EC2.7.7.7) arethe keyenzymes cataLyzingthe accurate replication ofDNA. Theyrequireeither a sut RUA moleculeor a protein as aprimr for the denovosynthesisof aDNA chain. On thebasisof

sequence sImIlaritiesa nuwerofDNApolymera have been grouped together Cl to7] under the designation ofDNApolywrase fmilyB. Thepolyerases that belong to thIs

faellymare:

- Higher eukaryotespolymfrases alpha.

- Higher aukaryotespolymerises delta.

- Yeast polywrise I/alpha (gene POLI), polyerase Il/epsilon(gene POL2), polyerae" ll/delta (gene POL3) andpolyeraseREV3.

-EScherichia coli polymerase ll

(gene

diAor polC).

Polymerses

of viruses framthe

herpesviridie

family.

-

Polywraes

framAdenoviruses.

Polymerases fromBsculoviruses.

Polywrases

fromChlorella virutes.

Polywrases

fromPoxvlruses.

Bacteriophage

T4

polywrase.

Podoviridie

becteriophages Phi-29, M2, andPZApolywrase.

Tectiviridae

bacteriophage PRDI polymerase.

Polymerases encoded on

eukiryotic

linearDNA plasmidssuchas

lulyveromyces

lactis pGKL1 andpGKL2 AgaricusbitorquispEN, Ascobolus imersuspA 2, Cltvicepspurpuree pCLKI andMizeS-1.

Six regionsof similarity (nuileredfrom I to VI) are found in all or a subset of theabovepolymerises. The ost conserved region (I) includes a perfectly conserved tetr tide which contains twoispartate residues. The function of this consrvi region Isnot yet known, however it has been suggested C3l that it my be involved in bindingamagnsiu Ion. Weusethis conserved regionas asignature for thisfmilyofDNApolymerses.

-Consensus pattern: EYA]-

EGLIVNSTAC]

-D-T-D-

ESG]- ELIVNFTC]

-x-

CLIVNSTACI

Sequences known tobelongto this classdetected by thepttern: ALL, except foryeastpolywrase II/epsilon nd Agaricusbitorquis MEN.

-Other sequence(s) detected In SWISS-PROT: chicken

vitalI1

ogenin2.

-Last update: Nay1993 /Patternand text revised.

E11 Jung

G.,

Leavitt N.C., Hsieh J.-C. Ito J.

Proc. IltI. Aced. Sci. U.S.A. 84:8o87-8291C1987).

E2] BernadA., ZabillosA. SilasN., BlencoL.

ENBO J. 6:4219-4225(1967).

C3] Argos P.

Mucleic AcidsRes. 16:9909-9916(1968).

E4] Wang T.S.-F. Wong S.W., KornD.

FASEB J. 3:14-21(0989).

C 5] Delarue N., Poch 0., Todro N., Moras D., ArgosP.

ProteinEngineering 3:461-467(1990).

C 6] Ito J., BraithwaiteD.K.

Nucleic AcidsRes. 19:4045-4057(1991).

n7 BraithwaiteD.K., Ito J.

Nucleic Acids Res. 21:787-802(1993).

(END)

b

ACID DTDE PANR NR NRCC DR

DR

DRDR DRDR DRDR DRDR DR DRDR DRDR DRDO

DO

DNA POLYNERASE-B; PATTERN.

PS0O 16-

APR-1996 (CREATED)- MAY-1993 (DATAUPDATE); AY-1993 (INFO UPDATE).

DNA poltmrisefamIly Ssa ture.

CYA-CGLIVNSTAC -D-T-D-CSG]-CLIVNFTC]-x-CL VNSTAC].

/RELEASEn24 28154*

/TOTAL42(4fi-

/P6SITIVE=41(41); /UNKNOWN0(0); /FALSEPOS=1(1);

/FALSE NEGu2(2)

/TAXO-ANMGEAEV-/AX-REPEATu1-

P26019, DP0 DR0NM, T; P0968, Dh T P280, D SCNPO, T

P27727, DOARYU, T; P13382, DPOAYEAST, T P28339, DmDmOVIN, P28340, DPDNUNAN P15436,DPODEAST, P14284 D EST T- P21189, DPOZECOLI, T- P26811, DPOL-ULSO, P03261 DPOL-ADE2: T-

P04495, DPOLADE05, T; DPOLADE07, T P06538, DPOLADEI2, T-

P03196, DPOL-ESV, T; , DPOL-NCNVA, T P04293, DPOLNSV11, T- P07917, DPOL NSVIA, P04292, DPOL NSV1K, ;P04, DPOLNSV1S, T;

P07918, DPOLNSV21, T; P2887, DPOLNSV6U, T; P , DPOL-NSVES, T;- P2M59, DPOL NSVI1, T; P24907, DPOLNSVSA, T; P09252, DPOLTZVD, 7;

P094, DPOIPLULA,T P05468, DPO2kLULA, T; P2233, DPOLCAU,

P10582, DPOL)IIZE, T; P18131, DPOLPVAC, P20509, DPOLVACCC, T;

P06856, DPOL VACCV, T- P21402, DPOL FOWPV, T 0, DPOLDPN2, T;

P06950, DPOL-BPPZA, T; P19694, DPOL2, P10479, DPOL RD, T;

P04415, DPOL-BPT4,T,

P22374, DPOL ASCIN, N; P21951, DPOE YEAST, N;

P02845 VIT2:CHICK, F;

PDOCO0O07; -

Figure1. Sampledata from PROSITE.a)Adocumentation (textbook) entry. b) Thecorresponding entryin thepatternfile.

Appendix 1. List ofpatternsdocumentationentries in release 10.1 ofPROS1TE Post-translational modifications

N-glycosylation site

Glycosaminoglycanattachment site Tyrosine sulfatation site

cAMP- andcGMP-dependent proteinkinasephosphorylation site Protein kinase Cphosphorylation site

Casein kinase IIphosphorylationsite Tyrosinekinase phosphorylationsite N-myristoylation site

Amidation site

Aspartic acidandasparaginehydroxylationsite VitaminK-dependentcarboxylation domain Phosphopantetheineattachment site

Prokaryoticmembranelipoprotein lipidattachment site Prokaryotic N-terminalmethylation site

Farnesylgroupbindingsite (CAAX box) Domains

Endoplasmicreticulumtargeting sequence Microbodies C-terminaltargeting signal

Gram-positivecocci surfaceproteins 'anchoring' hexapeptide Bipartitenucleartargetingsequence

Cell attachment sequence

ATP/GTP-binding site motif A(P-loop) EF-handcalcium-binding domain

Actinin-typeactin-binding domainsignatures Cofilin/tropomyosin-typeactin-binding domain Apple domain

Band 4.1family domain signatures Kringledomainsignature

EGF-likedomaincysteinepattern signature

Fibrinogenbeta and gammachainsC-terminaldomainsignature TypeIIfibronectincollagen-bindingdomain

Hemopexin domainsignature C-type lectin domainsignature Osteonectindomainsignatures Somatomedin B domainsignature Thyroglobulin type-1 repeatsignature 'Trefoil'domainsignature

Cellulose-binding domain,bacterialtype Cellulose-bindingdomain, fungaltype Chitinrecognitionorbinding domainsignature Barwin domainsignatures

WAP-type 'four-disulfide core' domainsignature Phorbolesters/diacylglycerolbinding domain C2 domainsignature

ZPdomainsignature

DNAorRNAassociatedproteins 'Homeobox' domainsignature

'Homeobox' antennapedia-typeprotein signature 'Homeobox' engrailed-typeprotein signature 'Paired box' domainsignature

'POU' domainsignatures Zincfinger, C2H2type, domain Zincfinger, C3HC4type, signature

Nuclear hormonesreceptorsDNA-bindingregion signature GATA-typezincfingerdomain

Poly(ADP-ribose) polymerasezinc fingerdomain FungalZn(2)-Cys(6)binuclear cluster domain

(5)

3100 Nucleic Acids Research, 1993, Vol. 21, No. 13

Appendix 1. continued Leucinezipper pattern

Fos/junDNA-binding basic domainsignature MybDNA-binding domainrepeatsignatures

Myc-type, 'helix-loop-helix' putative DNA-binding domainsignature p53tumorantigen signature

CBF/NF-Y subunits signatures

'Cold-shock' DNA-binding domain signature CTF/NF-Isignature

Ets-domain signatures Forkheaddomain signatures

HSF-typeDNA-binding domainsignature IRFfamilysignature

LIMdomain

SRF-typetranscription factorsDNA-binding and dimerization domain TEAdomainsignature

Transcription factor TFIIBrepeat signature Transcription factorTFIID repeatsignature TFIIS cysteine-richdomainsignature

DEADandDEAHbox familiesATP-dependenthelicasessignature EukaryoticputativeRNA-binding regionRNP-1 signature Fibrillarinsignature

XPACprotein signatures

Bacterial

regulaory

proteins, araC family signature Bacterialregulatoryproteins,asnC familysignature Bacterialregulatoryproteins,crpfamilysignature Bacterialregulatoryproteins, gntRfamilysignature Bacterialregulatoryproteins, laclfamily signature Bacterialregulatoryproteins,luxRfamily signature Bacterialregulatoryproteins, lysRfamilysignature Bacterialregulatoryproteins, merRfamilysignature Transcriptional antiterminatorsbglGfamilysignature Sigma-54 factorsfamily signatures

Sigma-70 factorsfamilysignaus Sigma-54 interaction domainsignatures Single-strandbinding proteinfamilysignatures Bacterial histone-likeDNA-bindingproteins signature Histone H2Asignature

Histone H2Bsignature Histone H3signature HistoneH4signature HMGI/2signature

HMG-I and HMG-Y DNA-bindingdomain(AT-hook) HMG14andHMG17signature

Bromodomain Chromodomain

Regulator ofchromosomecondensationsignatures Protamine P1 signature

Nucleartransition protein1 signature Ribosomalprotein L2signature RibosomalproteinL3

signature

RibosomalproteinL5 signature RibosomalproteinL6signatures Ribosomalprotein L9signature RibosomalproteinLii signature Ribosomalprotein L13signaure RibosomalproteinL14signature Ribosomal protein L15signature Ribosomal protein L16signatures Ribosomal proteinL22signature RibosomalproteinL23signature RibosomalproteinL29signature RibosomalproteinL30signature Ribosomal proteinL33 signature Ribosomalprotein L34

signature

RibosomalproteinL19esignature Ribosomalprotein L30e

signaure

RibosomalproteinL32esignature Ribosomal proteinL46e signature Ribosomalprotein S3 signatures RibosomalproteinS4signature RibosomalproteinS5signature Ribosomalprotein S7 signature RibosomalproteinS8 signature Ribosomalprotein S9 signature RibosomalproteinS1Osignature

Ribosomalprotein SII signature Ribosomal proteinS12signature Ribosomal protein S13signature Ribosomalprotein S14signature Ribosomalprotein S15 signature RibosomalproteinS16signature RibosomalproteinS17signature RibosomalproteinS18signature Ribosomalprotein Si9 signature Ribosomal protein S4esignaure Ribosomalprotein S6esignature Ribosomal protein S17esignature Ribosomal proteinS19esignature Ribosomal protein S24e signature Ribosomalprotein S26esignature

DNAmismatch repair proteinsmutL/hexB/PMSI signature DNAmismatchrepair proteinsmutSfamily signature RecFproteinsignatures

Small, acid-soluble sporeproteins,alpha/betatype, signatures

Enzymes

Oxidoreductases

Zinc-containingalcoholdehydrogenases signature Iron-containing alcohol dehydrogenases signature Short-chain alcoholdehydrogenase family signature Aldo/keto reductasefamily signatures

Histidinol dehydrogenase activesite L-lactate dehydrogenase activesite

D-isomerspecific2-hydroxyaciddehydrogenases signatures Hydroxymethylglutaryl-coenzyme Areductasessignatures 3-hydroxyacyl-CoAdehydrogenasesignature

Malatedehydrogenase active sitesignature Malicenzymessignature

Isocitrate andisopropyimalate dehydrogenasessignature 6-phosphogluconate dehydrogenasesignature

Glucose-6-phosphate dehydrogenase active site IMPdehydrogenase/GMP reductase signature Bacterialquinoprotein dehydrogenases signatures

FMN-dependent alpha-hydroxy acid dehydrogenases active site GMC oxidoreductasessignatures

Eukaryoticmolybdopterin oxidoreductases signature Prokaryoticmolybdopterin oxidoreductasessignatures Aldehydedehydrogenasesactive sites

Glyceraldehyde 3-phosphatedehydrogenase active site Fumaratereductase/succinatedehydrogenaseFAD-binding site Acyl-CoAdehydrogenasessignatues

Glutamate/Leucine/Phenylalanine dehydrogenases active site D-aminoacid oxidase signature

Delta I-pyrroline-5-carboxylate reductasesignature Dihydrofolate reductase signature

Tetrahydrofolate dehydrogene/cyclohydrolase signatures Pyridine nucleotide-disulphide oxidoreductases class-I activesite Pyridinenucleotide-disulphide oxidoreductases class-II active site Respiratory-chainNADHdehydrogenase subunit 1 signatures Respiratory-chainNADHdehydrogenase 30Kdsubunit signature Respiratory-chainNADHdehydrogenase 49Kdsubunitsignature Respiratory-chainNADHdehydrogenase51 Kdsubunitsignatures Respiratory-chain NADHdehydrogenase 75 Kd subunitsignatures Nitritereductasesandsulfite reductase putative siroheme-binding sites Uricasesignature

CytochromecoxidasesubunitI, copper Bbinding regionsignature Cytochromecoxidase subunit II, copper A binding region signature Multicopperoxidases signatures

Peroxidasessignatures Catalasesignatures

Glutathioneperoxidases signatures

Lipoxygenases,putativeiron-bindingregionsignatures Extradiolring-cleavage dioxygenasessignature Intradiol ring-cleavagedioxygenasessignature

Bacterialring hydroxylatingdioxygenasesalpha-subunit signature Bacterialluciferase subunitssignatre

Biopterin-dependentaromaticamino acid hydroxylases signature CoppertypeII, ascorbate-dependentmonooxygenases signatures Tyrosinasesignatures

Fattyacid desaturasessignatures

(6)

Cytochrome P450 cysteine heme-ironligand signature Hemeoxygenasesignature

Copper/Zinc superoxidedismutase signatures Manganese andiron superoxide dismutases signature Ribonucleotide reductase large subunit signature Ribonucleotide reductasesmallsubunit signature

Nitrogenases component 1 alpha and beta subunits signatures NifH/frxC family signatures

Nickel-dependent hydrogenases large subunit signatures Glutamyl-tRNA reductasesignature

Transferases

Thymidylate synthaseactive site

Methylated-DNA--protein-cysteine methyltransferaseactive site N-6 Adenine-specific DNA methylases signature

N4cytosine-specific DNA methylases signature C-5 cytosine-specific DNA methylases signatures

Serine hydroxymethyltransferase pyridoxal-phosphate attachment site Phosphoribosylglycinamide formyltransferase active site

Aspartate andornithinecarbamoyltransferases signature Transketolase signatures

Acyltransferases ChoActase/COT/CPT-II family signatures Thiolases signatures

Chloramphenicol acetyltransferase activesite cysE/lacA/nodL acetyltransferases signature Beta-ketoacyl synthases active site Chalcone and stilbene synthasesactivesite Gamma-glutamyltranspeptidase signature Transglutaminases active site

Phosphorylasepyridoxal-phosphate attachmentsite UDP-glucoronosyl andUDP-glucosyl transferases signature Purine/pyrimidine phosphoribosyl transferases signature Glutamine amidotransferases class-I active site Glutamine amidotransferasesclass-Il active site Thymidine phosphorylase signature

S-Adenosylmethionine synthetase signatures Polyprenyl synthetases signatures

Riboflavinsynthase alphachainfamily Lum-binding sitesignature Dihydropteroate synthase signatures

EPSPsynthase active site

Aspartateaminotransferases pyridoxal-phosphate attachment site Aminotransferasesclass-Ilpyridoxal-phosphateattachment site Aminotransferases class-IIIpyridoxal-phosphateattachment site Aminotransferases class-IV signature

Phosphoserine aminotransferasesignature Hexokinasessignature

Galactokinasesignature

GHMP kinasesputative ATP-bindingdomain Phosphofructokinase signature

pfkB family of carbohydratekinasessignatures Phosphoribulokinase signature

Thymidinekinasecellular-type signature Prokaryotic carbohydrate kinasessignature Protein kinasessignatures

Pyruvatekinase active sitesignature Phosphoglycerate kinase signature Aspartokinasesignature

ATP:guanidophosphotransferasesactive site PTSHprcomponentphosphorylation sitessignatures PTS permeasesphosphorylation sitessignatures Adenylate kinase signature

Nucleosidediphosphatekinasesactivesite Phosphoribosylpyrophosphate synthetase signature

7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase signature Bacteriophage-typeRNApolymerase familyactive sitesignature EukaryoticRNApolymeraseIIheptapeptiderepeat

EukaryoticRNApolymerases30to40 Kdsubunits signature DNApolymerase familyA signature

DNApolymerase familyB signature DNA polymerasefamilyXsignature

Galactose-l-phosphate uridyltransferase active site

signature

CDP-alcoholphosphatidyltransferases signature

PEP-utilizingenzymessignatures Rhodanesesignatures

Hydrolases

PhospholipaseA2 active sitessignatures Lipases, serine active site

Colipasesignature

Carboxylesterases type-Bactive site Pectinesterase signatures

Alkaline phosphatase active site Histidine acid phosphatases signatures 5'-nucleotidase signatures

Fructose-1-6-bisphosphataseactive site

Serine/threonine specific proteinphosphatasessignature Tyrosine specific protein phosphatases active site Inositol monophosphatasefamily signatures

Prokaryotic zinc-dependent phospholipase C signature 3'5'-cyclic nucleotide phosphodiesterases signature cAMPphosphodiesterases class-Il signature Sulfatasessignatures

AP endonucleases family 1 signatures APendonucleases family 2 signatures

Endonuclease HI iron-sulfur binding region signature RibonucleaseIIIfamily signature

Bacterial Ribonuclease P protein component signature Ribonuclease T2 family histidine active sites Pancreaticribonuclease family signature Beta-amylase signatures

Polygalacturonase active site

Clostridium cellulasesrepeateddomain signature Chitinasesclass Isignatures

Alpha-lactalbumin/lysozyme C signature Alpha-galactosidase signature

Alpha-L-fucosidase putative active site Glycosyl hydrolases family 1 signatures Glycosyl hydrolases family2signatures Glycosyl hydrolases family 3activesite Glycosyl hydrolases family 5signature Glycosyl hydrolases family 6 signatures

Glycosyl hydrolases family 9 active sites signatures Glycosylhydrolases family 10active site

Glycosyl hydrolases family 11 active sitesignatures Glycosyl hydrolases family 17signature

Glycosyl hydrolases family31 signatures Glycosyl hydrolases family 32 active site Alkylbase DNA glycosidases aLkA family signature Uracil-DNA glycosylase signature

S-adenosyl-L-homocysteine hydrolase signatures Cytosol aminopeptidase signature

Aminopeptidase Pandproline dipeptidase signature Methionineaminopeptidasesignature

Serinecarboxypeptidases, active sites

Zinccarboxypeptidases, zinc-binding regions signatures Serineproteases,trypsin family, active sites

Serineproteases, subtilasefamily, active sites Serineproteases,V8family, active sites Prolyl endopeptidase family serineactive site ClpPproteasesactive sites

Eukaryoticthiol(cysteine) proteases active sites

Ubiquitincarboxyl-terminal hydrolase, putativeactive-site signature Eukaryotic aspartylproteasesactive site

Neutral zincmetallopeptidases, zinc-binding region signature Matrixins cysteineswitch

Insulinase family signature recAsignature

Proteasomesubunitssignature Signal peptidasesIsignatures Amidases signature

Asparaginase/glutaminaseactivesite Ureaseactive site

ArgE/dapE/CPG2family signatures Dihydroorotasesignatures

Beta-lactamases classes-A, -C, and -Dactive site Beta-lactamases classBsignatures

Arginaseand agmatinase signatures Adenosine and AMP deaminasesignature Inorganic pyrophosphatasesignature Acylphosphatasesignatures

ATPsynthasealphaand beta subunits

signature

ATP synthase gamma subunit signature ATPsynthasedelta(OSCP) subunitsignature

ATPsynthaseasubunitsignature

ATPsynthasecsubunitsignature

(7)

3102 Nucleic Acids Research, 1993, Vol. 21, No. 13

Appendix1. confinued

E1-E2ATPasesphosphorylation site

Sodium andpotassiumATPases beta subunits signatus Cutinase, serine active site

Lyases

DDC/GAD/HDCpyridoxal-phosphate attachmentsite

Prokaryotic

orihine and

lysine

decarboxylases

pyridoxal-ps

eattachmentsite Orotidine5'-phosphatedecarboxylase active site

Phosphoenolpyruvatecarboxylase active sites Phosphoenolpyruvatecarboxykinase (GTP) signature Phosphoenolpyruvatecarboxykinase (ATP)signature Indole-3-glycerol phosphate synthasesignature

Ribulosebisphosphatecarboxylase large chain active site Fructose-bisphosphate aldolase class-I active site Fructose-bisphosphate aldolase class-H signatures Malate synthase signature

Citratesynthasesignature

KDPG and KHG aldolasesactive sitesignatures Isocitratelyasesignature

DNAphotolyases signatures

Eukaryotic-type carbonicanhydrases signature Prokaryotic-typecarbonicanhydrases signatures Fumaratelyases signature

Aconitasefamily signature Enolasesignature

Serine/threonine dehydratasespyridoxal-phosphateattachment site Enoyl-CoAhydratase/isomerasesignature

Tryptophansynthase alpha chainsignature

Tryptophansynthase beta chainpyridoxal-phosphateattachment site Delta-aminolevulinic aciddehydratase active site

Dihydrodipicolinate synthetase signatures

Phenylalanine and histidineammonia-lyasessignature Porphobilinogendeaminase cofactor-bindingsite Guanylatecyclases signature

Chorismate synthase signatures Ferrochelatasesignature Isomerases

Alanineracemasepyridoxal-phosphate attachmentsite Aldose I-epimeraseputativeactive site

Cyclophilin-typepeptidyl-prolyl cis-trans isomerase signature FKBP-typepeptidyl-prolyl cis-trans isomerase signatures Triosephosphate isomerase active site

Xylose isomerasesignatures Phosphoglucose isomerase signatures

Phosphoglyceratemutasefamily phosphohistidine signature

Phosphoglucomutase &phosphomannomutase phosphohistidinesignature Methylmalonyl-CoAmutasesignature

EukaryoticDNAtopoisomeraseIactive site ProkaryoticDNAtopoisomeraseIactive site DNAtopoisomeraseIIsignature

Amioacyl-ueansfer

RNAsynthetasesclass-Isigntr Aminoacyl-transferRNAsynthetasesclass-I signatures WHEP-TRSdomainsignature

ATP-citratelyaseandsuccinyl-CoAligases active site Glutamine synthetase signatures

Ubiquitin-activating enzymesignature Ubiquitin-conjugatingenzymesactive site Formate--tetrahydrofolate ligasesignatures Adenylosuccinate synthetase activesite Argininosuccinate synthase signatures Phosphoribosylglycinamide synthetase signature ATP-dependentDNAligasesignatures Others

sopenicillinN synthetase

signatures

Site-specific recombinases

signatues

Thiamine pyrophosphateenzymessignature Biotin-requiringenzymes attachment site

2-oxoaciddehydrogenasesacyltransferasecomponentlipoyl binding site PutativeAMP-bindingdomainsignature

Ekctron transportproteins

Cytochromecfamilyheme-binding sitesignature Cytochromeb5family, heme-binding domainsignatue

Cytochromeb/b6signatures

Cytochrome b559 subunitsheme-binding site signature Thioredoxinfamilyactivesite

Glutaredoxinactive site

Type-l copper(blue)proteins signature

2Fe-2Sferredoxins, iron-sulfurbinding region signature 4Fe-4S ferredoxins, iron-sulfurbinding region signature Highpotential iron-sulfurproteins signature

Rieske iron-sulfur protein signatures Flavodoxinsignature

Rubredoxinsignature

Electron transferflavoproteinalpha-subunit signature Other transportproteins

ClassImetallothioneinssignature Ferritin iron-binding regions signatures Bacterioferritin signature

Transfeffins signatures Planthemoglobins signature Hemerythrinssignature

Arthropodhemocyanins/insect LSPs signatures ATP-bindingproteins 'active transport' family signature

Binding-protein-ependent

transport systems umer membranecomponent

signatue

Serum albuminfamily signature

Transthyretin signatures

Avidin/Streptavidin family signature

Eukaryoticcobalamin-binding proteins signature

Lipocalin

signature

Cytosolicfatty-acidbinding proteinssignature LBP/BPI/CETPfamily signature

Plantlipidtransferproteins signature Uteroglobin family signatures

Mitochondrialenergy transferproteinssignature Sugartransportproteins signatures

Sodium symporterssignatures

Prokaryotic sulfate- andthiosulfate-binding proteins signatures Aminoacidpenneases signature

Aromatic amino acidspermeasessignature Anion exchangersfamilysignatrs

BacterialproteinexportpilTproteinfamily signature GltP/dctAfamilyoftransporterssignatures MIPfamily signature

Neurotransmitters transporterssignatures Generaldiffusiongram-negative porins signature Eukaryoticporinsignature

Insulin-like growth factor binding proteins signature Strctura proteins

43Kdpostsynapticprotein signature Actinssignatures

Annexinsrepeated domainsignature Clathrinlightchainssignatures Clusterinsignatures

Connexinssignatures

Crystallinsbeta and gamma 'Greekkey'motifsignature Dynaminfamily signature

Intermediate filaments signature Involucrinsignature

Kinesinmotordomain signature Myelinbasicproteinsignature MyelinP0proteinsignature Myelin proteolipid proteinsignature Neuromodulin(GAP-43) signatures Profilinsignature

SurfactantassociatedpolypeptideSP-C palmitoylation sites Synapsins signatures

Synaptobrevinsignature

Synaptophysin/synaptoporinsignature Tropomyosins signature

Tubulinsubunits alpha, beta, and gammasignature Tubulin-betamRNAautoregulationsignal Tauand MAPproteinsrepeatedregionsignaure Neuraxin andMAP1Bproteins repeated region signature F-actincappingprotein alpha subunitsignatures F-actincapping proteinbetasubunit

signature

Vinculin family signatures

(8)

Amyloidogenicglycoprotein signatures

Cadherins extracellular repeateddomain signature Insectflexible cuticleproteins signature Gasvesiclesprotein GVPa signatures

Gas vesiclesproteinGVPc repeated domainsignature Flagella basal body rod proteins signature

Plant virusesicosahedral capsid proteins 'S'regionsignature Potexviruses and carlaviruses coatprotein signature Receptors

Neurotransmitter-gated ion-channels signature G-proteincoupledreceptors signature G-protein coupledreceptors family 2 signatures Visual pigments (opsins) retinalbinding site Bacterial rhodopsinsretinalbinding site Receptor tyrosinekinase class II signature Receptortyrosinekinaseclass III signature Receptortyrosinekinaseclass Vsignatures

Growth factor andcytokinesreceptorsfamilysignatures TNFR/NGFR family cysteine-rich region signature Integrins alpha chain signature

Integrins betachaincysteine-rich domain signature Natriuretic peptidesreceptors signature

Photosynthetic reaction center proteins signature PhotosystemIpsaA and psaBproteins signature Phytochrome chromophore attachmentsite Speractreceptorrepeated domain signature TonB-dependent receptor proteins signature Type-Ilmembrane antigens family signature Bacterialchemotaxis sensorytransducers signature Cytokines and growth factors

Granulinssignature HBGF/FGFfamily signature

PTN/MKheparin-binding protein family signatures Nervegrowthfactor familysignature

Platelet-derivedgrowth factor(PDGF)family signature Small cytokines(intercrine/chemokine) signatures TGF-betafamily signature

TNFfamily signature Wnt-l family signature

Interferon alphaand betafamilysignature

Granulocyte-macrophage colony-stimulating factor signature Interleukin-l signature

Interleukin-2 signature

Interleukin-6/G-CSF/MGF family signature Interleukin-7 signature

Interleukin-10 signature LIF/OSMfamily signature Hormonesandactivepeptides Adipokinetichormonefamily signature Bombesin-likepeptides family signature Calcitonin/CGRP/IAPP family signature Corticotropin-releasing factorfamily signature Graninssignatures

Gastrin/cholecystokinin familysignature Glucagon/GIP/secretin/VIPfamily signature Glycoproteinhormonesalpha chain signatures Glycoprotein hormonesbetachainsignatures Gonadotropin-releasing hormones signature Insulinfamily signature

Natriuretic peptidessignature Neurohypophysial hormones signature Pancreatichormonefamilysignature Parathyroid hormone family signature Pyrokininssignature

Somatotropin,prolactinand relatedhormonessignatures Tachykininfamilysignature

Thymosinbeta-4family signature Cecropin familysignature Mammaliandefensinssignature Insectdefensinssignature Endothelins/sarafotoxins signature Toxins

Plantthionins signature Snaketoxinssignature

Myotoxinssignature

Heat-stableenterotoxinssignature Aerolysintype toxins signature

Shiga/ricin ribosomalinactivatingtoxinsactive site signature Channelforming colicins signature

Hok/gef family cell toxic proteins signature

Staphyloccocal enterotoxins/Streptococcal pyrogenic exotoxinssignatures Thiol-activated cytolysins signature

Membraneattack complexcomponents/perforinsignature Inhibitors

Pancreatic

trypsin

inhibitor(Kunitz) family signature Bowman-Birk serineproteaseinhibitors family signature Kazalserine protease inhibitors family signature

Soybean

trypsin

inhibitor(Kunitz)proteaseinhibitorsfamilysignature Serpins signature

Potatoinhibitor Ifamily signature

Squash family of serineproteaseinhibitorssignature Cysteineproteasesinhibitors signature

Tissueinhibitorsof metalloproteinasessignature Cerealtrypsin/alpha-amylase inhibitors familysignature Alpha-2-macroglobulin family thiolesterregionsignature Disintegrins signature

Lambdoidphages regulatoryproteinCHI signature Others

Pentaxin family signature

Immunoglobulinsandmajorhistocompatbility complex proteins signature Gram-negative piliassemblychaperonesignature

Prion proteinsignatures Cyclinssignature

Proliferating cell nuclearantigensignature Arrestinssignature

Chaperoninscpn60 signature Chaperonins cpnlO signature Chaperonins TCP-1 signatures

Heatshockhsp7Oproteins family signatures Heatshockhsp9Oproteins familysignature DnaJdomains signatures

Protein secYsignatures

CDC48/PASl/SEC18 family signature Ubiquitin signature

Beta-transducinfamilyTrp-Asprepeatssignature RasGTPase-activatingproteins signature

Guanine-nucleotide dissociation stimulatorsCDC25 family signature Guanine-nucleotide dissociationstimulators CDC24 family signature Stathmin familysignature

SRP54-typeproteins GTP-binding domain signature GTP-bindingelongationfactors signature

Eukaryoticinitiation factor5Ahypusinesignature Prokaryotic-typepeptidechainrelease factors signature Calreticulinfamily signatures

S-100/ICaBPtype calciumbindingprotein signature Hemolysin-type putative calcium-binding regionsignature HlyDfamily secretion proteins signature

P-II proteinsignatures 14-3-3proteins signatures Caseinsalpha/beta signature Legumelectins signatures

Vertebrategalactoside-bindinglectinsignature

Lysosome-associatedmembraneglycoproteinssignatures GlycophorinAsignature

SeminalvesicleproteinIrepeatssignature Seminal vesicleproteinII repeatssignature Stress-inducedproteinsSRPl/T1Pl family signature Tissue factorsignature

HCP repeatssignature

Bacterial ice-nucleationproteins octamerrepeat Cell cycleproteins ftsW/rodA/spoVEsignature

Enterobacterialvirulenceoutermembraneproteinsignatures Staphylocoagulaserepeatsignature

11-Splantseedstorageproteins signature Dehydrinssignature

Germinfamily signature

Smallhydrophilic plant seed proteinssignature Pathogenesis-relatedproteins BetvI

family signature

Thaumatin familysignature

Références

Documents relatifs

There- fore, although different mechanisms are exploited, manipulating host immunity and/or responses to invading pathogens by affecting host transcription through modification

eliminate IGFBPs interference, IGF-I and IGF-II were measured by a radio-immunoas- say with specific antibodies. In the fasted state, IGFBP-34 was lower in Fat

The GAL4 fusion proteins studied here support this conclusion, as the histone H3 binding activities of mutant proteins correlate well with transcriptional activation in a

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

During the last 5 years, accumulating data revealed that the nutrient- sensing O-GlcNAc glycosylation (O-GlcNAcylation) may be pivotal in the modulation of chromatin remodeling and

5: Mycobacterium leprae laminin-binding histone-like protein (LBP/Hlp) binds to different collagen types by its C-terminal domain.. leprae recom- binant-LBP/Hlp, truncated

The combined high stability and ability to bind any dsDNA sequences of the “7 kDa DNA-binding” proteins have paved the way for developing improved tools for molecular biology by