Article
Reference
The PROSITE dictionary of sites and patterns in proteins, its current status
BAIROCH, Amos Marc
BAIROCH, Amos Marc. The PROSITE dictionary of sites and patterns in proteins, its current status. Nucleic Acids Research , 1993, vol. 21, no. 13, p. 3097-103
DOI : 10.1093/nar/21.13.3097 PMID : 8332530
Available at:
http://archive-ouverte.unige.ch/unige:36853
Disclaimer: layout of this document may differ from the published version.
1 / 1
The PROSITE dictionary of sites and patterns in proteins, its current status
Amos Bairoch
Department of Medical Biochemistry, University of Geneva, 1 rue Michel Servet, 1211 Geneva 4, Switzerland
BACKGROUND
PROSITE is a compilation of sites and patterns found in protein sequences; it can be used as a method of determining the function of uncharacterized proteins translated from genomic or cDNA sequences. In some cases the sequence of an unknown protein is too distantly related to any protein of known structure to detect its resemblance by overall sequence alignment, but relationships can be revealed by the occurrence in its sequence of a particular cluster of residue types which is variously known as a pattern, motif, signature, or fingerprint. These motifs arise because specific region(s) of a protein which may be important, for example, for their binding properties or for their enzymatic activity are conserved in both structure and sequence. These structural requirements impose very tight constraints on the evolution of these small but important portion(s) of a protein sequence. The use of protein sequence patterns to determine the function of proteins is becoming very rapidly one of the essential tools of sequence analysis. This reality has been recognized by many authors [1,2]. While there have been a number of reviews of published patterns [3,4,5], no attempt had been made until very recently [6,7] to systematically collect biologically significant patterns or to discover new ones. Based on these observations, we decided in 1988, to actively pursue the development of a database of patterns which would be used to search against sequences of unknown function. This database, called PROSITE, contains some patterns which have been published in the literature, but the majority have been developed in the last four years by the author.
LEADING CONCEPTS
The design of PROSITE follows four leading concepts:
Completeness. For such a compilation
tobe helpful in the determination of protein function, it is important that it contains as many biologically meaningful patterns
aspossible.
High specificity of the patterns. In the majority of
cases wehave chosen patterns that are specific enough that they do
notdetect
toomany unrelated sequences, yet they will detect most, if not all, sequences that clearly belong
tothe
setin consideration.
Documentation. Each of the patterns is fully documented; the documentation includes
aconcise description of the protein family that it is designed
todetect
aswell
as asummary of the
reasonsleading
tothe development of the pattern.
Periodic reviewing. It is important that each pattern be periodically reviewed
toinsure that it is still valid.
FORMAT
The PROSITE database is composed of two ASCII (text) files.
The first file (PROSITE.DAT) is a computer-readable file that contains all the information necessary for programs that make use of PROSITE to scan sequence(s) for the occurrence of the patterns. This file also includes, for each of the patterns described, statistics on the number of hits obtained while scanning for that pattern in the SWISS-PROT protein sequence data bank [8].
Cross-references to the corresponding SWISS-PROT entries are also present in the file. The second file (PROSITE.DOC), which we call the textbook, contains textual information that documents each pattern. A user manual (PROSUSER.TXT) is distributed with the database; it fully describes the format of both files. A sample textbook entry is shown in Figure 1 with the corresponding data from the pattern file.
CONTENT OF THE CURRENT RELEASE
Release 10.1 of PROSITE (April 1993) contains 635 documentation entries describing 803 different patterns. The list of these entries is provided in Appendix 1. The database requires about 2 Mb of disk storage space.
COMPUTER PROGRAMS THAT MAKE USE OF PROSITE
Many academic groups and commercial companies have developed computer programs that make use of PROSITE. We list here some of these programs (a fuill descriptive list is included with the database and is stored in a file called 'PROSITE.PRG').
ACADEMIC
Program Operatingsystem Author
MacPattern prosite.c ProSearch cregex dbsite/mksite PROINDEX Quelsite Scrutineer PATTERN PIP andPIPI PATMAT PROTOMAT
AppleMcIntosh IBM3090400Eand Unix Unix andDOS(AWK) Unix and DOS Unix VAX VMS VAX VMS
VAX VMSand Unix Unix
VAXVMS and Unix OS and Unix OS andUnix
Rainer Fuchs[9]
KlausHartmuth LeeKolakowski [10]
Jack Leunissen J.-M. Claverie SteveClark ClaudeValencien PeterSibbald[11]
Olivier Boulot Roger Staden[12]
StevenHenikoff[13]
StevenHenikoff[14]
3098 Nucleic Acids Research, 1993, Vol. 21, No. 13
COMMERCIAL
Program Package Supplier Operatng system
MOTIF GCG Genetics Comp. Group VaxVMSandUnix
QUEST IG-Suite IntelliGenetics VaxVMS and Unix
PROMOT OML Vax VMSand Unix
PROSITE PC/Gene IntelliGenetics DOS
PROSITE GeneWorks IntelliGenetics AppleMcIntosh
Protean LaserGene DNASTAR Apple McIntosh
PROTSITE National Biosciences DOS
SEQ/Pattern ProExplore Biostructure Unix
FUTURE DEVELOPMENTS
There are a number of protein families as well as functional or structural domains that cannot be detected using patterns due to their extreme sequence divergence. Typical examples of important functional domains which are weakly conserved are the immunoglobulin domains, the SH2 and SH3 domains, and the fibronectin type I domain. In such domains there
areonly a few sequence positions which are well conserved. Any attempt to build a consensus pattern for such regions will either fail to pick up a significant proportion of the protein sequences that contain such a region (false negatives) or will pick up too many proteins that do not contain the region (false positives).
Techniques based on the use of weight matrices [15,16,17] allows the detection of such proteins or domains. We plan to collaborate closely with Dr P. Bucher of the Swiss Cancer Research Institute (ISREC) in Lausanne and with Dr T.K. Attwood of the University of Leeds to develop such methods and
tointegrate them into PROSITE.
HOW TO OBTAIN PROSITE
PROSrTE is distributed on magnetic tape and
onCD-ROM by the EMBL Data Library. For all enquiries regarding the subscription and distribution of PROSITE
oneshould
contact:EMBL Data Library
European Molecular Biology Laboratory Postfach 10.2209, Meyerhofstrasse 1 6900 Heidelberg, Germany
Telephone: (+49 6221) 387 258
Telefax: (+49 6221) 387 519 or 387 306
Electronic network address: [email protected] PROSITE can be obtained from the EMBL File Server [18].
Detailed instructions on how to make the best use of this service, and in particular on how to obtain PROSITE, can be obtained by sending to the network address [email protected] the following message:
HELP
HELP PROSITE
If you have access to a computer system linked to the Internet you can obtain PROSITE using FTP (File Transfer Protocol), from the following file servers:
EMBL anonymous FTP server
Internet address: fip.EMBL-heidelberg.de (or 192.54.41.33)
NCBI Repository (National Library of Medicine, NIH, Washington D.C., U.S.A.)
Internet address: ncbi.nlm.nih.gov (130.14.20.1)
Basel Biozentrum Biocomputing
server(EMBnet SWISS node) Internet address: bioftp. unibas. ch (or 131.152.8.1)
ExPASy (Expert Protein Analysis System server, University of Geneva, Switzerland)
Internet address: expasy.hcuge.ch (129.195.254.61) National Institute of Genetics (Japan) FTP
serverInternet address: ftp.nig.ac.jp (133.39.16.66)
You
canalso browse through PROSITE using various Internet Gopher servers that specialize in biosciences (biogophers) [19].
Gopher is a distributed document delivery service that allows
aneophyte
user to accessvarious types of data residing
onmultiple hosts in a seamless fashion.
The present distribution frequency is four releases per year.
No restrictions are placed on use or redistribution of the data.
REFERENCES
1. Doolittle R.F.(In)OfURFs andORFs: aprimeronhow toanalyzederived amino acid sequences.,University Science Books, MillValley,California, (1986).
2. Lesk A.M. (In) Computational Molecular Biology, Lesk A.M., Ed., ppl7-26, Oxford University Press, Oxford (1988).
3. Barker W.C., Hunt T.L., George D.G. Protein Seq. Data Anal.
1:363-373(1988).
4. HodgmanT.C.Comput. Appl. Biosci.5:1-13(1989).
5. Taylor W.R.,Jones D.T. Curr. Opin. Struct. Biol. 1:327-333(1991).
6. Bork P. FEBSLett. 257:191-195(1989).
7. SmithH.O., Annau T.M.,ChandrasegaranS. Proc.Natl.Acad. Sci.USA 87:826-830(1990).
8. Bairoch A., BoeckmannB. Nucleic AcidsRes.20:2019-2022(1992).
9. FuchsR. Comput. Appl. Biosci. 7:105-106(1991).
10. Kolakowski L.F. Jr., Leunissen J.A.M., Smith J.E. Biotechniques 13:919-921(1992).
11. Sibbald P.R., Sommerfeldt H., Argos P. Comput. Appl. Biosci.
7:535-536(1991).
12. StadenR. DNASequence 1:369-374(1991).
13. Wallace J.C., Henikoff S. Comput. Appl. Biosci. 8:249-254(1992).
14. Henikoff S., Henikoff J. Nucleic Acids Res. 19:6565-6572(1991).
15. Staden R. Nucleic Acids Res. 12:505-519(1984).
16. GribskovM.,Luethy R., Eisenberg D. Meth. Enzymol. 183:146-159(1990).
17. AttwoodT.K.,Eliopoulos E.E., Findlay J.B.C. Gene98:153-159(1991).
18. StoehrP.J., Omond R.A. Nucleic AcidsRes. 17:6763-6764(1989).
19. GilbertD. Trends Biochem. Sci. 18:107-108(1993).
a
(PDOC00107)
(PS00116; DNA POLYNERASEB)
(BEGIN) -
DNA otzerisef_1tySslrnature*
Replicative DNA polymerases (EC2.7.7.7) arethe keyenzymes cataLyzingthe accurate replication ofDNA. Theyrequireeither a sut RUA moleculeor a protein as aprimr for the denovosynthesisof aDNA chain. On thebasisof
sequence sImIlaritiesa nuwerofDNApolymera have been grouped together Cl to7] under the designation ofDNApolywrase fmilyB. Thepolyerases that belong to thIs
faellymare:
- Higher eukaryotespolymfrases alpha.
- Higher aukaryotespolymerises delta.
- Yeast polywrise I/alpha (gene POLI), polyerase Il/epsilon(gene POL2), polyerae" ll/delta (gene POL3) andpolyeraseREV3.
-EScherichia coli polymerase ll
(gene
diAor polC).Polymerses
of viruses framtheherpesviridie
family.-
Polywraes
framAdenoviruses.Polymerases fromBsculoviruses.
Polywrases
fromChlorella virutes.Polywrases
fromPoxvlruses.Bacteriophage
T4polywrase.
Podoviridie
becteriophages Phi-29, M2, andPZApolywrase.Tectiviridae
bacteriophage PRDI polymerase.Polymerases encoded on
eukiryotic
linearDNA plasmidssuchaslulyveromyces
lactis pGKL1 andpGKL2 AgaricusbitorquispEN, Ascobolus imersuspA 2, Cltvicepspurpuree pCLKI andMizeS-1.
Six regionsof similarity (nuileredfrom I to VI) are found in all or a subset of theabovepolymerises. The ost conserved region (I) includes a perfectly conserved tetr tide which contains twoispartate residues. The function of this consrvi region Isnot yet known, however it has been suggested C3l that it my be involved in bindingamagnsiu Ion. Weusethis conserved regionas asignature for thisfmilyofDNApolymerses.
-Consensus pattern: EYA]-
EGLIVNSTAC]
-D-T-D-ESG]- ELIVNFTC]
-x-CLIVNSTACI
Sequences known tobelongto this classdetected by thepttern: ALL, except foryeastpolywrase II/epsilon nd Agaricusbitorquis MEN.
-Other sequence(s) detected In SWISS-PROT: chicken
vitalI1
ogenin2.-Last update: Nay1993 /Patternand text revised.
E11 Jung
G.,
Leavitt N.C., Hsieh J.-C. Ito J.Proc. IltI. Aced. Sci. U.S.A. 84:8o87-8291C1987).
E2] BernadA., ZabillosA. SilasN., BlencoL.
ENBO J. 6:4219-4225(1967).
C3] Argos P.
Mucleic AcidsRes. 16:9909-9916(1968).
E4] Wang T.S.-F. Wong S.W., KornD.
FASEB J. 3:14-21(0989).
C 5] Delarue N., Poch 0., Todro N., Moras D., ArgosP.
ProteinEngineering 3:461-467(1990).
C 6] Ito J., BraithwaiteD.K.
Nucleic AcidsRes. 19:4045-4057(1991).
n7 BraithwaiteD.K., Ito J.
Nucleic Acids Res. 21:787-802(1993).
(END)
b
ACID DTDE PANR NR NRCC DR
DR
DRDR DRDR DRDR DRDR DR DRDR DRDR DRDO
DO
DNA POLYNERASE-B; PATTERN.
PS0O 16-
APR-1996 (CREATED)- MAY-1993 (DATAUPDATE); AY-1993 (INFO UPDATE).
DNA poltmrisefamIly Ssa ture.
CYA-CGLIVNSTAC -D-T-D-CSG]-CLIVNFTC]-x-CL VNSTAC].
/RELEASEn24 28154*
/TOTAL42(4fi-
/P6SITIVE=41(41); /UNKNOWN0(0); /FALSEPOS=1(1);/FALSE NEGu2(2)
/TAXO-ANMGEAEV-/AX-REPEATu1-
P26019, DP0 DR0NM, T; P0968, Dh T P280, D SCNPO, T
P27727, DOARYU, T; P13382, DPOAYEAST, T P28339, DmDmOVIN, P28340, DPDNUNAN P15436,DPODEAST, P14284 D EST T- P21189, DPOZECOLI, T- P26811, DPOL-ULSO, P03261 DPOL-ADE2: T-
P04495, DPOLADE05, T; DPOLADE07, T P06538, DPOLADEI2, T-
P03196, DPOL-ESV, T; , DPOL-NCNVA, T P04293, DPOLNSV11, T- P07917, DPOL NSVIA, P04292, DPOL NSV1K, ;P04, DPOLNSV1S, T;
P07918, DPOLNSV21, T; P2887, DPOLNSV6U, T; P , DPOL-NSVES, T;- P2M59, DPOL NSVI1, T; P24907, DPOLNSVSA, T; P09252, DPOLTZVD, 7;
P094, DPOIPLULA,T P05468, DPO2kLULA, T; P2233, DPOLCAU,
P10582, DPOL)IIZE, T; P18131, DPOLPVAC, P20509, DPOLVACCC, T;
P06856, DPOL VACCV, T- P21402, DPOL FOWPV, T 0, DPOLDPN2, T;
P06950, DPOL-BPPZA, T; P19694, DPOL2, P10479, DPOL RD, T;
P04415, DPOL-BPT4,T,
P22374, DPOL ASCIN, N; P21951, DPOE YEAST, N;
P02845 VIT2:CHICK, F;
PDOCO0O07; -
Figure1. Sampledata from PROSITE.a)Adocumentation (textbook) entry. b) Thecorresponding entryin thepatternfile.
Appendix 1. List ofpatternsdocumentationentries in release 10.1 ofPROS1TE Post-translational modifications
N-glycosylation site
Glycosaminoglycanattachment site Tyrosine sulfatation site
cAMP- andcGMP-dependent proteinkinasephosphorylation site Protein kinase Cphosphorylation site
Casein kinase IIphosphorylationsite Tyrosinekinase phosphorylationsite N-myristoylation site
Amidation site
Aspartic acidandasparaginehydroxylationsite VitaminK-dependentcarboxylation domain Phosphopantetheineattachment site
Prokaryoticmembranelipoprotein lipidattachment site Prokaryotic N-terminalmethylation site
Farnesylgroupbindingsite (CAAX box) Domains
Endoplasmicreticulumtargeting sequence Microbodies C-terminaltargeting signal
Gram-positivecocci surfaceproteins 'anchoring' hexapeptide Bipartitenucleartargetingsequence
Cell attachment sequence
ATP/GTP-binding site motif A(P-loop) EF-handcalcium-binding domain
Actinin-typeactin-binding domainsignatures Cofilin/tropomyosin-typeactin-binding domain Apple domain
Band 4.1family domain signatures Kringledomainsignature
EGF-likedomaincysteinepattern signature
Fibrinogenbeta and gammachainsC-terminaldomainsignature TypeIIfibronectincollagen-bindingdomain
Hemopexin domainsignature C-type lectin domainsignature Osteonectindomainsignatures Somatomedin B domainsignature Thyroglobulin type-1 repeatsignature 'Trefoil'domainsignature
Cellulose-binding domain,bacterialtype Cellulose-bindingdomain, fungaltype Chitinrecognitionorbinding domainsignature Barwin domainsignatures
WAP-type 'four-disulfide core' domainsignature Phorbolesters/diacylglycerolbinding domain C2 domainsignature
ZPdomainsignature
DNAorRNAassociatedproteins 'Homeobox' domainsignature
'Homeobox' antennapedia-typeprotein signature 'Homeobox' engrailed-typeprotein signature 'Paired box' domainsignature
'POU' domainsignatures Zincfinger, C2H2type, domain Zincfinger, C3HC4type, signature
Nuclear hormonesreceptorsDNA-bindingregion signature GATA-typezincfingerdomain
Poly(ADP-ribose) polymerasezinc fingerdomain FungalZn(2)-Cys(6)binuclear cluster domain
3100 Nucleic Acids Research, 1993, Vol. 21, No. 13
Appendix 1. continued Leucinezipper pattern
Fos/junDNA-binding basic domainsignature MybDNA-binding domainrepeatsignatures
Myc-type, 'helix-loop-helix' putative DNA-binding domainsignature p53tumorantigen signature
CBF/NF-Y subunits signatures
'Cold-shock' DNA-binding domain signature CTF/NF-Isignature
Ets-domain signatures Forkheaddomain signatures
HSF-typeDNA-binding domainsignature IRFfamilysignature
LIMdomain
SRF-typetranscription factorsDNA-binding and dimerization domain TEAdomainsignature
Transcription factor TFIIBrepeat signature Transcription factorTFIID repeatsignature TFIIS cysteine-richdomainsignature
DEADandDEAHbox familiesATP-dependenthelicasessignature EukaryoticputativeRNA-binding regionRNP-1 signature Fibrillarinsignature
XPACprotein signatures
Bacterial
regulaory
proteins, araC family signature Bacterialregulatoryproteins,asnC familysignature Bacterialregulatoryproteins,crpfamilysignature Bacterialregulatoryproteins, gntRfamilysignature Bacterialregulatoryproteins, laclfamily signature Bacterialregulatoryproteins,luxRfamily signature Bacterialregulatoryproteins, lysRfamilysignature Bacterialregulatoryproteins, merRfamilysignature Transcriptional antiterminatorsbglGfamilysignature Sigma-54 factorsfamily signaturesSigma-70 factorsfamilysignaus Sigma-54 interaction domainsignatures Single-strandbinding proteinfamilysignatures Bacterial histone-likeDNA-bindingproteins signature Histone H2Asignature
Histone H2Bsignature Histone H3signature HistoneH4signature HMGI/2signature
HMG-I and HMG-Y DNA-bindingdomain(AT-hook) HMG14andHMG17signature
Bromodomain Chromodomain
Regulator ofchromosomecondensationsignatures Protamine P1 signature
Nucleartransition protein1 signature Ribosomalprotein L2signature RibosomalproteinL3
signature
RibosomalproteinL5 signature RibosomalproteinL6signatures Ribosomalprotein L9signature RibosomalproteinLii signature Ribosomalprotein L13signaure RibosomalproteinL14signature Ribosomal protein L15signature Ribosomal protein L16signatures Ribosomal proteinL22signature RibosomalproteinL23signature RibosomalproteinL29signature RibosomalproteinL30signature Ribosomal proteinL33 signature Ribosomalprotein L34signature
RibosomalproteinL19esignature Ribosomalprotein L30esignaure
RibosomalproteinL32esignature Ribosomal proteinL46e signature Ribosomalprotein S3 signatures RibosomalproteinS4signature RibosomalproteinS5signature Ribosomalprotein S7 signature RibosomalproteinS8 signature Ribosomalprotein S9 signature RibosomalproteinS1Osignature
Ribosomalprotein SII signature Ribosomal proteinS12signature Ribosomal protein S13signature Ribosomalprotein S14signature Ribosomalprotein S15 signature RibosomalproteinS16signature RibosomalproteinS17signature RibosomalproteinS18signature Ribosomalprotein Si9 signature Ribosomal protein S4esignaure Ribosomalprotein S6esignature Ribosomal protein S17esignature Ribosomal proteinS19esignature Ribosomal protein S24e signature Ribosomalprotein S26esignature
DNAmismatch repair proteinsmutL/hexB/PMSI signature DNAmismatchrepair proteinsmutSfamily signature RecFproteinsignatures
Small, acid-soluble sporeproteins,alpha/betatype, signatures
Enzymes
Oxidoreductases
Zinc-containingalcoholdehydrogenases signature Iron-containing alcohol dehydrogenases signature Short-chain alcoholdehydrogenase family signature Aldo/keto reductasefamily signatures
Histidinol dehydrogenase activesite L-lactate dehydrogenase activesite
D-isomerspecific2-hydroxyaciddehydrogenases signatures Hydroxymethylglutaryl-coenzyme Areductasessignatures 3-hydroxyacyl-CoAdehydrogenasesignature
Malatedehydrogenase active sitesignature Malicenzymessignature
Isocitrate andisopropyimalate dehydrogenasessignature 6-phosphogluconate dehydrogenasesignature
Glucose-6-phosphate dehydrogenase active site IMPdehydrogenase/GMP reductase signature Bacterialquinoprotein dehydrogenases signatures
FMN-dependent alpha-hydroxy acid dehydrogenases active site GMC oxidoreductasessignatures
Eukaryoticmolybdopterin oxidoreductases signature Prokaryoticmolybdopterin oxidoreductasessignatures Aldehydedehydrogenasesactive sites
Glyceraldehyde 3-phosphatedehydrogenase active site Fumaratereductase/succinatedehydrogenaseFAD-binding site Acyl-CoAdehydrogenasessignatues
Glutamate/Leucine/Phenylalanine dehydrogenases active site D-aminoacid oxidase signature
Delta I-pyrroline-5-carboxylate reductasesignature Dihydrofolate reductase signature
Tetrahydrofolate dehydrogene/cyclohydrolase signatures Pyridine nucleotide-disulphide oxidoreductases class-I activesite Pyridinenucleotide-disulphide oxidoreductases class-II active site Respiratory-chainNADHdehydrogenase subunit 1 signatures Respiratory-chainNADHdehydrogenase 30Kdsubunit signature Respiratory-chainNADHdehydrogenase 49Kdsubunitsignature Respiratory-chainNADHdehydrogenase51 Kdsubunitsignatures Respiratory-chain NADHdehydrogenase 75 Kd subunitsignatures Nitritereductasesandsulfite reductase putative siroheme-binding sites Uricasesignature
CytochromecoxidasesubunitI, copper Bbinding regionsignature Cytochromecoxidase subunit II, copper A binding region signature Multicopperoxidases signatures
Peroxidasessignatures Catalasesignatures
Glutathioneperoxidases signatures
Lipoxygenases,putativeiron-bindingregionsignatures Extradiolring-cleavage dioxygenasessignature Intradiol ring-cleavagedioxygenasessignature
Bacterialring hydroxylatingdioxygenasesalpha-subunit signature Bacterialluciferase subunitssignatre
Biopterin-dependentaromaticamino acid hydroxylases signature CoppertypeII, ascorbate-dependentmonooxygenases signatures Tyrosinasesignatures
Fattyacid desaturasessignatures
Cytochrome P450 cysteine heme-ironligand signature Hemeoxygenasesignature
Copper/Zinc superoxidedismutase signatures Manganese andiron superoxide dismutases signature Ribonucleotide reductase large subunit signature Ribonucleotide reductasesmallsubunit signature
Nitrogenases component 1 alpha and beta subunits signatures NifH/frxC family signatures
Nickel-dependent hydrogenases large subunit signatures Glutamyl-tRNA reductasesignature
Transferases
Thymidylate synthaseactive site
Methylated-DNA--protein-cysteine methyltransferaseactive site N-6 Adenine-specific DNA methylases signature
N4cytosine-specific DNA methylases signature C-5 cytosine-specific DNA methylases signatures
Serine hydroxymethyltransferase pyridoxal-phosphate attachment site Phosphoribosylglycinamide formyltransferase active site
Aspartate andornithinecarbamoyltransferases signature Transketolase signatures
Acyltransferases ChoActase/COT/CPT-II family signatures Thiolases signatures
Chloramphenicol acetyltransferase activesite cysE/lacA/nodL acetyltransferases signature Beta-ketoacyl synthases active site Chalcone and stilbene synthasesactivesite Gamma-glutamyltranspeptidase signature Transglutaminases active site
Phosphorylasepyridoxal-phosphate attachmentsite UDP-glucoronosyl andUDP-glucosyl transferases signature Purine/pyrimidine phosphoribosyl transferases signature Glutamine amidotransferases class-I active site Glutamine amidotransferasesclass-Il active site Thymidine phosphorylase signature
S-Adenosylmethionine synthetase signatures Polyprenyl synthetases signatures
Riboflavinsynthase alphachainfamily Lum-binding sitesignature Dihydropteroate synthase signatures
EPSPsynthase active site
Aspartateaminotransferases pyridoxal-phosphate attachment site Aminotransferasesclass-Ilpyridoxal-phosphateattachment site Aminotransferases class-IIIpyridoxal-phosphateattachment site Aminotransferases class-IV signature
Phosphoserine aminotransferasesignature Hexokinasessignature
Galactokinasesignature
GHMP kinasesputative ATP-bindingdomain Phosphofructokinase signature
pfkB family of carbohydratekinasessignatures Phosphoribulokinase signature
Thymidinekinasecellular-type signature Prokaryotic carbohydrate kinasessignature Protein kinasessignatures
Pyruvatekinase active sitesignature Phosphoglycerate kinase signature Aspartokinasesignature
ATP:guanidophosphotransferasesactive site PTSHprcomponentphosphorylation sitessignatures PTS permeasesphosphorylation sitessignatures Adenylate kinase signature
Nucleosidediphosphatekinasesactivesite Phosphoribosylpyrophosphate synthetase signature
7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase signature Bacteriophage-typeRNApolymerase familyactive sitesignature EukaryoticRNApolymeraseIIheptapeptiderepeat
EukaryoticRNApolymerases30to40 Kdsubunits signature DNApolymerase familyA signature
DNApolymerase familyB signature DNA polymerasefamilyXsignature
Galactose-l-phosphate uridyltransferase active site
signature
CDP-alcoholphosphatidyltransferases signaturePEP-utilizingenzymessignatures Rhodanesesignatures
Hydrolases
PhospholipaseA2 active sitessignatures Lipases, serine active site
Colipasesignature
Carboxylesterases type-Bactive site Pectinesterase signatures
Alkaline phosphatase active site Histidine acid phosphatases signatures 5'-nucleotidase signatures
Fructose-1-6-bisphosphataseactive site
Serine/threonine specific proteinphosphatasessignature Tyrosine specific protein phosphatases active site Inositol monophosphatasefamily signatures
Prokaryotic zinc-dependent phospholipase C signature 3'5'-cyclic nucleotide phosphodiesterases signature cAMPphosphodiesterases class-Il signature Sulfatasessignatures
AP endonucleases family 1 signatures APendonucleases family 2 signatures
Endonuclease HI iron-sulfur binding region signature RibonucleaseIIIfamily signature
Bacterial Ribonuclease P protein component signature Ribonuclease T2 family histidine active sites Pancreaticribonuclease family signature Beta-amylase signatures
Polygalacturonase active site
Clostridium cellulasesrepeateddomain signature Chitinasesclass Isignatures
Alpha-lactalbumin/lysozyme C signature Alpha-galactosidase signature
Alpha-L-fucosidase putative active site Glycosyl hydrolases family 1 signatures Glycosyl hydrolases family2signatures Glycosyl hydrolases family 3activesite Glycosyl hydrolases family 5signature Glycosyl hydrolases family 6 signatures
Glycosyl hydrolases family 9 active sites signatures Glycosylhydrolases family 10active site
Glycosyl hydrolases family 11 active sitesignatures Glycosyl hydrolases family 17signature
Glycosyl hydrolases family31 signatures Glycosyl hydrolases family 32 active site Alkylbase DNA glycosidases aLkA family signature Uracil-DNA glycosylase signature
S-adenosyl-L-homocysteine hydrolase signatures Cytosol aminopeptidase signature
Aminopeptidase Pandproline dipeptidase signature Methionineaminopeptidasesignature
Serinecarboxypeptidases, active sites
Zinccarboxypeptidases, zinc-binding regions signatures Serineproteases,trypsin family, active sites
Serineproteases, subtilasefamily, active sites Serineproteases,V8family, active sites Prolyl endopeptidase family serineactive site ClpPproteasesactive sites
Eukaryoticthiol(cysteine) proteases active sites
Ubiquitincarboxyl-terminal hydrolase, putativeactive-site signature Eukaryotic aspartylproteasesactive site
Neutral zincmetallopeptidases, zinc-binding region signature Matrixins cysteineswitch
Insulinase family signature recAsignature
Proteasomesubunitssignature Signal peptidasesIsignatures Amidases signature
Asparaginase/glutaminaseactivesite Ureaseactive site
ArgE/dapE/CPG2family signatures Dihydroorotasesignatures
Beta-lactamases classes-A, -C, and -Dactive site Beta-lactamases classBsignatures
Arginaseand agmatinase signatures Adenosine and AMP deaminasesignature Inorganic pyrophosphatasesignature Acylphosphatasesignatures
ATPsynthasealphaand beta subunits
signature
ATP synthase gamma subunit signature ATPsynthasedelta(OSCP) subunitsignature
ATPsynthaseasubunitsignature
ATPsynthasecsubunitsignature
3102 Nucleic Acids Research, 1993, Vol. 21, No. 13
Appendix1. confinued
E1-E2ATPasesphosphorylation site
Sodium andpotassiumATPases beta subunits signatus Cutinase, serine active site
Lyases
DDC/GAD/HDCpyridoxal-phosphate attachmentsite
Prokaryotic
orihine andlysine
decarboxylasespyridoxal-ps
eattachmentsite Orotidine5'-phosphatedecarboxylase active sitePhosphoenolpyruvatecarboxylase active sites Phosphoenolpyruvatecarboxykinase (GTP) signature Phosphoenolpyruvatecarboxykinase (ATP)signature Indole-3-glycerol phosphate synthasesignature
Ribulosebisphosphatecarboxylase large chain active site Fructose-bisphosphate aldolase class-I active site Fructose-bisphosphate aldolase class-H signatures Malate synthase signature
Citratesynthasesignature
KDPG and KHG aldolasesactive sitesignatures Isocitratelyasesignature
DNAphotolyases signatures
Eukaryotic-type carbonicanhydrases signature Prokaryotic-typecarbonicanhydrases signatures Fumaratelyases signature
Aconitasefamily signature Enolasesignature
Serine/threonine dehydratasespyridoxal-phosphateattachment site Enoyl-CoAhydratase/isomerasesignature
Tryptophansynthase alpha chainsignature
Tryptophansynthase beta chainpyridoxal-phosphateattachment site Delta-aminolevulinic aciddehydratase active site
Dihydrodipicolinate synthetase signatures
Phenylalanine and histidineammonia-lyasessignature Porphobilinogendeaminase cofactor-bindingsite Guanylatecyclases signature
Chorismate synthase signatures Ferrochelatasesignature Isomerases
Alanineracemasepyridoxal-phosphate attachmentsite Aldose I-epimeraseputativeactive site
Cyclophilin-typepeptidyl-prolyl cis-trans isomerase signature FKBP-typepeptidyl-prolyl cis-trans isomerase signatures Triosephosphate isomerase active site
Xylose isomerasesignatures Phosphoglucose isomerase signatures
Phosphoglyceratemutasefamily phosphohistidine signature
Phosphoglucomutase &phosphomannomutase phosphohistidinesignature Methylmalonyl-CoAmutasesignature
EukaryoticDNAtopoisomeraseIactive site ProkaryoticDNAtopoisomeraseIactive site DNAtopoisomeraseIIsignature
Amioacyl-ueansfer
RNAsynthetasesclass-Isigntr Aminoacyl-transferRNAsynthetasesclass-I signatures WHEP-TRSdomainsignatureATP-citratelyaseandsuccinyl-CoAligases active site Glutamine synthetase signatures
Ubiquitin-activating enzymesignature Ubiquitin-conjugatingenzymesactive site Formate--tetrahydrofolate ligasesignatures Adenylosuccinate synthetase activesite Argininosuccinate synthase signatures Phosphoribosylglycinamide synthetase signature ATP-dependentDNAligasesignatures Others
sopenicillinN synthetase
signatures
Site-specific recombinasessignatues
Thiamine pyrophosphateenzymessignature Biotin-requiringenzymes attachment site2-oxoaciddehydrogenasesacyltransferasecomponentlipoyl binding site PutativeAMP-bindingdomainsignature
Ekctron transportproteins
Cytochromecfamilyheme-binding sitesignature Cytochromeb5family, heme-binding domainsignatue
Cytochromeb/b6signatures
Cytochrome b559 subunitsheme-binding site signature Thioredoxinfamilyactivesite
Glutaredoxinactive site
Type-l copper(blue)proteins signature
2Fe-2Sferredoxins, iron-sulfurbinding region signature 4Fe-4S ferredoxins, iron-sulfurbinding region signature Highpotential iron-sulfurproteins signature
Rieske iron-sulfur protein signatures Flavodoxinsignature
Rubredoxinsignature
Electron transferflavoproteinalpha-subunit signature Other transportproteins
ClassImetallothioneinssignature Ferritin iron-binding regions signatures Bacterioferritin signature
Transfeffins signatures Planthemoglobins signature Hemerythrinssignature
Arthropodhemocyanins/insect LSPs signatures ATP-bindingproteins 'active transport' family signature
Binding-protein-ependent
transport systems umer membranecomponentsignatue
Serum albuminfamily signatureTransthyretin signatures
Avidin/Streptavidin family signature
Eukaryoticcobalamin-binding proteins signature
Lipocalin
signatureCytosolicfatty-acidbinding proteinssignature LBP/BPI/CETPfamily signature
Plantlipidtransferproteins signature Uteroglobin family signatures
Mitochondrialenergy transferproteinssignature Sugartransportproteins signatures
Sodium symporterssignatures
Prokaryotic sulfate- andthiosulfate-binding proteins signatures Aminoacidpenneases signature
Aromatic amino acidspermeasessignature Anion exchangersfamilysignatrs
BacterialproteinexportpilTproteinfamily signature GltP/dctAfamilyoftransporterssignatures MIPfamily signature
Neurotransmitters transporterssignatures Generaldiffusiongram-negative porins signature Eukaryoticporinsignature
Insulin-like growth factor binding proteins signature Strctura proteins
43Kdpostsynapticprotein signature Actinssignatures
Annexinsrepeated domainsignature Clathrinlightchainssignatures Clusterinsignatures
Connexinssignatures
Crystallinsbeta and gamma 'Greekkey'motifsignature Dynaminfamily signature
Intermediate filaments signature Involucrinsignature
Kinesinmotordomain signature Myelinbasicproteinsignature MyelinP0proteinsignature Myelin proteolipid proteinsignature Neuromodulin(GAP-43) signatures Profilinsignature
SurfactantassociatedpolypeptideSP-C palmitoylation sites Synapsins signatures
Synaptobrevinsignature
Synaptophysin/synaptoporinsignature Tropomyosins signature
Tubulinsubunits alpha, beta, and gammasignature Tubulin-betamRNAautoregulationsignal Tauand MAPproteinsrepeatedregionsignaure Neuraxin andMAP1Bproteins repeated region signature F-actincappingprotein alpha subunitsignatures F-actincapping proteinbetasubunit
signature
Vinculin family signaturesAmyloidogenicglycoprotein signatures
Cadherins extracellular repeateddomain signature Insectflexible cuticleproteins signature Gasvesiclesprotein GVPa signatures
Gas vesiclesproteinGVPc repeated domainsignature Flagella basal body rod proteins signature
Plant virusesicosahedral capsid proteins 'S'regionsignature Potexviruses and carlaviruses coatprotein signature Receptors
Neurotransmitter-gated ion-channels signature G-proteincoupledreceptors signature G-protein coupledreceptors family 2 signatures Visual pigments (opsins) retinalbinding site Bacterial rhodopsinsretinalbinding site Receptor tyrosinekinase class II signature Receptortyrosinekinaseclass III signature Receptortyrosinekinaseclass Vsignatures
Growth factor andcytokinesreceptorsfamilysignatures TNFR/NGFR family cysteine-rich region signature Integrins alpha chain signature
Integrins betachaincysteine-rich domain signature Natriuretic peptidesreceptors signature
Photosynthetic reaction center proteins signature PhotosystemIpsaA and psaBproteins signature Phytochrome chromophore attachmentsite Speractreceptorrepeated domain signature TonB-dependent receptor proteins signature Type-Ilmembrane antigens family signature Bacterialchemotaxis sensorytransducers signature Cytokines and growth factors
Granulinssignature HBGF/FGFfamily signature
PTN/MKheparin-binding protein family signatures Nervegrowthfactor familysignature
Platelet-derivedgrowth factor(PDGF)family signature Small cytokines(intercrine/chemokine) signatures TGF-betafamily signature
TNFfamily signature Wnt-l family signature
Interferon alphaand betafamilysignature
Granulocyte-macrophage colony-stimulating factor signature Interleukin-l signature
Interleukin-2 signature
Interleukin-6/G-CSF/MGF family signature Interleukin-7 signature
Interleukin-10 signature LIF/OSMfamily signature Hormonesandactivepeptides Adipokinetichormonefamily signature Bombesin-likepeptides family signature Calcitonin/CGRP/IAPP family signature Corticotropin-releasing factorfamily signature Graninssignatures
Gastrin/cholecystokinin familysignature Glucagon/GIP/secretin/VIPfamily signature Glycoproteinhormonesalpha chain signatures Glycoprotein hormonesbetachainsignatures Gonadotropin-releasing hormones signature Insulinfamily signature
Natriuretic peptidessignature Neurohypophysial hormones signature Pancreatichormonefamilysignature Parathyroid hormone family signature Pyrokininssignature
Somatotropin,prolactinand relatedhormonessignatures Tachykininfamilysignature
Thymosinbeta-4family signature Cecropin familysignature Mammaliandefensinssignature Insectdefensinssignature Endothelins/sarafotoxins signature Toxins
Plantthionins signature Snaketoxinssignature
Myotoxinssignature
Heat-stableenterotoxinssignature Aerolysintype toxins signature
Shiga/ricin ribosomalinactivatingtoxinsactive site signature Channelforming colicins signature
Hok/gef family cell toxic proteins signature
Staphyloccocal enterotoxins/Streptococcal pyrogenic exotoxinssignatures Thiol-activated cytolysins signature
Membraneattack complexcomponents/perforinsignature Inhibitors
Pancreatic
trypsin
inhibitor(Kunitz) family signature Bowman-Birk serineproteaseinhibitors family signature Kazalserine protease inhibitors family signatureSoybean
trypsin
inhibitor(Kunitz)proteaseinhibitorsfamilysignature Serpins signaturePotatoinhibitor Ifamily signature
Squash family of serineproteaseinhibitorssignature Cysteineproteasesinhibitors signature
Tissueinhibitorsof metalloproteinasessignature Cerealtrypsin/alpha-amylase inhibitors familysignature Alpha-2-macroglobulin family thiolesterregionsignature Disintegrins signature
Lambdoidphages regulatoryproteinCHI signature Others
Pentaxin family signature
Immunoglobulinsandmajorhistocompatbility complex proteins signature Gram-negative piliassemblychaperonesignature
Prion proteinsignatures Cyclinssignature
Proliferating cell nuclearantigensignature Arrestinssignature
Chaperoninscpn60 signature Chaperonins cpnlO signature Chaperonins TCP-1 signatures
Heatshockhsp7Oproteins family signatures Heatshockhsp9Oproteins familysignature DnaJdomains signatures
Protein secYsignatures
CDC48/PASl/SEC18 family signature Ubiquitin signature
Beta-transducinfamilyTrp-Asprepeatssignature RasGTPase-activatingproteins signature
Guanine-nucleotide dissociation stimulatorsCDC25 family signature Guanine-nucleotide dissociationstimulators CDC24 family signature Stathmin familysignature
SRP54-typeproteins GTP-binding domain signature GTP-bindingelongationfactors signature
Eukaryoticinitiation factor5Ahypusinesignature Prokaryotic-typepeptidechainrelease factors signature Calreticulinfamily signatures
S-100/ICaBPtype calciumbindingprotein signature Hemolysin-type putative calcium-binding regionsignature HlyDfamily secretion proteins signature
P-II proteinsignatures 14-3-3proteins signatures Caseinsalpha/beta signature Legumelectins signatures
Vertebrategalactoside-bindinglectinsignature
Lysosome-associatedmembraneglycoproteinssignatures GlycophorinAsignature
SeminalvesicleproteinIrepeatssignature Seminal vesicleproteinII repeatssignature Stress-inducedproteinsSRPl/T1Pl family signature Tissue factorsignature
HCP repeatssignature
Bacterial ice-nucleationproteins octamerrepeat Cell cycleproteins ftsW/rodA/spoVEsignature
Enterobacterialvirulenceoutermembraneproteinsignatures Staphylocoagulaserepeatsignature
11-Splantseedstorageproteins signature Dehydrinssignature
Germinfamily signature
Smallhydrophilic plant seed proteinssignature Pathogenesis-relatedproteins BetvI