• Aucun résultat trouvé

Development of DNA Sequencing Methods

Dans le document URLs Referenced in This Book (Page 189-194)

Part III: Tools for Bioinformatics

Chapter 7. Sequence Analysis, Pairwise Alignment, and Database Searching

7.4 Development of DNA Sequencing Methods

If you just digest DNA into its four component bases and measure the quantity of each, it tells you nothing about the DNA sequence. Modern methods for DNA sequencing rely on controlled biochemical reactions that allow the base content at each position in the DNA sequence to be quantitated independently. The chemical cleavage method for sequencing DNA relies on the specificity of chemical reagents (reactive substances) to break DNA chains at four specific types of sites. There are reagents that break or cleave the chain specifically after G nucleotides and reagents that cleave specifically after C nucleotides.

There are also reagents that cleave less specifically: one to cleave after A and G nucleotides and one to cleave after C and T nucleotides. The method Maxam and Gilbert designed was conceptually simple. Four samples of DNA are required for this method. One type of reagent is mixed with each sample in a quantity that causes each DNA chain in the sample to be broken only one time, on average, at a random location. One end of the DNA is radioactively labeled, and the other is not, so only one piece of each broken chain is radioactive after the chain is cleaved. DNA fragments of different sizes can be separated using an electric current to drive them through a viscous medium called a gel. The larger the fragment, the more it's slowed by the gel, so at the end of some period of time, different-sized radioactive pieces of DNA are spread out at regular intervals down the gel. Figure 7-5 shows a partial

autoradiogram of a DNA sequencing gel. Each set of four closely spaced lanes represents an individual sequencing experiment. The gel is read from bottom to top. Each band on the gel identifies the nucleotide present at the position in the sequence, depending on which of the four lanes it appears in. (Image courtesy of Dr. Dennis Dean, Virginia Tech.) If each DNA chain is broken once after a random A, C, G, or T, a uniform distribution of fragments that map the entire sequence of the DNA is created. Depending on which sample the radioactive piece is from, the last base in its sequence is known, and the sequence can be read off the gel from end to end.

Figure 7-5. DNA sequencing gel

Safari | Developing Bioinformatics Computer Skills -> 7.4 Development of DNA Sequencing Methods

http://safari.oreilly.com/main.asp?bookname=bioskills&snode=75 (1 of 5) [6/2/2002 1:21:47 PM]

All Books Search

Sanger's chain-terminator procedure is the most commonly used sequencing chemistry in modern laboratories. This procedure takes advantage of an enzyme called DNA polymerase, which builds a complementary strand of DNA for an existing single strand. In Sanger's method, the DNA polymerase reaction is carried out in the presence of specific analogues of nucleotides that, when they are incorporated, cause the synthesis of the complementary strand to stop. Four samples are prepared, each containing a small amount of one type of chain terminator. Analogously to the Maxam and Gilbert method, a uniform distribution of DNA fragments is generated, each with a known end residue. The fragments are analyzed based on the strength of this fluorescence signal, giving the sequence of the complementary strand to the original DNA.

The chain termination method is easily automated, and computer-compatible sequencing systems that use this method are readily available. Most genome sequence data is currently generated using this method, though new sequencing methods that don't involve chain

cleavage or chain termination are in development. We discuss the process of sequencing data analysis and genome assembly further in Chapter 11.

Safari | Developing Bioinformatics Computer Skills -> 7.4 Development of DNA Sequencing Methods

http://safari.oreilly.com/main.asp?bookname=bioskills&snode=75 (2 of 5) [6/2/2002 1:21:47 PM]

7.4.1 The Chemical Composition of Proteins

Unlike DNA, protein polymers consist of a common set of building blocks called amino acids. There are 20 amino acids that make up the standard chemical alphabet used to build proteins. Amino acids are small molecules that share a common motif, of three substituent chemical groups arranged around a central carbon atom. One of the substituent groups is always an amino group; another is always carboxylic acid group. To form the protein polymer, the amino and carboxyl groups react with each other and form a bond called the peptide bond. The third substituent on the central carbon of an amino acid is variable, and it's this property that makes the amino acids into a code for storing information. The sequence of amino acids in a protein is referred to as the protein's primary structure. Protein sequence can be subjected to the same analyses (described later) for DNA sequence. As we describe

sequence analysis methods, we will point out ways in which these methods differ for proteins and DNA.

7.4.2 Mechanisms of Molecular Evolution

The discovery of DNA as the molecular basis of heredity and evolution made it possible to understand the process of evolution in a whole new way. Darwin's theory of evolution by natural selection describes the observable process of evolution and speciation. However, it doesn't explain how information is passed from generation to generation, nor does it explain the mechanisms that give rise to, or that limit, variation within each generation.

The two halves of the double-helical DNA molecule serve as a template for replication of the DNA molecule. Even though the molecular rules governing replication of DNA are specific, replication doesn't always occur with perfect fidelity. When a piece of DNA is replicated incorrectly and the error is not corrected by the cell's repair machinery, it's called a mutation.

Mutations can occur in any part of an organism's DNA: in the middle of genes that code for proteins or functional RNA molecules, in the middle of regulatory sequences that govern when a gene is turned on, or out in the "middle of nowhere", in the regions between gene sequences. Mutations can have dramatic effects on the organism's phenotype (its visible or measurable characteristics) or they can have no apparent effect. Over time—thousands or millions of years—mutations that are beneficial or at least not harmful to a species can become fixed in the population, meaning that the mutated form of the gene occurs with a certain frequency among all individuals of a particular species. Over longer time scales, enough mutations may accumulate that new species develop.

There are two classes of mutations: point mutations, in which a change affects a single nucleotide in the DNA sequence; and segmental mutations, which can affect anywhere from a few to many hundreds of adjacent nucleotides.

Point mutations usually result from a single mismatch, in which one nucleotide is mispaired with the template DNA as a new complementary DNA strand is being built. Point mutations become significant only if they occur in the middle of a coding region or signal sequence, and then only if they cause a change in functionality. In coding regions, point mutations can either be synonymous, meaning that the mutated strand codes for the same amino acid as it did before the mutation occurred, or nonsynonymous. The genetic code (which was shown back in Figure 2-3) is degenerate; that is, several different three-letter combinations code for each amino acid. The groups of codons which code for each amino acid are by no means random; instead, nature has arranged a fail-safe mechanism in which several codons that differ by only one nucleotide represent a single amino acid, thereby allowing a little room for synonymous replication errors in DNA.

Segmental mutations, which can result in insertion or deletion of long stretches of DNA, can

Safari | Developing Bioinformatics Computer Skills -> 7.4 Development of DNA Sequencing Methods

http://safari.oreilly.com/main.asp?bookname=bioskills&snode=75 (3 of 5) [6/2/2002 1:21:47 PM]

occur by many different mechanisms, all of which involve mismatching of a strand of DNA either with the wrong partner or with a part of itself. Segmental mutations can result in duplications of whole genes or even large regions of chromosomes; some genetic events can even result in the duplication of entire genomes. Generated by gene and chromosome

duplication, redundant copies of genes can be repurposed (through a slow process of

mutational trial and error) to perform new functions in the cell. A detailed discussion of these mechanisms is given in the excellent book Fundamentals of Molecular Evolution; see the Bibliography.

Both types of mutation leave traces in the evolutionary record, that is, in the DNA sequences of living things. Since mutations tend to be preserved only if they are functionally useful (or at least, not harmful), there is a tendency for functionally important parts of sequences to be conserved (to remain constant throughout the evolutionary process) while noncoding or nonfunctional sequences diverge wildly. This tendency to conserve functionally important sequences is the basis for the whole field of sequence analysis; it lets us draw evolutionary connections between genes that are related in sequence.

By comparative study of DNA sequences, and on a larger scale, of whole genomes, it's possible to develop quantitative methods for understanding when and how mutational events occurred, as well as how and why they were preserved to survive in existing species and populations. Genomics and bioinformatics—the production of genome data and the development of tools for analyzing it—have made it possible to examine the evolutionary record and make increasingly quantitative statements about the evolutionary relationship of one species to another. Taxonomies can begin to be based not merely on anatomy but on quantitative measurements of differences in the genetic code. Both point mutations and segmental mutations are explicitly modeled in the scoring schemes for comparison of protein and DNA sequences discussed later in this chapter. Changes in the identity of the residue (nucleotide or amino acid) at a given position in the sequence are scored using standard substitution scores (for example, a positive score for a match and a negative score for a mismatch) or substitution matrices. Insertions and deletions are scored with penalties for gap opening and gap extension.

Delivered for Maurice ling Swap Option Available: 7/15/2002

Last updated on 10/30/2001 Developing Bioinformatics Computer Skills, © 2002 O'Reilly

< BACK Make Note | Bookmark CONTINUE >

Index terms contained in this section

amino acids

Safari | Developing Bioinformatics Computer Skills -> 7.4 Development of DNA Sequencing Methods

http://safari.oreilly.com/main.asp?bookname=bioskills&snode=75 (4 of 5) [6/2/2002 1:21:47 PM]

SangerÕs chain-terminator procedur segmental mutations

© 2002, O'Reilly & Associates, Inc.

Safari | Developing Bioinformatics Computer Skills -> 7.4 Development of DNA Sequencing Methods

http://safari.oreilly.com/main.asp?bookname=bioskills&snode=75 (5 of 5) [6/2/2002 1:21:47 PM]

Show TOC | Frames My Desktop | Account | Log Out | Subscription | Help

Programming > Developing Bioinformatics Computer Skills > 7. Sequence Analysis, Pairwise Alignment, and Database Searching > 7.5 Genefinders and Feature Detection in DNA

See All Titles

< BACK Make Note | Bookmark CONTINUE >

158127045003020048038218232180015152049140171155089105071170183109173227139090012123211

Dans le document URLs Referenced in This Book (Page 189-194)