Immunodeficiency Syndrome (AIDS) 22.
In the next section, I will introduce the general retroviral structure.
1.1.3 The general structure of retroviruses
It all starts with the viral RNA. The positive single-‐stranded RNA genome is composed of different regulatory sequences and open reading frames (ORFs) 12 (figure 2), and has a 5’ cap and a poly A tail.
The regulatory elements are located at the extremities of the viral RNA and consist of repeated (R) sequences, a unique 5’ sequence (U5) containing a cis-‐
acting attachement (att) site, a unique 3’ sequence (U3), the primer binding site (PBS), the psi (packaging signal) element (ψ) and a polypurine tract (PPT) 12 (figure 2).
The R regions are redundant in sequence and are found after the m7G5’ppp5’Gmp cap, which mimics the eukaryotic mRNA 5’cap. The U5 sequence is immediately downstream of the 5’ R sequence and contains the att sequence that is involved in proviral integration. These regions are followed by the PBS where the specific tRNA primer hybridizes and starts the transcription of the minus-‐strand DNA (-‐sDNA). The next sequence in the RNA genome is the ψ region recapitulating most of the sequences required for viral genome packaging into the viral particles. A major splice donor site, that gives rise to different subgenomic mRNAs, often closely follows this element. Subgenomic RNAs are different mRNA species created when reverse transcription jumps on the template in the 3’ to 5’ orientation. The resulting mRNAs have variable 5’ regions overlapping with the template strands at different levels but the same 3’
sequence. The generation of various mRNAs allows condensing a high amount of information 12.
The PPT, positioned at the 3’ end of the viral genome, consists of a row of purines Adenine and Guanosine, required for the initiation of the +sDNA transcription.
Finally, the U3 region preceding the polyA tail contains another att site and in addition a set of cis-‐regulatory sequences essential for viral gene expression.
Given that the synthesis of the viral DNA involves a duplication of the extremities of the RNA templates with a subsequent transfer of the U5 and U3 regions, the two ends in the resulting dsDNA are identical and these are called Long Terminal Repeats (LTRs) 12.
The provirus is integrated and found in the host genome with the flanking LTRs
5. When the provirus is transcribed, the 5’ U3 region is not taken into account and the synthesis proceeds until the R to U5 boundary. In this way, the resulting viral RNA has the same genomic organization as the template from viral particles.
The viral proteins are encoded by three ORFs, namely the group antigen (gag), the polymerase (pol) and the envelope (env). These genes code for precursos that once cleaved will give rise to more than one protein.
The gag ORF codes for the matrix (MA), the capsid (CA) and the nucleocapsid (NC) 12.
The pol gene products are the protease (PR), the reverse-‐transcriptase (RT), the integrase (IN) and, in some cases, a dUTPase.
Finally, the precursor synthesized from the env gene is cleaved into the surface envelope protein (SU) and the transmembrane envelope protein (TM) 12.
Once processed from their precursors the viral proteins form the mature virion, which is able to infect susceptible cells that express the appropriate receptors.
The viral core of a mature viral particle consists in the diploid RNA genome that interacts with the NC, creating a condensation, surrounded by the CA protein complex. The matrix protein that covers this core is surrounded on top by a host-‐
derived lipid bilayer and the included SU and TM proteins 12
The viral core contains as well the pol-‐derived proteins that will be used for a novel round of replication, namely the PR, the RT and the IN 12.
Figure 2: Schematic view of the proviral genome structure of retroviruses. The retrovirus proviral DNA is composed of untranslated regions that flank the ORFs for gag, pro, pol, env and in some cases that of accessory genes. The flanking LTRs contain U3 and U5 regions, as well as a repeat sequence (R). The 5’
region of the retroviral genome is followed by a PBS and a psi encapsidation signal. Adjacent to the last ORF, the viral RNA contain a PPT. ORF: open-‐reading frame; LTR: log terminal repeats; U3 and U5: unique regions 3 and 5, respectively; att: attachemetn site; PBS: primer binding site; PPT: poly-‐purine tract. Adapted from Fouty and Solodushko, 2011 23 .
1.1.4 The reverse-‐transcription process.
Once the retroviral genome enters the cell, the diploid single-‐stranded genome that is still bound to the nucleocapsid (NC) protein, constituting the viral core, starts the process of reverse transcription 24,25.
For reverse transcription to take place, important elements contained in the viral particles are required. The central component is the reverse transcriptase enzyme, which catalyzes four different reactions: RNA-‐dependent and DNA-‐
dependent DNA polymerization, DNA strand separation via its helicase function and the hydrolysis of the RNA fragments on RNA-‐DNA heteroduplexes 26. The viral core carries additionally a specific collection of transfer RNA (tRNA)
!"# !"#
molecules, different cellular messenger RNAs (mRNAs) from previously infected cells and some ribosomal RNA (5S and 7S) 26.
Reverse transcription starts when the 3’ region of a specific tRNA is used as a primer that anneals with the PBS within the 5’ region of the viral RNA genome (figure 3). DNA synthesis continues until the 5’ extremity of the RNA strain is reached, resulting in a short DNA strand called the minus strand strong stop DNA (–ssDNA) 27.
The next step takes the advantage that the minus-‐strand DNA contains a repeat (R) sequence that is present at both viral genome termini and that was introduced in the newly synthesized DNA molecule by the reverse transcription of the 5’ region of the viral RNA. This confers a complementarity of the –sDNA and the 3’ end of the RNA genome that allows the transfer of the small oligonucleotide to that region, after that the RNAse H function of the RT has degraded the RNA to which the newly synthesized DNA is annealed. This marks the beginning of the elongation of the –sDNA chain, with an accompanying RNA degradation accomplished by RNAse H 27.
During the RNA dependent-‐DNA synthesis, the ppt permit the RNA to escape degradation and this RNA fragment is then used as a primer for the plus-‐strand DNA (+sDNA) polymerization that finally reaches the U5 region of the –sDNA. In the mean time, the –sDNA continues to be polymerized, with a subsequent gradual RNA degradation.
In the following step, the +sDNA synthesis proceeds until the level of the PBS complementarity is formed and the RNA and tRNA primers are degraded. When the tRNA is removed from the +sDNA a complementarity region is exposed and the second strand transfer happens where the plus and minus strands anneal.
The resulting molecule is a circular DNA intermediate 27.
This point of the viral replication cycle can lead to a non-‐productive dead-‐end DNA molecule which contains a single LTR or to a productive DNA form flanked
by two LTRs, resulting from the strand displacement of the plus and minus
1.1.5 The classification of retroviruses 50end of the viral RNA, exposing the newly syn-thesized minus-strand DNA (see Fig. 1).
The ends of the viral RNA are direct repeats, called R. These repeats act as a bridge that allows the newly synthesized minus-strand DNA to be transferred to the 30end of the viral RNA. Retro-viruses package two copies of the viral RNA
genome; the first (or minus-strand) transfer can involve the R sequence at the 30ends of either of the two RNAs (Panganiban and Fiore 1988; Hu and Temin 1990b; van Wamel and Berkhout 1998; Yu et al. 1998). After this trans-fer, minus-strand synthesis can continue along the length of the genome. As DNA synthesis proceeds, so does RNase H degradation. How-ever, there is a purine-rich sequence in the RNA genome, called the polypurine tract, or ppt, that is resistant to RNase H cleavage and serves as the primer for the initiation of the
R U5 pbs gag pol env ppt U3 R
Figure 1.Conversion of the single-stranded RNA genome of a retrovirus into double-stranded DNA. (A) The RNA genome of a retrovirus (light blue) with a tRNA primer base paired near the 50end. (B) RT has initiated reverse transcription, generating minus-strand DNA (dark blue), and the RNase H activity of RT has degraded the RNA template (dashed line). (C) Minus-strand transfer has occurred between the R sequences at both ends of the genome (see text), allowing minus-strand DNA synthesis to continue (D), accompanied by RNA degradation. A purine-rich sequence (ppt), adjacent to U3, is resistant to RNase H cleavage and serves as the primer for the synthesis of plus-strand DNA (E). Plus-strand synthesis continues until the first 18 nucleotides of the tRNA are copied, allowing RNase H cleavage to remove the tRNA primer. Most retroviruses remove the entire tRNA; the RNase H of HIV-1 RT leaves the rA from the 30end of the tRNA attached to minus-strand DNA. Removal of the tRNA primer sets the stage for the second ( plus-strand) transfer (F); extension of the plus and minus strands leads to the synthesis of the complete double-stranded linear viral DNA (G).
HIV-1 Reverse Transcription
Cite this article asCold Spring Harb Perspect Med2012;2:a006882 3
www.perspectivesinmedicine.org