Chapter 1
INTRODUCTION
1.1.1 Retroviruses
Viral replication requires the transcription and translation machinery, which they themselves lack, from the organisms that they infect.
The common feature among all retroviruses, and also what makes them unique among other viruses, is that they reverse transcribe their RNA genome into DNA that can be inserted into the host genome 1 (figure 1). For this process, retroviruses encode an RNA-‐dependent DNA polymerase, an RNAseH, and a host-‐encoded transfer RNA (tRNA) that serves as the primer for reverse transcription.
Discovered and first studied as disease-‐causing agents, many decades of research has uncovered the molecular mechanisms governing spread, replication and disease progression caused by these viruses.
Retroviruses can produce fast and slow-‐progression diseases including various types of tumors and immunodeficiency 2,3.
A principal characteristic of these viruses is that they integrate into the host genome 4. This is the reason why, they later became to be used as vectors for gene delivery into cells 5-‐7.
If the integration event happens in the germline, the retroviral sequences can be spread from one generation to the next, a phenomenon called vertical transmission8. This feature of retroviruses is exploited to trace the evolution of genes of the host species, which inherits retroviral sequences in a Mendelian fashion, and used in the study of speciation 9,10.
A subset of retroviruses lead to cell transformation and cancer, therefore they are designated as oncoviruses 11. The investigation of the corresponding insertion sites within the genome thus lead to the identification of genes involved in cell growth and tumor promotion 12. Certain oncoviruses have inherent transforming potential owing to the prior acquisition of host sequences
13This characteristic is useful for the study of genetic regulation of the cell growth 14.
Figure 1: The retroviral life cycle. The main steps of the retroviral replication cycle are depicted. Blue:
capsid; yellow: nucleocapsid; black bars within the nucleocapsid: RNA genome; orange bars: DNA genome.
Courtesy of Prof. Jeremy Luban (adapted).
1.1.2 The discovery of retroviruses and the reverse-‐transcriptase enzyme
The first retroviruses were discovered at the beginning of the twentieth century as oncogenic agents affecting birds. Ellermann and Bang found that leukosis in poultry was caused by a factor present in ultra-‐filtered cell extracts 15, that was later called Avian Leukemia virus. A few years later Rous showed that an agent
Expression Assembly
Nuclear transport Budding Maturation Binding
Membrane Fusion
Reverse Transcription
Integration
present in cell-‐free extracts was the intrinsic cause of the sarcoma formation in the fowl 5,16. This retrovirus was named after the man who discovered it, the Rous Sarcoma Virus (RSV).
Almost thirty years later, a murine virus was found to be the agent provoking leukemia in mice and thus was called Murine Leukemia Virus (MLV) 17.
MLV belongs to the gammaretrovirus genus of the retroviridae family 17.
The concept of an RNA virus converting its genetic material into a DNA form and integrating into the host genome was not yet formulated. The only information scientists had at that moment was that the viral agent did not have a DNA genome but instead was constituted of RNA 5.
In the beginning of the 1960s, the molecular biologist Howard Temin worked with the RSV and found that inhibiting the DNA synthesis blocked the viral replication 18. This led him to propose the provirus hypothesis. That is, the retroviruses have a DNA intermediate in the cells that they infect.
Later, in the year 1970 his team and another virologist involved in the MLV research published separately data showing the presence in RNA tumour virus particles -‐ called virions -‐ of a RNA-‐dependent DNA polymerase activity, by correlating the induced RNA degradation with the decrease of the DNA synthesis.
This enzyme was later called the reverse-‐transcriptase 13,19,20.
In humans, certain types of acute leukemias were studied and the viral cause of this disease was soon investigated. A type C morphology (that will be defined in the following section) retrovirus, which close relative had been discovered to induce leukemia in the Gibbon ape was pointed out by Robert Gallo to be the cause of the human disease and it was called Human T cell Leukemia Virus (HTLV) 21.
The previous findings provided the biochemical and molecular tools that ultimately allowed the subsequent identification of the Human
Immunodeficiency Virus type 1 (HIV) as the agent causing the Acquired Immunodeficiency Syndrome (AIDS) 22.
In the next section, I will introduce the general retroviral structure.
1.1.3 The general structure of retroviruses
It all starts with the viral RNA. The positive single-‐stranded RNA genome is composed of different regulatory sequences and open reading frames (ORFs) 12 (figure 2), and has a 5’ cap and a poly A tail.
The regulatory elements are located at the extremities of the viral RNA and consist of repeated (R) sequences, a unique 5’ sequence (U5) containing a cis-‐
acting attachement (att) site, a unique 3’ sequence (U3), the primer binding site (PBS), the psi (packaging signal) element (ψ) and a polypurine tract (PPT) 12 (figure 2).
The R regions are redundant in sequence and are found after the m7G5’ppp5’Gmp cap, which mimics the eukaryotic mRNA 5’cap. The U5 sequence is immediately downstream of the 5’ R sequence and contains the att sequence that is involved in proviral integration. These regions are followed by the PBS where the specific tRNA primer hybridizes and starts the transcription of the minus-‐strand DNA (-‐sDNA). The next sequence in the RNA genome is the ψ region recapitulating most of the sequences required for viral genome packaging into the viral particles. A major splice donor site, that gives rise to different subgenomic mRNAs, often closely follows this element. Subgenomic RNAs are different mRNA species created when reverse transcription jumps on the template in the 3’ to 5’ orientation. The resulting mRNAs have variable 5’ regions overlapping with the template strands at different levels but the same 3’
sequence. The generation of various mRNAs allows condensing a high amount of information 12.
The PPT, positioned at the 3’ end of the viral genome, consists of a row of purines Adenine and Guanosine, required for the initiation of the +sDNA transcription.
Finally, the U3 region preceding the polyA tail contains another att site and in addition a set of cis-‐regulatory sequences essential for viral gene expression.
Given that the synthesis of the viral DNA involves a duplication of the extremities of the RNA templates with a subsequent transfer of the U5 and U3 regions, the two ends in the resulting dsDNA are identical and these are called Long Terminal Repeats (LTRs) 12.
The provirus is integrated and found in the host genome with the flanking LTRs
5. When the provirus is transcribed, the 5’ U3 region is not taken into account and the synthesis proceeds until the R to U5 boundary. In this way, the resulting viral RNA has the same genomic organization as the template from viral particles.
The viral proteins are encoded by three ORFs, namely the group antigen (gag), the polymerase (pol) and the envelope (env). These genes code for precursos that once cleaved will give rise to more than one protein.
The gag ORF codes for the matrix (MA), the capsid (CA) and the nucleocapsid (NC) 12.
The pol gene products are the protease (PR), the reverse-‐transcriptase (RT), the integrase (IN) and, in some cases, a dUTPase.
Finally, the precursor synthesized from the env gene is cleaved into the surface envelope protein (SU) and the transmembrane envelope protein (TM) 12.
Once processed from their precursors the viral proteins form the mature virion, which is able to infect susceptible cells that express the appropriate receptors.
The viral core of a mature viral particle consists in the diploid RNA genome that interacts with the NC, creating a condensation, surrounded by the CA protein complex. The matrix protein that covers this core is surrounded on top by a host-‐
derived lipid bilayer and the included SU and TM proteins 12
The viral core contains as well the pol-‐derived proteins that will be used for a novel round of replication, namely the PR, the RT and the IN 12.
Figure 2: Schematic view of the proviral genome structure of retroviruses. The retrovirus proviral DNA is composed of untranslated regions that flank the ORFs for gag, pro, pol, env and in some cases that of accessory genes. The flanking LTRs contain U3 and U5 regions, as well as a repeat sequence (R). The 5’
region of the retroviral genome is followed by a PBS and a psi encapsidation signal. Adjacent to the last ORF, the viral RNA contain a PPT. ORF: open-‐reading frame; LTR: log terminal repeats; U3 and U5: unique regions 3 and 5, respectively; att: attachemetn site; PBS: primer binding site; PPT: poly-‐purine tract. Adapted from Fouty and Solodushko, 2011 23 .
1.1.4 The reverse-‐transcription process.
Once the retroviral genome enters the cell, the diploid single-‐stranded genome that is still bound to the nucleocapsid (NC) protein, constituting the viral core, starts the process of reverse transcription 24,25.
For reverse transcription to take place, important elements contained in the viral particles are required. The central component is the reverse transcriptase enzyme, which catalyzes four different reactions: RNA-‐dependent and DNA-‐
dependent DNA polymerization, DNA strand separation via its helicase function and the hydrolysis of the RNA fragments on RNA-‐DNA heteroduplexes 26. The viral core carries additionally a specific collection of transfer RNA (tRNA)
!"# !"#
molecules, different cellular messenger RNAs (mRNAs) from previously infected cells and some ribosomal RNA (5S and 7S) 26.
Reverse transcription starts when the 3’ region of a specific tRNA is used as a primer that anneals with the PBS within the 5’ region of the viral RNA genome (figure 3). DNA synthesis continues until the 5’ extremity of the RNA strain is reached, resulting in a short DNA strand called the minus strand strong stop DNA (–ssDNA) 27.
The next step takes the advantage that the minus-‐strand DNA contains a repeat (R) sequence that is present at both viral genome termini and that was introduced in the newly synthesized DNA molecule by the reverse transcription of the 5’ region of the viral RNA. This confers a complementarity of the –sDNA and the 3’ end of the RNA genome that allows the transfer of the small oligonucleotide to that region, after that the RNAse H function of the RT has degraded the RNA to which the newly synthesized DNA is annealed. This marks the beginning of the elongation of the –sDNA chain, with an accompanying RNA degradation accomplished by RNAse H 27.
During the RNA dependent-‐DNA synthesis, the ppt permit the RNA to escape degradation and this RNA fragment is then used as a primer for the plus-‐strand DNA (+sDNA) polymerization that finally reaches the U5 region of the –sDNA. In the mean time, the –sDNA continues to be polymerized, with a subsequent gradual RNA degradation.
In the following step, the +sDNA synthesis proceeds until the level of the PBS complementarity is formed and the RNA and tRNA primers are degraded. When the tRNA is removed from the +sDNA a complementarity region is exposed and the second strand transfer happens where the plus and minus strands anneal.
The resulting molecule is a circular DNA intermediate 27.
This point of the viral replication cycle can lead to a non-‐productive dead-‐end DNA molecule which contains a single LTR or to a productive DNA form flanked
by two LTRs, resulting from the strand displacement of the plus and minus
1.1.5 The classification of retroviruses 50end of the viral RNA, exposing the newly syn-thesized minus-strand DNA (see Fig. 1).
The ends of the viral RNA are direct repeats, called R. These repeats act as a bridge that allows the newly synthesized minus-strand DNA to be transferred to the 30end of the viral RNA. Retro-viruses package two copies of the viral RNA
genome; the first (or minus-strand) transfer can involve the R sequence at the 30ends of either of the two RNAs (Panganiban and Fiore 1988; Hu and Temin 1990b; van Wamel and Berkhout 1998; Yu et al. 1998). After this trans-fer, minus-strand synthesis can continue along the length of the genome. As DNA synthesis proceeds, so does RNase H degradation. How-ever, there is a purine-rich sequence in the RNA genome, called the polypurine tract, or ppt, that is resistant to RNase H cleavage and serves as the primer for the initiation of the
R U5 pbs gag pol env ppt U3 R
Figure 1.Conversion of the single-stranded RNA genome of a retrovirus into double-stranded DNA. (A) The RNA genome of a retrovirus (light blue) with a tRNA primer base paired near the 50end. (B) RT has initiated reverse transcription, generating minus-strand DNA (dark blue), and the RNase H activity of RT has degraded the RNA template (dashed line). (C) Minus-strand transfer has occurred between the R sequences at both ends of the genome (see text), allowing minus-strand DNA synthesis to continue (D), accompanied by RNA degradation. A purine-rich sequence (ppt), adjacent to U3, is resistant to RNase H cleavage and serves as the primer for the synthesis of plus-strand DNA (E). Plus-strand synthesis continues until the first 18 nucleotides of the tRNA are copied, allowing RNase H cleavage to remove the tRNA primer. Most retroviruses remove the entire tRNA; the RNase H of HIV-1 RT leaves the rA from the 30end of the tRNA attached to minus-strand DNA. Removal of the tRNA primer sets the stage for the second ( plus-strand) transfer (F); extension of the plus and minus strands leads to the synthesis of the complete double-stranded linear viral DNA (G).
HIV-1 Reverse Transcription
Cite this article asCold Spring Harb Perspect Med2012;2:a006882 3
www.perspectivesinmedicine.org
retroviruses consist of the Alpharetroviruses, Betaretroviruses and Gammaretroviruses 12.
Alpharetroviruses infects a large range of birds. They assemble at the cell membrane and possess a central spherical core (C-‐type morphology). The tRNA they use for the priming of reverse transcription is the one for tryptophan (tRNATrp). A typically well-‐studied member of this genius is the Avian Leukosis Virus (ALV) and the previously mentioned RSV 12.
Members of the Betaretroviruses infect different mammalian species including mice and primates. Morphologically, they can have either an asymmetric round core, either a cylindrical one. They contain a dUTPase gene in frame with the pro gene and they use the tRNALys. The oncovirus Mouse Mammary Tumor Virus (MMTV) is a member of this family 12.
Gammaretroviruses possess C-‐type virion morphology. They have two ORFs.
The first one encodes the gag, pro and pol gene products; the second one encodes the envelope proteins. The tRNAs used by these retroviruses are mainly the ones for proline or glutamine. Highly documented oncogenic members of this genius include the Murine Leukemia Virus (MLV), Feline Leukemia Virus (FLV) and Gibbon Ape Leukemia Virus (GALV) 12.
The group of complex retroviruses is composed of Deltaretroviruses, Epsilonretroviruses, Lentiviruses and Spumaviruses. Deltaretroviruses and Epsilonretroviruses have a similar C-‐type virion morphology. The first genius is composed of members encoding two accessory proteins named rex and tax, which are involved in the synthesis and processing of viral RNA. It uses the tRNAPro. An example of this group is the oncovirus Human T-‐Lymphotropic Virus 1 (HTLV-‐1) and the closely related HTLV-‐2. The second genius is uses the tRNA for histidine or arginine and codes additionally for three proteins called ORFA, B and C respectively. The function of these accessory proteins are not well understood but in the case of the better-‐studied member Walley Dermal Sarcoma Virus (WDSV), ORFA has been shown to be an orthologue of
mammalian cyclin c, ORFB activates the PKC and AKT signaling and ORFC has oncolytic properties 29.
The AIDS-‐causing HIV-‐1 belongs to the genius of lentiviruses and is characterized by a conical shape of the core of the mature virion. Members of this group carry this name because of the long asymptomatic phase preceeding the first symptoms 12. HIV-‐1 expresses six accessory proteins that will be discussed below. These gene products control transcription, gene expression and assembly and counteract restriction factors encoded by the host 12. The primer used by lentiviruses is the tRNALys3.
In latin Spuma means foam. The members of the Spumaviruses produce vacuolization of cells, hence resulting in a foamy-‐like histological aspect. The human foamy virus is a well-‐studied member of this group. The pol gene products arise from a splice transcript. Unlike other retroviruses, this genius of viruses is characterized by virions that carry high amounts of reverse-‐
transcribed DNA. Accessory proteins shared by the members of this group include a transcriptional transactivator. The primer used is generally tRNALys 12.
1.1.6 The Acquired Immunodeficiency Syndrome (AIDS)
The AIDS is a severe disease affecting more than 35 millions of people around the world, as published by the UNAIDS report on the global AIDS epidemics 2013
30.
In the early 1980s, young men with typical immunodeficiency symptoms were hospitalized in Los Angeles, New York and California 31,32.
As mentioned previously, biochemical and genetic tools for studying retroviruses existed in that decade and they were used by Researchers at the Institut Pasteur and in the United States to characterize the virus extracted from CD4+ T cells coming from AIDS patients. Barré-‐Sinoussi and colleagues isolated and described a virus that was able experimentally to infect T lymphocytes
extracted from cord blood 22 and called it Lymphoadenopathy Associated Virus (LAV).
The team of Robert Gallo, had suspected that the causing agent of AIDS was of retroviral origin and possessed T-‐cell tropism but at that time attributed it to the human tumor retrovirus HTLV-‐I 33. The virus was later called HTLV-‐III by the same team.
In 1986, the virus was finally named the Human Immunodeficiency Virus (HIV), in reference to the disease it produced 34.
Transmission of HIV-‐1 from one person to another happens during sexual intercourse, injecting with contaminated needles, or by blood transfusion 35. Mother to child transmission during delivery or after breast-‐feeding is another important route of spreading 35.
The first events of HIV-‐1 infection seem to implicate a local spreading within cells residing in the mucosa and in the epithelium, such as dendritic cells (DCs), CD4+ T cells and macrophages 36-‐38. Primary infected cells subsequently migrate to the lymphoid organs and seed the virus by direct cell-‐to-‐cell contact or by the release of newly produced cell-‐free viruses, which enter new cells 39.
When HIV-‐1 gp120/gp41 glycoproteins interact with the lectin receptor DC-‐
SIGN at the surface of DCs, the virus can be either endocytosed and degraded within lysosomes or by targeting to the proteasome 40,41. Another route for entry into DCs is mediated by a host-‐derived glycosphingolipid present in the virion envelope that binds to an unknown receptor, with SIGLEC-‐1 being a potential candidate 42. This interaction allows the virus to escape degradation and join immunological synapses, from where new target CD4+ T cells can be reached
12,43,44.
During the acute phase of infection, a large fraction of CD4+ T cells are infected and high amounts of virions are synthesized and released from cells 39. As CD8+
T cells fight against the pathogen and high doses of type I interferon (IFN) and cytokines are released, infected individuals commonly experience flu-‐like
symptoms 45-‐47. The immune response mediated by cytotoxic T cells and B cells producing antibodies permits to moderately recover the level CD4+ T cells for a few weeks 47. At that point, HIV-‐1 already integrated into the host chromosomes and latent reservoirs starts to be established, and infected individuals can have a total absence of HIV-‐1-‐related symptoms for nearly ten years 47. Unfortunately, in the meantime, the virus continues to replicate and spread via the various lymphoid organs.
At the terminal stage, the disease causes a high destruction of the CD4+ T cells, which decrease below 200 cells per mm3 of blood, leading to immune suppression and the subsequent unavoidable infection by opportunistic
At the terminal stage, the disease causes a high destruction of the CD4+ T cells, which decrease below 200 cells per mm3 of blood, leading to immune suppression and the subsequent unavoidable infection by opportunistic