Patrocles: a database of
polymorphic miR-mediated gene
regulation in vertebrates
Denis Baurain
Samuel Hiard
Wouter Coppieters
Carole Charlier
Michel Georges
Polymorphic miR-mediated
gene regulation
AAAAAAAAA…. 3’-UTR mature miR miRNP Pri -miR nucleus Pre -miR Host gene ? Exportin5 cytoplasm Drosha complex Dicer Helicase mRNA miR/miR* miRNP
Targets (1)
miRs (100s)
Silencing machinery
(overall effect)
DNA
Sequence
Polymorphisms: DSPs
Considerable sequence space is devoted to miR-mediated gene regulation
(targets, miRs, silencing machinery)
DSPs in silencing components are likely to contribute to (complex)
phenotypic variation including disease
Proof in animals: Texel sheep, Clop et al. (2006) Nat. Genet. 38:813-818
Suggestions in humans: Sethupathy & Collins (2008) TIG 24:489-497
Polymorphic miR-mediated
gene regulation
AAAAAAAAA…. 3’-UTR mature miR miRNP Pri -miR nucleus Pre -miR Host gene ? Exportin5 cytoplasm Drosha complex Dicer Helicase mRNA miR/miR* miRNP
Targets (1)
miRs (100s)
Silencing machinery
(overall effect)
DNA
Sequence
Polymorphisms: DSPs
Considerable sequence space is devoted to miR-mediated gene regulation
(targets, miRs, silencing machinery)
DSPs in silencing components are likely to contribute to (complex)
phenotypic variation including disease
Proof in animals: Texel sheep, Clop et al. (2006) Nat. Genet. 38:813-818
Suggestions in humans: Sethupathy & Collins (2008) TIG 24:489-497
http://www.patrocles.org/
Mining public databases for
SNPs and other DSPs in the
3 sequence compartments
Patrocles - Overview
Patrocles
miRBase
miRs 8nt motifsUCSC
alignmentsEnsembl
3’-UTRs SNPsSymAtlas
gene expr.GEO
DGV
HapMap
1000 genomes
CNVs gene expr. genotypes allele freqsLiterature
8nt motifs miR expr. CNVs eQTL machineryPatrocles - Overview
Patrocles
miRBase
miRs 8nt motifsUCSC
alignmentsEnsembl
3’-UTRs SNPsSymAtlas
gene expr.GEO
DGV
HapMap
1000 genomes
CNVs gene expr. genotypes allele freqsLiterature
8nt motifs miR expr. CNVs eQTL machineryUpdates and Synchronization...
Currently, 7 species
human
chimp
mouse
rat
dog
cow
chicken
target sites in 3’-UTRs target site motifs SNPs in 3’-UTRs 3’-UTRs 2,674,395 (12.4%) 4,072,176 (15.5%) sequence space 19,595 30,290 (9.0%)
conserved L NOT X-targets
57,154 64,010 (22.4%)
conserved X NOT L-targets
9,436 10,425 (27.7%)
conserved X AND L-targets
31,416 37,700 X AND L-targets 455,620 661,187 X OR L-targets 219,392 375,054 L-targets 267,644 323,833 X-targets 58 59 X AND L-octamers 948 1,164 X OR L-octamers 466 683 L-octamers 117 170 miR* 484 676 miR 540 540 X-octamers 111,178 (87.8%) 114,305 (83.9%)
known ancestral allele
126,589 136,159 total 21,634,548 26,261,732 sequence space 21,911 24,319 genes mouse human
target sites in 3’-UTRs target site motifs SNPs in 3’-UTRs 3’-UTRs 2,674,395 (12.4%) 4,072,176 (15.5%) sequence space 19,595 30,290 (9.0%)
conserved L NOT X-targets
57,154 64,010 (22.4%)
conserved X NOT L-targets
9,436 10,425 (27.7%)
conserved X AND L-targets
31,416 37,700 X AND L-targets 455,620 661,187 X OR L-targets 219,392 375,054 L-targets 267,644 323,833 X-targets 58 59 X AND L-octamers 948 1,164 X OR L-octamers 466 683 L-octamers 117 170 miR* 484 676 miR 540 540 X-octamers 111,178 (87.8%) 114,305 (83.9%)
known ancestral allele
126,589 136,159 total 21,634,548 26,261,732 sequence space 21,911 24,319 genes mouse human
Friedman et al. (2009) Genome Res. 19:92-105
Targets – Methods
2 collections of 8nt motifs
X-targets: 540 8nt motifs (mammals)
conserved in 3’-UTRs, putative miR target sites
Xie et al. (2005) Nature 434:338-345
L-targets: 683 8nt motifs (human)
rc(2-8nt)+A from mature miRs in miRBase
Lewis et al. (2005) Cell 120:15-20
2 collections of 7nt motifs (from L-targets)
7mer-A1
7mer-m8
target sites in 3’-UTRs target site motifs SNPs in 3’-UTRs 3’-UTRs 2,674,395 (12.4%) 4,072,176 (15.5%) sequence space 19,595 30,290 (9.0%)
conserved L NOT X-targets
57,154 64,010 (22.4%)
conserved X NOT L-targets
9,436 10,425 (27.7%)
conserved X AND L-targets
31,416 37,700 X AND L-targets 455,620 661,187 X OR L-targets 219,392 375,054 L-targets 267,644 323,833 X-targets 58 59 X AND L-octamers 948 1,164 X OR L-octamers 466 683 L-octamers 117 170 miR* 484 676 miR 540 540 X-octamers 111,178 (87.8%) 114,305 (83.9%)
known ancestral allele
126,589 136,159 total 21,634,548 26,261,732 sequence space 21,911 24,319 genes mouse human
target sites in 3’-UTRs target site motifs SNPs in 3’-UTRs 3’-UTRs 2,674,395 (12.4%) 4,072,176 (15.5%) sequence space 19,595 30,290 (9.0%)
conserved L NOT X-targets
57,154 64,010 (22.4%)
conserved X NOT L-targets
9,436 10,425 (27.7%)
conserved X AND L-targets
31,416 37,700 X AND L-targets 455,620 661,187 X OR L-targets 219,392 375,054 L-targets 267,644 323,833 X-targets 58 59 X AND L-octamers 948 1,164 X OR L-octamers 466 683 L-octamers 117 170 miR* 484 676 miR 540 540 X-octamers 111,178 (87.8%) 114,305 (83.9%)
known ancestral allele
126,589 136,159 total 21,634,548 26,261,732 sequence space 21,911 24,319 genes mouse human
Targets - Concordance between
X and L target site motifs
540 8mers 577 7mers 554 6mers 683 8mers 1265 7mers 1448 6mers
91%
40%
Targets - Conserved vs.
1. human: A ...TTTGGTG
A
AACCAAC... => ancestral allele
human: G ...TTTGGTG
G
AACCAAC... => derived allele
chimp ...TTTGGTG
A
AACCAAC... => sibling species
2. rat ...TTTGGTG
A
AACAAAC...
mouse ...CTTGGTG
A
AACAAAC...
3. dog ...TTTGGTG
A
AACTAAC...
cow ...TTTGGTG
A
AACTAAC...
(3/3) TTTGGTG
A
(3/3) TTGGTG
A
A
(3/3) TGGTG
A
AA
(3/3) GGTG
A
AAC
(2/3) not in dog/cow gtg
a
aacc
(2/3) not in dog/cow tg
a
aacca
(2/3) not in dog/cow g
a
aaccaa => hsa-miR-29b-2*
(2/3) not in dog/cow
a
aaccaac
Targets
1. human: A ...TTTGGTG
A
AACCAAC... => ancestral allele
|||||||||
3'-GAUUCGGUGGUACACUUUGGUC-5' => hsa-miR-29b-2*
|||.|||||
human: G ...TTTGGTG
G
AACCAAC... => derived allele
chimp ...TTTGGTG
A
AACCAAC... => sibling species
2. rat ...TTTGGTG
A
AACAAAC...
mouse ...CTTGGTG
A
AACAAAC...
3. dog ...TTTGGTG
A
AACTAAC...
cow ...TTTGGTG
A
AACTAAC...
(3/3) TTTGGTG
A
...
...
(2/3) not in dog/cow g
a
aaccaa => hsa-miR-29b-2*
(2/3) not in dog/cow
a
aaccaac
Targets
Patrocles SNPs - Methods
CNC DNC not cons. P (CC) DC conserved ? der anc site \ allele +S +W7C / S7CTargets
Patrocles SNPs - Results
mouse human 56 37 837 741 S 2,065 2,290 3,295 1,944 P 7,573 8,545 11,244 9,006 CNC 7,250 7,732 10,328 7,392 DNC 496+65 951+102 959+58 1,546+50 DC+CC 17,505 19,657 26,719 20,679 total Lewis Xie Lewis Xie pSNP class# destructions
=
# creations
Targets - Patrocles SNPs
Evidence for purifying selection
SNP shuffling in 3’-UTR sequence space with preservation of trinucleotide contextTargets - Patrocles SNPs
Evidence for purifying selection
SNP shuffling in 3’-UTR sequence space with preservation of trinucleotide contexthuman - DC
possible elimination of SNPs affecting conserved targets
22 to 35% in human
53 to 67% in mouse
Chen & Rajewsky (2006) Nat. Genet. 38:1452-1456
depletion of SNPs in conserved miR target sites when compared to
Targets
Prioritization for lab validation
most interesting pSNPs are
pSNPs destroying conserved target sites
pSNPs creating target sites in anti-targets
to yield a phenotype, target and miR have to be
expressed in the same tissue (at the same time)
co-expression plots for human and mouse
target genes: SymAtlas
miRs: Landgraf et al. (2007) Cell 129:1401-1414
two different kinds of plots
comparing miR and target
comparing miR host gene (if any) and target
miR target
pSNPs - Co-expression plots
rs34542287 A/G [0.985/0.015] Destroyed Conserved target site miR-9 vs. actin-binding LIM protein 1
ACCA[A]AGA
rs28399411 G/A [0.994/0.006] Destroyed Conserved target site
miR-32 vs. Axonal membrane protein GAP-43 TGTGC[A]AT
mature miR counts
[+] direct evidence
[–] gross tissue mapping
host gene expression
[+] perfect matching of tissues
[–] indirect evidence
pri-miRs (stem-loops) from miRBase
pDSPs altering miR
sequence
SNPs (de-)stabilizing interaction (seed, mature non-seed)
pDSPs altering miR
concentration
SNPs altering processing efficiency (anywhere in stem-loop)
CNVs encompassing miR genes
human: http://projects.tcag.ca/variation/
mouse: She et al. (2008) Nat. Genet. 40:909-914
rat: Guryev et al. (2008) Nat. Genet. 40:538-545
eQTL (or allelic imbalance) corresponding to host genes (only human)
Morley et al. (2004) Nature 430:743-747
Cheung et al. (2005) Nature 437:1365-1369
Ge et al. (2005) Genome Res. 15:1584-1591
Stranger et al. (2005) PLoS Genet. 1:e78
Pant et al. (2006) Genome Res. 16:331-339
Dixon et al. (2007) Nat. Genet. 39:1202-1207
Goring et al. (2007) Nat. Genet. 39:1208-1216
Spielman et al. (2007) Nat. Genet. 39:226-231
Stranger et al. (2007) Nat. Genet. 39:1217-1224
Polymorphic miRs - Methods
A 5’ 9 8 7 6 5 4 3 2 miR seed target site 1 2 3 4 5 6 7 8 Targeted mRNA 1 5’ Pri-miR Host gene ? Pri-miR Host gene ?
Polymorphic miRs – Results
n.d. 85 affected miRs n.d. 78 eQTL miRs hosted in eQTL genes 0 256 affected miRs 0 158 CNVs miRs in CNVs 79 146 other 6 26 mature non-seed 4 12 seed 89 184 total 71 136 affected miRs SNPs in pre-miRs 466 676 pre-miRs mouse humanPolymorphic miRs – Results
Duan et al. (2007) Hum. Mol. Genet. 16:1124-1131
T
G
e.g., hsa-miR-125a
SNP in seed (+8) blocks processing
of pri-miR to pre-miR
manually curated list of 52 gene products involved in
RNA-mediated gene silencing
3 broad compartments
1. miR biogenesis: 4 (+4)
2. RISC/mRNP: 12 (+2)
3. P-bodies: 27 (+3)
pDSPs altering machinery gene
sequence
SNPs (non-synonymous, stop/frameshift, splicing site)
pDSPs altering machinery gene product
concentration
CNVs encompassing machinery genes (human, mouse, rat)
eQTL corresponding to machinery genes (human)
Silencing machinery – Methods
1
2
3
Silencing machinery – Results
n.d. 21
machinery genes identified as eQTL
0 17 affected genes 0 17 CNVs machinery genes in CNVs 52 42 splicing sites 2 45 stops / frameshifts 73 151 non-synonymous 127 237 total 35 49 affected genes SNPs in machinery genes 51 52 genes mouse human