• Aucun résultat trouvé

Identification of long non-coding RNAs in insects genomes

N/A
N/A
Protected

Academic year: 2021

Partager "Identification of long non-coding RNAs in insects genomes"

Copied!
9
0
0

Texte intégral

(1)

HAL Id: hal-01240461

https://hal.inria.fr/hal-01240461

Submitted on 28 May 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Identification of long non-coding RNAs in insects

genomes

Fabrice Legeai, Thomas Derrien

To cite this version:

Fabrice Legeai, Thomas Derrien. Identification of long non-coding RNAs in insects genomes. Current

Opinion in Insect Science, Elsevier, 2015, 7, pp.37 - 44. �10.1016/j.cois.2015.01.003�. �hal-01240461�

(2)

Identification

of

long

non-coding

RNAs

in

insects

genomes

Fabrice

Legeai

1,2

and

Thomas

Derrien

3

Thedevelopmentofhighthroughputsequencingtechnologies (HTS)hasallowedresearcherstobetterassessthecomplexity anddiversityofthetranscriptome.Amongthemanyclassesof non-codingRNAs(ncRNAs)identifiedthelastdecade,long non-codingRNAs(lncRNAs)representadiverseandnumerous repertoireofimportantncRNAs,reinforcingtheviewthatthey areofcentralimportancetothecellmachineryinallbranchesof life.AlthoughlncRNAshavebeeninvolvedinessentialbiological processessuchasimprinting,generegulationordosage compensationespeciallyinmammals,therepertoireoflncRNAs ispoorlycharacterizedformanynon-modelorganisms.Inthis review,wefirstfocusonwhatisknownaboutexperimentally validatedlncRNAsininsectsandthenreviewbioinformatic methodstoannotatelncRNAsinthegenomesofhexapods. Addresses

1INRA,UMR1349,InstituteofGenetics,EnvironmentandPlant

Protection,DomainedelaMotte,BP35327,35653LeRheucedex, France

2

IRISA/INRIAGenScale,CampusBeaulieu,35000Rennes,France

3CNRS,UMR6290,InstitutdeGe´ne´tiqueetDe´veloppementdeRennes,

Universite´ deRennes1,2AvenueduPr.Le´onBernard,35000Rennes, France

Correspondingauthor:Legeai,Fabrice(fabrice.legeai@rennes.inra.fr)

CurrentOpinioninInsectScience2015,7:37–44

ThisreviewcomesfromathemedissueonInsectgenomics EditedbySusanBrownandDenisTagu

ForacompleteoverviewseetheIssueandtheEditorial Availableonline13thJanuary2015

http://dx.doi.org/10.1016/j.cois.2015.01.003 2214-5745/#2015PublishedbyElsevierInc.

Introduction

WholetranscriptomesequencingexperimentsorRNAseq hasbecomeverypopularasameanstomonitorthe popu-lationofRNAsincellsandtoprovideauniquesnapshotof alltranscriptspresentataspecifictime-pointinaparticular cell type or tissue[1,2]. Beyond the classicalmessenger RNAs (mRNAs) that will be translated into proteins, RNAseq has shed light on the multiple classes of non-coding RNAs (ncRNAs) players pervasively transcribed fromthegenome.Recently,particularattentionhasbeen paidtotheclassoflongnon-codingRNAs(lncRNAs)since

theyhavebeenconnectedtovariousmechanismssuchas cisand/ortransregulationoftranscription,dosage compen-sation,imprintingandcompetingendogeneousRNA(see forrecentreviews[3,4,5,6]).

lncRNAsarearbitrarilydefinedastranscriptslongerthan 200 nucleotides that do not show any protein-coding capability and thus will not be translated into proteins [7]. Because of this vague definition, the catalog of lncRNAsrepresents aheterogeneousclass oftranscripts with mRNA-like characteristics, that is, transcribed by RNA polymerase II, 50 capped and often spliced [8]. Similar to thewell-studied lncRNA XIST in mammals [9],pioneerworkhasbeendoneinDrosophilamelanogaster through the identification of the ROX1/2 RNA genes involvedindosagecompensation[10].Morerecently,the modENCODEprojectannotatedthousandsoflncRNAs viaadeepexplorationof thefruitflytranscriptomes[11] reinforcing the viewthat ncRNAs are essential compo-nentstolinkgenotypetophenotyperelationships. How-ever, while the repertoire of annotated lncRNAs is regularly improved in phylogenetically distant species [12],manynon-modelorganismsstilldonotbenefitfrom theannotation of thesefunctionalelements ofthe gen-omes [13]. In this review, we first compiled available biologicalinformationonfunctionallyvalidatedlncRNAs ininsect genomeswithparticularemphasisonlncRNAs in fruitflyandhoneybeespeciesandthendiscussed the methodto annotatelncRNAsusingRNAseq.

Resources

and

known

functions

for

lncRNAs

in

insects

At least 72 active databasesare dedicated to collecting biological information about ncRNAs [14] including LNCipedia [15]and lncRNome[16],whichare specifi-cally devoted to catalog lncRNAs and describe their functions basedon literature.Evenif mostof themare stilllimitedtohumanormouseorganisms,more general-istdatabases suchas NonCode[17]aim atgatheringall ncRNAsequencesproducedfromexperimentalprotocols or derived from automatic computational scans, with emphasis on a dozen of model organisms, including Drosophilamelanogaster.Whilethefourthversionof Non-Code contains more than 200 thousands human and mouse lncRNAs, it registers only 3193 lncRNAs from fruitfly, all of them beingautomatically predicted from RNASeq data (see below) without further functional description.Ontheotherhand,theRNAfamilydatabase

(3)

Rfam[18]andlncRNAdb[19],databases,bothaccessible throughtheRNACentralrepository[20],compriseaset of experimentally verified ncRNAs which are also searched in other organisms using both sequence and structuralsimilaritystrategies.Interestingly,while RNA-Central stores a total of 40,182 lncRNAs only a few concerninsectmodels.

Amongthem, muchattention hasbeen focused ontwo particularinsectspecies:fruitflyandhoneybee.Thefirst

becauseitbenefitsfromoutstandingresourcesfor experi-mental investigation and validations, and the second because of its particular social behaviors and caste polyphenism,thelatterprocessesmainlyinvolve epige-netics [21]. In Table 1, we briefly describe the few well-studiedlongncRNAsforwhichfunctionshavebeen characterizedtodateinfruitflyandhoneybee.Weshow that these lncRNAs could be broadly classified into main biological mechanisms such as development (lncRNAbithorax),behavior(sphinxandNB-1)orneural 38 Insectgenomics

Table1

DescriptionoftheknownfruitflyandhoneybeelncRNAswhichhavebeenexperimentallyvalidatedandannotatedinspecificdatabases (Rfam,lncRNAdborFlyBase).

Gene Species Function Mode

Generegulation

hsr-v Drosophila melanogaster

Longnon-codingRNAsproducedbythehsw-vgeneareactivelyexpressedinnuclei,forming spotscalledperinuclearomega-speckles,inresponsetoheatshockstress.Thesespecklesare involvedintheredistributionandsequestrationofmultipleprocessingproteins,inparticular heterogeneousnuclearribonucleoproteins(hnRNPs),HP1orpolII,whichstronglyaffect multiplecellularnetworkssubsequenttoastress[22,23]

Trans

Epigeneticscontrolofgenesregulation rox1/2 Drosophila

melanogaster

InX0maleDrosophila,thetranscriptionofgeneslocatedontheXchromosomeisincreased relativetothelevelofXXfemalesbyamechanismknownasdosagecompensation.This mechanismisconnectedtotheacetylationofhistoneH4atlysine16inducedbyaprotein complexwhichinvolvestwolongnon-codingRNAsroX1androX2[24,25]

Trans

Development

bithorax Drosophila melanogaster

Thebithoraxcomplex(BX-C)playsakeyroleinDrosophiladevelopmentandcoversaregion largerthan300kbthatincludesonlyfourprotein-codinggenes.Non-codinggeneshave alreadybeenshowntobeconcomitantlyexpressedfromtheBX-Cdomaintoregulateincisthe BX-Cproteinsinspecificabdominalsegments[26,27]

Cis

Lnccov1/2 Apismellifera ThesetwotranscriptslackevidenceoffunctionalORFsandaredifferentiallyexpressedin queenandworkerovarioletranscriptomesattheembryonicstage.Temporalexpressionshows thatlnccov1mightbeinvolvedintheautophagiccelldeathofovariolesduringworker embryogenesis,andfluorescentinsituhybridization(FISH)indicatesperinuclearlocalizationin omegaspeckle-likestructures[28]

Unknown

Behavior

yar Drosophila melanogaster

yarisalongnon-codingRNA,highlyexpressedduringembryogenesisandlocatedinaneural geneclusterbetweenyellow(y)andachaete(ac)[29].Withthehelpofmutants,thefunctionof thislongnon-codingRNAhasbeenrecentlyrefinedasaregulatorofyandactranscription, affectingaswellthesleepbehavioroftheDrosophila[30]

Cis

sphinx Drosophila melanogaster

sphinxisalncRNAinvolvedintheregulationofmalecourtshipbehavior.Sequencevariations betweencloseDrosophilaspeciesadvocateforafunctionalroleandarapidadaptation.The mutagenesisofsphinxinDrosophilarevealstheemergenceofmale-malecourtshipbehavior, probablybythedisruptionofsomesensorycircuits,whichissupportedbyitsspecific expressioninchemosensoryorgans[31]

Unknown

Nb-1 Apismellifera Nb-1isa700nttranscript,whoselongestORFencodesaputative32aminoacidwithoutany sequenceconservation.ThislncRNAisexpressedinthehoneybeebrainwithvariation accordingtotheageofthecolonyworkerstestifyingtoitsputativeroleinpolyethism[32]

Unknown

Neuralexpression

Ks-1 Apismellifera Sawataetal.identifieda17knttranscriptthatisexpressedrestrictivelyinthemushroombody ofKenyoncellsinthehoneybeebrainandwhichaccumulatesinthenucleus.Thetranscript exhibitssevenputativeORFslongerthan67aminoacidswithoutanyconservationinarelated species(Apiscerana)norsimilaritywithknownproteins[33]

Unknown

AncR-1 Apismellifera AncR-1ispreferentiallyexpressedinthebrain,insexualtissuesandinsomesecretoryorgans andaccumulatesinnuclei.Itcontainsmultiplealternateisoforms,whicharederivedfroma 6.9kbpgenomiclocus[34]

Unknown

Kakusei Apismellifera Thekakuseiisa7000ntlongnon-codingRNAwithmultipleconstitutiveandinducible variants,theexpressionofwhichistransientlyup-regulatedbyneuralactivity.Itislocalized exclusivelyinneuralnucleiindiscretenuclearcompartments.Thisgenemayplayspecificroles inRNAmetabolisminthehoneybeebrains,irrespectiveofbehavioralexperience[35]

(4)

expression(kakusei).Inthesespecies,themodeofaction ofthelncRNAsinvolveseitherbroadtrans-regulationby epigeneticcontrolofchromatin(re)organizationorRNA sequestrationinanuclearcompartment(suchas omega-speckles),aswellasneighboringcis-regulationofspecific mRNA genes, both during development or following a stress.

In other insect model organisms, such as Acyrthosiphon pisum,Aedesgambiae,Anophelesgambiae,Danausplexippus orHeliconiusmelpomenenoin-depthfunctionalanalysisof putative lncRNA have been published to date. But, interestinglyin theredflourbeetleTriboliumcastaneum, numerousncRNAsareexpressedontheoppositestrand ofprotein-codinggeneslocalizedintheHoxcluster[36]. Similarly,Lietal.recentlyobservedthatsomencRNAsof intermediate sizesaretranscribedfromthesilk glandof Bombyx mori and may be involved in therepression of transcriptionbyepigeneticmodificationsofhistones[37]. Finally, in Nasonia vitripennis, where some individuals containapaternallytransmittedsupernumerary chromo-some(PaternalSexRatio,PSR),thepaternalchromatinis modifiedduringthefirstmitoticdivisionviaretentionof histoneH3inaphosphorylatedstate.Thisfirst exhaus-tive transcriptome study in testis tissue revealed the presence of four putative ncRNAs that are specific to individualshaving aPSR [38].

IncontrasttoshortncRNAssuchastRNAsormiRNAs, neither thesequence northe structure ofthelncRNA appeartobephylogeneticallyconservedthroughoutthe metazoa kingdom[39]orevenamongtheinsectseven when the biological processesinvolving lncRNA func-tions are similar between species. As a result, new specializedcomputationalpredictionprotocolsandtools arebeingdevelopedtodiscriminatecodingversus non-coding transcripts and to refine the functional annota-tionoflncRNAs(seenextsectionfordetails).Thanksto therecentadvancesinsequencingtechnologies (RNA-Seq), thesystematicidentificationoflongncRNAshas alreadybeenappliedtofewinsectshavinghigh-quality genome assembliesinordertocompletetherepertoire offunctional elements intheir genomes (Table 2).

Bioinformatics

workflows

for

the

systematic

identification

and

annotation

of

lncRNAs

In order to annotate lncRNAs, a typical workflow (Figure 1) could be applied to the growing number of assembled insect genomes with particular attention on thefollowingthreekey points.

RNASeq

protocols

Whole transcriptome sequencing (RNASeq) represents themethodofchoicetodiscovernewtranscriptsandalso toquantifyallRNAsinavarietyoforganisms,celltypes

Table2

Computational identificationoflong non-coding RNAsinfruitfly andmosquito genomes.Allthe methodsdescribed hereuseda dedicatedpipelinewithdifferentbiologicalmaterialasinput(columnmaterial)anddifferentprotocolsandscoringscheme(columns structuralannotationandcodingpotentialanalysis).

Reference Species Material Structuralannotation Codingpotentialanalysis Numberof identified

lncRNA Tupy[40] Drosophila

melanogaster

7972cDNA Nointersectionwithexisting annotations

KA/Ks,QRNAandno conservedcodonstructure

72 Hiller[41] Drosophila

melanogaster

Donorandacceptorsites identifiedbyintronscan

Evolutionarysignatureof conservedintronsstructure between15insectgenomes

N/A 129(partial) Young[42] Drosophila melanogaster RNA-Seqfrom modENCODEproject (30developmental points)

Readsmapping(TopHat), annotation(Cufflinks), comparisonwithexisting annotation(CuffCompare)

Codingpotentialcalculation (CPC),andanalysisof conservation(PhyloCSF) 1119 Brown[11] Drosophila melanogaster 12.9billionsof pairs-endssinglestrand polyA+RNASeqreads from74libraries(inc. distincttissues,whole bodylibrariesandcell lines)

Readsmapping(TopHat), annotationGRIT),comparison toFlybaseannotation Analysisofconservation (PhyloCSF) 1875(3085 transcripts) Padro´n[43] Anopheles gambiae RNA-Seq(223millions reads)

Readsmapping(TopHat), annotation(Cufflinks), comparisonwithexisting annotation(CuffCompare)

Codingpotentialcalculation (CPAT)

9863

Jenkins[44] Anopheles gambiae

Morethan500million alignablereads

Readsmapping(TopHat), annotation(Cufflinks), comparisonwithexisting annotation(CuffCompare)

SizeoftheORFandCDS, analysisofconservation (phyloCSF),norecognizable proteindomains

(5)

or tissues[1].Evenif RNASeq technologyhasbecome the ‘de facto’ standard for transcriptome profiling, it is worth noting that this technology undergoes rapid

evolution in terms of library preparation, sequencing platformsandsubsequentbioinformaticsanalysesleading toregularupdatesofguidelinesandstandards[45]. 40 Insectgenomics

Figure1

RNA –Seq Alignment RNA –Seq

Gene prediction genome

Spliced reads alignments (TopHat2, GSNAP)

Reconstruction of transcript models (Cufflinks, scripture, FluxSimulator)

RNA –Seq Alignment RNA –Seq Gene prediction Gene prediction annotations overlap putative IncRNA CPAT Pfam

SwissProt PhyloCSF Exons and Intronsof orthologs genes

CPC / Blast InterproScan

putative IncRNA putative IncRNA

putative IncRNA hexamer file

logitmodel selection of

non–coding sequences selection of

non–coding sequences

selection of non–coding sequences IncRNA filter :

* same strand as a mRNA * > 200bp stranded intersection of the annotations (intersectBed, CuffCompare) Genome annotation Annotation Merge (Cuffmerge) genome

Spliced reads alignments (TopHat2, GSNAP)

Reconstruction of transcript models (Cufflinks, scripture, FluxSimulator)

RNA –Seq Alignment

Coding Potential Calculation

IncRNA identification

Similarity and Proteic domains Evolutionary analyses

RNA –Seq

Gene prediction genome

Spliced reads alignments (TopHat2, GSNAP)

Reconstruction of transcript models (Cufflinks, scripture, FluxSimulator)

Current Opinion in Insect Science

Schematicviewofstandardizedworkflow.Steps1–4aredesignedtoeliminatetranscriptsfromknownandpotentialproteincodinggenes.Step 5isdesignedtoidentifyknownlncRNAs.(1)AlignRNASeqreadsusingasplice-awarealgorithmandpredictgenes.Thisstepcanbeexecutedin parallelforeachlibrary.(2)Mergeannotatedtranscriptsfromstep1andcomparetoknownproteincodinggenes.(3)Predictcodingpotentialof ORFsinremaininggenes.(4)CompareORFpredictionswithknownproteinsandproteindomains.(5)Alignremainingtranscriptwiththosefrom closelyrelatedorthologstoidentifysignalsofselection.

(6)

Given that first,lncRNAs tend to be rare compared to mRNAs and second, display both spatial and temporal expression patterns [8], the depth of sequencing, the number of different tissues/cell lines and time-points need to be considered in planning each experiment. For instance, it has been shown in human and fruitfly that testis tissue shows the highest number of tissue-specificlncRNAs(e.g.11%ofalllncRNAsinDrosophila) probablyreflectingmorerelaxedchromatinstructure[11]. In addition, many lncRNAs,are expressed antisenseto protein-coding genes [46,47],which they oftenregulate [48].ItisthusrecommendedtofavorstrandedRNASeq protocols (also known as directional transcriptome sequencing) that keep track of the strand of origin of thetranscript[49]ifthepurposeofthestudyincludesthe identificationofantisenselncRNAs.Forexample,using these stranded protocols, the modENCODE project recentlydiscovered402lncRNAloci(21%ofalllncRNA loci) located antisense to mRNA transcripts of protein codinggenesinD.melanogaster[11]whilethisproportion isslightlylower(15%)inthehumangenome[8].

Mapping

and

reconstructing

lncRNAs

When ahighquality referencegenomeisavailable,one would favoramap-first thenassemble strategy.To this end, the common but still critical point in RNASeq analysis consists of mapping themillions of sequenced readsontoareferencegenome.Fortunately,several soft-waresolutionsormappershavebeendevelopedinthelast few years that efficiently and rapidly align reads on a referencegenome[50–53].RegardingRNASeqreads,the task iseven more challengingsince the mappershould also handle spliced-readalignments,that is,reads map-pingoverexon/intronjunctions(seeforbenchmark[54]). Sincemanyof thelncRNAsaremultiexonic, theuseof spliced-readalignersiscriticaltopreciselydiscovernovel exonjunctionsasforlncRNAs.InD.melanogaster,Young etal.usedamorestringentmappingprotocolbyforcing read mapping onto exon junctions connecting mRNAs endsandknownadjacentintergenictranscripts[42]in order to exclude possible false positive intergenic lncRNAs [55]. The resultingmapping file isthen used bygraph-based approachessuchas cufflinks[56], scrip-ture[57]orFluxSimulator[58]toreconstructalltranscript models.Despiteintensiveworksinthisarea,these meth-ods are far from being perfect in terms of accuracy, sensitivity and specificity but seem to behave better for smaller genomes (nematode and fruitfly) compared to mammaliangenomes [59].

Measuring

the

coding

potential

Onceasetoftranscriptshasbeenassembled,itisimportant toestimatetheprotein-codingpotentialofthesequences. After having removed transcripts whose size is below a certaincutoff(often200ntforlncRNA),afirststepcould bethefilteringoftranscriptswhichoverlapmRNAsexons inthesenseorientationastheymorelikelycorrespondto

novelisoformsofaprotein-codinggene.Subsequently,two kinds of methods areavailable to classify codingversus non-codingtranscriptsusingcomputationalprogramsand/ orbiochemicalexperiments.Computationally,thecoding potentialcouldbedeterminedbymeasuringtheintrinsic properties of the sequences which correspond (non-ex-haustively)tofirst,thelengthoftheOpenReadingFrame (ORF),second,thecoverageoftheORFcomparedtothe lengthofthetranscripts,third,thebiasink-mer frequen-ciesbetweencodingandnon-codingsequencesandfourth, thepresenceofprotein-codingspecificmotifs[60,61].All thesefeaturesareoftenintegratedintomachinelearning algorithms such as a support vector machine (SVM) or randomforest(RF),whicharetrainedwithknownsetsof protein-coding and non-coding transcripts. Additionally, someothertoolsalsorequirethealignmentofthe candi-datelncRNAssequenceswithproteindatabases(suchas PFAM or Swiss-Prot) to search for evidence of trans-latability[62]orwithmultiplegenomesinorderto specifi-cally tag the selective pressureacting on mRNAs [63]. However, theformer method suffers from theinherent lackofproteinsequences(especiallyforinsectproteomes) andmaythereforeleadtofalsepositiveannotationofinsect lncRNAswhilethelatterimpliesthatlncRNAsare evolu-tionaryconservedwhichmaynotbethecasewithrespect to both the phylogenetic distances [39] and effective population sizes [64]. Finally, further work is needed todevelopprogramsthatsimulatenon-codingsequences intheabsenceofnon-codingtrainingsets.

At the experimental level, evidence for protein-coding capabilitycanbedirectlyobtainedbymassspectrometry data[65] althoughtheseexperimentsmaynotbe avail-ableforallinsectspecies.Anothercomplementary tech-nique,ribosomeprofiling,wasdevelopedbyIngoliaetal. and utilizes high throughput sequencing to map RNA regionsassociatedwithtranslatingribosomes[66].While some lncRNAs have been shown to be associatedwith ribosomessuggestingthattheyareinfactwrongly anno-tated, itisstillunclearwhethertheresultingshort pep-tidesarereallyfunctionalsincetheyarealsoreportedin the50UTRof protein-codingtranscripts.

Conclusion

and

future

directions

DespitegrowingevidencethatlncRNAsarekeyplayersin mammaliancells, onlyafewofthemhavebeen experi-mentally validated in insects,mostly in D. melanogaster. AsaproofofconceptforlncRNAfunctionality,the well-studiedRNAroxgenesareinvolvedindosage compen-sationsimilartotheXistgeneinmammals[9].

ManynewinsectlncRNAswillbediscoveredinthenext few years owing to both the availability of cheaper RNASeq protocols and the development of dedicated bioinformaticsprograms.Forinstance,whenareference genome is not available or when the quality of the genome assembly is relatively poor (which maybe the

(7)

case for several non-model organisms), de novo or ge-nome-independentassemblyapproaches[67–69]haveto beenvisagedtobetteridentifythelncRNAsrepertoire. Moreover, differentiating non-coding from coding tran-scriptsremainsachallenging task[70]andcanbe ham-pered by biological artifices. Indeed, at least two Drosophila lncRNAs (prg and polished rice) have been recentlyre-classifiedascodingforsmallpeptides[71,72]. Furthermore, bioinformatics tools are still missing that distinguishbifunctionalRNAssuchasthesteroid recep-tor activator gene (SRA) [73] harboring two different functions, one at the RNA level and another when translatedintoproteins.

Finally,itisobviousthatnewexperimentalmethodshave to be implemented to understand the function of these intriguingRNAs,suchas therecent andpromising tech-nologiesCHART[74]ordChIRP[75],thatcanbeapplied toidentifytheDNAbindingsitesoflncRNAs.Inparallel, computationalapproachesarerequiredtounveilfunctions oflncRNAsonalarge-scaleperspective.Asillustratedby the recent implementation of lncRNAtor [76], a new databaseintegratingfunctionalinformationaboutlncRNAs fromsixspecies,includingfruitfly.Theseattemptshaveto beextendedtonon-modelorganismsinordertoshedlight onthemanycomponentsofthegenome(codingand non-coding)thatareresponsibleforphenotypictraits.

Acknowledgements

TheauthorswishtothankSueBrownandDenisTagufortheircomments andcorrectionsonthemanuscript.

References

and

recommended

reading

Papersofparticularinterest,publishedwithintheperiodofreview, havebeenhighlightedas:

 ofspecialinterest ofoutstandinginterest

1. WangZ,GersteinM,SnyderM:RNA-Seq:arevolutionarytool fortranscriptomics.NatRevGenet2009,10:57-63.

2. DjebaliS,DavisCA,MerkelA,DobinA,LassmannT,MortazaviA, TanzerA,LagardeJ,LinW,SchlesingerFetal.:Landscapeof transcriptioninhumancells.Nature2012,488:101-108. 3.



UlitskyI,BartelDP:lincRNAs:genomics,evolution,and mechanisms.Cell2013,154:26-46.

4. 

FaticaA,BozzoniI:Longnon-codingRNAs:newplayersincell differentiationanddevelopment.NatRevGenet2013,15:7-21. 5.



GardiniA,ShiekhattarR.:Themanyfacesoflongnoncoding RNAs.FEBSJ2014http://dx.doi.org/10.1111/febs.13101. 6.



BonasioR,ShiekhattarR.:Regulationoftranscriptionbylong noncodingRNAs.AnnuRevGenet2014,48:433-455.

Thesefourarticlesareexcellentrecentreviewscoveringallthe mechan-ismsinvolvinglongnon-codingRNAsthathasbeendescribedtodate. 7. KapranovP,ChengJ,DikeS,NixDA,DuttaguptaR,WillinghamAT,

StadlerPF,HertelJ,Hackermu¨llerJ,HofackerILetal.:RNAmaps revealnewRNAclassesandapossiblefunctionforpervasive transcription.Science2007,316:1484-1488.

8. DerrienT,JohnsonR,BussottiG,TanzerA,DjebaliS,TilgnerH, GuernecG,MartinD,MerkelA,KnowlesDGetal.:TheGENCODE v7catalogofhumanlongnoncodingRNAs:analysisoftheir

genestructure,evolution,andexpression.GenomeRes2012, 22:1775-1789.

9. BrockdorffN,AshworthA,KayGF,McCabeVM,NorrisDP, CooperPJ,SwiftS,RastanS:TheproductofthemouseXistgene isa15kbinactiveX-specifictranscriptcontainingnoconserved ORFandlocatedinthenucleus.Cell1992,71:515-526. 10. MellerVH,WuKH,RomanG,KurodaMI,DavisRL:roX1RNA

paintstheXchromosomeofmaleDrosophilaandisregulated bythedosagecompensationsystem.Cell1997,88:445-457. 11. BrownJB,BoleyN,EismanR,MayGE,StoiberMH,DuffMO,

BoothBW,WenJ,ParkS,SuzukiAMetal.:Diversityand dynamicsoftheDrosophilatranscriptome.Nature2014http:// dx.doi.org/10.1038/nature12962.

12. NecsuleaA,SoumillonM,WarneforsM,LiechtiA,DaishT,ZellerU, BakerJC,Gru¨tznerF,KaessmannH:TheevolutionoflncRNA repertoiresandexpressionpatternsintetrapods.Nature2014, 505:635-640.

13. 

TaguD,ColbourneJK,NegreN:Genomicdataintegrationfor ecologicalandevolutionarytraitsinnon-modelorganisms. BMCGenomics2014,15:1-16.

Aninterestingclaimforimprovingdataandbioinformaticsresourcesto increaseourknowledgeonnon-modelorganisms.

14. HoeppnerMP,BarquistLE,GardnerPP:AnintroductiontoRNA databases.MethodsMolBiol2014,1097:107-123.

15. VoldersP-J,VerheggenK,MenschaertG,VandepoeleK, MartensL,VandesompeleJ,MestdaghP:Anupdateon LNCipedia:adatabaseforannotatedhumanlncRNA sequences.NucleicAcidsRes2014http://dx.doi.org/10.1093/ nar/gku1060.

16. BhartiyaD,PalK,GhoshS,KapoorS,JalaliS,PanwarB,JainS, SatiS,SenguptaS,SachidanandanCetal.:lncRNome:a comprehensiveknowledgebaseofhumanlongnoncoding RNAs.Database(Oxford)2013,2013:bat034.

17. XieC,YuanJ,LiH,LiM,ZhaoG,BuD,ZhuW,WuW,ChenR, ZhaoY:NONCODEv4:exploringtheworldoflongnon-coding RNAgenes.NucleicAcidsRes2014,42:D98-D103.

18. BurgeSW,DaubJ,EberhardtR,TateJ,BarquistL,NawrockiEP, EddySR,GardnerPP,BatemanA:Rfam11.0:10yearsofRNA families.NucleicAcidsRes2013,41:D226-D232.

19. AmaralPP,ClarkMB,GascoigneDK,DingerME,MattickJS: lncRNAdb:areferencedatabaseforlongnoncodingRNAs. NucleicAcidsRes2011,39:D146-D151.

20. 

TheRNACentralConsortium:RNAcentral:aninternational databaseofncRNAsequences.NucleicAcidsRes2014http:// dx.doi.org/10.1093/nar/gku991.

ThisdatabaseisoneofthemostcomprehensiveresourcesforlncRNA geneannotationinmultipleorganismsincludingmanyinsects. 21.



BonasioR:Emergingtopicsinepigenetics:ants,brains,and noncodingRNAs.AnnNYAcadSci2012,1260:14-23.

Arelevantanalysisofwhatarethefuturedirectionstobetterunderstand theimpactofepigeneticsinthecontextofpolyphenismandpolyethismof eusocialinsects.

22. LakhotiaSC:Fortyyearsofthe93DpuffofDrosophila melanogaster.JBiosci2011,36:399-423.

23. LakhotiaSC,MallikM,SinghAK,RayM:Thelargenoncoding hsrv-ntranscriptsareessentialforthermotoleranceand remobilizationofhnRNPs,HP1andRNApolymeraseIIduring recoveryfromheatshockinDrosophila.Chromosoma2012, 121:49-70.

24. SmithER,AllisCD,LucchesiJC:Linkingglobalhistone acetylationtothetranscriptionenhancementof X-chromosomalgenesinDrosophilamales.JBiolChem2001, 276:31483-31486.

25. DengX,MellerVH:roXRNAsarerequiredforincreased expressionofX-linkedgenesinDrosophilamelanogaster males.Genetics2006,174:1859-1866.

26. LipshitzHD,PeattieDA,HognessDS:Noveltranscriptsfromthe Ultrabithoraxdomainofthebithoraxcomplex.GenesDev 1987,1:307-322.

(8)

27. PeaseB,BorgesAC,BenderW:NoncodingRNAsofthe UltrabithoraxdomainoftheDrosophilabithoraxcomplex. Genetics2013,195:1253-1264.

28. HumannFC,HartfelderK:Representationaldifferenceanalysis (RDA)revealsdifferentialexpressionofconservedaswellas novelgenesduringcaste-specificdevelopmentofthehoney bee(ApismelliferaL.)ovary.InsectBiochemMolBiol2011, 41:602-612.

29. SoshnevAA,LiX,WehlingMD,GeyerPK:Contextdifferences revealinsulatorandactivatorfunctionsofaSu(Hw)binding region.PLoSGenet2008,4:e1000159.

30. SoshnevAA,IshimotoH,McAllisterBF,LiX,WehlingMD, KitamotoT,GeyerPK:AconservedlongnoncodingRNAaffects sleepbehaviorinDrosophila.Genetics2011,189:455-468. 31. ChenY,DaiH,ChenS,ZhangL,LongM:Highlytissuespecific

expressionofSphinxsupportsitsmalecourtshiprelatedrole inDrosophilamelanogaster.PLoSONE2011,6:e18853. 32. TadanoH,YamazakiY,TakeuchiH,KuboT:Age-and

division-of-labour-dependentdifferentialexpressionofanovel non-codingRNA,Nb-1,inthebrainofworkerhoneybees,Apis melliferaL..InsectMolBiol2009,18:715-726.

33. SawataM,DaisukeY,TakeuchiH,KamikouchiA,KazuakiO, KuboT:Identificationandpunctatenuclearlocalizationofa novelnoncodingIdentificationandpunctatenuclear localizationofanovelnoncodingRNA,Ks-1,fromthe honeybeebrain.RNA2002:772-785.

34. SawataM,TakeuchiH,KuboT:Identificationandanalysisofthe minimalpromoteractivityofanovelnoncodingnuclearRNA gene,AncR-1,fromthehoneybee(ApismelliferaL.).RNA2004 http://dx.doi.org/10.1261/rna.5231504.use.

35. KiyaT,KuniedaT,KuboT:Inducible-andconstitutive-type transcriptvariantsofkakusei,anovelnon-codingimmediate earlygene,inthehoneybeebrain.InsectMolBiol2008, 17:531-536.

36. ShippyTD,RonshaugenM,CandeJ,HeJ,BeemanRW,LevineM, BrownSJ,DenellRE:AnalysisoftheTriboliumhomeotic complex:insightsintomechanismsconstraininginsectHox clusters.DevGenesEvol2008,218:127-139.

37. LiD-D,LiuZ-C,HuangL,JiangQ-L,ZhangK,QiaoH-L,JiaoZ-J, YaoL-G,LiuR-Y,KanY-C:Theexpressionanalysisofsilk gland-enrichedintermediate-sizenon-codingRNAsin silkwormBombyxmori.InsectSci2014,21:429-438. 38. AkbariOS,AntoshechkinI,HayBA,FerreePM:Transcriptome

profilingofNasoniavitripennistestisrevealsnoveltranscripts expressedfromtheselfishBchromosome,paternalsexratio. G3(Bethesda,MD)2013,3:1597-1605.

39. KapustaA,FeschotteC:Volatileevolutionoflongnoncoding RNArepertoires:mechanismsandbiologicalimplications. TrendsGenet2014,30:439-452.

40. TupyJL,BaileyAM,DaileyG,Evans-holmM,SiebelCW,MisraS, CelnikerSE,RubinGM:Identificationofputativenoncoding polyadenylatedtranscriptsinDrosophilamelanogaster.Proc NatlAcadSciUSA2005,102:5495-5500.

41. HillerM,FindeissS,LeinS,MarzM,NickelC,RoseD,SchulzC, BackofenR,ProhaskaSJ,ReuterGetal.:Conservedintrons revealnoveltranscriptsinDrosophilamelanogaster.Genome Res2009,19:1289-1300.

42. 

YoungRS,MarquesAC,TibbitC,HaertyW,BassettAR,LiuJ-L, PontingCP:Identificationandpropertiesof1119candidate lincRNAlociintheDrosophilamelanogastergenome.Genome BiolEvol2012,4:427-442.

This paper describes a stringentmethodology to annotate and fully characterize oneofthefirstcataloguesoflong intergenicncRNAsin fruitfly.

43. 

Padro´nA,Molina-CruzA,QuinonesM,RibeiroJEM,RamphulU, RodriguesJ,ShenK,HaileA,RamirezJEL,Barillas-MuryC:In depthannotationoftheAnophelesgambiaemosquitomidgut transcriptome.BMCGenomics2014,15:636.

AfirstapplicationofsystematicpredictionoflncRNAsinaninsectspecies apartfromDrosophilamelanogaster.

44. JenkinsAM,WaterhouseRM,KopinAS:Longnon-codingRNA discoveryinAnophelesgambiaeusingdeepRNAsequencing. bioRxiv2014http://biorxiv.org/content/early/2014/07/26/007484. 45. BullardJH,PurdomE,HansenKD,DudoitS:Evaluationof

statisticalmethodsfor normalizationanddifferentialexpression inmRNA-Seqexperiments.BMCBioinform2010,11:94. 46. HeY,VogelsteinB,VelculescuVE,PapadopoulosN,KinzlerKW:

Theantisensetranscriptomesofhumancells.Science2008, 322:1855-1857.

47. MagistriM,FaghihiMA,StLaurentG,WahlestedtC:Regulation ofchromatinstructurebylongnoncodingRNAs:focuson naturalantisensetranscripts.TrendsGenet2012,28:389-396. 48. JohnssonP,AckleyA,VidarsdottirL,LuiW-O,CorcoranM,

Grande´rD,MorrisKV:Apseudogenelong-noncoding-RNA networkregulatesPTENtranscriptionandtranslationin humancells.NatStructMolBiol2013,20:440-446. 49. LevinJZ,YassourM,AdiconisX,NusbaumC,ThompsonDA,

FriedmanN,GnirkeA,RegevA:Comprehensivecomparative analysisofstrand-specificRNAsequencingmethods.Nat Methods2010,7:709-715.

50. Marco-SolaS,SammethM,oacuteRG,RibecaP:TheGEM mapper:fast,accurateandversatilealignmentbyfiltration. NatMethods2012http://dx.doi.org/10.1038/nmeth.2221. 51. TrapnellC,PachterL,SalzbergSL:TopHat:discoveringsplice

junctionswithRNA-Seq.Bioinformatics2009,25:1105-1111. 52. LiH,DurbinR:Fastandaccurateshortreadalignmentwith

Burrows–Wheelertransform.Bioinformatics2009,25:1754-1760. 53. DobinA,DavisCA,SchlesingerF,DrenkowJ,ZaleskiC,JhaS,

BatutP,ChaissonM,GingerasTR:STAR:ultrafastuniversal RNA-seqaligner.Bioinformatics2012,29:15-21.

54. Engstro¨mPG,SteijgerT,SiposB,GrantGR,KahlesA,AliotoT, BehrJ,BertoneP,BohnertR,CampagnaDetal.:Systematic evaluationofsplicedalignmentprogramsforRNA-seqdata. NatMethods2013http://dx.doi.org/10.1038/nmeth.2722. 55. VanBakelH,NislowC,BlencoweBJ,HughesTR:Most‘‘dark

matter’’transcriptsareassociatedwithknowngenes.PLoS Biol2010,8:e1000371.

56. TrapnellC,WilliamsBA,PerteaG,MortazaviA,KwanG,van BarenMJ,SalzbergSL,WoldBJ,PachterL:Transcriptassembly andquantificationbyRNA-Seqrevealsunannotated transcriptsandisoformswitchingduringcelldifferentiation. NatBiotechnol2010,28:511-515.

57. GuttmanM,GarberM,LevinJZ,DonagheyJ,RobinsonJ, AdiconisX,FanL,KoziolMJ,GnirkeA,NusbaumCetal.:Abinitio reconstructionofcelltype-specifictranscriptomesinmouse revealstheconservedmulti-exonicstructureoflincRNAs.Nat Biotechnol2010,28:503-510.

58. GriebelT,ZacherB,RibecaP,RaineriE,LacroixV,GuigoR, SammethM:ModellingandsimulatinggenericRNA-Seq experimentswiththefluxsimulator.NucleicAcidsRes2012, 40:10073-10083.

59. SteijgerT,AbrilJF,Engstro¨mPG,KokocinskiF,AbrilJF, AkermanM,AliotoT,AmbrosiniG,AntonarakisSE,BehrJetal.: Assessmentoftranscriptreconstructionmethodsfor RNA-seq.NatMethods2013http://dx.doi.org/10.1038/nmeth.2714. 60. WangL,ParkHJ,DasariS,WangS,KocherJP,LiW:CPAT:

coding-potentialassessmenttoolusinganalignment-free logisticregressionmodel.NucleicAcidsRes2013http:// dx.doi.org/10.1093/nar/gkt006.

ThispaperdescribesafastandeffectiveprogramcalledCPATwhichis abletodiscriminatebetweencodingandnon-codingsequencesusinga logisiticregressionmodelbuiltonasetofknowncodingandnon-coding transcripts.

61. LiA,ZhangJ,ZhouZ:PLEK:atoolforpredictinglong non-codingRNAsandmessengerRNAsbasedonanimproved k-merscheme.BMCBioinform2014,15:1-10.

62. KongL,ZhangY,YeZQ,LiuXQ,ZhaoSQ,WeiL,GaoG:CPC: assesstheprotein-codingpotentialoftranscriptsusing

(9)

sequencefeaturesandsupportvectormachine.NucleicAcids Res2007,35:W345-W349.

63. LinMF,JungreisI,KellisM:PhyloCSF:acomparativegenomics methodtodistinguishproteincodingandnon-codingregions. Bioinformatics2011,27:i275-i282.

64. 

HaertyW,PontingCP:MutationswithinlncRNAsareeffectively selectedagainstinfruitflybutnotinhuman.GenomeBiol2013, 14:R49.

By comparing sequence polymorphism in Drosophila and human populations,thispaperdemonstratesthedeleteriouseffectofmutations withinfruitflylncRNAsbutnotinhumanlncRNAsprobablyowingtothe differenceineffectivepopulationsizesbetweenthetwospecies. 65. Ba´nfaiB,JiaH,KhatunJ,WoodE,RiskB,GundlingWE,

KundajeA,GunawardenaHP,YuY,XieLetal.:Longnoncoding RNAsarerarelytranslatedintwohumancelllines.Genome Res2012,22:1646-1657.

66. IngoliaNT,GhaemmaghamiS,NewmanJRS,WeissmanJS: Genome-wideanalysisinvivooftranslationwithnucleotide resolutionusingribosomeprofiling.Science2009,

324:218-223.

67. MartinJA,WangZ:Next-generationtranscriptomeassembly. NatRevGenet2011,12:671-682.

68. BaoE,JiangT,GirkeT:BRANCH:boostingRNA-Seq assemblieswithpartialorrelatedgenomicsequences. Bioinformatics2013,29:1250-1259.

69. SacomotoGAT,KielbassaJ,ChikhiR,UricaruR,AntoniouP, SagotM-F,PeterlongoP,LacroixV:KISSPLICE:de-novocalling alternativesplicingeventsfromRNA-seqdata.BMCBioinform 2012,13(Suppl.6):S5.

70. DingerME,PangKC,MercerTR,MattickJS:Differentiating protein-codingandnoncodingRNA:challengesand ambiguities.PLoSCompBiol2008,4:e1000176.

71. KageyamaY,KondoT,HashimotoY:Codingvsnon-coding: translatabilityofshortORFsfoundinputativenon-coding transcripts.Biochimie2011,93:1981-1986.

72. CohenSM:Everythingoldisnewagain:(linc)RNAsmake proteins! EMBOJ2014,33:937-939.

73. Chooniedass-KothariS,EmberleyE,HamedaniMK,TroupS, WangX,CzosnekA,HubeF,MutaweM,WatsonPH,LeygueE: ThesteroidreceptorRNAactivatoristhefirstfunctionalRNA encodingaprotein.FEBSLett2004,566:43-47.

74. SimonMD,WangCI,KharchenkoPV,WestJA,ChapmanBA, AlekseyenkoAA,BorowskyML,KurodaMI,KingstonRE:The genomicbindingsitesofanoncodingRNA.ProcNatlAcadSci 2011,108:20497-20502.

75. 

QuinnJJ,IlikIA,QuK,GeorgievP,ChuC,AkhtarA,ChangHY: RevealinglongnoncodingRNAarchitectureandfunctions usingdomain-specificchromatinisolationbyRNA purification.NatBiotechnol2014:32.

This article presents domain-specific chromatin isolation by RNA purification(dChIRP),atechnique toanalyzeRNA–RNA,RNA–protein and RNA–chromatin interactions used to precisely characterize the chromosomalbindingofroX1androX2.

76. 

ParkC,YuN,ChoiI,KimW,LeeS:lncRNAtor:acomprehensive resourceforfunctionalinvestigationoflongnon-coding RNAs.Bioinformatics2014,30:2480-2485.

AnewdatabaseintegratingfunctionalinformationaboutlncRNAsfrom sixspecies,includingfruitfly.

Références

Documents relatifs

Indeed, various studies have demonstrated that more than 60% of human genes are regulated by miRNAs [12] that may act as tumor suppressors (TS-miR) or oncogenes (oncomiR) depending

Our estimates are valid for any cell-centered finite volume scheme, and, in a larger sense, for any locally conservative method such as the mimetic finite difference, covolume,

The phylogenetic tree of tyrosine recombi- nases (Supplementary Figure S1) was built using 204 pro- teins, including: 21 integrases adjacent to attC sites and matching the

Beyond the need of a clear definition of the two phases of the propulsion cycle, this study showed that the assumption on wheelchair locomotion usually admitted on

The projects were Human Resource Development for the Mekong Region (1993-2007) on vegetable research; Social Forestry Support Programme (1994-2002) for development of five

ceranae spore samples, including lncRNAs, long intergenic non-coding RNAs (lincRNAs), and sense lncRNAs.. Moreover, these lncRNAs share similar character- istics with those

Lagarri- gue 1 , 1 UMR PEGASE INRA, Agrocampus Ouest UMR PEGASE, Rennes, France; 2 UMR GenPhySE, INRA, INPT, ENVT, Universi- té de Toulouse, Castanet-Tolosan, France; 3

Considering the functional roles of sRNAs, they are categorized in two major classes: (i) cis-acting elements located/acting on untranslated regions (UTRs) of a translated mRNA