HAL Id: hal-01240461
https://hal.inria.fr/hal-01240461
Submitted on 28 May 2020
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Identification of long non-coding RNAs in insects
genomes
Fabrice Legeai, Thomas Derrien
To cite this version:
Fabrice Legeai, Thomas Derrien. Identification of long non-coding RNAs in insects genomes. Current
Opinion in Insect Science, Elsevier, 2015, 7, pp.37 - 44. �10.1016/j.cois.2015.01.003�. �hal-01240461�
Identification
of
long
non-coding
RNAs
in
insects
genomes
Fabrice
Legeai
1,2and
Thomas
Derrien
3Thedevelopmentofhighthroughputsequencingtechnologies (HTS)hasallowedresearcherstobetterassessthecomplexity anddiversityofthetranscriptome.Amongthemanyclassesof non-codingRNAs(ncRNAs)identifiedthelastdecade,long non-codingRNAs(lncRNAs)representadiverseandnumerous repertoireofimportantncRNAs,reinforcingtheviewthatthey areofcentralimportancetothecellmachineryinallbranchesof life.AlthoughlncRNAshavebeeninvolvedinessentialbiological processessuchasimprinting,generegulationordosage compensationespeciallyinmammals,therepertoireoflncRNAs ispoorlycharacterizedformanynon-modelorganisms.Inthis review,wefirstfocusonwhatisknownaboutexperimentally validatedlncRNAsininsectsandthenreviewbioinformatic methodstoannotatelncRNAsinthegenomesofhexapods. Addresses
1INRA,UMR1349,InstituteofGenetics,EnvironmentandPlant
Protection,DomainedelaMotte,BP35327,35653LeRheucedex, France
2
IRISA/INRIAGenScale,CampusBeaulieu,35000Rennes,France
3CNRS,UMR6290,InstitutdeGe´ne´tiqueetDe´veloppementdeRennes,
Universite´ deRennes1,2AvenueduPr.Le´onBernard,35000Rennes, France
Correspondingauthor:Legeai,Fabrice(fabrice.legeai@rennes.inra.fr)
CurrentOpinioninInsectScience2015,7:37–44
ThisreviewcomesfromathemedissueonInsectgenomics EditedbySusanBrownandDenisTagu
ForacompleteoverviewseetheIssueandtheEditorial Availableonline13thJanuary2015
http://dx.doi.org/10.1016/j.cois.2015.01.003 2214-5745/#2015PublishedbyElsevierInc.
Introduction
WholetranscriptomesequencingexperimentsorRNAseq hasbecomeverypopularasameanstomonitorthe popu-lationofRNAsincellsandtoprovideauniquesnapshotof alltranscriptspresentataspecifictime-pointinaparticular cell type or tissue[1,2]. Beyond the classicalmessenger RNAs (mRNAs) that will be translated into proteins, RNAseq has shed light on the multiple classes of non-coding RNAs (ncRNAs) players pervasively transcribed fromthegenome.Recently,particularattentionhasbeen paidtotheclassoflongnon-codingRNAs(lncRNAs)since
theyhavebeenconnectedtovariousmechanismssuchas cisand/ortransregulationoftranscription,dosage compen-sation,imprintingandcompetingendogeneousRNA(see forrecentreviews[3,4,5,6]).
lncRNAsarearbitrarilydefinedastranscriptslongerthan 200 nucleotides that do not show any protein-coding capability and thus will not be translated into proteins [7]. Because of this vague definition, the catalog of lncRNAsrepresents aheterogeneousclass oftranscripts with mRNA-like characteristics, that is, transcribed by RNA polymerase II, 50 capped and often spliced [8]. Similar to thewell-studied lncRNA XIST in mammals [9],pioneerworkhasbeendoneinDrosophilamelanogaster through the identification of the ROX1/2 RNA genes involvedindosagecompensation[10].Morerecently,the modENCODEprojectannotatedthousandsoflncRNAs viaadeepexplorationof thefruitflytranscriptomes[11] reinforcing the viewthat ncRNAs are essential compo-nentstolinkgenotypetophenotyperelationships. How-ever, while the repertoire of annotated lncRNAs is regularly improved in phylogenetically distant species [12],manynon-modelorganismsstilldonotbenefitfrom theannotation of thesefunctionalelements ofthe gen-omes [13]. In this review, we first compiled available biologicalinformationonfunctionallyvalidatedlncRNAs ininsect genomeswithparticularemphasisonlncRNAs in fruitflyandhoneybeespeciesandthendiscussed the methodto annotatelncRNAsusingRNAseq.
Resources
and
known
functions
for
lncRNAs
in
insects
At least 72 active databasesare dedicated to collecting biological information about ncRNAs [14] including LNCipedia [15]and lncRNome[16],whichare specifi-cally devoted to catalog lncRNAs and describe their functions basedon literature.Evenif mostof themare stilllimitedtohumanormouseorganisms,more general-istdatabases suchas NonCode[17]aim atgatheringall ncRNAsequencesproducedfromexperimentalprotocols or derived from automatic computational scans, with emphasis on a dozen of model organisms, including Drosophilamelanogaster.Whilethefourthversionof Non-Code contains more than 200 thousands human and mouse lncRNAs, it registers only 3193 lncRNAs from fruitfly, all of them beingautomatically predicted from RNASeq data (see below) without further functional description.Ontheotherhand,theRNAfamilydatabase
Rfam[18]andlncRNAdb[19],databases,bothaccessible throughtheRNACentralrepository[20],compriseaset of experimentally verified ncRNAs which are also searched in other organisms using both sequence and structuralsimilaritystrategies.Interestingly,while RNA-Central stores a total of 40,182 lncRNAs only a few concerninsectmodels.
Amongthem, muchattention hasbeen focused ontwo particularinsectspecies:fruitflyandhoneybee.Thefirst
becauseitbenefitsfromoutstandingresourcesfor experi-mental investigation and validations, and the second because of its particular social behaviors and caste polyphenism,thelatterprocessesmainlyinvolve epige-netics [21]. In Table 1, we briefly describe the few well-studiedlongncRNAsforwhichfunctionshavebeen characterizedtodateinfruitflyandhoneybee.Weshow that these lncRNAs could be broadly classified into main biological mechanisms such as development (lncRNAbithorax),behavior(sphinxandNB-1)orneural 38 Insectgenomics
Table1
DescriptionoftheknownfruitflyandhoneybeelncRNAswhichhavebeenexperimentallyvalidatedandannotatedinspecificdatabases (Rfam,lncRNAdborFlyBase).
Gene Species Function Mode
Generegulation
hsr-v Drosophila melanogaster
Longnon-codingRNAsproducedbythehsw-vgeneareactivelyexpressedinnuclei,forming spotscalledperinuclearomega-speckles,inresponsetoheatshockstress.Thesespecklesare involvedintheredistributionandsequestrationofmultipleprocessingproteins,inparticular heterogeneousnuclearribonucleoproteins(hnRNPs),HP1orpolII,whichstronglyaffect multiplecellularnetworkssubsequenttoastress[22,23]
Trans
Epigeneticscontrolofgenesregulation rox1/2 Drosophila
melanogaster
InX0maleDrosophila,thetranscriptionofgeneslocatedontheXchromosomeisincreased relativetothelevelofXXfemalesbyamechanismknownasdosagecompensation.This mechanismisconnectedtotheacetylationofhistoneH4atlysine16inducedbyaprotein complexwhichinvolvestwolongnon-codingRNAsroX1androX2[24,25]
Trans
Development
bithorax Drosophila melanogaster
Thebithoraxcomplex(BX-C)playsakeyroleinDrosophiladevelopmentandcoversaregion largerthan300kbthatincludesonlyfourprotein-codinggenes.Non-codinggeneshave alreadybeenshowntobeconcomitantlyexpressedfromtheBX-Cdomaintoregulateincisthe BX-Cproteinsinspecificabdominalsegments[26,27]
Cis
Lnccov1/2 Apismellifera ThesetwotranscriptslackevidenceoffunctionalORFsandaredifferentiallyexpressedin queenandworkerovarioletranscriptomesattheembryonicstage.Temporalexpressionshows thatlnccov1mightbeinvolvedintheautophagiccelldeathofovariolesduringworker embryogenesis,andfluorescentinsituhybridization(FISH)indicatesperinuclearlocalizationin omegaspeckle-likestructures[28]
Unknown
Behavior
yar Drosophila melanogaster
yarisalongnon-codingRNA,highlyexpressedduringembryogenesisandlocatedinaneural geneclusterbetweenyellow(y)andachaete(ac)[29].Withthehelpofmutants,thefunctionof thislongnon-codingRNAhasbeenrecentlyrefinedasaregulatorofyandactranscription, affectingaswellthesleepbehavioroftheDrosophila[30]
Cis
sphinx Drosophila melanogaster
sphinxisalncRNAinvolvedintheregulationofmalecourtshipbehavior.Sequencevariations betweencloseDrosophilaspeciesadvocateforafunctionalroleandarapidadaptation.The mutagenesisofsphinxinDrosophilarevealstheemergenceofmale-malecourtshipbehavior, probablybythedisruptionofsomesensorycircuits,whichissupportedbyitsspecific expressioninchemosensoryorgans[31]
Unknown
Nb-1 Apismellifera Nb-1isa700nttranscript,whoselongestORFencodesaputative32aminoacidwithoutany sequenceconservation.ThislncRNAisexpressedinthehoneybeebrainwithvariation accordingtotheageofthecolonyworkerstestifyingtoitsputativeroleinpolyethism[32]
Unknown
Neuralexpression
Ks-1 Apismellifera Sawataetal.identifieda17knttranscriptthatisexpressedrestrictivelyinthemushroombody ofKenyoncellsinthehoneybeebrainandwhichaccumulatesinthenucleus.Thetranscript exhibitssevenputativeORFslongerthan67aminoacidswithoutanyconservationinarelated species(Apiscerana)norsimilaritywithknownproteins[33]
Unknown
AncR-1 Apismellifera AncR-1ispreferentiallyexpressedinthebrain,insexualtissuesandinsomesecretoryorgans andaccumulatesinnuclei.Itcontainsmultiplealternateisoforms,whicharederivedfroma 6.9kbpgenomiclocus[34]
Unknown
Kakusei Apismellifera Thekakuseiisa7000ntlongnon-codingRNAwithmultipleconstitutiveandinducible variants,theexpressionofwhichistransientlyup-regulatedbyneuralactivity.Itislocalized exclusivelyinneuralnucleiindiscretenuclearcompartments.Thisgenemayplayspecificroles inRNAmetabolisminthehoneybeebrains,irrespectiveofbehavioralexperience[35]
expression(kakusei).Inthesespecies,themodeofaction ofthelncRNAsinvolveseitherbroadtrans-regulationby epigeneticcontrolofchromatin(re)organizationorRNA sequestrationinanuclearcompartment(suchas omega-speckles),aswellasneighboringcis-regulationofspecific mRNA genes, both during development or following a stress.
In other insect model organisms, such as Acyrthosiphon pisum,Aedesgambiae,Anophelesgambiae,Danausplexippus orHeliconiusmelpomenenoin-depthfunctionalanalysisof putative lncRNA have been published to date. But, interestinglyin theredflourbeetleTriboliumcastaneum, numerousncRNAsareexpressedontheoppositestrand ofprotein-codinggeneslocalizedintheHoxcluster[36]. Similarly,Lietal.recentlyobservedthatsomencRNAsof intermediate sizesaretranscribedfromthesilk glandof Bombyx mori and may be involved in therepression of transcriptionbyepigeneticmodificationsofhistones[37]. Finally, in Nasonia vitripennis, where some individuals containapaternallytransmittedsupernumerary chromo-some(PaternalSexRatio,PSR),thepaternalchromatinis modifiedduringthefirstmitoticdivisionviaretentionof histoneH3inaphosphorylatedstate.Thisfirst exhaus-tive transcriptome study in testis tissue revealed the presence of four putative ncRNAs that are specific to individualshaving aPSR [38].
IncontrasttoshortncRNAssuchastRNAsormiRNAs, neither thesequence northe structure ofthelncRNA appeartobephylogeneticallyconservedthroughoutthe metazoa kingdom[39]orevenamongtheinsectseven when the biological processesinvolving lncRNA func-tions are similar between species. As a result, new specializedcomputationalpredictionprotocolsandtools arebeingdevelopedtodiscriminatecodingversus non-coding transcripts and to refine the functional annota-tionoflncRNAs(seenextsectionfordetails).Thanksto therecentadvancesinsequencingtechnologies (RNA-Seq), thesystematicidentificationoflongncRNAshas alreadybeenappliedtofewinsectshavinghigh-quality genome assembliesinordertocompletetherepertoire offunctional elements intheir genomes (Table 2).
Bioinformatics
workflows
for
the
systematic
identification
and
annotation
of
lncRNAs
In order to annotate lncRNAs, a typical workflow (Figure 1) could be applied to the growing number of assembled insect genomes with particular attention on thefollowingthreekey points.
RNASeq
protocols
Whole transcriptome sequencing (RNASeq) represents themethodofchoicetodiscovernewtranscriptsandalso toquantifyallRNAsinavarietyoforganisms,celltypes
Table2
Computational identificationoflong non-coding RNAsinfruitfly andmosquito genomes.Allthe methodsdescribed hereuseda dedicatedpipelinewithdifferentbiologicalmaterialasinput(columnmaterial)anddifferentprotocolsandscoringscheme(columns structuralannotationandcodingpotentialanalysis).
Reference Species Material Structuralannotation Codingpotentialanalysis Numberof identified
lncRNA Tupy[40] Drosophila
melanogaster
7972cDNA Nointersectionwithexisting annotations
KA/Ks,QRNAandno conservedcodonstructure
72 Hiller[41] Drosophila
melanogaster
Donorandacceptorsites identifiedbyintronscan
Evolutionarysignatureof conservedintronsstructure between15insectgenomes
N/A 129(partial) Young[42] Drosophila melanogaster RNA-Seqfrom modENCODEproject (30developmental points)
Readsmapping(TopHat), annotation(Cufflinks), comparisonwithexisting annotation(CuffCompare)
Codingpotentialcalculation (CPC),andanalysisof conservation(PhyloCSF) 1119 Brown[11] Drosophila melanogaster 12.9billionsof pairs-endssinglestrand polyA+RNASeqreads from74libraries(inc. distincttissues,whole bodylibrariesandcell lines)
Readsmapping(TopHat), annotationGRIT),comparison toFlybaseannotation Analysisofconservation (PhyloCSF) 1875(3085 transcripts) Padro´n[43] Anopheles gambiae RNA-Seq(223millions reads)
Readsmapping(TopHat), annotation(Cufflinks), comparisonwithexisting annotation(CuffCompare)
Codingpotentialcalculation (CPAT)
9863
Jenkins[44] Anopheles gambiae
Morethan500million alignablereads
Readsmapping(TopHat), annotation(Cufflinks), comparisonwithexisting annotation(CuffCompare)
SizeoftheORFandCDS, analysisofconservation (phyloCSF),norecognizable proteindomains
or tissues[1].Evenif RNASeq technologyhasbecome the ‘de facto’ standard for transcriptome profiling, it is worth noting that this technology undergoes rapid
evolution in terms of library preparation, sequencing platformsandsubsequentbioinformaticsanalysesleading toregularupdatesofguidelinesandstandards[45]. 40 Insectgenomics
Figure1
RNA –Seq Alignment RNA –Seq
Gene prediction genome
Spliced reads alignments (TopHat2, GSNAP)
Reconstruction of transcript models (Cufflinks, scripture, FluxSimulator)
RNA –Seq Alignment RNA –Seq Gene prediction Gene prediction annotations overlap putative IncRNA CPAT Pfam
SwissProt PhyloCSF Exons and Intronsof orthologs genes
CPC / Blast InterproScan
putative IncRNA putative IncRNA
putative IncRNA hexamer file
logitmodel selection of
non–coding sequences selection of
non–coding sequences
selection of non–coding sequences IncRNA filter :
* same strand as a mRNA * > 200bp stranded intersection of the annotations (intersectBed, CuffCompare) Genome annotation Annotation Merge (Cuffmerge) genome
Spliced reads alignments (TopHat2, GSNAP)
Reconstruction of transcript models (Cufflinks, scripture, FluxSimulator)
RNA –Seq Alignment
Coding Potential Calculation
IncRNA identification
Similarity and Proteic domains Evolutionary analyses
RNA –Seq
Gene prediction genome
Spliced reads alignments (TopHat2, GSNAP)
Reconstruction of transcript models (Cufflinks, scripture, FluxSimulator)
Current Opinion in Insect Science
Schematicviewofstandardizedworkflow.Steps1–4aredesignedtoeliminatetranscriptsfromknownandpotentialproteincodinggenes.Step 5isdesignedtoidentifyknownlncRNAs.(1)AlignRNASeqreadsusingasplice-awarealgorithmandpredictgenes.Thisstepcanbeexecutedin parallelforeachlibrary.(2)Mergeannotatedtranscriptsfromstep1andcomparetoknownproteincodinggenes.(3)Predictcodingpotentialof ORFsinremaininggenes.(4)CompareORFpredictionswithknownproteinsandproteindomains.(5)Alignremainingtranscriptwiththosefrom closelyrelatedorthologstoidentifysignalsofselection.
Given that first,lncRNAs tend to be rare compared to mRNAs and second, display both spatial and temporal expression patterns [8], the depth of sequencing, the number of different tissues/cell lines and time-points need to be considered in planning each experiment. For instance, it has been shown in human and fruitfly that testis tissue shows the highest number of tissue-specificlncRNAs(e.g.11%ofalllncRNAsinDrosophila) probablyreflectingmorerelaxedchromatinstructure[11]. In addition, many lncRNAs,are expressed antisenseto protein-coding genes [46,47],which they oftenregulate [48].ItisthusrecommendedtofavorstrandedRNASeq protocols (also known as directional transcriptome sequencing) that keep track of the strand of origin of thetranscript[49]ifthepurposeofthestudyincludesthe identificationofantisenselncRNAs.Forexample,using these stranded protocols, the modENCODE project recentlydiscovered402lncRNAloci(21%ofalllncRNA loci) located antisense to mRNA transcripts of protein codinggenesinD.melanogaster[11]whilethisproportion isslightlylower(15%)inthehumangenome[8].
Mapping
and
reconstructing
lncRNAs
When ahighquality referencegenomeisavailable,one would favoramap-first thenassemble strategy.To this end, the common but still critical point in RNASeq analysis consists of mapping themillions of sequenced readsontoareferencegenome.Fortunately,several soft-waresolutionsormappershavebeendevelopedinthelast few years that efficiently and rapidly align reads on a referencegenome[50–53].RegardingRNASeqreads,the task iseven more challengingsince the mappershould also handle spliced-readalignments,that is,reads map-pingoverexon/intronjunctions(seeforbenchmark[54]). Sincemanyof thelncRNAsaremultiexonic, theuseof spliced-readalignersiscriticaltopreciselydiscovernovel exonjunctionsasforlncRNAs.InD.melanogaster,Young etal.usedamorestringentmappingprotocolbyforcing read mapping onto exon junctions connecting mRNAs endsandknownadjacentintergenictranscripts[42]in order to exclude possible false positive intergenic lncRNAs [55]. The resultingmapping file isthen used bygraph-based approachessuchas cufflinks[56], scrip-ture[57]orFluxSimulator[58]toreconstructalltranscript models.Despiteintensiveworksinthisarea,these meth-ods are far from being perfect in terms of accuracy, sensitivity and specificity but seem to behave better for smaller genomes (nematode and fruitfly) compared to mammaliangenomes [59].
Measuring
the
coding
potential
Onceasetoftranscriptshasbeenassembled,itisimportant toestimatetheprotein-codingpotentialofthesequences. After having removed transcripts whose size is below a certaincutoff(often200ntforlncRNA),afirststepcould bethefilteringoftranscriptswhichoverlapmRNAsexons inthesenseorientationastheymorelikelycorrespondto
novelisoformsofaprotein-codinggene.Subsequently,two kinds of methods areavailable to classify codingversus non-codingtranscriptsusingcomputationalprogramsand/ orbiochemicalexperiments.Computationally,thecoding potentialcouldbedeterminedbymeasuringtheintrinsic properties of the sequences which correspond (non-ex-haustively)tofirst,thelengthoftheOpenReadingFrame (ORF),second,thecoverageoftheORFcomparedtothe lengthofthetranscripts,third,thebiasink-mer frequen-ciesbetweencodingandnon-codingsequencesandfourth, thepresenceofprotein-codingspecificmotifs[60,61].All thesefeaturesareoftenintegratedintomachinelearning algorithms such as a support vector machine (SVM) or randomforest(RF),whicharetrainedwithknownsetsof protein-coding and non-coding transcripts. Additionally, someothertoolsalsorequirethealignmentofthe candi-datelncRNAssequenceswithproteindatabases(suchas PFAM or Swiss-Prot) to search for evidence of trans-latability[62]orwithmultiplegenomesinorderto specifi-cally tag the selective pressureacting on mRNAs [63]. However, theformer method suffers from theinherent lackofproteinsequences(especiallyforinsectproteomes) andmaythereforeleadtofalsepositiveannotationofinsect lncRNAswhilethelatterimpliesthatlncRNAsare evolu-tionaryconservedwhichmaynotbethecasewithrespect to both the phylogenetic distances [39] and effective population sizes [64]. Finally, further work is needed todevelopprogramsthatsimulatenon-codingsequences intheabsenceofnon-codingtrainingsets.
At the experimental level, evidence for protein-coding capabilitycanbedirectlyobtainedbymassspectrometry data[65] althoughtheseexperimentsmaynotbe avail-ableforallinsectspecies.Anothercomplementary tech-nique,ribosomeprofiling,wasdevelopedbyIngoliaetal. and utilizes high throughput sequencing to map RNA regionsassociatedwithtranslatingribosomes[66].While some lncRNAs have been shown to be associatedwith ribosomessuggestingthattheyareinfactwrongly anno-tated, itisstillunclearwhethertheresultingshort pep-tidesarereallyfunctionalsincetheyarealsoreportedin the50UTRof protein-codingtranscripts.
Conclusion
and
future
directions
DespitegrowingevidencethatlncRNAsarekeyplayersin mammaliancells, onlyafewofthemhavebeen experi-mentally validated in insects,mostly in D. melanogaster. AsaproofofconceptforlncRNAfunctionality,the well-studiedRNAroxgenesareinvolvedindosage compen-sationsimilartotheXistgeneinmammals[9].
ManynewinsectlncRNAswillbediscoveredinthenext few years owing to both the availability of cheaper RNASeq protocols and the development of dedicated bioinformaticsprograms.Forinstance,whenareference genome is not available or when the quality of the genome assembly is relatively poor (which maybe the
case for several non-model organisms), de novo or ge-nome-independentassemblyapproaches[67–69]haveto beenvisagedtobetteridentifythelncRNAsrepertoire. Moreover, differentiating non-coding from coding tran-scriptsremainsachallenging task[70]andcanbe ham-pered by biological artifices. Indeed, at least two Drosophila lncRNAs (prg and polished rice) have been recentlyre-classifiedascodingforsmallpeptides[71,72]. Furthermore, bioinformatics tools are still missing that distinguishbifunctionalRNAssuchasthesteroid recep-tor activator gene (SRA) [73] harboring two different functions, one at the RNA level and another when translatedintoproteins.
Finally,itisobviousthatnewexperimentalmethodshave to be implemented to understand the function of these intriguingRNAs,suchas therecent andpromising tech-nologiesCHART[74]ordChIRP[75],thatcanbeapplied toidentifytheDNAbindingsitesoflncRNAs.Inparallel, computationalapproachesarerequiredtounveilfunctions oflncRNAsonalarge-scaleperspective.Asillustratedby the recent implementation of lncRNAtor [76], a new databaseintegratingfunctionalinformationaboutlncRNAs fromsixspecies,includingfruitfly.Theseattemptshaveto beextendedtonon-modelorganismsinordertoshedlight onthemanycomponentsofthegenome(codingand non-coding)thatareresponsibleforphenotypictraits.
Acknowledgements
TheauthorswishtothankSueBrownandDenisTagufortheircomments andcorrectionsonthemanuscript.
References
and
recommended
reading
Papersofparticularinterest,publishedwithintheperiodofreview, havebeenhighlightedas:
ofspecialinterest ofoutstandinginterest
1. WangZ,GersteinM,SnyderM:RNA-Seq:arevolutionarytool fortranscriptomics.NatRevGenet2009,10:57-63.
2. DjebaliS,DavisCA,MerkelA,DobinA,LassmannT,MortazaviA, TanzerA,LagardeJ,LinW,SchlesingerFetal.:Landscapeof transcriptioninhumancells.Nature2012,488:101-108. 3.
UlitskyI,BartelDP:lincRNAs:genomics,evolution,and mechanisms.Cell2013,154:26-46.
4.
FaticaA,BozzoniI:Longnon-codingRNAs:newplayersincell differentiationanddevelopment.NatRevGenet2013,15:7-21. 5.
GardiniA,ShiekhattarR.:Themanyfacesoflongnoncoding RNAs.FEBSJ2014http://dx.doi.org/10.1111/febs.13101. 6.
BonasioR,ShiekhattarR.:Regulationoftranscriptionbylong noncodingRNAs.AnnuRevGenet2014,48:433-455.
Thesefourarticlesareexcellentrecentreviewscoveringallthe mechan-ismsinvolvinglongnon-codingRNAsthathasbeendescribedtodate. 7. KapranovP,ChengJ,DikeS,NixDA,DuttaguptaR,WillinghamAT,
StadlerPF,HertelJ,Hackermu¨llerJ,HofackerILetal.:RNAmaps revealnewRNAclassesandapossiblefunctionforpervasive transcription.Science2007,316:1484-1488.
8. DerrienT,JohnsonR,BussottiG,TanzerA,DjebaliS,TilgnerH, GuernecG,MartinD,MerkelA,KnowlesDGetal.:TheGENCODE v7catalogofhumanlongnoncodingRNAs:analysisoftheir
genestructure,evolution,andexpression.GenomeRes2012, 22:1775-1789.
9. BrockdorffN,AshworthA,KayGF,McCabeVM,NorrisDP, CooperPJ,SwiftS,RastanS:TheproductofthemouseXistgene isa15kbinactiveX-specifictranscriptcontainingnoconserved ORFandlocatedinthenucleus.Cell1992,71:515-526. 10. MellerVH,WuKH,RomanG,KurodaMI,DavisRL:roX1RNA
paintstheXchromosomeofmaleDrosophilaandisregulated bythedosagecompensationsystem.Cell1997,88:445-457. 11. BrownJB,BoleyN,EismanR,MayGE,StoiberMH,DuffMO,
BoothBW,WenJ,ParkS,SuzukiAMetal.:Diversityand dynamicsoftheDrosophilatranscriptome.Nature2014http:// dx.doi.org/10.1038/nature12962.
12. NecsuleaA,SoumillonM,WarneforsM,LiechtiA,DaishT,ZellerU, BakerJC,Gru¨tznerF,KaessmannH:TheevolutionoflncRNA repertoiresandexpressionpatternsintetrapods.Nature2014, 505:635-640.
13.
TaguD,ColbourneJK,NegreN:Genomicdataintegrationfor ecologicalandevolutionarytraitsinnon-modelorganisms. BMCGenomics2014,15:1-16.
Aninterestingclaimforimprovingdataandbioinformaticsresourcesto increaseourknowledgeonnon-modelorganisms.
14. HoeppnerMP,BarquistLE,GardnerPP:AnintroductiontoRNA databases.MethodsMolBiol2014,1097:107-123.
15. VoldersP-J,VerheggenK,MenschaertG,VandepoeleK, MartensL,VandesompeleJ,MestdaghP:Anupdateon LNCipedia:adatabaseforannotatedhumanlncRNA sequences.NucleicAcidsRes2014http://dx.doi.org/10.1093/ nar/gku1060.
16. BhartiyaD,PalK,GhoshS,KapoorS,JalaliS,PanwarB,JainS, SatiS,SenguptaS,SachidanandanCetal.:lncRNome:a comprehensiveknowledgebaseofhumanlongnoncoding RNAs.Database(Oxford)2013,2013:bat034.
17. XieC,YuanJ,LiH,LiM,ZhaoG,BuD,ZhuW,WuW,ChenR, ZhaoY:NONCODEv4:exploringtheworldoflongnon-coding RNAgenes.NucleicAcidsRes2014,42:D98-D103.
18. BurgeSW,DaubJ,EberhardtR,TateJ,BarquistL,NawrockiEP, EddySR,GardnerPP,BatemanA:Rfam11.0:10yearsofRNA families.NucleicAcidsRes2013,41:D226-D232.
19. AmaralPP,ClarkMB,GascoigneDK,DingerME,MattickJS: lncRNAdb:areferencedatabaseforlongnoncodingRNAs. NucleicAcidsRes2011,39:D146-D151.
20.
TheRNACentralConsortium:RNAcentral:aninternational databaseofncRNAsequences.NucleicAcidsRes2014http:// dx.doi.org/10.1093/nar/gku991.
ThisdatabaseisoneofthemostcomprehensiveresourcesforlncRNA geneannotationinmultipleorganismsincludingmanyinsects. 21.
BonasioR:Emergingtopicsinepigenetics:ants,brains,and noncodingRNAs.AnnNYAcadSci2012,1260:14-23.
Arelevantanalysisofwhatarethefuturedirectionstobetterunderstand theimpactofepigeneticsinthecontextofpolyphenismandpolyethismof eusocialinsects.
22. LakhotiaSC:Fortyyearsofthe93DpuffofDrosophila melanogaster.JBiosci2011,36:399-423.
23. LakhotiaSC,MallikM,SinghAK,RayM:Thelargenoncoding hsrv-ntranscriptsareessentialforthermotoleranceand remobilizationofhnRNPs,HP1andRNApolymeraseIIduring recoveryfromheatshockinDrosophila.Chromosoma2012, 121:49-70.
24. SmithER,AllisCD,LucchesiJC:Linkingglobalhistone acetylationtothetranscriptionenhancementof X-chromosomalgenesinDrosophilamales.JBiolChem2001, 276:31483-31486.
25. DengX,MellerVH:roXRNAsarerequiredforincreased expressionofX-linkedgenesinDrosophilamelanogaster males.Genetics2006,174:1859-1866.
26. LipshitzHD,PeattieDA,HognessDS:Noveltranscriptsfromthe Ultrabithoraxdomainofthebithoraxcomplex.GenesDev 1987,1:307-322.
27. PeaseB,BorgesAC,BenderW:NoncodingRNAsofthe UltrabithoraxdomainoftheDrosophilabithoraxcomplex. Genetics2013,195:1253-1264.
28. HumannFC,HartfelderK:Representationaldifferenceanalysis (RDA)revealsdifferentialexpressionofconservedaswellas novelgenesduringcaste-specificdevelopmentofthehoney bee(ApismelliferaL.)ovary.InsectBiochemMolBiol2011, 41:602-612.
29. SoshnevAA,LiX,WehlingMD,GeyerPK:Contextdifferences revealinsulatorandactivatorfunctionsofaSu(Hw)binding region.PLoSGenet2008,4:e1000159.
30. SoshnevAA,IshimotoH,McAllisterBF,LiX,WehlingMD, KitamotoT,GeyerPK:AconservedlongnoncodingRNAaffects sleepbehaviorinDrosophila.Genetics2011,189:455-468. 31. ChenY,DaiH,ChenS,ZhangL,LongM:Highlytissuespecific
expressionofSphinxsupportsitsmalecourtshiprelatedrole inDrosophilamelanogaster.PLoSONE2011,6:e18853. 32. TadanoH,YamazakiY,TakeuchiH,KuboT:Age-and
division-of-labour-dependentdifferentialexpressionofanovel non-codingRNA,Nb-1,inthebrainofworkerhoneybees,Apis melliferaL..InsectMolBiol2009,18:715-726.
33. SawataM,DaisukeY,TakeuchiH,KamikouchiA,KazuakiO, KuboT:Identificationandpunctatenuclearlocalizationofa novelnoncodingIdentificationandpunctatenuclear localizationofanovelnoncodingRNA,Ks-1,fromthe honeybeebrain.RNA2002:772-785.
34. SawataM,TakeuchiH,KuboT:Identificationandanalysisofthe minimalpromoteractivityofanovelnoncodingnuclearRNA gene,AncR-1,fromthehoneybee(ApismelliferaL.).RNA2004 http://dx.doi.org/10.1261/rna.5231504.use.
35. KiyaT,KuniedaT,KuboT:Inducible-andconstitutive-type transcriptvariantsofkakusei,anovelnon-codingimmediate earlygene,inthehoneybeebrain.InsectMolBiol2008, 17:531-536.
36. ShippyTD,RonshaugenM,CandeJ,HeJ,BeemanRW,LevineM, BrownSJ,DenellRE:AnalysisoftheTriboliumhomeotic complex:insightsintomechanismsconstraininginsectHox clusters.DevGenesEvol2008,218:127-139.
37. LiD-D,LiuZ-C,HuangL,JiangQ-L,ZhangK,QiaoH-L,JiaoZ-J, YaoL-G,LiuR-Y,KanY-C:Theexpressionanalysisofsilk gland-enrichedintermediate-sizenon-codingRNAsin silkwormBombyxmori.InsectSci2014,21:429-438. 38. AkbariOS,AntoshechkinI,HayBA,FerreePM:Transcriptome
profilingofNasoniavitripennistestisrevealsnoveltranscripts expressedfromtheselfishBchromosome,paternalsexratio. G3(Bethesda,MD)2013,3:1597-1605.
39. KapustaA,FeschotteC:Volatileevolutionoflongnoncoding RNArepertoires:mechanismsandbiologicalimplications. TrendsGenet2014,30:439-452.
40. TupyJL,BaileyAM,DaileyG,Evans-holmM,SiebelCW,MisraS, CelnikerSE,RubinGM:Identificationofputativenoncoding polyadenylatedtranscriptsinDrosophilamelanogaster.Proc NatlAcadSciUSA2005,102:5495-5500.
41. HillerM,FindeissS,LeinS,MarzM,NickelC,RoseD,SchulzC, BackofenR,ProhaskaSJ,ReuterGetal.:Conservedintrons revealnoveltranscriptsinDrosophilamelanogaster.Genome Res2009,19:1289-1300.
42.
YoungRS,MarquesAC,TibbitC,HaertyW,BassettAR,LiuJ-L, PontingCP:Identificationandpropertiesof1119candidate lincRNAlociintheDrosophilamelanogastergenome.Genome BiolEvol2012,4:427-442.
This paper describes a stringentmethodology to annotate and fully characterize oneofthefirstcataloguesoflong intergenicncRNAsin fruitfly.
43.
Padro´nA,Molina-CruzA,QuinonesM,RibeiroJEM,RamphulU, RodriguesJ,ShenK,HaileA,RamirezJEL,Barillas-MuryC:In depthannotationoftheAnophelesgambiaemosquitomidgut transcriptome.BMCGenomics2014,15:636.
AfirstapplicationofsystematicpredictionoflncRNAsinaninsectspecies apartfromDrosophilamelanogaster.
44. JenkinsAM,WaterhouseRM,KopinAS:Longnon-codingRNA discoveryinAnophelesgambiaeusingdeepRNAsequencing. bioRxiv2014http://biorxiv.org/content/early/2014/07/26/007484. 45. BullardJH,PurdomE,HansenKD,DudoitS:Evaluationof
statisticalmethodsfor normalizationanddifferentialexpression inmRNA-Seqexperiments.BMCBioinform2010,11:94. 46. HeY,VogelsteinB,VelculescuVE,PapadopoulosN,KinzlerKW:
Theantisensetranscriptomesofhumancells.Science2008, 322:1855-1857.
47. MagistriM,FaghihiMA,StLaurentG,WahlestedtC:Regulation ofchromatinstructurebylongnoncodingRNAs:focuson naturalantisensetranscripts.TrendsGenet2012,28:389-396. 48. JohnssonP,AckleyA,VidarsdottirL,LuiW-O,CorcoranM,
Grande´rD,MorrisKV:Apseudogenelong-noncoding-RNA networkregulatesPTENtranscriptionandtranslationin humancells.NatStructMolBiol2013,20:440-446. 49. LevinJZ,YassourM,AdiconisX,NusbaumC,ThompsonDA,
FriedmanN,GnirkeA,RegevA:Comprehensivecomparative analysisofstrand-specificRNAsequencingmethods.Nat Methods2010,7:709-715.
50. Marco-SolaS,SammethM,oacuteRG,RibecaP:TheGEM mapper:fast,accurateandversatilealignmentbyfiltration. NatMethods2012http://dx.doi.org/10.1038/nmeth.2221. 51. TrapnellC,PachterL,SalzbergSL:TopHat:discoveringsplice
junctionswithRNA-Seq.Bioinformatics2009,25:1105-1111. 52. LiH,DurbinR:Fastandaccurateshortreadalignmentwith
Burrows–Wheelertransform.Bioinformatics2009,25:1754-1760. 53. DobinA,DavisCA,SchlesingerF,DrenkowJ,ZaleskiC,JhaS,
BatutP,ChaissonM,GingerasTR:STAR:ultrafastuniversal RNA-seqaligner.Bioinformatics2012,29:15-21.
54. Engstro¨mPG,SteijgerT,SiposB,GrantGR,KahlesA,AliotoT, BehrJ,BertoneP,BohnertR,CampagnaDetal.:Systematic evaluationofsplicedalignmentprogramsforRNA-seqdata. NatMethods2013http://dx.doi.org/10.1038/nmeth.2722. 55. VanBakelH,NislowC,BlencoweBJ,HughesTR:Most‘‘dark
matter’’transcriptsareassociatedwithknowngenes.PLoS Biol2010,8:e1000371.
56. TrapnellC,WilliamsBA,PerteaG,MortazaviA,KwanG,van BarenMJ,SalzbergSL,WoldBJ,PachterL:Transcriptassembly andquantificationbyRNA-Seqrevealsunannotated transcriptsandisoformswitchingduringcelldifferentiation. NatBiotechnol2010,28:511-515.
57. GuttmanM,GarberM,LevinJZ,DonagheyJ,RobinsonJ, AdiconisX,FanL,KoziolMJ,GnirkeA,NusbaumCetal.:Abinitio reconstructionofcelltype-specifictranscriptomesinmouse revealstheconservedmulti-exonicstructureoflincRNAs.Nat Biotechnol2010,28:503-510.
58. GriebelT,ZacherB,RibecaP,RaineriE,LacroixV,GuigoR, SammethM:ModellingandsimulatinggenericRNA-Seq experimentswiththefluxsimulator.NucleicAcidsRes2012, 40:10073-10083.
59. SteijgerT,AbrilJF,Engstro¨mPG,KokocinskiF,AbrilJF, AkermanM,AliotoT,AmbrosiniG,AntonarakisSE,BehrJetal.: Assessmentoftranscriptreconstructionmethodsfor RNA-seq.NatMethods2013http://dx.doi.org/10.1038/nmeth.2714. 60. WangL,ParkHJ,DasariS,WangS,KocherJP,LiW:CPAT:
coding-potentialassessmenttoolusinganalignment-free logisticregressionmodel.NucleicAcidsRes2013http:// dx.doi.org/10.1093/nar/gkt006.
ThispaperdescribesafastandeffectiveprogramcalledCPATwhichis abletodiscriminatebetweencodingandnon-codingsequencesusinga logisiticregressionmodelbuiltonasetofknowncodingandnon-coding transcripts.
61. LiA,ZhangJ,ZhouZ:PLEK:atoolforpredictinglong non-codingRNAsandmessengerRNAsbasedonanimproved k-merscheme.BMCBioinform2014,15:1-10.
62. KongL,ZhangY,YeZQ,LiuXQ,ZhaoSQ,WeiL,GaoG:CPC: assesstheprotein-codingpotentialoftranscriptsusing
sequencefeaturesandsupportvectormachine.NucleicAcids Res2007,35:W345-W349.
63. LinMF,JungreisI,KellisM:PhyloCSF:acomparativegenomics methodtodistinguishproteincodingandnon-codingregions. Bioinformatics2011,27:i275-i282.
64.
HaertyW,PontingCP:MutationswithinlncRNAsareeffectively selectedagainstinfruitflybutnotinhuman.GenomeBiol2013, 14:R49.
By comparing sequence polymorphism in Drosophila and human populations,thispaperdemonstratesthedeleteriouseffectofmutations withinfruitflylncRNAsbutnotinhumanlncRNAsprobablyowingtothe differenceineffectivepopulationsizesbetweenthetwospecies. 65. Ba´nfaiB,JiaH,KhatunJ,WoodE,RiskB,GundlingWE,
KundajeA,GunawardenaHP,YuY,XieLetal.:Longnoncoding RNAsarerarelytranslatedintwohumancelllines.Genome Res2012,22:1646-1657.
66. IngoliaNT,GhaemmaghamiS,NewmanJRS,WeissmanJS: Genome-wideanalysisinvivooftranslationwithnucleotide resolutionusingribosomeprofiling.Science2009,
324:218-223.
67. MartinJA,WangZ:Next-generationtranscriptomeassembly. NatRevGenet2011,12:671-682.
68. BaoE,JiangT,GirkeT:BRANCH:boostingRNA-Seq assemblieswithpartialorrelatedgenomicsequences. Bioinformatics2013,29:1250-1259.
69. SacomotoGAT,KielbassaJ,ChikhiR,UricaruR,AntoniouP, SagotM-F,PeterlongoP,LacroixV:KISSPLICE:de-novocalling alternativesplicingeventsfromRNA-seqdata.BMCBioinform 2012,13(Suppl.6):S5.
70. DingerME,PangKC,MercerTR,MattickJS:Differentiating protein-codingandnoncodingRNA:challengesand ambiguities.PLoSCompBiol2008,4:e1000176.
71. KageyamaY,KondoT,HashimotoY:Codingvsnon-coding: translatabilityofshortORFsfoundinputativenon-coding transcripts.Biochimie2011,93:1981-1986.
72. CohenSM:Everythingoldisnewagain:(linc)RNAsmake proteins! EMBOJ2014,33:937-939.
73. Chooniedass-KothariS,EmberleyE,HamedaniMK,TroupS, WangX,CzosnekA,HubeF,MutaweM,WatsonPH,LeygueE: ThesteroidreceptorRNAactivatoristhefirstfunctionalRNA encodingaprotein.FEBSLett2004,566:43-47.
74. SimonMD,WangCI,KharchenkoPV,WestJA,ChapmanBA, AlekseyenkoAA,BorowskyML,KurodaMI,KingstonRE:The genomicbindingsitesofanoncodingRNA.ProcNatlAcadSci 2011,108:20497-20502.
75.
QuinnJJ,IlikIA,QuK,GeorgievP,ChuC,AkhtarA,ChangHY: RevealinglongnoncodingRNAarchitectureandfunctions usingdomain-specificchromatinisolationbyRNA purification.NatBiotechnol2014:32.
This article presents domain-specific chromatin isolation by RNA purification(dChIRP),atechnique toanalyzeRNA–RNA,RNA–protein and RNA–chromatin interactions used to precisely characterize the chromosomalbindingofroX1androX2.
76.
ParkC,YuN,ChoiI,KimW,LeeS:lncRNAtor:acomprehensive resourceforfunctionalinvestigationoflongnon-coding RNAs.Bioinformatics2014,30:2480-2485.
AnewdatabaseintegratingfunctionalinformationaboutlncRNAsfrom sixspecies,includingfruitfly.