• Aucun résultat trouvé

Road to effective data curation for translational research

N/A
N/A
Protected

Academic year: 2021

Partager "Road to effective data curation for translational research"

Copied!
5
0
0

Texte intégral

(1)

feature

Road to effective data curation for

translational research

WeiGu1,2,z,SamiulHasan3,z,PhilippeRocca-Serra4,zand VenkataP.Satagopam1,2,venkata.satagopam@uni.lu

Translational research today is data-intensive and requires multi-stakeholder collaborations to generate

and pool data together for integrated analysis. This leads to the challenge of harmonization of data from

different sources with different formats and standards, which is often overlooked during project

planning and thus becomes a bottleneck of the research progress. We report on our experience and

lessons learnt about data curation for translational research garnered over the course of the European

Translational Research Infrastructure & Knowledge management Services (eTRIKS) program (https://

www.etriks.org), a unique, 5-year, cross-organizational, cross-cultural collaboration project funded by

the Innovative Medicines Initiative of the EU. Here, we discuss the obstacles and suggest what steps are

needed for effective data curation in translational research, especially for projects involving multiple

organizations from academia and industry.

Introduction

Billionsofdollarsarespentannuallyongenerat- ingclinicalandtranslationalresearchdata.Yet despitesuchsignificantlevelsofinvestment,the amountofdatathatarereusedremainssurpris- inglylow.Besidesthelegalconstraints,twomain reasonsexplainthisdeficit:first,thedifficultiesin findingandaccessingdatasetsthemselves,and second,theeffortrequiredtoharmonizedatasets bothsyntacticallyand semantically.Theperpetual needfor addedcuration efforts highlights alackof interoperabilitybetweendigitalassets.Thisis symptomaticofthedichotomyplaguingthelife sciencesdomainwhenitcomestodataman- agementandassethandling.Ontheonehand, there is an operational gapin termsoftraining life- scienceresearchersinthenecessaryskills,tools andculturerequiredtoturndatasetsintoFAIR

(findable,accessible,interoperable,reusable) resources[1].Andontheotherhand,thereisa culturaldividebetweentheworldsofacademia andindustry,whichtranslatesintodiverseand frequentlydivergentattitudestowardsthe adoptionofdata-managementstandards(Fig.1).

Thisiscompoundedbyanarrayofoftencom- petingstandardsspecificationssponsoredby differentstakeholders,whoseviewsoftheworld arenotnecessarilydiscordant,butarenotentirely aligned,either.Hence,unlessalife-sciencelingua francaemerges,realisingthevisionofaFAIRdata exchangerequiresstrategicthinkingwithagood understandingofexistingstandardsresources andaclearcommitmenttousingthem.

Furthermore,funders,sponsors,program managers,dataanalystsandscientistsneedto fullyappreciatetheaddedburdenofformat

conversions,languagetranslationsandmap- pingbetweenterminologies.Theywillalsohave tobearinmindthatstandardsarenotstatic artefactsbutevolvingentitiesdesignedtore- flectthegrowthofdomainknowledge.There- fore,theyneedtoplanforupgrades,migration andobsolescencestrategies,leveraginglearn- ingsandpracticeswell-entrenchedinother domainsinthesoftwareindustry.

TheInnovativeMedicinesInitiative(IMI)isthe largestpublic–privatepartnershipinthelife- sciencesdomaininEurope,anditisfocusedon developingbetterandsafermedicinesfor patients.Centraltothisstrategyisaknowledge management(KM)environmentthatprovides sustainableaccesstothedatainanintegrated manner.Tomeetthischallenge,theIMIEuro- peanTranslationalResearchInfrastructure&

FeaturesPERSPECTIVE

1359-6446/ã2020TheAuthors.PublishedbyElsevierLtd.ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/).

1

(2)

KnowledgemanagementServices(eTRIKS) projectfocusedonbuildingasustainableKM platformandprovidedsupportatthelevelof datamanagementtootherIMIprojects.Cura- tion,asanessentialpartofdatamanagement, connectskeycomponents,suchasvocabulary managementanddatahosting,inalonger

‘valuechain’thatleadstoasubstantialim- provementoftheutilityofdataforresearch.

BasedonourlearningsfromeTRIKS,weoutline thekeystepstodefine,specify,rolloutand enactrobust,proven,standards-awaredata- managementplans(DMPs).Theseexperiences arerelevanttoalldata-intensiveclinicaland translationalstudies.Thisworkbuildsonthe eTRIKSstandardsstarterpack(SSP)[2],which surveyed,evaluatedandcompiledacollection ofdatastandardsofrelevancetothefield.

Ouraimistosharethepracticalknowledge thusgainedtoassistorganizationstomaximize thevalueoftheirownandpublicdataandto practicallyassesswhentoimplementdata curationintranslational-researchoperational processes.Thisinvolvesoutliningacompre- hensiveconceptofdatacurationasanintegral partoftheglobalconceptofaDMP.Itcovers datahandlingprocessingfromendtoend, introducingtheuseofdatastandardsfromthe studyplanningstagestodata-setsocialization throughplannedreleases.

ExperienceofdatacurationinIMIeTRIKS supportedprojects

eTRIKSdirectlyengagedwithIMIprojectsby providingsupportindatacurationandtraining.

Thisfacilitateddataaccesstoconsortium membersviadata-sharingplatformssuchas tranSMART[3].Thecurationactivitiestargeted compliancetostandardsasrecommendedby

theSSP[2].Curationthereforecoveredaspects ofmetadatastructuring(thesyntax)andele- mentsofcontentannotation(thesemantics).

TheIMIprojectsdirectlyinvolvedwere:UBI- OPRED[4],OncoTrack[5],RA-MAP[6],ABIRISK [7],APPROACH(https://www.approachproject.

eu/about-approach)andAETIONOMY[8].

Initially,eTRIKSadopteda‘fullsupport’model byallocatingexperiencedpersonneltocurate dataforleadclientprojects.However,asifto confirmtheaphorism‘nobattleplaneversurvives contactwiththeenemy’,thissoonresultedin havingtomakeadjustments,oftensignificant ones,owingtothefollowingshareableobstacles:

Obstacle 1: underestimation of the time

required to clear legal hurdles

Thenegotiationofmaterialordataprocessing agreementsbetweenconsortiaturnedouttobe atime-consumingprocess.Thecomplexdetails resultedin>20monthsspenttograntdata accessforcuration.Thisrestrictionwascom- poundedfurtherbyapprehensivenessinshar- inginformationaboutstudyvariables,which resultedinfurtherunanticipateddelays.

Itiscrucialforfutureconsortiatoclearthe pathtodataexchangebetweenpartnersto preventbeingboggeddowninjuridicalno man’sland.

Obstacle 2: under resourcing of the

curation activities and sustainability

issues

ThenumberofcuratorsallocatedtoeTRIKSonly allowedustorealisticallyengagewithfiveto eightleadprojects,whichwouldhavefallenfar shortofeTRIKS’engagementgoalof40projects.

Therealcostsandthescaleofthegapbetween thecuratorialresourcesandthenumberofIMI

projectsbeingfundedwasquicklyandcrudely exposed.

Toprovidesustainablecurationsupporttosub- sequentIMIprojects,anadditional‘lightweight’ supportmodelwasintroduced.Underthisremit, eTRIKShasdevelopeddata-curationguidelinesand an IMI curatortraining course[9]. The guidelines and aseriesoftrainingsessionswereprovidedtosup- portedprojectstoenableindependentcuration.

Eventhismodeloftenfacedchallengesowingtoa lackofpersonnelinmanyprojects.

Westronglysuggestthatfutureconsortia/

projectsshouldplansufficientcuration resourcesintheirproposalstoensureaneffi- cientdataflowfromdataproductiontoanaly- ses.

Obstacle 3: underestimation of the

‘cultural divide’ of standards

Industrystrengthstandardshaveadaunting complexity.Thismeansthatbringingdata managerstotherequiredlevelsofcompetence demandsadequatetrainingandresourcing.By contrast,datastandardsusedinacademiaoften haveadifferentorigin,withpragmatism,agility andflexibilityattheircore,andwithloosely coupledimplementation.Thesedistinctatti- tudestendtopolarizepracticeandmakefinding commongroundchallenging.Havingthesetwo worldsexchange,coordinateanddevelop globalstandardsconsumestime,expertise, trainingandknowledgetransferforharmoni- zationandstability.

Weurgescientificleadersandresearchers frombothacademiaandindustrytoreachoutto fundersandsponsorsregardingthisissue,so thatitwillbeacknowledgedbyfundingagen- ciesandstudysponsorstopromoteconver- genceofefforts.

Drug Discovery Today

FIGURE1

Fundamentaldatacurationpolicies,technicalinfrastructureandculturaladoptionarenecessarytohelppublic–privatepartnershipsdelivernewandbetter- structuredclinicalandomicsdatatosupportpersonalizedmedicine.

2 www.drugdiscoverytoday.com FeaturesPERSPECTIVE

(3)

Obstacle 4: lack of a proper data-

management plan

ADMPisvitalforvariousactivitiesinvolvingthe handlingofdataorinformationthatisoutlined intheprotocoltobecollected,analysedand shared.Italsocontainsdetailedinformation necessaryforthecuratorstounderstandthe datatobeprocessed.Yetalltoooften,DMPsare missingor,whenavailable,aretoothinlywritten tobootstrapthecurationprocess.

BecauseoftheessentialroleofaproperDMP inefficientandeffectivedatacuration,wehave madesomesuggestionsindetailinadedicated section‘Supportforeffectivecurationwithina data-managementplan’.

Obstacle 5: lack of metadata descriptors

AlthoughitrelatestotheDMP,thispointisworth separateattentionowingtoitslargepractical impactoncuration,especiallyforretrospective datasetsforwhichaDMPmightnotbeavailable.

Assumingthatcuratorshaveovercomethepre- vioushurdlesandobtainedadataset,thevariables andtheirvaluesetsareoftensopoorlyannotated thatitisnearimpossibletodecipherthemwithout furtherinteractionswiththeprimaryresearchers, whoaretypicallypressedfortimeorunreachable.

Mostly,itfallsonthecuratorstounderstandthe pre-curated(‘straightfromthehose’)data.This affectstheefficiencyofcurationandfrequently resultsinerroneousinterpretations.

Westronglyrecommendthataproperdata dictionarybedevelopedtodocumentthemeta- dataofeachvariable(name,label,type,unit,value ranges,controlledvocabularies,dependencies, etc.).Thisinformationshouldbecollectedasearly aspossibleor,ifpossible,evenbeforetheproject starts:akeysteptomakedataFAIRbydesign.

Supportforeffectivecurationwithina data-managementplan

Themajorresearchfundershavenowevolved relatively(orsomewhat)homogenous approachestotheirDMPs.DMPtemplatesare availablefromnumeroussources,buttheUK DigitalCurationCentre[10]andELIXIRDMP- wizard(https://ds-wizard.org/)provide resourcesthatreflectbroadcurrentthinking.

DMPsarenowevolvingfromsimplecheckliststo documentscoveringtheentiredatacustody planfromcollectiontopublication,ensuring compliancewiththeformatsandannotation requirementsofrepositories(e.g.,CONSORT, http://www.consort-statement.org/),regulators (e.g.,CDISC,https://www.cdisc.org/standards/

therapeutic-areas)andEuropeanlaw(EUGen- eralDataProtectionRegulation).Keyperfor- manceindicatorsarebeingdevelopedto

determinethelevelofadherencetoprinciples suchastheFAIRprinciplesbothintheEU(e.g., FAIR-DOM.org)[11]andintheUnitedStates(US NationalInstitutesofHealthDCCPCKC1FAIR access)[1].Theseeffortsshareacommonob- jective:Toselectdatastandards,terminologies orontologiesthatwillbeimplementedinthe data-capturingphasewhereverpossible,lead- ingtoanideallimitof‘freeoffreetext’datasets.

Theupfrontdeclarationofstandards,theirname andversionthusallowstheavailabilityofkey provenancedescriptors,whichoughttobeas- sociatedtometadatacapturetemplates,which themselvesoughttobeidentified,licensedand versioned.ThisfollowedtheworkbyDietrich etal.onDMPs[12].Insupportofthesetasks, effortssuchasFAIRsharing[13]andCDISCShare arelaudableinitiatives.

However,inspiteofthisprogress,many instancesofDMPsseemtobetooweak,orevenan afterthought.DMPsneedtobepreparedatthe earlystagestoprovideameanstodeliverpro- spective,designdriven,compliantdata-collection blueprintsthataredigitalartefactsintheirown right,bothmachinereadableandactionable.To reachthisstage,itisessentialthatfrominception, datacontrollersanddataprocessors(including usersandstatisticians)shouldbecloselyinvolved, alongwithstudydesigners.Theyshouldascertain thattheprotocolsembedsufficientsafeguardsto accountforbiasanddocumenthowlurkingvari- ablesaredealtwith;thisiscrucial,andthelackof suchinformationinlegacyorpublicdatasetsis usuallyacausefortheirexclusioninmeta-analyses andreviews,astheuseofinsufficientlydescribed datasetscarriesariskofmisinterpretingsignals.

Somedatamodalitiespresentnewchal- lenges,suchasthehigh-dimensionaldatasets generatedbyomicstechnologies.Thesechal- lengesarebasedontheemergenceofnew platformandtechnologydescriptionsandsig- nal-processingmethodologies(e.g.,normaliza- tionandbatch-effectcorrections).Inaddition, thecomputationalworkflowsandinformation technology(IT)infrastructuresforstoringand executingthecomputationsareevolving,add- ingtothecomplexities.

TheDMPmustbeaugmentedwithdetailed data-accessconditionsandtermsofuse(suchas theDataUseOntologyGA4GHstandard(https://

www.ga4gh.org/news/data-use-ontology- approved-as-a-ga4gh-technical-standard)and theEuropeanGenome-phenomeArchiveterms [14])tobecompliantwithlegalframeworksand tosafeguardpatients’rightsandprivacy.Hence, dataprocessorsshouldprovide(iftheyhostthe data)technicalimplementationplanstohost (transfer),process,analyseandsharedataina

secureandscalablemanner.Theyoughtto makeavailableaclearpictureofthedataflow andprovidestandardoperatingproceduresfor dataaccess.ModelssuchasDataTagSuite (DATS)[15,16]andthedataaccesscomponents couldbefollowed.

TheIMIpublishedthe‘GuidelinesonFAIR DataManagementinHorizon2020’[17]witha FAIRDMPtemplatetoharmonizeandimprove curationactivities.Anothergoodresourceisthe

‘GoodClinicalDataManagementPractices’, whichaimstohelpimplementsustainable curationactivities[18].

Supportforeffectivecurationwithina curationcommunity

ButbeyondaDMP,whatwouldbenefitthe translation-researchcommunitymostisan open,precompetitivecurationcommunity.This oughttoincludeacademia,datamanagers, bioinformaticians,standardsdevelopment,IT developers,physicians,statisticiansandcura- tors.InthecaseoftheIMI,thisincludesEuro- peanFederationofPharmaceuticalIndustries andAssociations(EFPIA)partners,academiaand smallandmediumenterprises.Forthat,mem- bersoftheresearchcommunityshouldestablish andmaintainthefollowingkeypillars.

People and organizations

Buildupanetworkofpeoplewithdiverseex- pertise(domainknowledge,ITskills,algorithms, legalandethicalrequirementsandcoordina- tion)andcreateaforumforallcurationstake- holderstoexchangeideas,requirementsand resources.DespiteinitiativesliketheBiocuration SocietyandthePistoiaAlliance,datacreatedby thepharmaceuticalindustryarestilltreatedas tradesecrets,leadingtoalotofwheelrein- ventionineveryorganization.

Projects and ideas

Seekoutopportunitiestoapplyjointgrantsto improvecurationmethods,resources(e.g., metadataandreference-datarepositories)and infrastructures.

Data sets

Releasetrainingdatasetsofvariables,their valuesetandactual‘dirtydata’.Asmachine learningandartificialintelligence(AI)method- ologiesaregainingimportance,itiscrucialthat acommunityisbuiltaroundassemblingtraining datasetsthatcanestablishthebasisfortools gearedtobootstrapcurationefforts.Tothat effect,obtainingcommondataelementsfrom theNationalCancerInstitute(NCI)(https://www.

cancer.gov/),USNationalInstitutesofHealth,

www.drugdiscoverytoday.com 3 FeaturesPERSPECTIVE

(4)

FederalInteragencyTraumaticBrainInjuryRe- search(FITBIR)(https://fitbir.nih.gov/)andmany others,organizedbyclinicaldomain,wouldbe essentialtoseedanAIengine.

Training

Providetrainingnotonlytocurators,butalsoto principalinvestigators,clinicalresearchers,IT staffandlegalexpertstohelpthemunderstand theimportanceofcurationandotherpeople’s viewsandrolesinthewholedata-management processrelatedtocuration.

Supportforeffectivecurationwithin infrastructuredevelopments

Thankstoitswidespreaduseofgenomicsand high-throughputorhigh-resolutionimaging techniques,translationalresearchsitsfirmlyin therealmofbigdatascience.Effectiveand

efficientdatacurationrequiresnowmorethan everstrongsupportfrominfrastructure,refer- ence(meta)dataresources,andtoolssothat FAIRprinciplescanbeimplemented.Thegoals ofsuchinfrastructurearetoreducetheoverall timeofcurationatthesametimeasimproving thequalityoftheoutcome.Otherbenefitsin- cludeloweringthebarrierofcurationandbeing abletotraceandreproducecurationsteps.A moderncurationinfrastructurerequirescare andproperset-up,followingthestepwisepro- cessshowninFigure2.

Wewouldliketoemphasizethatmanyofthe functionalitiesmentionedherearealready availablepiecebypieceinexistingsoftware applications.Forexample,someofthecom- ponentsneededarealreadyprovidedbyon- tologysupporttools(inEurope,https://www.

ebi.ac.uk/spot/;intheUnitedStates,https://

bioportal.bioontology.org/),OpenRefine(http://

openrefine.org/)andmanycommunityversions ofcommercialtools,suchasPentaho-Kettle (https://github.com/pentaho/pentaho-kettle) andTalendOpenStudio(https://github.com/

Talend/tbd-studio-se).Whenacurationinfra- structureistobesetup,oneshouldreuseand integratetheexistingtoolsasmuchaspossible toavoidreinventingthesamesolutions.

Conclusion

Thisarticledeliversanhonestreviewofthe lessonslearntfromourexperiencesinsup- portingseveralIMIprojectswiththeirdata- curationneeds.Theoutcomeoftheseeffortshas resultedinthereuseofdatainseveralmulti- partyprojectswithinIMI[4–8]andbeyond.For example,theH2020-SYSCIDproject(http://

www.syscid.eu/)reusedthecurateddatarelated

Drug Discovery Today

FIGURE2

ToolssuchasanefficientITinfrastructureareneededforeffectiveandefficientdatacuration.Theupperexampleconversationisatypicalstartingpointfor researchers(ordatamanagersanddataanalysts)whotrytoperformdatacurationwithoutthepropertools.Thelowerpartisalistoffunctionalcomponentsfor anidealcurationinfrastructurethroughoutthelifecycleofdatacuration,frompre-curationtocurationproperandpost-curation.

4 www.drugdiscoverytoday.com FeaturesPERSPECTIVE

(5)

toautoimmunedisease,andtheNCER-PD project(https://parkinson.lu)reusedthecurated datarelatedtoParkinson’sdisease.Datacura- tionremainsthemaintechnicalchallengefor thereuseofdatainclinicalandtranslational research.

Cleaningandscrubbingdataisnotonlya time-consumingandtediousprocess,itisalso anexpensiveone.Yetprojectsstillconsentto thatexpenditure,eventhoughtheyareless inclinedtoagreetoamorerobust,‘front-loaded’

approachtodatamanagement.Indeed,possibly themosteffectivewaytowrestletheproblemis toformawell-structuredDMPatthestartthat coversallthestepsfromdatagenerationor capturetodataanalysis,withthehighestpos- sibleprecision,definitionsanddiscipline.The resources,rolesandresponsibilitiesofeach partyshouldbeclearlydefinedandplanned.

Necessaryinfrastructureshouldbedeployed and,ifneeded,developedtoaccelerateand improvethecurationactivities.Acuration communityisneededtosustainandcontinue developingknowledge,technologiesandex- pertise.Lastbutnotleast,awarenessofthe importanceofdatacurationandtheriskofalack ofplanningshouldbedisseminatedtothe clinicalandtranslationalresearchfield,so researcherscanavoidmakingthesamemistakes timeandtimeagain.

DeclarationofCompetingInterest Theauthorsdeclarethattheyhavenoknown competingfinancialinterestsorpersonal

relationshipsthatcouldhaveappearedtoin- fluencetheworkreportedinthispaper.

Acknowledgements

Wehighlyappreciatetheessentialcontributions oftheeTRIKSconsortiummembers:Dorina Bratfalean,AdrianoBarbosa-Silva,SergeEifes, FranciscoCapdevilaBonachela,IbrahimEmam, ManfredHendlich,PaulHouston,ChrisMarshall, PaulPeeters,KavitaRege,FabienRichard,Martin Romacker,AndreasTielmann,Michael

Braxenthaler,Susanna-AssuntaSansoneand ReinhardSchneider.Thisworkhasreceived supportfromtheIMIJointUndertakingunder grantagreementno.115446(eTRIKS),the resourcesofwhicharecomposedoffinancial contributionfromtheEU’sSeventhFramework Programme(FP7/2007–2013)andTheEuropean FederationofPharmaceuticalIndustriesand Associations(EFPIA)companies’in-kind contributions(www.imi.europa.eu).

References

1 Wilkinson,M.D.etal.(2016)TheFAIRGuidingPrinciples forscientificdatamanagementandstewardship.Sci.

Data3,160018

2 Rocca-Serra,P.etal.(2016)eTRIKSStandardsStarter PackRelease1.1April2016.Zenodo.http://dx.doi.org/

10.5281/zenodo.50825

3 Szalma,S.etal.(2010)Effectiveknowledge managementintranslationalmedicine.Brief.Bioinform.

8,68

4 Wheelock,C.E.etal.(2013)Applicationof’omics technologiestobiomarkerdiscoveryininflammatory lungdiseases.Eur.Respir.J.42,802–825

5 Gu,W.etal.(2019)Dataandknowledgemanagement intranslationalresearch:implementationoftheeTRIKS

platformfortheIMIOncoTrackconsortium.BMC Bioinform.20,164

6 Cope,A.P.etal.(2018)TheRA-MAPConsortium:a workingmodelforacademia–industrycollaboration.

Nat.Rev.Rheumatol.14,53–60

7 Bachelet,D.(2016)Occurrenceofanti-drugantibodies againstinterferon-betaandnatalizumabinmultiple sclerosis:Acollaborativecohortanalysis.PLoSOne11, e0162752

8 Hofmann-Apitius,M.etal.(2015)Bioinformaticsmining andmodelingmethodsfortheidentificationofdisease mechanismsinneurodegenerativedisorders.Int.J.

Mol.Sci.16,29179–29206

9 Marchetti,G.andJullian,N.(2017)eTRIKSFinalTraining Curriculum:DeliverableD6.6.eTRICKS

10 DCC(2013)ChecklistforaDataManagementPlan,v.4.0.

DigitalCurationCentre

11 Wolstencroft,K.etal.(2016)FAIRDOMHub:arepository andcollaborationenvironmentforsharingsystems biologyresearch.NucleicAcidsRes.45,D404–D407 12 Dietrich,D.etal.(2012)De-mystifyingthedata

managementrequirementsofresearchfunders.Issues Sci.Technol.Librariansh70.http://dx.doi.org/10.5062/

F44M92G2

13 Sansone,S.A.etal.(2019)FAIRsharingasacommunity approachtostandards,repositoriesandpolicies.Nat.

Biotechnol37,358–367

14 Lappalainen,I.etal.(2015)TheEuropeanGenome- phenomeArchiveofhumandataconsentedfor biomedicalresearch.Nat.Genet.47,692–695 15 Sansone,S.A.etal.(2017)DATS,thedatatagsuiteto

enablediscoverabilityofdatasets.Sci.Data4,170059 16 Alter,G.etal.(2020)TheDataTagsSuite(DATS)model

fordiscoveringdataaccessanduserequirements.

Gigascience9 giz165

17 EuropeanCommission(2016)H2020Programme:

GuidelinesonFAIRDataManagementinHorizon2020.

EuropeanCommission

18 SocietyforClinicalDataManagement(2020)Good ClinicalDataManagementPractices(GCDMP).Society forClinicalDataManagement

www.drugdiscoverytoday.com 5 FeaturesPERSPECTIVE

Références

Documents relatifs

Abstract—ICPSR recently developed two new training initiatives in digital curation: a week-long applied data curation workshop where participants learn the theories

We define the data curation process by developing the complete hardware and software infrastructure for monitoring the computing center of Laboratoire de l’Acc´el´erateur

In time Small Science will generate 2-3 times more data than Big Science.”. ‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher

With a consortium of ten partners from research and industry and a broad range of expertise in AI, Machine Learning and Language Technologies, the QURATOR project, funded by the

The detailing of these processes, carried out in [4], allowed us to identify about 50 curation practices, most of which also fall into such categories as data preservation,

(i) the question recalls a feature belonging to one of the ontologies of the chosen field and the user is asked whether this feature is necessary for her modeling requirements

The goal of the Linked Clinical Trials (LinkedCT) project is to transform the data published on ClinicalTrials.gov into a high- quality knowledge base published as Linked Data on

The German Biography Portal “Deutsche Biographie” is a joint effort of the Historical Commission at the Bavarian Academy of Sciences and Humanities and the Bavarian State Library