Genomic Data

Top PDF Genomic Data:

GenMiner: mining informative association rules from genomic data

GenMiner: mining informative association rules from genomic data

Another delicate issue in association rules discovery is the thresholds for selecting significative rules. Support and confidence are computed while rules are extracted from the dataset, and are, in many cases, the only ones used to point out its relevance. For genomic data, the minimum support threshold must be set low since if only a small set of genes are annotated into a very specific category, the support of rules containing this annotation will be quite low. Never- theless, if these rules have a high confidence value, they reveal that this specific biological property is highly associ- ated with an expression pattern of another gene annotation that appears in the consequent. However, an association rule with high support and confidence can be useless, if the con- sequent itemset of the rule is highly frequent in the dataset and is thus associated to many other itemsets. In other words, associations among weakly correlated elements can be generated using the support-confidence framework [15]. G EN M INER is based on the support-confidence framework, but other statistical measures to evaluate correlation (or in- dependence) between consequents and antecedents of rules can easily be integrated during the calculation phasis or the interpretation phasis, to filter rules between weakly corre- lated gene patterns and order other rules.
En savoir plus

9 En savoir plus

Beyond studying genetic diversity: how can pedigree and genomic data help us assigning individuals to breeds?

Beyond studying genetic diversity: how can pedigree and genomic data help us assigning individuals to breeds?

Beyond studying genetic diversity: how can pedigree and genomic data help us assigning individuals to breeds? H. Wilmot* 1 , J. Bormann 2 , N. Gengler 1 ; 1 ULiège - Gembloux Agro-Bio Tech, Gembloux, Belgium, 2 ASTA, Luxembourg, Luxembourg

1 En savoir plus

‘Overcoming the Bottleneck’: Knowledge Architectures for Genomic Data Interpretation in Oncology

‘Overcoming the Bottleneck’: Knowledge Architectures for Genomic Data Interpretation in Oncology

While most readers are doubtlessly unaware of the database/knowledgebase dis- tinction, it came as no surprise to us. During fieldwork for this paper, numerous respondents invoked it to characterize the computerized resources they had devel- oped to facilitate genomic data interpretation in oncology. Given oncology’s pio- neering role in the development of ‘precision medicine’, recourse to the neologism ‘knowledgebase’ deserves additional investigation. What does it entail and how does it relate to the molecular reconfiguration of oncology practices? More specifi- cally, how and to what extent does the replacement of ‘data’ with ‘knowledge’ in the portmanteau word reflect actual differences in the origin, kind, and content of the information in knowledgebases? Does the ‘data journey’ metaphor (Leonelli this volume ; Leonelli 2016 ; Bates et al. 2016 ), often used to characterize the dynamics of data repositories, continue to appropriately describe how information elicited from journal articles or databases is incorporated and organized within knowledge- bases? To begin to answer these questions we need to examine how knowledgebases are located within the sequence of activities that define genomics-driven oncology, from the initial sequencing of a patient’s tumor to treatment decisions. Knowledgebases are specifically geared for data interpretation and as such impinge directly on discussions about the actionability and clinical utility of genomic results, i.e. the establishment of predictive relations between molecular profiling results and specific drugs (Nelson et al. 2013 ). Oncologists perceive them as potential solutions to a major ‘bottleneck’ that threatens the viability of their endeavor.
En savoir plus

24 En savoir plus

Combining Individual Phenotypes of Feed Intake With Genomic Data to Improve Feed Efficiency in Sea Bass

Combining Individual Phenotypes of Feed Intake With Genomic Data to Improve Feed Efficiency in Sea Bass

for disease resistance in Atlantic salmon. In this study, they showed that the reliability of GEBV calculated with ssGBLUP for resistance to salmon rickettsial syndrome was about 0.41 when using 80% of the fish phenotyped and genotyped to predict the remaining 20%. Our results showed also that the reliability could be increased with more fish phenotyped as we did not reach a plateau when increasing the number of fish in the training group. However, this reliability results must be taken with care as the formula used to calculate the reliability is an approximation of the accuracy ( Gunia et al., 2014 ). In order to estimate the true reliability of GEBV, the G3 fish phenotyped for DGC_aquarium and genotyped could be crossed to generate a G4 in a future experiment. Then, by phenotyping several fish of G4 for DGC_aquarium we could estimate a proxy of the true breeding value of G3 fish. Finally, the GEBV calculated previously could be correlated to these true breeding values to obtain a better estimate of the accuracy of ssGLUP model. Such procedure has been implemented in rainbow trout for resistance to bacterial cold water disease ( Vallejo et al., 2017 ), and showed that the predictive ability of genomic predictions was twice higher than that of traditional pedigree BLUP. This confirms the importance of genomic data for genetic improvement of traits which are difficult to record, such as disease resistance and FCR.
En savoir plus

15 En savoir plus

Mining association rule bases from integrated genomic data and annotations

Mining association rule bases from integrated genomic data and annotations

The GenMiner approach was developed to address these weaknesses and fully exploit ARM capabil- ities. It enables the integration of gene annotations and gene expression profile data to discover intrinsic associations between them. Gene annotations can be integrated from any source of biological infor- mation (semantic sources, bibiographic databases, gene expression databases, etc.). It uses the novel NorDi method for discretizing gene expression measures and generate gene expression profiles. It takes advantage of the Close [23] algorithm that can efficiently generate low support and high confidence non- redundant association rules. When data is dense or correlated, such as genomic data, Close reduces both execution times and memory space usage compared with Apriori, thus enabling the analysis of large datasets. Furthermore, it improves the result’s relevance by extracting a minimal set of rules containing only non-redundant ARs, thus reducing the number of ARs and facilitating their interpretation by the biologist. With these features, GenMiner is an ARM approach that is adequate to biologists requirements for genomic data analysis.
En savoir plus

11 En savoir plus

Methods for staistical inference on correlated data : application to genomic data

Methods for staistical inference on correlated data : application to genomic data

intera tions, diseases spreading and e onomi s. 2.1 Appli ation to biology 2.1.1 Mole ular biology The major interest of omputational biologists has been for several de ades the stru tural and fun tional hara terisation of important biomole ules su h as DNA, RNA and proteins. F or instan e knowing the 3D stru ture of a membrane protein helps understanding its mole ular me hanism and a elerates the development of pharma ologi al agents targeting it. However solving three-dimensional stru tures is a hard experimental task and the stru tural hara terisation of biomole ules has till now pro eeded quite slowly . Sequen ing results to be mu h easier and heaper, thus we assisted at the exponential in rease of available sequen es. Given sequen ing data, the rst step onsists in sear hing for homologous sequen es, i.e. phylogeneti ally related sequen es sharing a ommon an estor. Then, this set of homologous sequen es is rearranged so to reate a Multiple Sequen e Alignment (MSA), meaning a matrix of nu leotides or amino a ids having on dierent lines dierent homologs and on dierent olumns dierent sites. Sequen e sites must be pla ed in the orre t olumn a ording to some equivalen e rule among spe ies. The best alignments tools existing maximise omplex global s ores depending on single-site frequen y of symbols. When these methods were introdu ed the number of available sequen es was extremely poor, thus it was entirely reasonable to ignore higher-order statisti s, sin e the amount of data was insu ient to estimate joint probabilities. MSAs urrently available on databases ontain tens of thousands and even hundreds of thousands of sequen es. Therefore the deep statisti al investigation of MSA is now a ommon pra ti e and diverse approa hes exist [24℄ [2℄ [25℄ [26℄ [3℄ [27℄ [28℄ [29℄.
En savoir plus

155 En savoir plus

Detection of loci under selection from temporal population genomic data through ABC random forest

Detection of loci under selection from temporal population genomic data through ABC random forest

We propose the use of ABC-RF to co-estimate demography and selection in temporal population genomics data. We ran ~50.000 simulations and calculated summary statistics to construct the reference table that was used to row the random-forests. RESULTS

2 En savoir plus

A simulation approach for analyzing genomic data using a package of specific FORTRAN90 functions

A simulation approach for analyzing genomic data using a package of specific FORTRAN90 functions

Collins, 2 Texas A&M University, College Station. Feed intake, growth, and pedigree data from the Midland Bull Test database were used to estimate parameters required for genetic evalua- tion of feed utilization traits. Length of the feeding period was 70 d, and test ADG was estimated as the slope of the regression of BW on test d. Records on DMI, ADG, and estimated mid-test BW raised to the power of 0.75 (MBW) from bulls (n = 2,346) and heifers (n = 221) representing 11 breeds (1,819 Angus) were included in a multiple trait animal model to estimate variance components using average information REML. The model for all traits included the fixed effects of contemporary group (n = 99) and a linear covariate for age at the start of test, and random animal genetic effects (n = 10,327). Heritability estimates for DMI, ADG, and MBW were 0.45 ± 0.09, 0.35 ± 0.07, and 0.54 ± 0.09, respectively, and genetic correlation estimates (SE <0.13) among the traits were posi- tive, ranging from 0.38 to 0.60. Phenotypic residual feed intake (RFI) was defined as the difference between DMI and expected DMI from regression on ADG and MBW. A 4 trait model including phenotypic RFI failed to converge because of the linear dependence with DMI, ADG,
En savoir plus

3 En savoir plus

Classification performance of support vector machines on genomic data utilizing feature space selection techniques

Classification performance of support vector machines on genomic data utilizing feature space selection techniques

Many configurations (dataset, type of kernel function, C value) obtained excellent classification performance, inferring that many of the feature space selection techniq[r]

89 En savoir plus

Methodologies to account for crossbred and genomic data in selection for feed efficiency

Methodologies to account for crossbred and genomic data in selection for feed efficiency

Page 5/20 2. Introduction Pigs and poultry production schemes rely on crossbreeding, where the production animals are crossbred (CB) animals (Figure 1). Thus, the breeding goal for these species is to increase CB performance under commercial farming conditions, while selection of purebred (PB) animals typically is based on PB performance measured in a nucleus environment with high levels of biosecurity. The genetic differences between PB and CB performance is quantified by the purebred-crossbred genetic correlation (r pc ), which is typically below unity for many traits, with average reported values in the range of 0.6 to 0.8 for pigs (Wientjes and Calus, 2017) and poultry (Bos, 2020). When genetic correlations between PB and CB performances differ from unity, the genetic gain reached at the nucleus level is only partly transferred to the production level as a correlated response. One approach to overcome this issue is to account for CB information (i.e., genetic and phenotypic data) in the genetic evaluations of the PB lines to select them for improving CB rather than PB performance.
En savoir plus

21 En savoir plus

Joint inference of effective population size and genetic load from temporal population genomic data

Joint inference of effective population size and genetic load from temporal population genomic data

allow to estimate demographic and selection parameters taking into account their interaction. Because these models are difficult to address under a likelihood framework we recourse to approximate Bayesian computation via Random Forest (ABC-RF). ABC-RF uses simulations to generate a training data set from which RF can learn and make inferences from real data.

2 En savoir plus

Comparative assessment of methods for estimating individual genome-wide homozygosity-by-descent from human genomic data.

Comparative assessment of methods for estimating individual genome-wide homozygosity-by-descent from human genomic data.

Genealogical information for examinees was available for 3-4 ancestral generations in nearly all cases (and in some cases up to 6 generations), based on the self- reported information and parish records. No inbreeding loops suggestive of a parental relationship of first-cousin (F = 0.0625) or closer were seen in the genealogical data, confirming the strong influence of the local Catho- lic Church on the avoidance of inbreeding [26]. Despite this, cryptic inbreeding was still expected to be found due to the known effect of limited mate choice in iso- lated populations [2]. All individuals included in this study were classified into seven groups of grandparental birthplace cluster, based on a-priori expectations of expected genome-wide homozygosity levels. This was based on a combination of information from genealogi- cal and demographic sources (Table 1). The highest homozygosity estimates were expected in the village of Okljucna which is a small and isolated outback settle- ment on the island. Secondly, Komiza is a larger village which is also isolated, but historically experienced more immigration than Okljucna. The third group included examinees from villages in the central highlands. The fourth group consisted of examinees all four of whose grandparents originated from the village of Vis, which historically had more connections with the mainland. The fifth group consisted of individuals of mixed origin (where at least one grandparent was from the island). Finally, the last two groups consisted of examinees all four of whose grandparents originated from the rest of Croatia, or even from other countries (Figure 1).
En savoir plus

11 En savoir plus

Aggregation of experts: an application in the field of “interactomics” (detection of interactions on the basis of genomic data)

Aggregation of experts: an application in the field of “interactomics” (detection of interactions on the basis of genomic data)

Background Major technical advances have made genetic information from molecular origin easily available to the research community in the last decades. In this new context, where very large datasets from the lab are available, the challenge is progressively shifting from data acquisition to data management and use. Genetic mapping - the as- sociation of genetic polymorphisms to phenotypic varia- tions - is one of the major goals targeted by geneticists, and strongly benefits from this recent data explosion. Despite remarkable successes - such as the discoveries of mutations involved in breast cancers, for example - we still need new approaches and new strategies to deal with situations that are more complex. These complex situations include those where several genes interact, making the relationship between the genomic pattern and the corresponding phenotypic variations not easy to identify. Although researchers have proposed many methods to tackle this difficult problem, and despite some successes, no gold standard method is currently available: some methods might be efficient while other fail in a set of situations, but the reverse might be true in another set of situations. Consequently, combining the performances of various methods seems an appeal- ing approach. Since a large portion of the genetic deter- minism underlying many traits of interest in various organisms, including humans, is still unknown and uncharacterized, genetic mapping and positional cloning is a very active field of research [ 1 ]. A classical approach in this field is the use of genome-wide association stud- ies (GWAS): dense molecular markers maps (most often, large sets of Single Nucleotide Polymorphisms (SNP), but not exclusively) are used to scan the whole genome and associations of markers with the trait of interest are sought. Although successful in many studies [ 2 ], this ap- proach has not been successful in many other cases, even when complete genomic information (i.e. sequence data) was available. Several reasons might explain this situation, such as a small power to detect effects of mod- est size or oversimplified statistical models [ 3 ]. If in- creasing the cohorts sizes used for mapping is difficult or useless, a possible track to tackle this “missing herit- ability ” problem might be to fit more elaborate models, such as those introducing epistatic or gene-environment interactions [ 4 , 5 ]. Genes interactions are interplays be- tween two or more genes with an impact on the expres- sion of an organism’s phenotype. They are thought to be particularly important to discover the genetic architec- ture underlying some genetic diseases [ 4 , 5 ]. Conse- quently, there has been an increased interest in discovering combinations of markers that are strongly associated with a phenotype even when each individual marker has little or even no effect [ 6 ]. This approach faces at least two problems: first, modeling and
En savoir plus

15 En savoir plus

Computational analysis of genomic data provides insight into cell sensitivity to BCNU-Induced DNA damage

Computational analysis of genomic data provides insight into cell sensitivity to BCNU-Induced DNA damage

Using gene expression analysis methods, we determined that the transcription factor NF-Y plays a key role in the differential response among cell lines, which is consisten[r]

44 En savoir plus

Learning from genomic data : efficient representations and algorithms.

Learning from genomic data : efficient representations and algorithms.

CHAPTER 2. NETNORM by the proxy mutations that were created in these patients (Fig. 2.5a ). The prognostic power of proxy genes in NetNorM comes from at least two types of information they capture. The first type of information captured by proxy mutations is the total number of mutations in a patient. Patients harbouring proxy mutations are significantly less mutated than those without proxy mutations (Welsh t-test, P ≤ 1 × 10 −2 ) in a given proxy gene. This results from the fact that patients with few mutations receive as many proxy mutations as needed to reach the target number of mutations k, and therefore proxy mutations have a higher probability to be set in patients with few mutations. The fact that NetNorM creates proxies for the total number of mutations raises the question of whether or not the total number of mutations can improve survival predictions made using the raw binary mutation profiles. To answer this question, we trained a model to predict survival from the raw binary mutation profiles concatenated with a feature, scaled to unit variance, which records the total number of mutations in each patient (Fig. A.4 ). According to our results, taking into account such a feature does not improve survival prediction performances compared to using the raw data alone. We therefore tested another feature which better mimics the proxies created by NetNorM, which we call ‘proxies’. This feature is equal to the total number of mutations in a patient for patients with less than k mutations, and is equal to 0 otherwise. We trained a survival prediction model on the raw data concatenated with the feature ‘proxies’, scaled to unit variance, where k is chosen by cross-validation. Interestingly, we find that using such a feature allows to significantly improve the results obtained for OV, KIRC and LUAD compared to the raw data alone. In particular, the performances obtained for LUAD are on par with those obtained with NetNorM, suggesting that the feature ‘proxies’ summarises well the information leveraged by NetNorM. However this is not the case for SKCM since considering the feature ‘proxies’ does not improve over using the raw data alone. We draw two conclusions from these observations: first, NetNorM creates relevant proxies for the total number of mutations which, in combination with the binary mutation profiles, have predictive power; second, such proxies do not entirely explain the performances of NetNorM, at least for SKCM.
En savoir plus

145 En savoir plus

ComBo : a visualization tool for comparative genomic data

ComBo : a visualization tool for comparative genomic data

The feature maps are synchronized with the dot plot to display identical ranges of sequences, allowing the user to focus on a subset of the alignments and compare features aroun[r]

64 En savoir plus

Mining association rule bases from integrated genomic data and annotations (extended version)

Mining association rule bases from integrated genomic data and annotations (extended version)

To evaluate the efficiency and scalability of GenMiner, it was run on a dataset combining the Eisen et al. gene expression data [10] and annotations of these genes. Experimental results show that GenMiner can deal with such large datasets and that its memory usage, as well as the number of ARs generated, are significantly smaller than these of Apriori based approaches. Moreover, ARs extracted by GenMiner are not constrained in their form and can contain both gene annotations and gene expression profiles in the antecedent and the conse- quent. The analyze of these ARs has shown important relationships supported by recent biological literature. These results show that GenMiner is a promis- ing tool for finding meaningful relationships between gene expression patterns and gene annotations. Furthermore, it enables the integration of thousands of gene annotations from heterogenous sources of information with related gene expression data. This is an essential feature as the integration of different types of biological information is indispensable to fully understand the underlying bi- ological processes. In addition, qualitative variables such as gender, tissue and age could easily be integrated in order to extract ARs among these features and gene expression patterns. In the future, we plan to integrate in GenMiner tools to filter, select, compare and visualize ARs during the interpretation phase to simplify these manipulations.
En savoir plus

14 En savoir plus

Joint inference of adaptive and demographic history from temporal population genomic data

Joint inference of adaptive and demographic history from temporal population genomic data

Navascu´ es (INRA, UM, UU) Joint inference of adaptive and demographic history 7/10/2019 11 / 12.. Conclusions & Perspectives[r]

40 En savoir plus

Handling the heterogeneity of genomic and metabolic networks data within flexible workflows with the PADMet toolbox

Handling the heterogeneity of genomic and metabolic networks data within flexible workflows with the PADMet toolbox

The high diversity of input files and tools required to run any metabolic networks reconstruc- tion protocol represents an important drawback. Genomic data is often required, provided in different formats: either annotated genomes, and/or protein sequences, possibly associ- ated with trained Hidden Markov Models. In addition, most approaches require reference metabolic networks of a template organism. Dictionaries mapping the reference metabolic databases to the gene identifiers corresponding to the studied organism may be required. As a main issue, it appears very difficult to ensure that input files agree among them. Such a heterogeneity produces loss of information during the use of the protocols and generates un- certainty in the final metabolic model. Here we introduce the PADMet-toolbox which allows conciliating genomic and metabolic network information. The toolbox centralizes all this in- formation in a new graph-based format: PADMet (PortAble Database for Metabolism) and provides methods to import, update and export information. For the sake of illustration,
En savoir plus

5 En savoir plus

MicroScope—an integrated resource for community expertise of gene functions and comparative analysis of microbial genomic and metabolic data

MicroScope—an integrated resource for community expertise of gene functions and comparative analysis of microbial genomic and metabolic data

An annotation service to researchers in microbiology Interface for user data integration Integration and analysis of genomic data into MicroScope are open and free of charge for the worldwide community of microbiologists. To standardize and make user submission fully automated, we have developed a dedicated Web interface (https://www.genoscope.cns.fr/agc/microscope/about/services. php). The service is mainly used for the annotation of microbial genomes: both newly sequenced genomes (which will remain private till the genome publication and/or their submission to public databanks) and, for comparative analysis purpose, public prokaryotic genomes (Figure 2). Moreover, three other types of services are provided for the integration of (i) genome assem- blies (bins) from metagenomic samples (ii) RNA-seq data for quantitative transcriptomics and (iii) DNA sequencing (DNA- seq) data to identify genomic variations in evolved strains (Figure 3). To ease data integration and comparative studies, standardization of contextual data about genome sequences is essential. For metagenomes, we have added a dedicated form that follows the MIMS specifications (minimum information about a metagenome sequence [16]). When submitting assembled metagenomic data in Microscope, the users are invited to select the type of environment (e.g. soil; air; water; human-associated; plant-associated) and to complete the asso- ciated fields (e.g. collection date, environment biome, geo- graphic location, etc.). These fields are dynamically loaded and
En savoir plus

15 En savoir plus

Show all 6736 documents...