• Aucun résultat trouvé

Hierarchical Classification of Transposable Elements

N/A
N/A
Protected

Academic year: 2022

Partager "Hierarchical Classification of Transposable Elements"

Copied!
2
0
0

Texte intégral

(1)

Hierarchical Classification of Transposable Elements

Felipe Kenji Nakano1,2 and Ricardo Cerri3

1 KU Leuven, Campus KULAK - Department of Public Health and Primary Care, Etienne Sabbelaan 53, 8500 Kortrijk, Belgium

2 ITEC - imec, Etienne Sabbelaan 51, 8500 Kortrijk, Belgium

3 Federal University of Sao Carlos - Washington Luis, 13565-905 Sao Carlos, Brazil

Abstract. Transposable Elements (TEs) are DNA sequences capable moving within a genome. Such movement is associated with genetic vari- ability and functionality in genes. TEs classification is usually performed using homology tools which can present drawbacks such as ignoring the hierarchical relationships among TEs. In this work, we have investigated hierarchical classification (HC) applied to TEs classification. We have also proposed two new methods, and experiments show that hierarchical classification can be superior than homology.

Keywords: Hierarchical Classification·Transposable Elements

1 Introduction

Transposable Elements (TEs) are DNA sequences capable of changing their lo- cation within a genome. Such movement promotes genetic variability and modi- fications in the functionality of genes. Moreover, TEs can also make species more resistant, or even be related to cancer [1]. Thus, the correct identification of TEs is essential to understand their influence in the evolution of species.

TEs classification is usually performed using homology methods. Homology consists of finding parts of sequences that are similar to pre-annotated sequences.

They can present drawbacks, however, such as overlooking biochemical proper- ties, being limited to sub-groups of TEs, and ignoring hierarchical relationships.

As a countermeasure, recent works have studied hierarchical classification (HC) methods [2, 3]. They build predictive models which can distinguish be- tween any classes while incorporating hierarchical relationships. In this work, we provide a comparison between HC methods and the homology method BLASTn.

We also propose two new local methods:non-LeafLocalClassifier perParent Node (nLLCPN) andLocalClassifier perParentNode andBranch (LCPNB).

In a simple manner, LCPNB considers the mean of prediction probabilities, whereas nlLCPN follows the standard top-down approach [2, 3].

Copyright c2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0)

(2)

2 Felipe Kenji Nakano and Ricardo Cerri

2 Experiments

We have collected data from public TE repositories (PGSB and REPBASE), and mapped them to Wicker’s Taxonomy, using K-mers as features [3].

In Table 2, we present a comparison between our methods, local baselines from the literature (SWV, SimplePrune and LCPN), and the homology method BLASTn. Parameters detail were presented in [3]

Our method is capable of providing superior results in many cases. More specifically, nlLCPN and LCPNB can provide the highest hF in all datasets, while being competitive in the other evaluation measures. Additionally, all meth- ods from machine learning were substantially superior to BLASTn in all cases.

SWV SimplePrune LCPN BLASTn nlLCPN LCPNB

PGSB

hP 0.84 0.84 0.84 0.80 0.90 0.90

hR 0.93 0.93 0.93 0.83 0.91 0.91

hF 0.88 0.88 0.88 0.81 0.90 0.91

REPBASE

hP 0.82 0.82 0.82 0.55 0.85 0.85

hR 0.89 0.89 0.89 0.53 0.87 0.88

hF 0.85 0.85 0.85 0.54 0.86 0.86

PGSB+REPBASE

hP 0.81 0.81 0.81 0.63 0.85 0.85

hR 0.89 0.89 0.89 0.63 0.87 0.87

hF 0.85 0.85 0.85 0.63 0.86 0.86

Table 1.mean hierarchical precision (hP), recall (hR) and f-measure (hF) measured in 10 runs

3 Conclusion

Since traditional methods were significantly outperformed, our results favour the applicability of machine learning in TEs classification. We have also proposed two methods which can be applied to other hierarchical problems. Future works might consider more datasets, and methods from the deep learning paradigm where features are learned automatically.

References

1. Bi´emont, C.: A brief history of the status of transposable elements: From junk dna to major players in evolution. Genetics186(4), 1085–1093 (2010)

2. Nakano, F.K., Mastelini, S.M., Barbon, S., Cerri, R.: Improving hierarchical classi- fication of transposable elements using deep neural networks. In: 2018 International Joint Conference on Neural Networks (IJCNN). pp. 1–8 (July 2018)

3. Nakano, F.K., Pinto, W.J., Pappa, G.L., Cerri, R.: Top-down strategies for hierar- chical classification of transposable elements with neural networks. In: 2017 Inter- national Joint Conference on Neural Networks (IJCNN) (May 2017)

Références

Documents relatifs

Given the predictive power of our model, we suggest that viewing biological organisms as distributed computing directed graphs learning an optimal environment to response

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

In contrast, approximately 50% of the 1CTA cells remained Ki67 positive 72 ‐hr postinfection with the genotoxic Salmonella (Figure 4a), and this effect was not associated

Die Studie stellt dar, dass MI bei Jugendlichen mit DMT1 signifikante Auswirkungen auf die Adhärenz, das Selbstmanagement und das HbA1c haben kann.. Jedoch weisen die Autorinnen darauf

We decoupled electron-transfer dissociation (ETD) and collision-induced dissociation of charge-reduced species (CRCID) events to probe the lifetimes of intermediate radical species

To this end, the volume represents a contribution to the growing literature that seeks to shed light on the EU’s institutional development through comparative assessment.. The book

Maruyama and Hartl [18] have detected hybridization bands that are specific for each strain, indicating the existence of insertion site polymorphism for the

The laboratory testing confirms that although the products tested lost some of their initial physical properties (as expected), they generally held up well compared to the standard