sequence comparison

Top PDF sequence comparison:

Sequence Covering Similarity for Symbolic Sequence Comparison

Sequence Covering Similarity for Symbolic Sequence Comparison

We have introduced the notion of sequence covering given a set of reference sequences which define a dictionary of subsequences that are used to ’opti- mally’ cover any sequence. Originally this notion has been introduced in the context of host intrusion detection. From this notion we have defined a pair- wise distance measure that can be used to compare two sequences and shown that this measure is a semimetric. As the nature of the sequence covering similarity is somehow complementary to other existing similarity defined for sequential data, one may conjecture it could help by bringing some comple- mentary discriminant information. In particular, as efficient implementations exist using suffix trees or arrays, this similarity could bring some benefits in bioinformatics or in text processing applications.
En savoir plus

12 En savoir plus

Application of Lempel-Ziv complexity to alignment-free sequence comparison of protein families

Application of Lempel-Ziv complexity to alignment-free sequence comparison of protein families

Two main categories of alignment-free methods have been pro- posed to overcome the limitations of the alignment-based se- quence comparisons (reviewed by Vinga and Almeida, 2003). The first category is founded on the statistics of word frequency, whereas the second category includes methods that do not re- quire resolving the sequence with fixed word length segments. Among the latter, methods based on information theory make use of the algorithmic complexity (estimated through sequence compression) as the distance metric. Along the same lines, Otu and Sayood (2003) have proposed to rely on the Lempel-Ziv com- plexity to compute the distance between two DNA sequences. Because it is based on exact direct repeats, the LZ complexity works well with the small DNA alphabet. However, when applied to protein sequences, such an approach is expected to miss the subtle and overlapping similarities that characterize the larger and more complex amino-acid alphabet.
En savoir plus

1 En savoir plus

Application of the Lempel-Ziv complexity to the alignment-free sequence comparison of protein families

Application of the Lempel-Ziv complexity to the alignment-free sequence comparison of protein families

outperforms all other methods at the three lower levels of the SCOP classification, except the very slow SW local alignment at the family level.  at superfamily and fold levels, our seq[r]

27 En savoir plus

Protein sequence comparison of human and non-human primate tooth proteomes

Protein sequence comparison of human and non-human primate tooth proteomes

21 Table 1: List of the proteins common to the five proteomes. The list of proteins was retrieved from the comparison of the protein lists obtained from each dataset per genus with Proteome Discoverer (PD) or PEAKS (PX) softwares. For each protein, Uniprot accession numbers correspond to the master protein and the first protein of the protein group identified by PD and by PX, respectively. The number of total identified peptides (and unique peptides) that allowed for protein identification is indicated. * In the case of proteins identified with only 1 unique peptide, the HCD MS/MS spectra of unique peptides are presented in Fig. S2. nd (not detected)
En savoir plus

33 En savoir plus

Aspects of biological sequence comparison

Aspects of biological sequence comparison

sequences using affine gap costs, the time complexity of finding optimal alignments using concave gap costs, and the random permutation of a sequence preserving its d[r]

168 En savoir plus

Sequence-to-Sequence Predictive models: from Prosody to Communicative Gestures

Sequence-to-Sequence Predictive models: from Prosody to Communicative Gestures

ISIR, Sorbonne University Paris, France ABSTRACT Communicative gestures and speech prosody are tightly linked. Our aim is to predict when gestures are performed based on prosody. We develop a model based on a seq2seq recurrent neural network with attention mechanism. The model is trained on a corpus of natural dyadic interaction where the speech prosody and the gestures have been annotated. Because the output of the model is a sequence, we use a sequence comparison technique to evaluate the model performance. We find that the model can predict certain gesture classes. In our experiment, we also replace some input features with random values to find which prosody features are pertinent. We find that the F 0 is pertinent. Lastly, we also train the model on one
En savoir plus

7 En savoir plus

Analysis of a set of Australian northern brown bandicoot expressed sequence tags with comparison to the genome sequence of the South American grey short tailed opossum

Analysis of a set of Australian northern brown bandicoot expressed sequence tags with comparison to the genome sequence of the South American grey short tailed opossum

After assembly, the 445 ESTs that did not match an anno- tated sequence in the public databases yielded 375 unique sequences (consisting of 144 contigs and 231 singletons). These ranged from 106 to 1019 bp in length. Some of the shorter clones in this category may contain insufficient sequence to identify coding regions and many of these sequences may represent the 3' or 5' untranslated region (UTR) of genes. To determine whether open reading frames (ORFs) were present in any of the unmatched ESTs, the web-based program, ESTScan was used to scan these ESTs for potential ORFs [47]. This resulted in the identification of putative ORFs in only 14 of the 375 unmatched ESTs. These 14 ORFs ranged in length from 84–383 bp. It is possible that these sequences encode pro- teins that have not been identified in other species or they may be composed of sequences too divergent to be recog- nized by sequence comparison programs. To identify divergent gene homologs among the 375 unmatched ESTs, the six-frame translation was also searched using all profiles in the Pfam database using a hidden Markov model search program (HMMER), however no significant matches were identified.
En savoir plus

11 En savoir plus

Mycobacterium abscessus multispacer sequence typing.

Mycobacterium abscessus multispacer sequence typing.

Methods Bacterial isolates Reference M. abscessus CIP104536 T , M. abscessus DSMZ44567 (German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany), M. absces- sus subsp. bolletii CIP108541 T (herein referred as “M. bolletii”) and M. abscessus subsp. bolletii CIP108297 T (herein referred as “M. massiliense” [23]) were used in this study. In addition, a collection of 17 M. abscessus clinical isolates from the mycobacteria reference laboratory of the Méditerranée Infection Institute, Marseille, France were also studied (Table 1). All of the mycobacteria were grown in 7H9 broth (Difco, Bordeaux, France) enriched with 10% OADC (oleic acid, bovine serum albumin, dextrose and catalase) at 37°C. As for the identification, DNA extraction and rpoB partial sequence-based identification were per- formed using the primers MYCOF and MYCOR2 (Table 1) as previously described [24]. In addition, the rpoB gene sequence retrieved from 48 M. abscessus sequenced genomes was also analysed (Additional file 1) (http:// www.ncbi.nlm.nih.gov/).
En savoir plus

11 En savoir plus

Temporally coherent mesh sequence segmentations

Temporally coherent mesh sequence segmentations

6.1.4 Time-Varying vs. Global Segmentations Figures 3 shows time-varying segmentations computed on the Balloon and Dancer sequences. Figure 6 shows a global segmentation computed on the Dancer sequence. By construction, global segmentations contains more clusters than time-varying segmentations since no merging process is applied. Thresholds for both time varying and global segmentation of the Dancer sequence share the same value, except for the eigen gap threshold which is slightly lower in the time varying segmentation case. In general, threshold values for a sequence are easily found in a few trials. The computation time of the segmentation between 2 frames of the Dancer sequence is approximately 3 minutes with a (not opti- mized) Matlab implementation. Additional results appear in the accompanying video.
En savoir plus

21 En savoir plus

Suivi de segments dans une sequence d'images monoculaire

Suivi de segments dans une sequence d'images monoculaire

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignemen[r]

9 En savoir plus

A Sequence-Filter Joint Optimization

A Sequence-Filter Joint Optimization

Figure 4. Joint Optimisation, Starting from a “Flat” Sequence proposed methods. In addition to low sidelobes, it is interesting to look at their distributions, according to the method. A direct optimisation of the PSLR gives — as expected — a uniform distribution, with sidelobes at −58 dB, improved by 8 dB with the other init. On the other hand, the indirect optimisation over the ISL provides incredibly good results, with sidelobes around the mainlobe reaching levels as low as −90 dB while the sidelobes at the edges are still acceptable, around -50 dB. To our knowledge, such sidelobes level have not been reached in the literature. The particular shape obtained in this ISL case may besides be very interesting in radar applications, as it may improve the detection of close and small targets that are usually hidden in strong clutter.
En savoir plus

6 En savoir plus

What is ... an automatic sequence?

What is ... an automatic sequence?

. which happens to consist of prefixes of the Thue–Morse sequence. In the limit, we obtain the Thue–Morse sequence itself, which is a fixed point of this morphism. Indeed, any morphism on a finite alphabet, where the image of each letter has length k ≥ 2 and there is some letter a whose image begins with a, has an infinite fixed point. Cobham showed in 1972 that the letters of this fixed point form a k-automatic sequence.

3 En savoir plus

Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings

Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings

3 Institute of Informatics, Federal University of Rio Grande do Sul, Brazil contact: marcely.zanon-boito@univ-grenoble-alpes.fr Abstract Since Bahdanau et al. [1] first introduced attention for neural machine translation, most sequence-to-sequence models made use of attention mechanisms [2, 3, 4]. While they pro- duce soft-alignment matrices that could be interpreted as align- ment between target and source languages, we lack metrics to quantify their quality, being unclear which approach produces the best alignments. This paper presents an empirical evalu- ation of 3 of the main sequence-to-sequence models for word discovery from unsegmented phoneme sequences: CNN, RNN and Transformer-based. This task consists in aligning word se- quences in a source language with phoneme sequences in a tar- get language, inferring from it word segmentation on the target side [5]. Evaluating word segmentation quality can be seen as an extrinsic evaluation of the soft-alignment matrices produced during training. Our experiments in a low-resource scenario on Mboshi and English languages (both aligned to French) show that RNNs surprisingly outperform CNNs and Transformer for this task. Our results are confirmed by an intrinsic evalua- tion of alignment quality through the use Average Normalized Entropy (ANE). Lastly, we improve our best word discovery model by using an alignment entropy confidence measure that accumulates ANE over all the occurrences of a given alignment pair in the collection.
En savoir plus

6 En savoir plus

sequence 6eme   longueur dun cercle

sequence 6eme longueur dun cercle

Arnaud Pousset REP+ André Malraux Monterau-Fault-Yonne Travail pour le vendredi 27/03 :.. A faire dans le cahier partie exercices :.[r]

4 En savoir plus

Make text look like speech: disfluency generation using sequence-to-sequence neural networks

Make text look like speech: disfluency generation using sequence-to-sequence neural networks

La production de textes disfluents pouvant être assimilée au passage d’une séquence de mots fluide à une séquence de mots disfluente, nous avons également étudié différents modèles séque[r]

18 En savoir plus

Causal Message Sequence Charts

Causal Message Sequence Charts

Vis(H) = {M i | i = 0, 1, . . .}, where each M i consists of an emission of a mes- sage m from p to r, then a sequence of i blocks of three messages: a message n from p to q followed by a message o from q to r then a message s from r to p. And at last the reception of message m on r. This MSC language is represent- ed in Figure 15 . This visual language can be easily defined with a CHMSC, by separating emission and reception of m and iterating a MSC containing mes- sages n, o, s an arbitrary number of times. Clearly, this Vis(H) is not finitely generated, and it is not either the visual language of a causal HMSC. Assume for contradiction, that there exists a causal HMSC G with Vis(G) = Vis(H). Let k be the number of messages of the biggest causal MSC which labels a transition of G. We know that M k+1 is in Vis(G), hence M k+1 ∈ Vis(⊚(ρ))
En savoir plus

39 En savoir plus

Analyzing the sequence-structure relationship of a library of local structural prototypes.: Sequence-Structure Relationship

Analyzing the sequence-structure relationship of a library of local structural prototypes.: Sequence-Structure Relationship

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignemen[r]

32 En savoir plus

Time Sequence Summarization: Theory and Applications

Time Sequence Summarization: Theory and Applications

Here we attempt to provide users with a tool that allows them to customize the way summarization is to be operated, i.e., decide the way sequence data is to be represented and the way such resulting data is to be grouped. For this purpose we go back to the fundamentals and rely on research on Cognitive Science. In the early 70's, research on the role of organization in human memory has coined the concept of Semantic memory in contrast with Episodic memory. Mainly inuenced by the ideas of Reiss and Scheers who distinguished in 1959 two forms of primitive memory, namely, remembrances and other memoria, Tulving proposed to distinguish Episodic memory from Semantic mem- ory. Episodic memory [Tul93, Tul02], also called autobiographical memory, allows one to remember events personally experienced at specic points in time and space. For exam- ple, an instance of such memory is the name and place of the last conference one has attended. On the other hand, semantic memory is the system that allows one to store, organize and connect one's knowledge of the world and make it available for retrieval. This is a knowledge base that anyone can obtain through learning, training, etc., that one can access quickly and without eort. In contrast with episodic memory, this knowledge does not refer to unique and concrete personal experiences but to factual knowledge of the world. Bouseld and Cohen [BC52] advanced the idea, later shared by Mandler [Man67], that individuals tend to organize or cluster information into groups or subgroups. Numerous data models have been proposed to represent semantic memory, including: Feature mod- els, Associative models, Statistical models and Network models. In the network model, knowledge is assumed to be represented as a set of nodes connected by links. The nodes may represent concepts, words, expressions, perceptual features or nothing at all. Links may be weighted such that some links are stronger than others.
En savoir plus

241 En savoir plus

Discovering linguistic patterns using sequence mining

Discovering linguistic patterns using sequence mining

there is no other frequent sequential pattern S ′ such that S  S ′ and sup(S) = sup(S ′ ). For instance, with minsup = 2, the sequential pattern h(N OU N )(N OU N )i from Ta- ble 1 is not closed whereas h(N OU N )(de P REP )(N OU N )i is closed. The constraint-based pattern paradigm [6] brings useful techniques to express the user’s interest in order to focus on the most promising patterns. A very well-used con- straint is the frequency. A sequence S is frequent if and only if sup(S) ≥ minsup where minsup is a threshold given by a user. However, it is possible to define many other useful constraints such as the gap constraint. A gap is a sequence of itemsets which may be skipped between two itemsets of a sequence S. g(M, N ) represents a gap whose size is within the range [M, N ] where M and N are integers. The range [M, N ] is called a gap-constraint. A sequential pattern satisfying the gap-constraint [M, N ] is denoted by P [M,N ] . It means there is a gap g(M, N ) between every two neigh-
En savoir plus

13 En savoir plus

Knowledge-based Sequence Mining with ASP

Knowledge-based Sequence Mining with ASP

Generally speaking, we have demonstrated that the mod- eling capacities of ASP are appropriate to express complex sequence mining tasks, where choice rules define the search space, integrity constraints eliminate invalid candidate pat- terns, and preferences distinguish optimal outcomes. In view of the data-intense domain at hand, some efforts had to be taken to develop economic encodings of generic sequence mining tasks. Experts can now benefit from our work and take advantage of it by stating further constraints or preferences, respectively, for extracting the patterns of their interest. Acknowledgments This work was partially funded by DFG grant SCHA 550/9.
En savoir plus

9 En savoir plus

Show all 3231 documents...