sequence family and others to a particular motif type. A famous one dedicated to proteins is ScanProsite, where motifs are built upon regular expressions that are searched either by a query in a precomputed database or with the algorithm ps-scan. A number of tools are dedicated to RNA sequences, in response to the increasing needs of structure exploration in the complex RNA world boosted by the recent importance of non coding RNA studies. For instance, RNAmotif, RNAbob, Hypasearch[10, 20] and Palingol have been designed for the de- scription of patterns as a succession of stems and loops, usually offering the possibility of choosing either a standard Watson-Crick pairing (A-U, G-C) or a pairing including Wobble (A-U, G-C, G-U). A more recent tool in this category, Structator, significantly improves the parsing time by making use of an index structure that is suited for the analysis of palindromic structures, the affix arrays. Patterns may also contain some sequence information on words that have to be present in particular places of the stems or the loops. RNAmotif is probablly the most popular in this category.
ing duration constraints we can refine the patterns, for example the expression p ∩ (true · hqi [a,∞] · true) considers only q-bursts that last for at least a time. The
problem of timed patternmatching has been introduced and solved in : find all the sub-segments of a multi-dimensional Boolean signal of a bounded vari- ability that match a timed regular expression. This work was automaton-free and worked inductively on the structure of the formula in an offline manner. An online version of that procedure has been developed in  based on a novel extension of Brzozowski derivatives  to timed regular expressions and dense time signals. Both works, which have been implemented in the tool Montre  did not use the full syntax of timed regular expressions and hence did not match the expressive power of timed automata (see ). In this paper we explore an alternative automaton-based procedure whose scope of application is wider than the expressions used in [27,28] as it works with any timed language definable by a timed automaton and is agnostic about the upstream pattern specification language. Let us mention another recent automaton-based approach is the one of , a real-time extension of the Boyer-Moore patternmatching method. In contrast to our work, the procedure in  works on TA defined over time-event sequences and it requires pre-computing the region graph from the TA specifi- cation. The same authors improve this result in , by using a more efficient variant of the Boyer-Moore algorithm and by replacing the region automaton by the more efficient zone-simulation graph.
* Corresponding author
Background: Nonribosomal peptides (NRPs), bioactive secondary metabolites produced by many
microorganisms, show a broad range of important biological activities (e.g. antibiotics, immunosuppressants, antitumor agents). NRPs are mainly composed of amino acids but their primary structure is not always linear and can contain cycles or branchings. Furthermore, there are several hundred different monomers that can be incorporated into NRPs. The N ORINE database, the first resource entirely dedicated to NRPs, currently stores more than 700 NRPs annotated with their monomeric peptide structure encoded by undirected labeled graphs. This opens a way to a systematic analysis of structural patterns occurring in NRPs. Such studies can investigate the functional role of some monomeric chains, or analyse NRPs that have been computationally predicted from the synthetase protein sequence. A basic operation in such analyses is the search for a given structural patternin the database.
is not. This is due to traces of the first request made by TPF clients for each triple pattern of each query to decide the join ordering (TPF servers send cardinalities of concerned triples in their answers). Thus, when processing Q 7 , the client makes
a first request of each triple pattern and then decides to begin with the first triple pattern. Then it binds resulting mappings into the ?book variable of the second triple pattern to retrieve corresponding authors. This nested-loop is deduced in BGP. But as output mappings of the first request of the second triple pattern intersects with the values of the inner loop, LIFT deduces BGP with a self-join that is very unlikely and that can be easily filtered in a post-processing. Such situation appears in 6 of 29 queries: Q 7 , Q 12−14 , Q 21 and Q 25 .
In this paper we present a new compilation and optimization method based on program transformation. The principle is to separate the compilation of patternmatching from the optimization, in order to improve modularity and make exten- sions simpler. In a first step, the patterns are compiled using a simple, but safe algorithm. Then, optimizations are directly performed on the generated code, using transformation rules. Separating optimization from compilation eases the compila- tion of extensions, such as new equational theories, or the addition of or-patterns for example. Another contribution of this paper is to define a set of rules which defines the optimization, to show their correction as well as their effectiveness on real programs. The presented approach has been implemented and applied to Tom, a language extension which adds pattern-matching facilities to C and Java.
Figure 3: A sample run of our prototype searching matches between the Pond and Pesticides models presented in Ex- amples 1 and 2.
patterns and models are all originated from various works performed by ecologists, in particular master students who have modeled contrasted ecosystems. The models are representation of ecosystems from the south of France (Camargue), the Alpes (Chamrousse) and ecosystems in Africa (Uganda, Karamoja). The patterns searched are mainly species interactions such as predation, competition, symbiosis, etc. It is out of the scope of this paper to describe these interac- tions, but we would like to pinpoint that they are all patterns and models that ecologists are actually inter- ested in and not arbitrary examples. In particular, we did not include the “pond” and “pesticides” models in this benchmark, because they have been designed to illustrate this paper and have no ecological interest. For each search, we have defined a timeout of 3 min- utes (180 seconds) 1 after which Sat4j was interrupted. Among the 252 searches resulting from this bench- mark, 194 (77%) returned an optimal solution before the timeout, and 58 (23%) have been interrupted re- sulting in a non-optimal solution, as summarized in Figure 4. Even if the search time is short, we can ob- serve that an optimal solution is found in most cases. For the other ones a solution, even if not optimal, is found anyway.
We then provide the split operator (Sec. 6). This applies our tiling transformations to filters defined on the full image, converting them into a series of filters that operate within image tiles, followed by filters across tiles, and then a final filter that assembles these two intermediate results and computes the final output. This tiling trans- formation exploits linearity and associativity of IIR filters. Internally, our compiler also makes critical performance optimization by mini- mizing back-and-forth communication between intra- and inter-tile computation and instead fuses computation by granularity level; e.g., we fuse all intra-tile stages because they have the same dependence pattern as the original non-tiled pipeline. This results in a compact internal graph of operations. We implement these transformations by mutating the internal Halide representation of the pipeline. Internally, our split operator mutates the original filters into a series of Halide functions corresponding to the intermediate opera- tions. We introduce automatic scheduling (Sec. 7), i.e. automatically mapping all the generated operations efficiently onto the hardware. We identify common patterns across the different operations and use heuristics to ensure memory coalescing, minimal bank conflicts for GPU targets, ideal thread pools, and unrolling/vectorization op- tions. We are able to do this without the hand-tuning or expensive autotuning required by general Halide pipelines because we can aggressively restrict our consideration to a much smaller space of “sensible” schedules for tiled recursive filters. Our heuristics sched- ule the entire pipeline automatically, which we show performs on par with manual scheduling. We also expose high-level schedul- ing operators to allow the user to easily write manual schedules, if needed, by exploiting the same restricted structure in the generated pipelines.
The works on the run-time verification of logical formu- las are more recent and addresses the problem in a real- time context. The objective is to check whether the run of a system under scrutiny satisfies or violates some correct- ness properties. In , a technique is proposed to translate a correctness property ϕ into a monitor used to check the current execution of a system. The property is expressed in timed linear time temporal logic (TLTL), a natural counter- part of LTL in the timed setting . The run-time verifica- tion shares many similarities with model checking, but there are important differences: only one execution is checked (not all possible execution paths), the run is bounded (only finite traces are considered) and the techniques focus on on-line checking (considering incremental check and disallowing to make multiple passes over the sequence of events). Never- theless, Antescofo temporal patterns only deal with the his- tory of past events to produce their output, while formula in TLTL may express rules that require future information to be entirely evaluated. This leads  to the development of an ad-hoc three valued semantic (true, false and don’t know yet) which is not relevant to decide if a pattern matches the pre- fix of a trace. The problem is that TLTL is not totally suited to the task of patternmatching: it is both too expressive and sometimes too cumbersome. Antescofo temporal pattern se- quences lead to formulas of form ψ ∧⋄ϕ where ψ are formu- las whose validity can be decided without having to look at future events and ϕ are formulas of the same form. So, An-
Abstract—Empowering software engineers often requires to let them write code transformations. However existing automated or tool-supported approaches force developers to have a detailed knowledge of the internal representation of the underlying tool. While this knowledge is time consuming to master, the syntax of the language, on the other hand, is already well known to developers and can serve as a strong foundation for patternmatching. Pattern languages with metavariables (that is variables holding abstract syntax subtrees once the pattern has been matched) have been used to help programmers define program transformations at the language syntax level. The question raised is then the engineering cost of metavariable support. Our contribution is to show that, with a GLR parser, such patterns with metavariables can be supported by using a form of runtime reflexivity on the parser internal structures. This approach allows one to directly implement such patterns on any parser generated by a parser generation framework, without asking the pattern writer to learn the AST structure and node types. As a use case for that approach we describe the implementation built on top of the SmaCC (Smalltalk Compiler Compiler) GLR parser generator framework. This approach has been used in production for source code transformations on a large scale. We will express perspectives to adapt this approach to other types of parsing technologies.
This thesis has two main motivations. Our rst motivation is to get closer to the PPM's application, especially for bioinformatic, which focuses on sequences of genetic material such as DNA, RNA, or protein sequences. In particular for the DNA sequence, its transmission is studied. However, this transmission may result in the modication of the sequence: some parts can be added, removed or swapped. As such, it is a problem to decide whether two sequences are related. More generally, deciding whether a trait (which is a characteristic of an individual), given as a sequence, is present in a DNA sequence is a problem. Nonetheless, not all modications occur, as some are lethal to the descendant and cannot thus be observed. As such, there is a logic in a sequence and it can thus split into blocs. Moreover, we know that some blocs are conserved in order for the trait to be present. So for a trait to be present in a sequence, it must contains blocs such that they appear in the same order as the blocs of trait. The PPM is found when labelling each bloc by a number. Additionally, the blocs cannot be ordered arbitrarily. There may be some dependencies between blocs (depending on the trait), such as, some blocs cannot be in the same trait and some blocs need to be next to each other. We represent this dependencies with the avoiding classes and the bivincular patterns. Our second motivation is to grasp a better understanding over some objects. Obviously, the rst objects are separable permutations and wedge permutations, in which we explore their structures. This can be used in random generation and combinatorics. The second object is the bivincular patterns, which is tool not yet well-known. The study of bivincular patterns is the continuity of the study of permutation pattern as bivincular pattern generalise permutation pattern.
Searching for motifs in graphs has become a crucial problem in the analysis of biological networks (e.g. protein-protein interaction, regulatory and metabolic networks). Roughly speaking, there exist two different views of graph motifs. Topological motifs (patterns occurring in the network) are the classical view [22, 23, 30–32] and computationally reduce to graph isomorphism, in the broad meaning of that term. These motifs have recently been identified as basic mod- ules of molecular information processing. By way of contrast, functional motifs, introduced recently by Lacroix et al. , do not rely on the key concept of topology conservation but focus on connectedness of the network vertices sought. This latter approach has been considered in subsequent papers [2, 6, 12]. For- mally, searching for a functional motif reduces to the following graph problem (referred hereafter as Graph Motif) : Given a target vertex-colored graph G = (V, E) and a multiset of colors M of size k, find a subset V ′ ⊆ V , |V ′ | = k
4.2.1 Choosing a mining strategy (from guideline G3) To implement our progressive pattern mining algorithm, we chose to start from the GSP  algorithm, available in the SPMF library . GSP is well-known (many pattern mining algorithms were actually designed as its variations) and it natively uses a breadth-first search. As a breadth-first Apriori-like sequential pattern mining algorithm, GSP computes the frequent event types in the data, then combines them to generate all the candidates of size 2. Each of these candidates is then checked to see if it is frequent or not. Infrequent candidates are discarded, and frequent ones are combined to obtain the candidates of size 3. This process continues until a user-given size limit, or when no candidates or frequent patterns are found. 4.2.2 Extracting episodes and occurrences (from G1 & G2) We modified GSP’s behavior to extract serial episodes instead of sequential patterns, by keeping track of all the pattern occurrences rather than the sequence ID they were discovered in. This change also led to use an absolute support threshold instead of a relative one to determine if a candidate is frequent or not. In addition to the support threshold, our algorithm takes the following parameters to provide boundaries for the pattern space:
There are lot of avenues for future work. We primarily focused on the con- straint brought by using a sliding window when taking into account the duration of the occurrences in time series rule matching. However, as with all the methods relying on a distance measure, setting the right threshold is a very important and difficult task. That is even more important in our case, because we work from the shape which is highly dependent of the expert judgment. That’s why we pro- pose to explore interactive learning, to learn the threshold by asking the expert to evaluate a sub-set of occurrences found, then to iterate until an acceptable threshold was found.
Agent’s initial knowledge regarding possible values of premaster secret The actions of a TLS session that contain the premaster secret can be used in different orders (e.g. the client can generate the premaster secret before or af- ter actually sending the first message to the server). There may even be different actions used in different types of sessions. The actions could for example contain the operating system of the device they are performed on. If there was a known weakness in random number generation of one of the possible operating systems, the particular type of a TLS session would indeed have an influence on its security. However, since we assume a properly working random number generator, confiden- tiality of the premaster secret does not depend on the particular order of actions or type of session that is being used. Consequently, we need not differentiate be- tween different types of TLS sessions, hence our model does not contain any such differentiation. This does not hold in general as there may very well be weaknesses of a protocol, e.g. the generation of a not sufficiently long random number that occurs only in specific types of sessions.
patterns. It defines a patternin an accurate and complete form of formula with a graphical representation. A diagram in LePUS is a graph whose nodes correspond to variables and whose arcs are labeled with binary relations.
While many security patterns have been designed, still few works propose gen- eral development techniques for security patterns. A survey of approaches of secu- rity patterns is proposed in . For the first approach of this kind , design patterns are usually represented by diagrams with notations such as UML object, annotated with textual descriptions and examples of code. There are some well- proven approaches  based on Gamma et al. However, this kind of techniques does not allow to reach the high degree of pattern structure flexibility which is required to achieve our target. The framework promoted by LePUS  is in- teresting but the degree of expressiveness proposed to design a pattern is too restrictive.
90 Chapter 6 — Conclusions and Future Work
multi-join queries while giving more opportunities to share executions plans with queries of type 1.
We tackle the problem of node failures during the dissemination of queries using a gossip protocol based on the concept of anti-entropy. This gossip protocol allows the detection and cor- rection of inconsistencies about the knowledge of queries in the system by continuous gossiping. Basically, the gossip algorithm is deployed on top of the DHT using the information stored in the Finger table of nodes for random gossip exchange. This level of determinism on the choice of which nodes to gossip reduce redundant messages while achieving complete dissemination under node failures. Node failures during query execution is tackled through the collaboration between consumer and producer operators eliminating unnecessary intermediate tuples that do not contribute to join results. Thus, a significant savings in terms of network traffic is obtained. DHTJoin provides an efficient solution to deal with overloaded nodes as a result of data skew. An overload node use the information stored in its Finger table to choose some underload node to which tuples of the overloaded node are distributed. We show that, in this case, DHTJoin incurs only one additional message per joined tuple produced, thus keeping response time low.