{emmanuel.casseau}@irisa.fr
Abstract
Reconfigurable computing system is emerging as an important computing system for satisfying the present and future computing demands in performance and flexibility. Extensible processor is a representative implementation of reconfigurable computing. In this context, custom instruction enumeration problem is one of the most computationally difficult problems involved in custom instruction synthesis for extensible processors. Custom instruction enumeration problem is essentially enumerating connected convex subgraphs from a given application graph. In this paper, we propose a provable **optimal** **algorithm** for enumerating connected convex subgraphs in acyclic digraphs in the sense of time complexity. The running time of the proposed **algorithm** is

En savoir plus
Structure of the paper. This paper is structured as follows. In section II we provide some background on the Euclidean Bipartite Matching Problem and on some notions in probabil- ity and group theory. We formally state the stochastic Stacker Crane Problem and the objectives of the paper in section III. In section IV we introduce and analyze the SPLICE algo- rithm, a polynomial-time, asymptotically **optimal** **algorithm** for the SCP, while in section V we present simulation results corroborating our findings. Finally, in section VI, we draw some conclusions and discuss directions for future work.

En savoir plus
The results are shown in Tables 1 and 2, where the best results are indicated in bold. The spectral algorithms RoE, EoR and GKM tend to be outperformed by the other algorithms. To perform well, GKM needs θ 1 to be positive and large (see [Ghosh et al., 2011]); whenever θ 1 ≤ 0 or |θ 1 | is small,
GKN tends to make a sign mistake causing a large error. Also the analysis of RoE and EoR assumes that the task-worker graph is a random D-regular graph (so that the worker-worker matrix has a large spectral gap). Here this assumption is violated and the practical performance suffers noticeably, so that this limitation is not only theoretical. KOS performs consistently well, and seems immune to sign ambiguity, see instance (iii). Further, while the analysis of KOS also assumes that the task-worker graph is random D-regular, its practical performance does not seem sensitive to that assumption. The performance of S-EM is good except when sign estimation is hard (instance (iii), b = 1). This seems due to the fact that the initialization of S-EM (see the **algorithm** description) is not good in this case. Hence the limitation of b being of order √ n is not only theoretical but practical as well. In fact (combining our results and the ideas of [Zhang et al., 2014]), this suggests a new **algorithm** where one uses EM with TE as the initial value of θ.

En savoir plus
r? A naive formulation uses nr + mr variables and yields an **algorithm** that is exponential in n
and m even for constant r. Arora et al. [Proceedings of STOC, 2012, pp. 145–162] recently reduced the number of variables to 2 r 2 2 r , and here we exponentially reduce the number of variables to
2 r 2 and this yields our main **algorithm**. In fact, the **algorithm** that we obtain is nearly **optimal** (under the exponential time hypothesis) since an **algorithm** that runs in time ( nm) o(r) would yield a subexponential **algorithm** for 3-SAT [Proceedings of STOC, 2012, pp. 145–162]. Our main result is based on establishing a normal form for nonnegative matrix factorization—which in turn allows us to exploit algebraic dependence among a large collection of linear transformations with variable entries. Additionally, we also demonstrate that nonnegative rank cannot be certified by even a very large submatrix of M, and this property also follows from the intuition gained from viewing nonnegative rank through the lens of systems of polynomial inequalities.

En savoir plus
U is unique (we can invert U mod x D+1 because U(0) = 1). When N1 < N2, we have that D+1 2 ≤ N2 (Proposition 4) and so, if the the Pad´e approx- imant of A of type ( D+1
2 −1, D+1 2 ) exists, by Lemma 18, we can recover P v from it. The existence of this Pad´e approximant is equivalent to the condition Uv(0) = 1, which means vN 1 +1 = 1. In the **algorithm** proposed in the conference version of this paper (Bender et al., 2016, **Algorithm** 3), the correctness of our algorithms relied on this condition. In that version, we ensured this prop- erty with a generic linear change of coordinates in the original polynomial f . In this paper, we skip this assumption. Following Bostan et al. (2017, Theorem 7.2), when N1 < N2, we can com- pute v no matter the value of vN 1 +1. This approach has a softly-linear arithmetic complexity and

En savoir plus
6. Conclusion and Perspectives
We have proved that the kl-UCB ++ **algorithm** is both minimax- and asymptotically-**optimal** for the exponential distribution families described in Section 2 . So far, this **algorithm** requires the horizon T as a parameter: to keep the proofs clear and simple, we have deferred to future work the analysis of an anytime variant. We believe, though, that obtaining such an extension should be possible by using the tools developed in Degenne and Perchet ( 2016 ). In addition, we have focused in this paper on asymptotic optimality without trying to derive explicit finite-time bounds: we believe that this would have impaired the clarity and simplicity of the reasoning. But it is certainly a challenging and important objective to design a general strategy that would, in addition to minimax- and asymptotic optimality, would also reach the important notion of finite-time instance near optimality of Lattimore

En savoir plus
An almost **optimal** approximate composable core-set. In [ IMGR18 ], the authors designed composable
core-sets of size O(k log k) with approximation guarantee of ˜ O(k) k for the determinant maximization prob-
lem. Moreover, they showed that the best approximation one can achieve is Ω(k k−o(k) ) for any polynomial
size core-sets, proving that their **algorithm** is almost **optimal**. However, its complexity makes it less appealing in practice. First of all, the **algorithm** requires an explicit representation of the point set, which is not present for many DPP applications; a common case is that the DPP kernel is given by an oracle which returns the inner product between the points; in this setting, the **algorithm** needs to construct the associated gram matrix, and use SVD decomposition to recover the point set, making the time and memory quadratic in the size of the point-set. Secondly, even in the point set setting, the **algorithm** is not efficient for large inputs as it requires solving O(kn) linear programs, where n is size of the point set.

En savoir plus
Meet-irreducible elements
The meet-irreducible elements of the lattice are those whose height function admits exactly one local maximum in the interior of D. For each vertex in the interior of D, one can compute the height in the minimal and maximal tilings of D using Thurston’s original **algorithm**. The possible values for h(v) vary 3 by 3 in the case of lozenges, 4 by 4 in the case or dominoes, so all the possibilities can easily be computed. For each pair deﬁned by v and an admissible height, there exists a meet-irreducible element of the lattice, which can be computed using the generalized Thurston **algorithm**.

En savoir plus
Abstract: We reexamine the work of Stumm and Walther on multistage algorithms for adjoint computation. We provide an **optimal** **algorithm** for this problem when there are two levels of checkpoints, in memory and on disk. Previously, **optimal** algorithms for adjoint computations were known only for a single level of checkpoints with no writing and reading costs; a well-known example is the binomial checkpointing **algorithm** of Griewank and Walther. Stumm and Walther extended that binomial checkpointing **algorithm** to the case of two levels of checkpoints, but they did not provide any optimality results. We bridge the gap by designing the first **optimal** **algorithm** in this context. We experimentally compare our **optimal** **algorithm** with that of Stumm and Walther to assess the difference in performance.

En savoir plus
2.2 Positioning
Problem statement The basic problem of **optimal** broadcast in a CAN is that, as a CAN is a P2P network, each peer only has information about the zone it manages, and the zones managed by its neighbours. Consequently, it is impossible to split the entire network into sub-spaces such that each zone exactly belongs to one sub-space: in Figure 1, the initiator has no knowledge about Z and cannot know that it must give the whole responsibility for zone Z to either D or F. Indeed, the initiator could decide that F is responsible for the lower half of Z, and that D is responsible for the upper half. In that case, Z would receive the message twice. It is possible to design an **optimal** **algorithm** based on sub-spaces, but this **algorithm** is inefficient because it almost never splits the space to be covered, and only one message is communicated at a time 1

En savoir plus
edges with negative objective function score, in this case we do not know the cardinality of the path beforehand. If the weight of the lightest decreasing path +1 is smaller than x ik , we identified a violated inequality (5). Since
we compute the minimum weight path in a directed acyclic graph with edge weights less than or equal to zero, we can not apply Dijkstra’s **algorithm**. Instead we traverse all nodes in topological order, which is provided by sorting according to the above defined order on the nodes. A constraint of type (5) only cuts off the current solution if its x ik value is greater than zero. In practice, a

En savoir plus
1 Introduction
**Optimal** Transport is an efficient and flexible tool to compare two probability distributions which has been popularized in the computer vision community in the context of discrete histograms [Rub- ner et al., 2000]. The introduction of entropic regularization in [Cuturi, 2013] has made possible the use of the fast Sinkhorn–Knopp **algorithm** [Sinkhorn, 1964] scaling with high dimensional data. Regularized **optimal** transport have thus been intensively used in Machine Learning with applica- tions such as Geodesic PCA [Seguy and Cuturi, 2015], domain adaptation [Courty et al., 2015], data fitting [Frogner et al., 2015], training of Boltzmann Machine [Montavon et al., 2016] or dictionary learning [Rolet et al., 2016, Schmitz et al., 2017].

En savoir plus
Conjugate points for this problem can be computed by the **algorithm** of §II.
Indeed, any **optimal** control is smooth outside isolated points called Π-singularities where an instantaneous rotation of angle pi occurs [6]. The norm of the control is thus (almost everywhere) maximum and the equation of the mass is solved by m(t) = m 0 − βF max t. As a result, though non-autonomous, the system is a particular case of a sub- Riemannian system for which the previous **algorithm** holds. Indeed, any smooth **optimal** control defines a singularity of the endpoint mapping where controls are taken on the sphere of radius F max : Although the system is affine in the command, controls can easily be reparameterized in order that the Legendre-Clebsh condition be satisfied. Test (6) is used in the normal case with free final time, and the rank is tested by a singular value decomposition of the n − 1 = 5 Jacobi fields computed by cotcot . An equivalent test is to look for zeros of the determinant of the projection of Jacobi fields with the dynamics along the trajectory:

En savoir plus
Recently, many researchers have extensively studied low power systems in accordance with Dynamic Voltage and Frequency Scaling. In [14], authors pro- posed an energy-aware DVFS (EA-DVFS) **algorithm** that aims to exploit the slack time as much as possible to reduce the deadline miss rate. This can be achived by using a good tradeoff between the saved energy and the proces- sor speed. The available energy mainly depends on the energy stored in the reservoir and the energy harvested from the renewable energy source. In case of insufficient available energy, the processor slows down the task execution; otherwise, the tasks are executed at maximum processor speed. The advan- tage of EA-DVFS is that it increases the percentage of feasibly executed tasks and reduces the storage capacity in case of low overload. However, EA-DVFS sill suffer from some inconvenients: First, authors perform the energy avail- ability test based only on the single current task. Second, the scheduler can continue its operation as long as the energy is sufficient to complete execut- ing a task whose relative deadline is no more than the remaining operation time of system at maximum processor speed [14]. For example, let us consider that the energy reservoir has only 1% energy and the system can execute the current task at full speed without exhausting the energy reservoir. Then, the EA-DVFS scheduler will run the task at maximum processor speed, which is not the correct behavior. Second, when using the task slacks, the proposed **algorithm** only considers the currect task instead of take into account all tasks in the ready queue. Hence, slack time is not fully exploited for reducing energy consumption.

En savoir plus
Actes des Journées Recherche en Imagerie et Technologies pour la Santé - RITS 2015 178
**Optimal** Spectral Histology of Human Normal Colon by Genetic **Algorithm** Ihsen FARAH 1,2 , Thi Nguyet Que NGUYEN 1,2 , Audrey GROH 3 , Dominique GUENOT 3 ,
Pierre JEANNESSON 1,2 ,Cyril GOBINET 1,2 ∗

ABSTRACT
In this paper, we are interested in **optimal** sensor placement for signal extraction. Recently, a new criterion based on out- put signal to noise ratio has been proposed for sensor place- ment. However, to solve the optimization problem, a greedy approach is used over a grid, which is not **optimal**. To im- prove this method, we present an optimization approach to locate all the sensors at once. We further add a constraint to the problem that controls the average distances between the sensors. To solve our problem, we use an alternating op- timization penalty method. As the associated cost function is non-convex, the proposed **algorithm** should be carefully initialized. We propose to initialize it with the result of the greedy method. Experimental results show the superiority of the proposed method over the greedy approach.

En savoir plus
Research Report n° 8375 — September 2013 — 16 pages
Abstract: Structured peer-to-peer networks are powerful underlying structures for communica-
tion and storage systems in large-scale setting. In the context of the Content-Addressable Network (CAN), this paper addresses the following challenge: how to perform an efficient broadcast while the local view of the network is restricted to a set of neighbours? In existing approaches, either the broadcast is inefficient (there are duplicated messages) or it requires to maintain a particular structure among neighbours, e.g. a spanning tree. We define a new broadcast primitive for CAN that sends a minimum number of messages while covering the whole network, without any global knowledge. Currently, no other **algorithm** achieves those two goals in the context of CAN. In this sense, the contribution we propose in this paper is threefold. First, we provide an **algorithm** that sends exactly one message per recipient without building a global view of the network. Second, we prove the absence of duplicated messages and the coverage of the whole network when using this **algorithm**. Finally, we show the practical benefits of the **algorithm** throughout experiments.

En savoir plus
Dynamic Voltage and Frequency Scaling (DVFS) is a promising and broadly used energy efficient technique to overcome the main issues when using a finite energy reservoir capacity and uncertain energy source in real-time embedded systems. This work investigates an energy management scheme for real-time task scheduling in variable voltage processors located in sensor nodes and powered by ambient energy sources. We use DVFS technique to decrease the energy consumption of sensors at the time when the energy sources are limited. In particular, we develop and prove an **optimal** real-time scheduling framework with speed stretching, namely Energy Guarantee Dynamic Voltage and Frequency Scaling (EG-DVFS), that jointly accounts for the timing constraints and the energy state incurred by the properties of the system components. EG-DVFS relies on the well-known ED-H scheduling **algorithm** combined with the DVFS technique. The sensor processing frequency is fine tuned to further minimize energy consumption and to achieve an energy autonomy of the system. Further, an exact feasibility

En savoir plus
To cite this version : Escrig, Benoît Splitting **algorithm** for DMT **optimal** cooperative MAC protocols in wireless mesh networks. (2011) Physical Communication, 4. pp.218-226. ISSN 1874-4907 㩷
O pen A rchive T oulouse A rchive O uverte ( OATAO )

Index Terms
Adaption coefficient, blind equalization, CMA, exact line search, SISO and SIMO channels.
I. I NTRODUCTION
An important problem in digital communications is the recovery of the data symbols transmitted through a distorting medium. The constant modulus (CM) criterion is arguably the most widespread blind channel equalization principle [1], [2]. The CM criterion generally presents local extrema — often associated with different equalization delays — in the equalizer parameter space [3]. This shortcoming renders the performance of gradient-based implementations, such as the well-known constant modulus **algorithm** (CMA), very dependent on the equalizer impulse response initialization. Even when the absolute minimum is found, convergence can be severely slowed down for initial equalizer settings with trajectories in the vicinity of saddle points [4], [5]. The constant value of the step-size parameter (or adaption coefficient) must be carefully selected to ensure a stable operation while balancing convergence rate and final accuracy (misadjustment or excess mean square error). The stochastic gradient CMA drops the expectation operator and approximates the gradient of the criterion by a one-sample estimate, as in LMS-based algorithms. This rough approximation generally leads to slow convergence and poor misadjustment, even if the step size is carefully chosen.

En savoir plus