Reconfigurable computing system is emerging as an important computing system for satisfying the present and future computing demands in performance and flexibility. Extensible processor is a representative implementation of reconfigurable computing. In this context, custom instruction enumeration problem is one of the most computationally difficult problems involved in custom instruction synthesis for extensible processors. Custom instruction enumeration problem is essentially enumerating connected convex subgraphs from a given application graph. In this paper, we propose a provable optimalalgorithm for enumerating connected convex subgraphs in acyclic digraphs in the sense of time complexity. The running time of the proposed algorithm is
Structure of the paper. This paper is structured as follows. In section II we provide some background on the Euclidean Bipartite Matching Problem and on some notions in probabil- ity and group theory. We formally state the stochastic Stacker Crane Problem and the objectives of the paper in section III. In section IV we introduce and analyze the SPLICE algo- rithm, a polynomial-time, asymptotically optimalalgorithm for the SCP, while in section V we present simulation results corroborating our findings. Finally, in section VI, we draw some conclusions and discuss directions for future work.
The results are shown in Tables 1 and 2, where the best results are indicated in bold. The spectral algorithms RoE, EoR and GKM tend to be outperformed by the other algorithms. To perform well, GKM needs θ 1 to be positive and large (see [Ghosh et al., 2011]); whenever θ 1 ≤ 0 or |θ 1 | is small,
GKN tends to make a sign mistake causing a large error. Also the analysis of RoE and EoR assumes that the task-worker graph is a random D-regular graph (so that the worker-worker matrix has a large spectral gap). Here this assumption is violated and the practical performance suffers noticeably, so that this limitation is not only theoretical. KOS performs consistently well, and seems immune to sign ambiguity, see instance (iii). Further, while the analysis of KOS also assumes that the task-worker graph is random D-regular, its practical performance does not seem sensitive to that assumption. The performance of S-EM is good except when sign estimation is hard (instance (iii), b = 1). This seems due to the fact that the initialization of S-EM (see the algorithm description) is not good in this case. Hence the limitation of b being of order √ n is not only theoretical but practical as well. In fact (combining our results and the ideas of [Zhang et al., 2014]), this suggests a new algorithm where one uses EM with TE as the initial value of θ.
r? A naive formulation uses nr + mr variables and yields an algorithm that is exponential in n
and m even for constant r. Arora et al. [Proceedings of STOC, 2012, pp. 145–162] recently reduced the number of variables to 2 r 2 2 r , and here we exponentially reduce the number of variables to
2 r 2 and this yields our main algorithm. In fact, the algorithm that we obtain is nearly optimal (under the exponential time hypothesis) since an algorithm that runs in time ( nm) o(r) would yield a subexponential algorithm for 3-SAT [Proceedings of STOC, 2012, pp. 145–162]. Our main result is based on establishing a normal form for nonnegative matrix factorization—which in turn allows us to exploit algebraic dependence among a large collection of linear transformations with variable entries. Additionally, we also demonstrate that nonnegative rank cannot be certified by even a very large submatrix of M, and this property also follows from the intuition gained from viewing nonnegative rank through the lens of systems of polynomial inequalities.
U is unique (we can invert U mod x D+1 because U(0) = 1). When N1 < N2, we have that D+1 2 ≤ N2 (Proposition 4) and so, if the the Pad´e approx- imant of A of type ( D+1
2 −1, D+1 2 ) exists, by Lemma 18, we can recover P v from it. The existence of this Pad´e approximant is equivalent to the condition Uv(0) = 1, which means vN 1 +1 = 1. In the algorithm proposed in the conference version of this paper (Bender et al., 2016, Algorithm 3), the correctness of our algorithms relied on this condition. In that version, we ensured this prop- erty with a generic linear change of coordinates in the original polynomial f . In this paper, we skip this assumption. Following Bostan et al. (2017, Theorem 7.2), when N1 < N2, we can com- pute v no matter the value of vN 1 +1. This approach has a softly-linear arithmetic complexity and
6. Conclusion and Perspectives
We have proved that the kl-UCB ++ algorithm is both minimax- and asymptotically-optimal for the exponential distribution families described in Section 2 . So far, this algorithm requires the horizon T as a parameter: to keep the proofs clear and simple, we have deferred to future work the analysis of an anytime variant. We believe, though, that obtaining such an extension should be possible by using the tools developed in Degenne and Perchet ( 2016 ). In addition, we have focused in this paper on asymptotic optimality without trying to derive explicit finite-time bounds: we believe that this would have impaired the clarity and simplicity of the reasoning. But it is certainly a challenging and important objective to design a general strategy that would, in addition to minimax- and asymptotic optimality, would also reach the important notion of finite-time instance near optimality of Lattimore
An almost optimal approximate composable core-set. In [ IMGR18 ], the authors designed composable
core-sets of size O(k log k) with approximation guarantee of ˜ O(k) k for the determinant maximization prob-
lem. Moreover, they showed that the best approximation one can achieve is Ω(k k−o(k) ) for any polynomial
size core-sets, proving that their algorithm is almost optimal. However, its complexity makes it less appealing in practice. First of all, the algorithm requires an explicit representation of the point set, which is not present for many DPP applications; a common case is that the DPP kernel is given by an oracle which returns the inner product between the points; in this setting, the algorithm needs to construct the associated gram matrix, and use SVD decomposition to recover the point set, making the time and memory quadratic in the size of the point-set. Secondly, even in the point set setting, the algorithm is not efficient for large inputs as it requires solving O(kn) linear programs, where n is size of the point set.
The meet-irreducible elements of the lattice are those whose height function admits exactly one local maximum in the interior of D. For each vertex in the interior of D, one can compute the height in the minimal and maximal tilings of D using Thurston’s original algorithm. The possible values for h(v) vary 3 by 3 in the case of lozenges, 4 by 4 in the case or dominoes, so all the possibilities can easily be computed. For each pair deﬁned by v and an admissible height, there exists a meet-irreducible element of the lattice, which can be computed using the generalized Thurston algorithm.
Abstract: We reexamine the work of Stumm and Walther on multistage algorithms for adjoint computation. We provide an optimalalgorithm for this problem when there are two levels of checkpoints, in memory and on disk. Previously, optimal algorithms for adjoint computations were known only for a single level of checkpoints with no writing and reading costs; a well-known example is the binomial checkpointing algorithm of Griewank and Walther. Stumm and Walther extended that binomial checkpointing algorithm to the case of two levels of checkpoints, but they did not provide any optimality results. We bridge the gap by designing the first optimalalgorithm in this context. We experimentally compare our optimalalgorithm with that of Stumm and Walther to assess the difference in performance.
Problem statement The basic problem of optimal broadcast in a CAN is that, as a CAN is a P2P network, each peer only has information about the zone it manages, and the zones managed by its neighbours. Consequently, it is impossible to split the entire network into sub-spaces such that each zone exactly belongs to one sub-space: in Figure 1, the initiator has no knowledge about Z and cannot know that it must give the whole responsibility for zone Z to either D or F. Indeed, the initiator could decide that F is responsible for the lower half of Z, and that D is responsible for the upper half. In that case, Z would receive the message twice. It is possible to design an optimalalgorithm based on sub-spaces, but this algorithm is inefficient because it almost never splits the space to be covered, and only one message is communicated at a time 1
edges with negative objective function score, in this case we do not know the cardinality of the path beforehand. If the weight of the lightest decreasing path +1 is smaller than x ik , we identified a violated inequality (5). Since
we compute the minimum weight path in a directed acyclic graph with edge weights less than or equal to zero, we can not apply Dijkstra’s algorithm. Instead we traverse all nodes in topological order, which is provided by sorting according to the above defined order on the nodes. A constraint of type (5) only cuts off the current solution if its x ik value is greater than zero. In practice, a
Optimal Transport is an efficient and flexible tool to compare two probability distributions which has been popularized in the computer vision community in the context of discrete histograms [Rub- ner et al., 2000]. The introduction of entropic regularization in [Cuturi, 2013] has made possible the use of the fast Sinkhorn–Knopp algorithm [Sinkhorn, 1964] scaling with high dimensional data. Regularized optimal transport have thus been intensively used in Machine Learning with applica- tions such as Geodesic PCA [Seguy and Cuturi, 2015], domain adaptation [Courty et al., 2015], data fitting [Frogner et al., 2015], training of Boltzmann Machine [Montavon et al., 2016] or dictionary learning [Rolet et al., 2016, Schmitz et al., 2017].
Conjugate points for this problem can be computed by the algorithm of §II.
Indeed, any optimal control is smooth outside isolated points called Π-singularities where an instantaneous rotation of angle pi occurs . The norm of the control is thus (almost everywhere) maximum and the equation of the mass is solved by m(t) = m 0 − βF max t. As a result, though non-autonomous, the system is a particular case of a sub- Riemannian system for which the previous algorithm holds. Indeed, any smooth optimal control defines a singularity of the endpoint mapping where controls are taken on the sphere of radius F max : Although the system is affine in the command, controls can easily be reparameterized in order that the Legendre-Clebsh condition be satisfied. Test (6) is used in the normal case with free final time, and the rank is tested by a singular value decomposition of the n − 1 = 5 Jacobi fields computed by cotcot . An equivalent test is to look for zeros of the determinant of the projection of Jacobi fields with the dynamics along the trajectory:
Recently, many researchers have extensively studied low power systems in accordance with Dynamic Voltage and Frequency Scaling. In , authors pro- posed an energy-aware DVFS (EA-DVFS) algorithm that aims to exploit the slack time as much as possible to reduce the deadline miss rate. This can be achived by using a good tradeoff between the saved energy and the proces- sor speed. The available energy mainly depends on the energy stored in the reservoir and the energy harvested from the renewable energy source. In case of insufficient available energy, the processor slows down the task execution; otherwise, the tasks are executed at maximum processor speed. The advan- tage of EA-DVFS is that it increases the percentage of feasibly executed tasks and reduces the storage capacity in case of low overload. However, EA-DVFS sill suffer from some inconvenients: First, authors perform the energy avail- ability test based only on the single current task. Second, the scheduler can continue its operation as long as the energy is sufficient to complete execut- ing a task whose relative deadline is no more than the remaining operation time of system at maximum processor speed . For example, let us consider that the energy reservoir has only 1% energy and the system can execute the current task at full speed without exhausting the energy reservoir. Then, the EA-DVFS scheduler will run the task at maximum processor speed, which is not the correct behavior. Second, when using the task slacks, the proposed algorithm only considers the currect task instead of take into account all tasks in the ready queue. Hence, slack time is not fully exploited for reducing energy consumption.
Actes des Journées Recherche en Imagerie et Technologies pour la Santé - RITS 2015 178
Optimal Spectral Histology of Human Normal Colon by Genetic Algorithm Ihsen FARAH 1,2 , Thi Nguyet Que NGUYEN 1,2 , Audrey GROH 3 , Dominique GUENOT 3 ,
Pierre JEANNESSON 1,2 ,Cyril GOBINET 1,2 ∗
In this paper, we are interested in optimal sensor placement for signal extraction. Recently, a new criterion based on out- put signal to noise ratio has been proposed for sensor place- ment. However, to solve the optimization problem, a greedy approach is used over a grid, which is not optimal. To im- prove this method, we present an optimization approach to locate all the sensors at once. We further add a constraint to the problem that controls the average distances between the sensors. To solve our problem, we use an alternating op- timization penalty method. As the associated cost function is non-convex, the proposed algorithm should be carefully initialized. We propose to initialize it with the result of the greedy method. Experimental results show the superiority of the proposed method over the greedy approach.
Research Report n° 8375 — September 2013 — 16 pages
Abstract: Structured peer-to-peer networks are powerful underlying structures for communica-
tion and storage systems in large-scale setting. In the context of the Content-Addressable Network (CAN), this paper addresses the following challenge: how to perform an efficient broadcast while the local view of the network is restricted to a set of neighbours? In existing approaches, either the broadcast is inefficient (there are duplicated messages) or it requires to maintain a particular structure among neighbours, e.g. a spanning tree. We define a new broadcast primitive for CAN that sends a minimum number of messages while covering the whole network, without any global knowledge. Currently, no other algorithm achieves those two goals in the context of CAN. In this sense, the contribution we propose in this paper is threefold. First, we provide an algorithm that sends exactly one message per recipient without building a global view of the network. Second, we prove the absence of duplicated messages and the coverage of the whole network when using this algorithm. Finally, we show the practical benefits of the algorithm throughout experiments.
Dynamic Voltage and Frequency Scaling (DVFS) is a promising and broadly used energy efficient technique to overcome the main issues when using a finite energy reservoir capacity and uncertain energy source in real-time embedded systems. This work investigates an energy management scheme for real-time task scheduling in variable voltage processors located in sensor nodes and powered by ambient energy sources. We use DVFS technique to decrease the energy consumption of sensors at the time when the energy sources are limited. In particular, we develop and prove an optimal real-time scheduling framework with speed stretching, namely Energy Guarantee Dynamic Voltage and Frequency Scaling (EG-DVFS), that jointly accounts for the timing constraints and the energy state incurred by the properties of the system components. EG-DVFS relies on the well-known ED-H scheduling algorithm combined with the DVFS technique. The sensor processing frequency is fine tuned to further minimize energy consumption and to achieve an energy autonomy of the system. Further, an exact feasibility
To cite this version : Escrig, Benoît Splitting algorithm for DMT optimal cooperative MAC protocols in wireless mesh networks. (2011) Physical Communication, 4. pp.218-226. ISSN 1874-4907 㩷
O pen A rchive T oulouse A rchive O uverte ( OATAO )
Adaption coefficient, blind equalization, CMA, exact line search, SISO and SIMO channels.
I. I NTRODUCTION
An important problem in digital communications is the recovery of the data symbols transmitted through a distorting medium. The constant modulus (CM) criterion is arguably the most widespread blind channel equalization principle , . The CM criterion generally presents local extrema — often associated with different equalization delays — in the equalizer parameter space . This shortcoming renders the performance of gradient-based implementations, such as the well-known constant modulus algorithm (CMA), very dependent on the equalizer impulse response initialization. Even when the absolute minimum is found, convergence can be severely slowed down for initial equalizer settings with trajectories in the vicinity of saddle points , . The constant value of the step-size parameter (or adaption coefficient) must be carefully selected to ensure a stable operation while balancing convergence rate and final accuracy (misadjustment or excess mean square error). The stochastic gradient CMA drops the expectation operator and approximates the gradient of the criterion by a one-sample estimate, as in LMS-based algorithms. This rough approximation generally leads to slow convergence and poor misadjustment, even if the step size is carefully chosen.