Title: RandomMatrixTheoryfor AI: FromTheorytoPractice
Keywords: Machine learning, randommatrixtheory, concentration of measure, neural net- works, GANs
Abstract: AI nowadays relies largely on us- ing large data and enhanced machine learning methods which consist in developing classifica- tion and inference algorithms leveraging large datasets of large sizes. These large dimensions induce many counter-intuitive phenomena, lead- ing generally to a misunderstanding of the be- havior of many machine learning algorithms of- ten designed with small data dimension intu- itions. By taking advantage of (rather than suf- fering from) the multidimensional setting, ran- dom matrixtheory (RMT) is able to predict the performance of many non-linear algorithms as complex as some random neural networks as well as many kernel methods such as Sup- port Vector Machines, semi-supervised classifi- cation, principal component analysis or spec- tral clustering. To characterize the performance of these algorithms theoretically, the underlying data model is often a Gaussian mixture model (GMM) which seems to be a strong assump- tion given the complex structure of real data (e.g., images). Furthermore, the performance of machine learning algorithms depends on the choice of data representation (or features) on which they are applied. Once again, consid- ering data representations as Gaussian vectors seems to be quite a restrictive assumption. Re- lying on randommatrixtheory, this thesis aims at going beyond the simple GMM hypothesis, by studying classical machine learning tools un- der the hypothesis of Lipschitz-ally transformed Gaussian vectors also called concentrated ran- dom vectors, and which are more generic than Gaussian vectors. This hypothesis is particu- larly motivated by the observation that one can use generative models (e.g., GANs) to design complex and realistic data structures such as im-
However, as Pafka and Kondor (2004) state, the empirical estimator of the covariance matrix often suffers from the “curse of dimensions”. In practice, many times the length of the stock returns’ time series (T ) used for the estimation is not big enough compared to the number of stocks (N ) one wishes to consider. As a result, the obtained estimated covariance matrix is ill conditioned. Typically, an ill conditioned covariance matrix exhibits implausibly large off-diagonal elements. Michaud (1989) points out that inverting such a matrix amplifies the estimation errors tremendously. Furthermore, when N is bigger than T , the sample covariance matrix is even not invertible at all (see Ledoit and Wolf, 2003). Another limit of the empirical estimator of the covariance matrix is pointed out by DeMiguel and Nogales (2007), due to the fact that the empirical covariance matrix is
The Tom language is an extension of Java 1 that provides rule based constructs. In particular, any
Java program is a Tom program. We call this kind of extension formal islands  where the ocean consists of Java code and the island of algebraic patterns. In this sense, the Tom system takes a Tom file composed of a mix between Java and Tom constructs as input and transforms it into a Java file as output. The system has a pipeline structure where each module process the given input and passes the result to the next one. In Fig. 1 we illustrate the order of modules which are responsible for different compilation phases:
The third lever is transparency, that is, the disclosure of the decisions or individual actions of agents to all members of the concerned community. Such a tool has been tested in experimental economics for a variety of scenarios and its positive effects on behavior was demonstrated repeatedly (Masclet et al., 2003; Fehr and Gatcher, 2000). It is sometimes shown to be more effective than a penalty (Foulon, et al., 2002). It is often advocated as an auxiliary management tool, its main advantage being its very favorable cost- benefit ratio. However, some studies would indicate that its effect varies from one community to another, being conditioned by the prior existence of a norm defining what constitutes virtuous behavior (d’Adda, 2011; Travers et al., 2011). Its effectiveness likewise depends on the weight carried by reputation. Transparency may take on a variety of forms according to whether individual or collective behaviors are being disclosed, and whether by name or anonymously.
Editor: L´ eon Bottou
We introduce a generic scheme for accelerating gradient-based optimization methods in the sense of Nesterov. The approach, called Catalyst, builds upon the inexact acceler- ated proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary problems, leading to faster con- vergence. One of the keys to achieve acceleration in theory and in practice is to solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy. We give practical guidelines to use Catalyst and present a compre- hensive analysis of its global complexity. We show that Catalyst applies to a large class of algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, MISO/Finito, and their proximal variants. For all of these methods, we establish faster rates using the Catalyst acceleration, for strongly convex and non-strongly convex objectives. We conclude with extensive experiments showing that acceleration is useful in practice, especially for ill-conditioned problems.
For centuries, scientists have addressed such problems by deriv- ing theoretical frameworks from first principles or have accumulated knowledge in order to model, analyze and understand the pheno- menon under study. For example, practitioners know from past ex- perience that elderly heart attack patients with low blood pressure are generally high risk. Similarly, meteorologists know from elemen- tary climate models that one hot, high pollution day is likely to be followed by another. For an increasing number of problems however, standard approaches start showing their limits. For example, identify- ing the genetic risk factors for heart disease, where knowledge is still very sparse, is nearly impractical for the cognitive abilities of humans given the high complexity and intricacy of interactions that exist be- tween genes. Likewise, for very fine-grained meteorological forecasts, a large number of variables need to be taken into account, which quickly goes beyond the capabilities of experts to put them all into a system of equations. To break this cognitive barrier and further ad- vance science, machines of increasing speed and capacity have been built and designed since the mid-twentieth century to assist humans in their calculations. Amazingly however, alongside this progress in terms of hardware, developments in theoretical computer science, ar- tificial intelligence and statistics have made machines to become more than calculators. Recent advances have made them experts of their own kind, capable to learn from data and to uncover by themselves the predictive structure of problems. Techniques and algorithms that have stemmed from the field of machine learning have indeed now become a powerful tool for the analysis of complex and large data, successfully assisting scientists in numerous breakthroughs of vari- ous fields of science and technology. Public and famous examples
be seen on Figure 1, a significant amount of sharing can be obtained for provenance within and across queries by using provenance circuits.
This suggests a practical way for computing prove- nance of queries over a relational database: induc- tively construct a provenance circuit over input tu- ples for each operation performed in a query, reusing parts of the circuit that have been constructed by subqueries. By constructing this circuit in the uni- versal m-semiring, it then becomes easy to instanti- ate it to a wide variety of semirings and m-semirings.
This work would not have been possible without the help of my co- authors. I would first like to thank Michael Eickenberg for the work we did together and for being always enthusiastic about new ideas. I would also like to thank Philippe Ciuciu for sharing his expertise on HRF estimation with me. The rest of the Parietal team also deserves a mention for coping with me during so much time: Gael Varoquaux (for bringing me to France five years ago), Régine Bricquet (a dedicated assistant makes a big differ- ence), Elvis “amigo amigo computador” Dohmatob, Danilo Bzdok, Vincent “comme ta soeur” Michel, Aina “sobrasada” Frau, Fernando Yepes, Mehdi Rahim, Alexandre Abraham, Virgile Fritsch, Jean Kossaifi, Andres Hoyos, Loic Esteve, Yannick “horrible personne” Schwarz, Olivier Grisel, Salma Bougacha, Philippe Gervais, Benoit “petaflop” Da Mota, Bernard Ng, Vi- viana “reghishtrashion” Siless, Solveig Badillo, Nicolas Chauffert and Matthieu Kowalski. I’ve also had the pleasure to interact with people from the Unicog team, from which I would like to mention Valentina Borghesani, Manuela Piazza, Christophe Pallier, Elodie Cauvet, Evelyn Eger, Lucie Charles, Pedro Pinhero Chagas and Ramón Guevara. I’m also grateful to the scikit-learn crowd for teaching me so much about programming and machine learn- ing: Andreas Mueller, Vlad Niculae, Lars Buitinck, Mathieu Blondel, Jake VanderPlas, Peter Prettenhofer and many others.
Previous works identify interesting properties for aggregation decomposition. A very relevant classification of aggregation func- tions, introduced in , is based on the size of sub-aggregation (i.e., partial aggregation). This classification distinguishes between distributive and algebraic aggregation, having sub-aggregate with fixed sizes, and holistic functions, where there is no constant bound on the storage size needed to describe a subaggregation. Some al- gebraic properties, such as associativity and commutativity, are identified as sufficient conditions for aggregation decomposition [6, 28]. Compared to these works, our work provides a generic framework to identify the decomposability of any symmetric ag- gregation and generate generic algorithms to process it in parallel. On the other side of sharing aggregation computation, [9, 11, 16, 29] focus on aggregate functions with varying selection predicates and group-by attributes. In dynamic data processing, [15, 19, 22] concentrate on windowed aggregate queries with different ranges and slides.  proposes a framework to manage the partial aggrega- tion results, and it has shown performance improvement compared to modern data analysis library e.g. Numpy. Previous works focus on optimizing queries with aggregation functions having differ- ent group attributes, predicates, and windows (range and slide), while we concentrate on sharing computation results for completely different aggregation functions without these constraints (aggre- gation simply runs on input dataset). And our solutions can be trivially extended to relational queries with aggregation functions by exploiting the contribution of previous works.
As we see from this description, the different p-values use overlapping bits (2– 25, 3–26, etc.) of the same numbers. There is no reason to expect that they are independent, contrary to the requirements of the Kolmogorov–Smirnov test. This description also exhibits another problem that appears in many tests from diehard suite. We use some asymptotic approximation, in this case the Pois- son distribution, instead of the true distribution, ignoring the approximation error for which we have no upper bounds. Moreover, even if the error can be upper-bounded for the primary test, this upper bound does not translate easily into a bound for an error in the secondary test where we use the approximate distribution for recalibrating the deviations. Sometimes even the parameters of approximate distribution are only guessed. For example, in the description of one of the tests (named OQSO) Marsaglia writes about the distribution: “The mean is based on theory; sigma comes from extensive simulation”. For the other one (called “parking lot test”) even the mean is based on simulation: “Simulation shows that k should average 3523 with sigma 21.9 and is very close to normally distributed. Thus (k − 3523)/21.9 should be a standard normal variable, which, converted to a uniform variable, provides input to a KSTEST based on a sample of 10”. Here KSTEST is the Kolmogorov–Smirnov test for uniform distribution. The arising problem is described by Marsaglia as follows:
Mots clés Nutrition entérale · Fibres alimentaires · Diarrhée · Glycémie · Prébiotiques · Synbiotiques
Abstract Even though enteral nutrition has gained an indis- putable position in the intensive care unit (ICU), the use of fiber-enriched formulas remains controversial. Health- promoting features of dietary fibers (DF) outside the ICU reach a good level of evidence: alongside functional proper- ties, DF have metabolic effects, mediated by short-chain fatty acids from colonic fermentation; some DF, named pre- biotics, are a specific substrate of beneficial bacteria in the intestinal microbiota. However, the number and variety of compounds amongst DF make comparisons difficult, with consequently conflicting results of ICU clinical trials. The most promising results come from soluble fibers that help improve glycemic control. Although DF may help improve digestive health, prevention studies on diarrhea (except from partially hydrolysed guar gum) or constipation are not convincing. Reduced intestinal barrier permeability, leading to fewer bacterial translocation episodes, is a feature of synbiotic (prebiotics and probiotics together) use. A consen- sus on the use of DF in enteral nutrition formulas in the ICU is therefore difficult to reach. Only larger studies using the same DF (including prebiotics) may result in new recom- mendations for clinical practice. To cite this journal: Réanimation 20 (2011).
Our datasets only contain points and in order to create disks for the hitting set prob- lem we have utilized two different strategies. In the first approach we create uniformly distributed disks in the unit square with uniformly distributed radius within the range [0, r]. Let us denote this test case as RN D(r). In the second approach we added disks centered at each point of the dataset with a fixed radius of 0.001. Let us denote this test case by F IX(0.001). The results are shown in Table 2 for two values r = 0.1 and r = 0.01. Our algorithm provides a 1.3 approximation on average. With small radius the solver seems to outperform our algorithm but this is most likely due to the fact that the problems become relatively simpler and various branch-and-bound heuristics be- come efficient. With bigger radius and therefore more complex constraint matrix our algorithm clearly outperforms the IP solver. Our method obtains a hitting set for all point sets, while in some of the cases the IP solver was unable to compute a solution in reasonable time (we terminate the solver after 1 hour).
the EVA for a candidate of age a is:
EVA (a) = P hit|age a − Hit rate N × E L − a|age a (3)
Example: To see how EVA works, suppose that a candidate
has a 20% chance of hitting in 10 accesses, 30% chance of hitting in 20 accesses, and a 50% chance of being evicted in 32 accesses. We would expect to get 0.5 hits from this can- didate, but these come at the cost of the candidate spending an expected 24 accesses in the cache. If the cache’s hit rate were 40% and it had 16 lines, then a line would yield 0.025 hits per access on average, so this candidate would cost an expected 24 × 0.025 = 0.6 forgone hits. Altogether, the can- didate yields an expected net 0.5−0.6 = −0.1 hits—its value added over the average candidate is negative! In other words, retaining this candidate would tend to lower the cache’s hit rate, even though its chance of hitting (50%) is larger than the cache’s hit rate (40%). It simply takes space for too much time to be worth the investment.
The second Julesz approach to texture preattentive discrimination theory introduced the notion of textons (blobs, terminators, line crossings, etc.) [ 3 ]. The texton theory assumes that the density of local indicators (the textons) is responsible for texture preattentive discrimination: images with the same texton densities should not be discriminated. A main axiom of the texton theory is that texture perception is invariant torandom shifts of the textons [ 3 ]. The shift invariance of this second Julesz theory can be made into a synthesis algorithm building a texture from initial shapes by random shifts. In the Julesz toy algorithm used in his discrimination experiments, this construction was a mere juxtaposition of simple shapes on a regular grid, with random shifts avoiding overlap. This random shift principle can be used to make realistic textures provided a linear superposition is authorized, by which the colors of overlapping objects are averaged. Textures obtained by the combined use of random shifts and linear superposition will be called random shift textures. We shall discuss thoroughly their relation torandom phase textures.
Moreover, typical results of analysis-by-synthesis of real data have confirmed that the convergence error was generally very small when the IIX system successfully processed a signal. The whole approach has then been used in two other applications: as a limiting framework to evaluate the quality of the results of analysis-by-synthesis experiments and as a criterion to define the range of acceptable movement times in an experiment dealing with rapid human movements. These latter two systematic methodologies provide automatic and robust ways for fixing thresh- olds in data analysis based on the delta-lognormal model.
show such similar features, such as common critical exponents, are said to sit in the same universality class, and their continuous limit shall be the same theory. The intuitive idea behind universality is that, if, in their definition, two models of random maps differ by features which seem to be details (irrelevant), those details should not be important in the continuous limit. For the previous examples, the fact that one looks at random maps with faces of degree 3 or 4 should not have a great influence on the continuous limit. On the contrary, some features of a model (decorations or constraints) may be crucial, as they can have an influence on the structures of the maps more likely to appear, so there is not only one universality class forrandom maps. For instance, the case of example 4.3 (which is the Ising model on random quadrangulations), shows very different behaviour at the critical point, whether it be for the value of the string susceptibility or the form of the function ˜ F g I . This means that the continuous limit of this model is different from the continuous limit of the two others. In the Ising model, some colored configurations have a greater weight than others, and there is an interplay between the map structure and the decorations. Thus, in the continuous limit, the colored map shall mimic the interaction between a matter field and the metric (i.e. gravity), which is different from mimicking purely gravity. The model I •
Figure 6: Operational semantics of ccsl.
exists a rewriting sequence from t 1 to t. The arrow =>* is used to restrict the
length of rewriting steps to none, one or more steps. If the arrow =>! is used instead of =>*, then a solution must be a term that cannot be further rewritten. Maude has an efficient LTL model checker, supporting model checking of the properties expressed in linear temporal logic. Given a Maude specification of a system and a set of atomic propositions defined on system states, the Maude LTL model checker is invoked by the built-in function modelCheck, which takes two arguments, an initial state and an LTL formula, and returns true if no counterexample is found, and otherwise a counterexample as a witness to the violation. An LTL formula is built out of atomic propositions that need to be predefined and logical and temporal operators in LTL. A condition of doing LTL model checking in Maude is that the set of states that are reachable from the given initial state is finite.
An increasing number of studies evaluate the potential benefit of external cuing devices for memory-impaired patients. Pitel et al. (2006) reported that following traumatic brain injury two patients were able to acquire limited procedural knowledge about how to use an electronic cuing device (e.g., programming appointments); however, none of them used the device in everyday life. Wilson et al. (1997a, 2001 ) examined the effects of a reminder system termed Neuropage ® on prospective memory in a large group of patients with different etiologies and different degrees of prospec- tive memory impairment. Neuropage ® is a communication system which enables one to send simple alphanumeric messages to a pager carried by the patient. The treatment program targeted individual prospective memory failures; therefore, the messages mostly con- tained information recalling a specific task that had to be done at a specific time. In a pilot study with 15 memory-impaired patients Wilson et al. (1997a) found that the proportion of target tasks that were completed by the patients increased from 37 to 85% when the pager was introduced. In a single-case study, Evans et al. (1998) confirmed the beneficial effect of the pager system on execution of target tasks in a patient with a severe dysexecutive syndrome characterized by impaired attention to action, difficulty with action planning and ritual-like dysfunctional behaviors. After introduc- tion of the pager, the patient showed a significant increase of the probability to carry out the target actions on time. More recently, Wilson et al. (2001) performed a randomized, crossover study on 143 memory-impaired patients with various etiologies. The par- ticipants were randomly assigned to group A, which received the pager for 7 weeks immediately post-baseline, or group B, receiv- ing the pager once the treatment phase of group A had finished. The results confirmed the beneficial effect of the pager system, as the proportion of completed target tasks increased in both groups during the treatment phase, but not during baseline. In addition, the authors found that in some patients withdrawal of the pager Linden and Coyette, 1995; Todd and Barrow, 2008 ), suggesting that
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional af ﬁliations.
In [ 5 ] we have proved that the achievability problem is undecidable. In this paper we present a way to relax the achievability problem in order to restore decidability. In particu- lar, we require correctness only for the final configuration, while numerical constraints and conflicts are ignored for the intermediary states traversed during the deployment run. In many cases dealing with service deployment, violating capacity constraints during installa- tion and configuration is not problematic because the services become publicly available only at the end. More precisely, we split the achievability problem in three separate phases: the verification of the existence of a final correct configuration that includes the target compo- nent (Configuration problem), the synthesis of such a configuration (Generation problem), and the computation of a deployment run reaching such a configuration (Planning problem). In this last phase, we exploit the efficient poly-time algorithm developed for the simplified Aeolus model without numerical constraints and conflicts. For this reason, it could indeed happen that such constraints are violated during the execution of the deployment run.