Title: **Random** **Matrix** **Theory** **for** AI: **From** **Theory** **to** **Practice**
Keywords: Machine learning, **random** **matrix** **theory**, concentration of measure, neural net- works, GANs
Abstract: AI nowadays relies largely on us- ing large data and enhanced machine learning methods which consist in developing classifica- tion and inference algorithms leveraging large datasets of large sizes. These large dimensions induce many counter-intuitive phenomena, lead- ing generally **to** a misunderstanding of the be- havior of many machine learning algorithms of- ten designed with small data dimension intu- itions. By taking advantage of (rather than suf- fering **from**) the multidimensional setting, ran- dom **matrix** **theory** (RMT) is able **to** predict the performance of many non-linear algorithms as complex as some **random** neural networks as well as many kernel methods such as Sup- port Vector Machines, semi-supervised classifi- cation, principal component analysis or spec- tral clustering. **To** characterize the performance of these algorithms theoretically, the underlying data model is often a Gaussian mixture model (GMM) which seems **to** be a strong assump- tion given the complex structure of real data (e.g., images). Furthermore, the performance of machine learning algorithms depends on the choice of data representation (or features) on which they are applied. Once again, consid- ering data representations as Gaussian vectors seems **to** be quite a restrictive assumption. Re- lying on **random** **matrix** **theory**, this thesis aims at going beyond the simple GMM hypothesis, by studying classical machine learning tools un- der the hypothesis of Lipschitz-ally transformed Gaussian vectors also called concentrated ran- dom vectors, and which are more generic than Gaussian vectors. This hypothesis is particu- larly motivated by the observation that one can use generative models (e.g., GANs) **to** design complex and realistic data structures such as im-

En savoir plus
196 En savoir plus

However, as Pafka and Kondor (2004) state, the empirical estimator of the covariance **matrix** often suffers **from** the “curse of dimensions”. In **practice**, many times the length of the stock returns’ time series (T ) used **for** the estimation is not big enough compared **to** the number of stocks (N ) one wishes **to** consider. As a result, the obtained estimated covariance **matrix** is ill conditioned. Typically, an ill conditioned covariance **matrix** exhibits implausibly large off-diagonal elements. Michaud (1989) points out that inverting such a **matrix** amplifies the estimation errors tremendously. Furthermore, when N is bigger than T , the sample covariance **matrix** is even not invertible at all (see Ledoit and Wolf, 2003). Another limit of the empirical estimator of the covariance **matrix** is pointed out by DeMiguel and Nogales (2007), due **to** the fact that the empirical covariance **matrix** is

En savoir plus
1 Introduction
The Tom language is an extension of Java 1 that provides rule based constructs. In particular, any
Java program is a Tom program. We call this kind of extension formal islands [1] where the ocean consists of Java code and the island of algebraic patterns. In this sense, the Tom system takes a Tom file composed of a mix between Java and Tom constructs as input and transforms it into a Java file as output. The system has a pipeline structure where each module process the given input and passes the result **to** the next one. In Fig. 1 we illustrate the order of modules which are responsible **for** different compilation phases:

En savoir plus
Transparency
The third lever is transparency, that is, the disclosure of the decisions or individual actions of agents **to** all members of the concerned community. Such a tool has been tested in experimental economics **for** a variety of scenarios and its positive effects on behavior was demonstrated repeatedly (Masclet et al., 2003; Fehr and Gatcher, 2000). It is sometimes shown **to** be more effective than a penalty (Foulon, et al., 2002). It is often advocated as an auxiliary management tool, its main advantage being its very favorable cost- benefit ratio. However, some studies would indicate that its effect varies **from** one community **to** another, being conditioned by the prior existence of a norm defining what constitutes virtuous behavior (d’Adda, 2011; Travers et al., 2011). Its effectiveness likewise depends on the weight carried by reputation. Transparency may take on a variety of forms according **to** whether individual or collective behaviors are being disclosed, and whether by name or anonymously.

En savoir plus
Editor: L´ eon Bottou
Abstract
We introduce a generic scheme **for** accelerating gradient-based optimization methods in the sense of Nesterov. The approach, called Catalyst, builds upon the inexact acceler- ated proximal point algorithm **for** minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary problems, leading **to** faster con- vergence. One of the keys **to** achieve acceleration in **theory** and in **practice** is **to** solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy. We give practical guidelines **to** use Catalyst and present a compre- hensive analysis of its global complexity. We show that Catalyst applies **to** a large class of algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, MISO/Finito, and their proximal variants. **For** all of these methods, we establish faster rates using the Catalyst acceleration, **for** strongly convex and non-strongly convex objectives. We conclude with extensive experiments showing that acceleration is useful in **practice**, especially **for** ill-conditioned problems.

En savoir plus
225 En savoir plus

be seen on Figure 1, a significant amount of sharing can be obtained **for** provenance within and across queries by using provenance circuits.
This suggests a practical way **for** computing prove- nance of queries over a relational database: induc- tively construct a provenance circuit over input tu- ples **for** each operation performed in a query, reusing parts of the circuit that have been constructed by subqueries. By constructing this circuit in the uni- versal m-semiring, it then becomes easy **to** instanti- ate it **to** a wide variety of semirings and m-semirings.

This work would not have been possible without the help of my co- authors. I would first like **to** thank Michael Eickenberg **for** the work we did together and **for** being always enthusiastic about new ideas. I would also like **to** thank Philippe Ciuciu **for** sharing his expertise on HRF estimation with me. The rest of the Parietal team also deserves a mention **for** coping with me during so much time: Gael Varoquaux (**for** bringing me **to** France five years ago), Régine Bricquet (a dedicated assistant makes a big differ- ence), Elvis “amigo amigo computador” Dohmatob, Danilo Bzdok, Vincent “comme ta soeur” Michel, Aina “sobrasada” Frau, Fernando Yepes, Mehdi Rahim, Alexandre Abraham, Virgile Fritsch, Jean Kossaifi, Andres Hoyos, Loic Esteve, Yannick “horrible personne” Schwarz, Olivier Grisel, Salma Bougacha, Philippe Gervais, Benoit “petaflop” Da Mota, Bernard Ng, Vi- viana “reghishtrashion” Siless, Solveig Badillo, Nicolas Chauffert and Matthieu Kowalski. I’ve also had the pleasure **to** interact with people **from** the Unicog team, **from** which I would like **to** mention Valentina Borghesani, Manuela Piazza, Christophe Pallier, Elodie Cauvet, Evelyn Eger, Lucie Charles, Pedro Pinhero Chagas and Ramón Guevara. I’m also grateful **to** the scikit-learn crowd **for** teaching me so much about programming and machine learn- ing: Andreas Mueller, Vlad Niculae, Lars Buitinck, Mathieu Blondel, Jake VanderPlas, Peter Prettenhofer and many others.

En savoir plus
134 En savoir plus

Previous works identify interesting properties **for** aggregation decomposition. A very relevant classification of aggregation func- tions, introduced in [20], is based on the size of sub-aggregation (i.e., partial aggregation). This classification distinguishes between distributive and algebraic aggregation, having sub-aggregate with fixed sizes, and holistic functions, where there is no constant bound on the storage size needed **to** describe a subaggregation. Some al- gebraic properties, such as associativity and commutativity, are identified as sufficient conditions **for** aggregation decomposition [6, 28]. Compared **to** these works, our work provides a generic framework **to** identify the decomposability of any symmetric ag- gregation and generate generic algorithms **to** process it in parallel. On the other side of sharing aggregation computation, [9, 11, 16, 29] focus on aggregate functions with varying selection predicates and group-by attributes. In dynamic data processing, [15, 19, 22] concentrate on windowed aggregate queries with different ranges and slides. [27] proposes a framework **to** manage the partial aggrega- tion results, and it has shown performance improvement compared **to** modern data analysis library e.g. Numpy. Previous works focus on optimizing queries with aggregation functions having differ- ent group attributes, predicates, and windows (range and slide), while we concentrate on sharing computation results **for** completely different aggregation functions without these constraints (aggre- gation simply runs on input dataset). And our solutions can be trivially extended **to** relational queries with aggregation functions by exploiting the contribution of previous works.

En savoir plus
As we see **from** this description, the different p-values use overlapping bits (2– 25, 3–26, etc.) of the same numbers. There is no reason **to** expect that they are independent, contrary **to** the requirements of the Kolmogorov–Smirnov test. This description also exhibits another problem that appears in many tests **from** diehard suite. We use some asymptotic approximation, in this case the Pois- son distribution, instead of the true distribution, ignoring the approximation error **for** which we have no upper bounds. Moreover, even if the error can be upper-bounded **for** the primary test, this upper bound does not translate easily into a bound **for** an error in the secondary test where we use the approximate distribution **for** recalibrating the deviations. Sometimes even the parameters of approximate distribution are only guessed. **For** example, in the description of one of the tests (named OQSO) Marsaglia writes about the distribution: “The mean is based on **theory**; sigma comes **from** extensive simulation”. **For** the other one (called “parking lot test”) even the mean is based on simulation: “Simulation shows that k should average 3523 with sigma 21.9 and is very close **to** normally distributed. Thus (k − 3523)/21.9 should be a standard normal variable, which, converted **to** a uniform variable, provides input **to** a KSTEST based on a sample of 10”. Here KSTEST is the Kolmogorov–Smirnov test **for** uniform distribution. The arising problem is described by Marsaglia as follows:

En savoir plus
Mots clés Nutrition entérale · Fibres alimentaires · Diarrhée · Glycémie · Prébiotiques · Synbiotiques
Abstract Even though enteral nutrition has gained an indis- putable position in the intensive care unit (ICU), the use of fiber-enriched formulas remains controversial. Health- promoting features of dietary fibers (DF) outside the ICU reach a good level of evidence: alongside functional proper- ties, DF have metabolic effects, mediated by short-chain fatty acids **from** colonic fermentation; some DF, named pre- biotics, are a specific substrate of beneficial bacteria in the intestinal microbiota. However, the number and variety of compounds amongst DF make comparisons difficult, with consequently conflicting results of ICU clinical trials. The most promising results come **from** soluble fibers that help improve glycemic control. Although DF may help improve digestive health, prevention studies on diarrhea (except **from** partially hydrolysed guar gum) or constipation are not convincing. Reduced intestinal barrier permeability, leading **to** fewer bacterial translocation episodes, is a feature of synbiotic (prebiotics and probiotics together) use. A consen- sus on the use of DF in enteral nutrition formulas in the ICU is therefore difficult **to** reach. Only larger studies using the same DF (including prebiotics) may result in new recom- mendations **for** clinical **practice**. **To** cite this journal: Réanimation 20 (2011).

En savoir plus
Our datasets only contain points and in order **to** create disks **for** the hitting set prob- lem we have utilized two different strategies. In the first approach we create uniformly distributed disks in the unit square with uniformly distributed radius within the range [0, r]. Let us denote this test case as RN D(r). In the second approach we added disks centered at each point of the dataset with a fixed radius of 0.001. Let us denote this test case by F IX(0.001). The results are shown in Table 2 **for** two values r = 0.1 and r = 0.01. Our algorithm provides a 1.3 approximation on average. With small radius the solver seems **to** outperform our algorithm but this is most likely due **to** the fact that the problems become relatively simpler and various branch-and-bound heuristics be- come efficient. With bigger radius and therefore more complex constraint **matrix** our algorithm clearly outperforms the IP solver. Our method obtains a hitting set **for** all point sets, while in some of the cases the IP solver was unable **to** compute a solution in reasonable time (we terminate the solver after 1 hour).

En savoir plus
the EVA **for** a candidate of age a is:
EVA (a) = P hit|age a − Hit rate N × E L − a|age a (3)
Example: **To** see how EVA works, suppose that a candidate
has a 20% chance of hitting in 10 accesses, 30% chance of hitting in 20 accesses, and a 50% chance of being evicted in 32 accesses. We would expect **to** get 0.5 hits **from** this can- didate, but these come at the cost of the candidate spending an expected 24 accesses in the cache. If the cache’s hit rate were 40% and it had 16 lines, then a line would yield 0.025 hits per access on average, so this candidate would cost an expected 24 × 0.025 = 0.6 forgone hits. Altogether, the can- didate yields an expected net 0.5−0.6 = −0.1 hits—its value added over the average candidate is negative! In other words, retaining this candidate would tend **to** lower the cache’s hit rate, even though its chance of hitting (50%) is larger than the cache’s hit rate (40%). It simply takes space **for** too much time **to** be worth the investment.

En savoir plus
The second Julesz approach **to** texture preattentive discrimination **theory** introduced the notion of textons (blobs, terminators, line crossings, etc.) [ 3 ]. The texton **theory** assumes that the density of local indicators (the textons) is responsible **for** texture preattentive discrimination: images with the same texton densities should not be discriminated. A main axiom of the texton **theory** is that texture perception is invariant **to** **random** shifts of the textons [ 3 ]. The shift invariance of this second Julesz **theory** can be made into a synthesis algorithm building a texture **from** initial shapes by **random** shifts. In the Julesz toy algorithm used in his discrimination experiments, this construction was a mere juxtaposition of simple shapes on a regular grid, with **random** shifts avoiding overlap. This **random** shift principle can be used **to** make realistic textures provided a linear superposition is authorized, by which the colors of overlapping objects are averaged. Textures obtained by the combined use of **random** shifts and linear superposition will be called **random** shift textures. We shall discuss thoroughly their relation **to** **random** phase textures.

En savoir plus
Moreover, typical results of analysis-by-synthesis of real data have confirmed that the convergence error was generally very small when the IIX system successfully processed a signal. The whole approach has then been used in two other applications: as a limiting framework **to** evaluate the quality of the results of analysis-by-synthesis experiments and as a criterion **to** define the range of acceptable movement times in an experiment dealing with rapid human movements. These latter two systematic methodologies provide automatic and robust ways **for** fixing thresh- olds in data analysis based on the delta-lognormal model.

En savoir plus
show such similar features, such as common critical exponents, are said **to** sit in the same universality class, and their continuous limit shall be the same **theory**. The intuitive idea behind universality is that, if, in their definition, two models of **random** maps differ by features which seem **to** be details (irrelevant), those details should not be important in the continuous limit. **For** the previous examples, the fact that one looks at **random** maps with faces of degree 3 or 4 should not have a great influence on the continuous limit. On the contrary, some features of a model (decorations or constraints) may be crucial, as they can have an influence on the structures of the maps more likely **to** appear, so there is not only one universality class **for** **random** maps. **For** instance, the case of example 4.3 (which is the Ising model on **random** quadrangulations), shows very different behaviour at the critical point, whether it be **for** the value of the string susceptibility or the form of the function ˜ F g I . This means that the continuous limit of this model is different **from** the continuous limit of the two others. In the Ising model, some colored configurations have a greater weight than others, and there is an interplay between the map structure and the decorations. Thus, in the continuous limit, the colored map shall mimic the interaction between a matter field and the metric (i.e. gravity), which is different **from** mimicking purely gravity. The model I •

En savoir plus
229 En savoir plus

[Causality-II]
Figure 6: Operational semantics of ccsl.
exists a rewriting sequence **from** t 1 **to** t. The arrow =>* is used **to** restrict the
length of rewriting steps **to** none, one or more steps. If the arrow =>! is used instead of =>*, then a solution must be a term that cannot be further rewritten. Maude has an efficient LTL model checker, supporting model checking of the properties expressed in linear temporal logic. Given a Maude specification of a system and a set of atomic propositions defined on system states, the Maude LTL model checker is invoked by the built-in function modelCheck, which takes two arguments, an initial state and an LTL formula, and returns true if no counterexample is found, and otherwise a counterexample as a witness **to** the violation. An LTL formula is built out of atomic propositions that need **to** be predefined and logical and temporal operators in LTL. A condition of doing LTL model checking in Maude is that the set of states that are reachable **from** the given initial state is finite.

En savoir plus
An increasing number of studies evaluate the potential benefit of external cuing devices **for** memory-impaired patients. Pitel et al. (2006) reported that following traumatic brain injury two patients were able **to** acquire limited procedural knowledge about how **to** use an electronic cuing device (e.g., programming appointments); however, none of them used the device in everyday life. Wilson et al. (1997a, 2001 ) examined the effects of a reminder system termed Neuropage ® on prospective memory in a large group of patients with different etiologies and different degrees of prospec- tive memory impairment. Neuropage ® is a communication system which enables one **to** send simple alphanumeric messages **to** a pager carried by the patient. The treatment program targeted individual prospective memory failures; therefore, the messages mostly con- tained information recalling a specific task that had **to** be done at a specific time. In a pilot study with 15 memory-impaired patients Wilson et al. (1997a) found that the proportion of target tasks that were completed by the patients increased **from** 37 **to** 85% when the pager was introduced. In a single-case study, Evans et al. (1998) confirmed the beneficial effect of the pager system on execution of target tasks in a patient with a severe dysexecutive syndrome characterized by impaired attention **to** action, difficulty with action planning and ritual-like dysfunctional behaviors. After introduc- tion of the pager, the patient showed a significant increase of the probability **to** carry out the target actions on time. More recently, Wilson et al. (2001) performed a randomized, crossover study on 143 memory-impaired patients with various etiologies. The par- ticipants were randomly assigned **to** group A, which received the pager **for** 7 weeks immediately post-baseline, or group B, receiv- ing the pager once the treatment phase of group A had finished. The results confirmed the beneficial effect of the pager system, as the proportion of completed target tasks increased in both groups during the treatment phase, but not during baseline. In addition, the authors found that in some patients withdrawal of the pager Linden and Coyette, 1995; Todd and Barrow, 2008 ), suggesting that

En savoir plus
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt **from** the relevant protective laws and regulations and therefore free **for** general use.
The publisher, the authors and the editors are safe **to** assume that the advice and information in this book are believed **to** be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect **to** the material contained herein or **for** any errors or omissions that may have been made. The publisher remains neutral with regard **to** jurisdictional claims in published maps and institutional af ﬁliations.

En savoir plus
In [ 5 ] we have proved that the achievability problem is undecidable. In this paper we present a way **to** relax the achievability problem in order **to** restore decidability. In particu- lar, we require correctness only **for** the final configuration, while numerical constraints and conflicts are ignored **for** the intermediary states traversed during the deployment run. In many cases dealing with service deployment, violating capacity constraints during installa- tion and configuration is not problematic because the services become publicly available only at the end. More precisely, we split the achievability problem in three separate phases: the verification of the existence of a final correct configuration that includes the target compo- nent (Configuration problem), the synthesis of such a configuration (Generation problem), and the computation of a deployment run reaching such a configuration (Planning problem). In this last phase, we exploit the efficient poly-time algorithm developed **for** the simplified Aeolus model without numerical constraints and conflicts. **For** this reason, it could indeed happen that such constraints are violated during the execution of the deployment run.

En savoir plus