Keywords: Bayesian inference; Bayesian probit model; Gibbs sampling; **Latent** **variable** mod- els; Marginal distribution; Mean-field variational methods
1 Introduction
Bayesian inference typically requires the computation of the posterior distribution for a collection of random variables (parameters or unknown observables). Save for stylized settings, this task leads often to intractable calculations. To solve this problem, numerous simulation-based methods have been developed and implemented within the Bayesian paradigm, e.g. importance sampling (Ripley, 1987), Markov Chains Monte Carlo (MCMC) algorithms (Casella and Robert, 1999), particle filtering (Doucet et al., 2001).

En savoir plus
Figure 9 – Schematic of the machine vision methodology for anode paste
The first specific objective is to develop a machine vision algorithm (i.e. image preprocessing, image analysis and features selection) at a laboratory scale. This method was developed with lab scale anodes in the laboratory at Université Laval. Paste samples were prepared by varying the conditions of fabrication. These variations included the use of different types of coke and pitch, variations of the dry aggregate particle size distribution, of the fine particles fineness (i.e. the Blaine number), of the amount of pitch as well as the mixing temperature of the paste. Each paste sample was imaged using a camera in the visible spectrum (i.e. RGB). The image texture characteristics that enabled the differentiation and classification of the different blends of paste were identified. The image texture features were computed using advanced image texture analysis method: the co-occurrence of gray level matrices (GLCM) and also wavelet texture analysis (WTA). Multivariate **latent** **variable** statistical methods such as principal component analysis (PCA) and projection to **latent** structures (PLS) were use to analyse the image features.

En savoir plus
217 En savoir plus

factorization ( Mnih & Salakhutdinov , 2008 ) by relying par-
tially on **latent** Dirichlet allocation (LDA, Blei et al. ( 2003 ))
and its recent autoencoding extensions ( Srivastava & Sut-
ton , 2017 ; Dieng et al. , 2019 ). Thus, our approach adopts a variational autoencoder architecture as a generative deep **latent** **variable** model for both the ordinal matrix encoding the user/product scores, and the document-term matrix en- coding the reviews. Our approach is tested on simulated and

ABSTRACT
**Latent** **variable** models in finance originate both from asset pricing theory and time series analysis. These two strands of literature appeal to two different concepts of **latent** structures, which are both useful to reduce the dimension of a statistical model specified for a multivariate time series of asset prices. In the CAPM or APT beta pricing models, the dimension reduction is cross-sectional in nature, while in time-series state-space models, dimension is reduced longitudinally by assuming conditional independence between consecutive returns, given a small number of state variables. In this paper, we use the concept of Stochastic Discount Factor (SDF) or pricing kernel as a unifying principle to integrate these two concepts of **latent** variables. Beta pricing relations amount to characterize the factors as a basis of a vectorial space for the SDF. The coefficients of the SDF with respect to the factors are specified as deterministic functions of some state variables which summarize their dynamics. In beta pricing models, it is often said that only the factorial risk is compensated since the remaining idiosyncratic risk is diversifiable. Implicitly, this argument can be interpreted as a conditional cross-sectional factor structure, that is, a conditional independence between contemporaneous returns of a large number of assets, given a small number of factors, like in standard Factor Analysis. We provide this unifying analysis in the context of conditional equilibrium beta pricing as well as asset pricing with stochastic volatility, stochastic interest rates and other state variables. We address the general issue of econometric specifications of dynamic asset pricing models, which cover the modern literature on conditionally heteroskedastic factor models as well as equilibrium-based asset pricing models with an intertemporal specification of preferences and market fundamentals. We interpret various instantaneous causality relationships between state variables and market fundamentals as leverage effects and discuss their central role relative to the validity of standard CAPM-like stock pricing and preference-free option pricing.

En savoir plus
VI. DISCUSSION
I have proposed a reformulation of the Ising distribu- tion as a **latent** **variable** model, and used it to derive prin- cipled approximations by the simpler Cox distribution. In practical applications, Cox models (including the di- chotomized Gaussian, and Cox point processes) are often preferred to the corresponding maximum entropy distri- butions (Ising distribution, Gibbs point process) because they are easier to sample, and to parametrize from a set of observed moments. This article establishes a simple analytical connection between the two families of mod- els, and investigates under what conditions they can be used interchangeably.

En savoir plus
ABSTRACT
**Latent** **variable** models in finance originate both from asset pricing theory and time series analysis. These two strands of literature appeal to two different concepts of **latent** structures, which are both useful to reduce the dimension of a statistical model specified for a multivariate time series of asset prices. In the CAPM or APT beta pricing models, the dimension reduction is cross-sectional in nature, while in time-series state-space models, dimension is reduced longitudinally by assuming conditional independence between consecutive returns, given a small number of state variables. In this paper, we use the concept of Stochastic Discount Factor (SDF) or pricing kernel as a unifying principle to integrate these two concepts of **latent** variables. Beta pricing relations amount to characterize the factors as a basis of a vectorial space for the SDF. The coefficients of the SDF with respect to the factors are specified as deterministic functions of some state variables which summarize their dynamics. In beta pricing models, it is often said that only the factorial risk is compensated since the remaining idiosyncratic risk is diversifiable. Implicitly, this argument can be interpreted as a conditional cross-sectional factor structure, that is, a conditional independence between contemporaneous returns of a large number of assets, given a small number of factors, like in standard Factor Analysis. We provide this unifying analysis in the context of conditional equilibrium beta pricing as well as asset pricing with stochastic volatility, stochastic interest rates and other state variables. We address the general issue of econometric specifications of dynamic asset pricing models, which cover the modern literature on conditionally heteroskedastic factor models as well as equilibrium-based asset pricing models with an intertemporal specification of preferences and market fundamentals. We interpret various instantaneous causality relationships between state variables and market fundamentals as leverage effects and discuss their central role relative to the validity of standard CAPM-like stock pricing and preference-free option pricing.

En savoir plus
When the probability of the given datapoint is defined as being factorised over every dimension, the implication is that the distribution over the observation is assumed to be completely dependent upon the **latent** **variable**. However, the **latent** **variable** has limited capacity, and may model only the aspects that contribute to most of the reconstruction loss (abstract concepts) and there may be variations at a lower level of abstraction that may depend on other parts of the input (details). Some examples of such models are Gulrajani et al. (2016) for images, with a slightly more general discussion found in Chen et al. (2016), and Bowman et al. (2015) for text.

En savoir plus
learning problem. Since the PYPMoG is a very efficient density estimation tool, we can assume that the terrains are well-modeled in the feature space. Thus, the error must come from overlapping mass of the PDFs representing terrains, as it is the only way to confuse terrains in the feature space. We argue that measuring the overlapping mass of the PDFs is an excellent criterion to assess the quality of the data acquisition procedure and feature extraction. Of course, a natural solution to this overlap is to tackle the problem at the source and reduce the variance of the features. This could also be solved by employing more discriminative sensors. Learning **latent** **variable** models, such as clustering terrains in a feature space, is a difficult task. The Pitman-Yor process mixture of Gaussians demonstrated excellent flexibility and accuracy on the previous surface identification tasks when compared to standard machine learning methods. The most convincing results are the ones of Table 3.4 . For this experiment, the PYPMoG demonstrated its usefulness by learning a model almost twice as accurate as the Expectation-Maximization with Minimum Description Length. The main advantage of the PYPMoG over the EM+MDL is its automatic model complexity selection relying on advanced Bayesian nonparametric methods. The model was able to learn the right number of **latent** surfaces representing the data and this translated into accurate models.

En savoir plus
166 En savoir plus

their corresponding rating 𝑦(𝑢, 𝑖) is a noisy measurement of 𝑓 (𝑥 1 (𝑢), 𝑥 2 (𝑖)) for some
unknown function 𝑓 . In contrast to classical regression, the features 𝑥 = (𝑥 1 (𝑢), 𝑥 2 (𝑖))
are not observed (**latent**), making it challenging to apply standard regression methods. We suggest a two-step procedure to overcome this challenge: 1) estimate distance for **latent** variables, and then 2) apply nonparametric regression. Applying this frame- work to matrix completion, we provide a prediction algorithm that is consistent for all Lipschitz functions. In fact, the analysis naturally leads to a variant of collaborative filtering, shedding insight into the widespread success of collaborative filtering. As- suming each entry is revealed independently with 𝑝 = max(𝑚 −1+𝛿 , 𝑛 −1/2+𝛿 ) for 𝛿 > 0, we prove that the expected fraction of our estimates with error greater than 𝜖 is less than 𝛾 2 /𝜖 2 , plus a polynomially decaying term, where 𝛾 2 is the variance of the noise. Experiments with the MovieLens and Netflix datasets suggest that our algorithm provides principled improvements over basic collaborative filtering and is competitive with matrix factorization methods. The algorithm and analysis naturally extend to higher order tensor completion by simply flattening the tensor into a matrix. We show that our simple and principled approach is competitive with respect to state-of- art tensor completion algorithms when applied to image inpainting data. Lastly, we conclude this thesis by proposing various related directions for future research. Thesis Supervisor: Devavrat Shah

En savoir plus
65 literature for data clustering and classification [269-270]. The most commonly used LVMs are Principal Component Analysis (PCA) and Projection to **Latent** Structure (PLS) [271-272]. Briefly, PCA and PLS project a multivariate dataset from a high dimensional space to a lower dimensional space, allowing to visualize and interpret the information contained in Big Data. These methods have been used in materials inspection based on acoustic features [282- 283] as well as on vibration features [284-286] and very interesting results have been obtained. In addition, some authors [273-274, 282] have combined wavelet analysis with PCA, called Wavelet-PCA, to improve the results of control. In this thesis, a large number of features were computed from the acousto-ultrasonic and vibrational responses of the various materials in order to describe them. It was found that these features were highly collinear. Instead of performing feature selection, it was decided to use all of them together to maximize the chances of detecting and discriminating anode defects. To cope with collinearity, PCA was used to cluster anode parts or anode blocks based on their acousto- ultrasonic and modal responses. Projection to **Latent** Structures (PLS) regression was also used to correlate acousto-ultrasonic features and textural descriptors collected from corresponding X-ray CT-Scan images. This regression method is known as a strong multivariate statistical method that works better than traditional multiple linear regression (MLR) [275] on highly collinear datasets as is the case in this thesis. Although multivariate and wavelet methods have generally been used to explore data for non-destructive evaluation, there has been little or even no attempt to consider their applications for the intricate cases highlighted in this project.

En savoir plus
247 En savoir plus

The Present Study
To date, no studies have explored the **latent** structure of RAN and whether one or several RAN factors are needed to predict academic outcomes, such as reading and arithmetic. Investigating this question at the beginning of formal schooling when children start to learn to read and to calculate, is particularly interesting with respect to early screening and educational practices, which aim to identify children at risk at an early stage of developing reading and/or arithmetic difficulties. From a general cognitive perspective, the present study investigates firstly whether the performance collected on a large set of RANs (i.e., colors, objects, dice patterns, canonical finger-numeral configurations, vowels, consonants, and digits) is mainly underpinned by a domain- general factor or multiple related factors. We will therefore examine the structure of RAN investigating five different possible **latent** models.

En savoir plus
The **latent** factors z of the model can be seen as topics that are present in the data. The parameters of each topic can be used as descriptions of the topic. P (t|z) represents the probabilities of the terms for the topic z, thus providing a way to find words that are representative of the topic. The most straightforward method to find these keywords is to use the words with the highest probability P(t|z). This ap- proach unfortunately is somewhat flawed as the histogram reflects the overall frequency of words, which means that generally common words tend to dominate the P (t|z).

are close to one another, on the chromosome. LD is rarely observed for SNPs distant by more than 500 kb [ 6 ]. We argued that the FHLCMs can offer several advantages for genetic data analysis, in particular for genome-wide association studies (GWASs). For instance, FHLCMs’ hierarchical structure supported by LVs allows flexible information synthesis, thus efficiently reducing the data dimensionality. Indeed, in an FHLCM, the different layers provide several degrees of reduction, which allow zooming in through narrower and narrower regions in search for stronger associations with the disease. Another promising property of the FHLCMs relies on their ability to allow a simple test of direct dependence beween an observed **variable** and a target **variable** such as the phenotype, conditional on the **latent** **variable**, parent of the observed **variable**. Note that the phenotype **variable** is not included in the FHLCM. In the context of GWASs, this test helps finding the markers which are directly associated with the phenotype, i.e. causal markers, should there be any.

En savoir plus
2. Besides, a heuristically good reason to suppose the exist- ence of a **latent** **variable** is mainly therapeutic rather than methodological. This kind of hidden **variable** is often seen as a therapeutic target rather than an etiological node; that is, not something to find that would explain everything, but some- thing to act upon that would dissolve everything. If a match is considered the cause of a fire in a building, rather than oxygen in the air, which is no less required to start a fire, it is because the match seems the most appropriate factor to act upon. Mackie (1974), Hesslow (1984), Gannett (1999), and Magnus (1992), among others, have shown the importance of pragmatic concerns in the search for a single target which might be called the cause of a disease (it is called the problem of causal selec- tion). This kind of pragmatic interpretation of a **latent** **variable** as “what we have to act upon” may justify the otherwise objec- tionable assumption that there is actually a **latent** **variable** which explains and causes everything. There is, however, a question as to how the network approach is to be translated into the definition of therapeutic targets. For instance, while such a definition is obviously easy on the basis of the target article’s Figure 1, one might ask what could be proposed on the basis of Figure 4.

En savoir plus
plete **latent** **variable** decomposition of counts data. In Neural Information Processing Systems, Vancouver, BC, Canada, December 2007.
[10] Paris Smaragdis and Judith C. Brown. Non-negative matrix factorization for polyphonic music transcription. In Workshop on Applications of Signal Processing to Audio and Acoustics, pages 177 – 180, New Paltz, NY, USA, October 2003. [11] Paris Smaragdis, Bhiksha Raj, and Madhusudana Shashanka.

In this paper, we tackle the non-convex problem of topic modelling, where agents have sensitive text data at their disposal that they can not or do not want to share (e.g., text messages, emails, confidential reports). More precisely, we adapt the particular **Latent** Dirichlet Allocation (LDA) [5] model to decentralized networks. We combine recent work of [6] on online inference for **latent** **variable** models, which adapts online EM [7] with local Gibbs sampling in the case of intractable **latent** **variable** models (such as LDA) and recent advances on decentralized optimization [3, 4]. The method presented in [6] is particularly adapted to decentralized framework as it consists in iteratively updating sufficient statistics, which can be done locally. After presenting our D E L E DA (for Decentralized LDA) algorithm, we give a brief sketch of convergence proof. Then, we apply our new method to synthetic datasets and show that our method recovers the same parameters and has similar performance than the online method [6] after enough iterations.

En savoir plus
GAZE **LATENT** SUPPORT VECTOR MACHINE FOR IMAGE CLASSIFICATION Xin Wang Nicolas Thome Matthieu Cord
Sorbonne Universit´es, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France
This paper deals with image categorization from weak supervision, e.g. global image labels. We propose to improve the region selection performed in **latent** **variable** models such as **Latent** Support Vector Machine (LSVM) by leveraging human eye movement features collected from an eye-tracker device. We introduce a new model, Gaze **Latent** Support Vector Machine (G-LSVM), whose region selection during training is biased toward regions with a large gaze density ratio. On this purpose, the training objective is enriched with a gaze loss, from which we derive a convex upper bound, leading to a Concave-Convex Procedure (CCCP) optimiza- tion scheme. Experiments show that G-LSVM significantly outperforms LSVM in both object detection and action recog- nition problems on PASCAL VOC 2012. We also show that our G-LSVM is even slightly better than a model trained from bounding box annotations, while gaze labels are much cheaper to collect.

En savoir plus
Abstract
In this contribution, we propose a generic online (also sometimes called adaptive or recursive) version of the Expectation-Maximisation (EM) algorithm applicable to **latent** **variable** models of independent observations. Compared to the algorithm of Titterington (1984), this approach is more directly connected to the usual EM algorithm and does not rely on integration with respect to the complete data distribution. The resulting algorithm is usually simpler and is shown to achieve convergence to the stationary points of the Kullback-Leibler divergence between the marginal distribution of the observation and the model distribution at the optimal rate, i.e., that of the maximum likelihood estimator. In addition, the proposed approach is also suitable for conditional (or regression) models, as illustrated in the case of the mixture of linear regressions model. Keywords: **Latent** data models, Expectation-Maximisation, adaptive algorithms, online estima- tion, stochastic approximation, Polyak-Ruppert averaging, mixture of regressions.

En savoir plus