Récemment recherché

Aucun résultat trouvé

Étiquettes

Aucun résultat trouvé

Document

Aucun résultat trouvé

Accueil Écoles Thèmes

Connexion

Bayesian Nonparametric Mixtures Why and How?

Partager "Bayesian Nonparametric Mixtures Why and How?"

N/A

N/A

Protected

Année scolaire: 2021

Info

Protected

Academic year: 2021

Partager "Bayesian Nonparametric Mixtures Why and How?"

Copied!

2

0

0

2

0

0

Chargement.... (Voir le texte intégral maintenant)

Télécharger maintenant ( 2 Page )

Texte intégral

(1)

HAL Id: hal-01950664

https://hal.archives-ouvertes.fr/hal-01950664

Submitted on 11 Dec 2018

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Bayesian Nonparametric Mixtures Why and How?

Julyan Arbel

To cite this version:

Julyan Arbel. Bayesian Nonparametric Mixtures Why and How?. IFSS 2018 - 2nd Italian-French Statistics Seminar, Sep 2018, Grenoble, France. �hal-01950664�

(2)

Bayesian Nonparametric Mixtures Why and How?

www.julyanarbel.com, Inria, Mistis

Introduction

Bayesian nonparametric framework

◦ Massively many parameters

◦ Inference on curves: pdf, cdf, hazard, link. . .

◦ Mixtures, exchangeable data Xⁿ = (X₁, . . . , X_n) X₁, . . . X_n | P ∼

( P → 1 R

Θ k( · |θ)P(dθ) → 2 3 , Natural uncertainty quantification

, Flexibility, avoids over-fitting by regularization (prior)

, Adapt to data complexity , Underlying clustering

/ Justify prior, expert

/ Efficient posterior sampling / Quantify truncation error

What prior for P?

◦ Learn about data through posterior dist.

◦ Discrete random probability measure prior

◦ Random weights (p_i)_i and locations (θ_i)_i

P =

∞

X

i=1

p_iδ_θ_i

→ Dirichlet process DP(α, G₀) (Ferguson, 1973) Predictive: Chinese Restaurant Process

P(X_n₊₁ ∈ · | Xⁿ) = α

α + nG₀ + 1 α + n

k_n

X

j=1

n_jδ_X^∗

j

→ Or for varying P(X_n₊₁ new ) . . . y

2 Density Estimation

Ecological risk assessment (Arbel et al., 2016b)

◦ Data are species critical effect concentrations (CEC), possibly censored

◦ Estimation of species sensitivity distribution (SSD), the density of CEC

◦ Safe concentration which protects most of the species: 5th percentile of the SSD (HC5)

◦ Very moderate sample sizes, ∼ 10 − 50

→ BNP describes well variability of the data, without being prone to over-fitting

→ Species clustering as an outcome

0.0 0.2 0.4

−2.5 0.0 2.5 5.0

Data

Density

0.0 0.1 0.2 0.3 0.4 0.5

−1.5 −1.0 −0.5 0.0 0.5 1.0

Data

Model

BNP KDE Normal

3 Survival Analysis

Bayesian hazard mixture (Arbel et al., 2016c)

◦ Data are (remission) times possibly censored

◦ Prior on hazard rate h(t) for every time t

◦ Induces prior on survival function S (t)

→ Availability of post. mean, median, mode

→ Smooth estimator VS Kaplan–Meyer

→ Proper uncertainty quantification

Time (in weeks)

0 35 70

0.0 0.5 1.0

Time (in weeks)

0 35 70

0.0 0.5 1.0

Funding

This work is supported by the European Research Council (ERC) through StG “N-BNP” 306406.

1 Species Modeling

Data can be species, microbes, words, genes. . . Discovery probabilities (Arbel et al., 2016a)

◦ Estimation of `-discovery

D_` = P(X_n₊₁ is a species seen ` times)

◦ Comparison with Good-Turing estimator

→ Closed form posterior and estimators

→ Uncertainty quantif., unavailable for GT

→ 2nd order (fast) approximations

Diversity in ecology (Arbel et al., 2015, 2016d)

◦ Assess impact of pollution on microbial community via study of diversity

Div = − P

i p_i log p_i

→ Model detects an hormetic effect

→ Uncertainty quantification

→ Prediction across full range of covariates

●

●

●●

●●

●●

●

● ●

● ● ●

●●

●

●

●

●

●

●

Fuel concentration (g TPH/kg soil)

Shannon diversity

0 5 10 15 20 25

0 1 2 3 4 5

References & Collaborators

Arbel, J., Favaro, S., Nipoti, B., and Teh, Y. W. (2016a).

Bayesian nonparametric inference for discovery probabilities: credible intervals and large sample asymptotics.

Statistica Sinica.

Arbel, J., Kon Kam King, G., and Prünster, I. (2016b).

Bayesian nonparametric modelling of species sampling distributions. In preparation.

Arbel, J., Lijoi, A., and Nipoti, B. (2016c). Full Bayesian inference with hazard mixture models. Computational Statistics & Data Analysis.

Arbel, J., Mengersen, K., Raymond, B., Winsley, T., and King, C. (2015). Application of a Bayesian nonparametric model to derive toxicity estimates based on the response of Antarctic microbial communities to fuel contaminated soil. Ecology and Evolution.

Arbel, J., Mengersen, K., and Rousseau, J. (2016d). Bayesian nonparametric dependent model for partially replicated data: the influence of fuel spills on species diversity.

Annals of Applied Statistics.

Arbel, J. and Prünster, I. (2016). A moment-matching Ferguson & Klass algorithm. Statistics and Computing. Ferguson, T. (1973). A Bayesian analysis of some

nonparametric problems. The Annals of Statistics.

Miller, J. W. and Harrison, M. T. (2014). Inconsistency of Pitman-Yor process mixtures for the number of components. The Journal of Machine Learning Research.

Wade, S. and Ghahramani, Z. (2015). Bayesian cluster analysis: Point estimation and credible balls. arXiv.

Open Questions

◦ How to best use underlying clustering?

(Wade and Ghahramani, 2015)

◦ Find consistent estimator of number of clusters: posterior inconsistent (Miller and Harrison, 2014), what about posterior mode?

◦ Devise efficient posterior sampling, trunca- tion error (Arbel and Prünster, 2016)

Références

Télécharger maintenant ( PDF - 2 Page - 499.54 KB )

Documents relatifs

Allele-Specic QTL fine-mapping with PLASMA

Applied to RNA-Seq data from 524 kidney tumor samples, PLASMA achieves a greater power at 50 samples than conventional QTL-based fine-mapping at 500 samples: with over 17% of

A novel conserved mechanism for plant NLR protein pairs: the "integrated decoy" hypothesis

Within a pair, distinct NLR proteins: (i) form homo- and hetero-complexes that are involved in important regulation pro- cesses both prior to and following pathogen recognition,

Dried fluid spots for peste des petits ruminants virus load evaluation allowing for non-invasive diagnosis and genotyping

In addition by comparing dried spots on filter paper made of whole blood and nasal swabs from animals sampled in the field at different clinical stages of the disease, we dem-

Within-Host Dynamics of the Emergence of Tomato Yellow Leaf Curl Virus Recombinants

Following co-infection of plants with Tomato yellow leaf curl virus (TYX) and Tomato leaf curl Comoros virus (TOX), the frequency of parental and recombinant genomes was

Using species distribution models to optimize vector control in the framework of the tsetse eradication campaign in Senegal

Using species distribution models to optimize vector control in the framework of the tsetse eradication campaign in Senegal..

L'impact de la certification GlobalGAP sur les producteurs de litchis à Madagascar

Plus précisément, nous testons l’hypothèse selon laquelle la mise en conformité à la norme GlobalGAP est à l’origine d’un plus gros volume de litchis vendus aux exportateurs

Average-Case Performance of Rollout Algorithms for Knapsack Problems

Beyond simple bounds derived from base policies, the only theoretical results given explicitly for rollout algorithms are average-case results for the breakthrough problem

Bottomonium spectroscopy and radiative transitions involving the χ[subscript bJ](1P,2P) states at BaBar

Two methods have been used to disentangle the P-wave spin states in the hard transitions: inclusive converted photon searches, used in a recent BABAR analysis [20] ; and

Téléchargez tous les documents en téléchargeant vos documents d'étude.

Votre document sera enrichi, partagé sur 123dok FR pour vous aider à étudier.

Documents relatifs

A l’attention des utilisateurs professionnels (ex : Centres hospitaliers, Cliniques)

A l’attention des utilisateurs professionnels (ex : Centres hospitaliers, Cliniques)

3

0

0

Fori Antiqui y Furs del Reino y la ciudad de Valencia : análisis de la cohesión textual del texto latino y su traducción.

Fori Antiqui y Furs del Reino y la ciudad de Valencia : análisis de la cohesión textual del texto latino y su traducción.

19

0

0

Computation of Cournot-Nash equilibria by entropic regularization

Computation of Cournot-Nash equilibria by entropic regularization

22

0

0

Oxidation kinetics and hydrogen uptake of titanium alloys in pressurized water reactors (PWR) primary water

Oxidation kinetics and hydrogen uptake of titanium alloys in pressurized water reactors (PWR) primary water

3

0

0

Case report : A suspicion of cortico-cerebral necrosis in a Belgian Blue herd after ingestion of moulded silage

Case report : A suspicion of cortico-cerebral necrosis in a Belgian Blue herd after ingestion of moulded silage

1

0

0

Vers une spécification des modèles de simulation de systèmes complexes

Vers une spécification des modèles de simulation de systèmes complexes

32

0

0

Impact of the Little Ice Age cooling and 20th century climate change on peatland vegetation dynamics in central and northern Alberta using a multi-proxy approach and high-resolution peat chronologies

Impact of the Little Ice Age cooling and 20th century climate change on peatland vegetation dynamics in central and northern Alberta using a multi-proxy approach and high-resolution peat chronologies

44

0

0

Détection et exploitation d'ombre de bâti sur les images de très haute résolution spatiale (IKONOS) application au milieu urbain (Sherbrooke)

Détection et exploitation d'ombre de bâti sur les images de très haute résolution spatiale (IKONOS) application au milieu urbain (Sherbrooke)

150

0

0