• Aucun résultat trouvé

Unsupervised learning algorithms for boundary layer study

N/A
N/A
Protected

Academic year: 2021

Partager "Unsupervised learning algorithms for boundary layer study"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: meteo-02465168

https://hal-meteofrance.archives-ouvertes.fr/meteo-02465168

Submitted on 3 Feb 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Unsupervised learning algorithms for boundary layer study

Thomas Rieutord

To cite this version:

Thomas Rieutord. Unsupervised learning algorithms for boundary layer study. Journée scientifique du SIRTA, Jun 2018, Palaiseau, France. �meteo-02465168�

(2)

Unsupervised learning algorithms for boundary layer study

Thomas Rieutord* CNRM (UMR 3589), Météo France & CNRS, Toulouse

*thomas.rieutord@meteo.fr

Introduction 1

Abstract

K-means clustering 2

DBSCAN clustering 4

Agglomerative clustering 3

pattern 1

pattern 2 pattern 3

predictors

(atm. variables)

obs.

distance intra-cluster

distance inter-cluster

Application to boundary layer: boundary layer classification

problem-dependent

?

questions

Minimise intra-cluster distance

Which predictors?

Which dissimilarity?

A C

B

N

Nested clusters

(identify smaller scale structures?)

Converges toward a local minimum

Initialisation highly influences the result

Choice of the number of groups?

Gives hierarchical structure anyway, regarless whether it is relevant

Small changes in data can lead to different dendrogram

Choice of linkage?

Fast convergence (few 10 iterations)

Different strategies of initialisation

Can highlight a "natural"

number of groups

No parameter to tune

Relative humidity Absolute temperature

Graphical summary of results in dendrogram

Automatically find the number of groups

Clusters can be of any shape Resilient to outliers

(can even identify them)

Edge points connected to more than one cluster can change assignation depending on their ordering

Clusters must be of similar density Choice of the parameters and ?

Results Results

Results

Conclusion

Conclusion Conclusion

Points attributed to the closest centroid

Centroid updated to better represent

the group

Initialisation of centroids

Minimum of intra-cluster variance reached

The 2 most similar groups are merged

Distances between groups are updated All points are considered

as a group of 1 element

One group contains all.

Result is the dendrogram (merging tree)

All methods look for high density areas with a

dissimilarity metric

Here: squared gap Here: normalized

Target:

K-means clusters have the good borders but they are not consistent with visual examination. More work is required on initialisation (start from meaningful centroids?) and predictors.

Results Prospects

Agglomerative clustering finds well the visible layers.

Moreover, the dendrogram gives an even more precise information (still to be examined) and many improvements are possibles.

Results Prospects

DBSCAN clustering gives 7 clusters, which is too much.

It appears to be very sensitive to settings values which are hard to correctly set. More advanced variants (e.g. OPTICS) might be better.

Results Prospects Presented at Journée scientifique du SIRTA, 15/06/2018

Unsupervised learning aims to derive high level information from data without reference. This work shows an example of how it can be used to derive user information from field campaign measurements.

Three algorithms have been tested on their ability to make a good boudary layer classification: K-means, Agglomerative and DBSCAN.

Data are from radiosoundings in the 2nd IOP of the Passy-2015 field experiment (alpine valley, wintertime). One can see a stable layer, a mixed layer and the free atmosphere.

Agglomerative gives the best results and has promising prospects. K- means and DBSCAN give clusters not corresponding to visual examination, but both have many ways of improvement.

Prohibitive cost when large dataset

More than neighbours?

NO YES

NO YES

Core point same cluster as

its neighbours.

Any core point in the neighborhood?

Edge point same cluster as

its neighbours.

Outlier No cluster attribution.

For all points in the dataset

Neighbours = points closer than

Références

Documents relatifs

Second, a spatially weighted SVM (sw-SVM) is used to learn for each time interval a sparse spatial filter optimizing directly the classification performance [11].. Finally, the

Purpose: To identify the genomic location of previously uncharacterized canine retina-expressed expressed sequence tags (ESTs), and thus identify potential candidate genes for

mètres environ de profondeur, des masses de fer carbonaté (sidérose), qui se trouvent placées à la base d’un filon de ma­ tières fibreuses ou micacées d’un

Suite à l’identification des paramètres présentée en Figure 4, nous illustrons notre démarche en com- parant les résultats d’une simulation comportant un assemblage boulonné 3D,

It is clear that the two strategies will lead to a very different set of acquisition guidelines and to the selection of different candidates, and so a company pursuing

power expansion for the potential has been obtained, with coefficients depending only on the frequency. Moreover, considerations about the asymptotic behavior of

The DIP turnover times of around 10 days measured at the N-gyre station (ALOHA, Fig. 3n) suggests a higher P deficiency than in the South Pacific gyre (T DIP around 200 days and

- Dans son rapport annuel 2001-2002 sur l’état et les besoins de l’éducation, le Conseil supérieur de l’éducation propose à la société québécoise deux grandes