• Aucun résultat trouvé

funFEM: an R package for functional data clustering

N/A
N/A
Protected

Academic year: 2021

Partager "funFEM: an R package for functional data clustering"

Copied!
3
0
0

Texte intégral

(1)

HAL Id: hal-01383951

https://hal.archives-ouvertes.fr/hal-01383951

Submitted on 20 Oct 2016

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

funFEM: an R package for functional data clustering

Charles Bouveyron, Julien Jacques

To cite this version:

Charles Bouveyron, Julien Jacques. funFEM: an R package for functional data clustering. Quatrième Rencontres R, 2015, Grenoble, France. �hal-01383951�

(2)

funFEM: an R package for functional data clustering

C. Bouveyrona and J. Jacquesb

aLaboratoire MAP5, UMR CNRS 8145

Université Paris Descartes

45 rue des Saints Pères, 75006 Paris, France charles.bouveyron@parisdescartes.fr

bLaboratoire ERIC

Université de Lyon - Lumière

5 av. Pierre Mendès France, 69755 Bron, France julien.jacques@univ-lyon2.fr

Mots clefs: clustering, functional data.

1

Introduction

A new model-based clustering algorithm for times series (or more generally functional data), called FunFEM, has been proposed in [1]. It is based on a functional mixture model which allows the clustering of the data in a discriminative functional subspace. This model presents the advantage to be parsimonious and to allow the visualization of the clustered systems. This paper presents the funFEM package for R which implements this new clustering algorithm.

2

The discriminative functional mixture model

Let {x1, ..., xn} be the observed curves we want to cluster. In practice, the functional expressions

of these curves are not known and we only have access to the discrete observations xi(tis) at

a finite set of ordered times {tis : s = 1, . . . , mi}. The first step in functional data analysis

usually consists in recovering the functional nature of data, and for this we assume that the observed curves can be decomposed in a finite basis of function

xi(t) = p

X

j=1

γijψj(t).

The functional mixture model proposed in [1] assumes that the basis expansion coefficient γi = (γi1, . . . , γip)t of curve xi follows a mixture of Gaussians:

p(γ) =

K

X

k=1

πkφ(γ; U µk, UtΣkU + Ξ), (1)

where πk is the prior probability of the kth group, φ is the standard Gaussian density function,

U is a p × d matrix mapping the coefficient γ into the discriminative subspace (of dimension d < p), µk and Σk are the mean vector and covariance matrice (for cluster k) of the mapping

(3)

3

The funFEM package

Model inference is based on an EM-like algorithm, including an additional step between the traditional E and M steps in which the orientation matrix U is updated. This algorithm is implemented in the funFEM package for R, available on the CRAN.

The main function, funFEM has only two mandatory arguments: the functional data, defined as a functional object of the fda package, and the number of clusters (or a vector of). The outputs of funFEM are: the posterior probabilities and the estimated clusters, the model parameter estimation and several model selection criteria. These latter can be used in order to choose the optimal number of clusters.

The use of funFEM is now illustrated on the Velov dataset (available in the funFEM package), which contains one week of loading curves (i.e. the proportion of available bikes) of the bike sharing system stations of Lyon (called Velov). The number of Velov stations in Lyon is 345, and the curves are sampled approximatively each hour. Due to the periodic nature of the curves, a Fourier basis is considered. Below is given the R script used to launch the clustering analysis with funFEM.

# Load the velov data and smoothing R> library(funFEM)

R> data(velov)

R> basis <- create.fourier.basis(c(0, 181), nbasis=25) R> fdobj <- smooth.basis(1:181,t(velov$data),basis)$fd # Clustrering with FunFEM

R> res = funFEM(fdobj,K=6,model="AkjBk",init="kmeans",lambda=0,disp=TRUE) Figure 1 presents the resulting mean curves of each cluster as well as the geographical repartition of clusters. Such results typically allow us to analyse the bike sharing system of Lyon.

0.2 0.4 0.6 0.8 propor tion of a v ailab le bik es

Dim−15 Dim−21 Lun−03 Lun−09 Lun−15 Lun−21 Mar−03 Mar−09 Mar−15 Mar−21 Mer−03 Mer−09 Mer−15 Mer−21 Jeu−03 Jeu−09 Jeu−15 Jeu−21 Ven−03 Ven−09 Ven−15 Ven−21 Sam−03 Sam−09 Sam−15 Sam−21 Dim−03 Dim−09 Dim−15 Dim−21

45.72 45.76 45.80 45.84 4.75 4.80 4.85 4.90 lon lat

Figure 1: Mean profiles of the 6 clusters (left) and position of clusters in Lyon (right)

References

[1] Bouveyron, C., Côme E., and Jacques, J. (2014). The discriminative functional mixture model for the analysis of bike sharing systems. Preprint HAL n◦01024186

Figure

Figure 1: Mean profiles of the 6 clusters (left) and position of clusters in Lyon (right)

Références

Documents relatifs

The package BeyondBenford compares the goodness of fit of Benford’s and Blondeau Da Silva’s (BDS’s) digit distributions in a given dataset.. It first enables to check whether the

This flattened approach can also be used for smaller inversions, and there is no limit in terms of depth or imbrication: if there is, in a given tradition, variation in the order

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

To address the upcoming problem of exploring extremely large sets of electricity con- sumption curves, we proposed in this work a new co-clustering methodology for functional

In this section we will illustrate how the vrmlgen package allows users to translate 3D point data into interactive scatter plots in VRML or LiveGraphics3D format using the

pathogenesis-related protein 1; GO: Gene ontology; HIF1A: Hypoxia-inducible factor 1-alpha; IL8: Interleukin 8; IL24: Interleukin 24; JUN: Jun proto-oncogene; JUNB: Transcription

The ZIP employ two different process : a binary distribution that generate structural

We compare the use of the classic estimator for the sample mean and SCM to the FP estimator for the clustering of the Indian Pines scene using the Hotelling’s T 2 statistic (4) and