• Aucun résultat trouvé

End-To-End Anomaly Detection, Correction and Prediction of Missing Values in Historical Daily Temperature Timeseries

N/A
N/A
Protected

Academic year: 2022

Partager "End-To-End Anomaly Detection, Correction and Prediction of Missing Values in Historical Daily Temperature Timeseries"

Copied!
2
0
0

Texte intégral

(1)

End-to-end Anomaly Detection, Correction and Prediction of Missing Values in Historical Daily

Temperature Timeseries

?

Alioscha-Perez M.1, Oveneke M.C.1, Diaz A.1, Bertrand C.2, and Sahli H.1,3 Vrije Universiteit Brussel, ETRO Department (ETRO-VUB), Brussels, Belgium

{maperezg,mcovenek,aberengu,hsahli}@etrovub.be Royal Meteorological Institute of Belgium (RMI), Brussels, Belgium

Interuniversity Microelectronics Center (IMEC), Heverlee, Belgium

Abstract. In this paper, we present a deep learning solution to detect and correct anomalous values present in historical temperature time- series, that are likely associated to human and weather instruments er- rors. Our solution consists in a joint peaks detection and end-to-end sequence prediction involving synchronous measurements of individual meteorological stations along with their neighboring peers. We designed our models in a way that the false positive rate (FPR) of the anomaly detection is minimized and the accuracy maximized, so that the histor- ical records are corrected as less as possible. The method was applied to temperature records of 24 meteorological stations in Belgium, and allowed to automatically correct more than 80% of all errors in both max/min daily temperature records by modifying less than 15% of all the timeseries values, with an overall detection accuracy of 90%. The corrected temperature timeseries yielded a perfect match with respect to errors-free signals in several climate indicators. Our method can be potentially applied to other historical timeseries such as precipitation.

1 Introduction

Studies of extreme climate rely extensively on historical records documenting weather conditions from the past, which are usually available in the form of weather ledgers or logbooks with a registry of the state of the weather at dif- ferent locations and time. However, it is known that human errors and weather instruments malfunctioning are a major source of errors already present in many weather registries. To address this problem, we present a deep learning solution for anomaly detection, automated correction and prediction of missing values in historical temperature records. Our solution outputs cleaned timeseries where most of anomalies are replaced by predicted values. In addition, it optionally allows to perform a manual revision (i.e. human in the loop) of some or all the corrections, if wanted or necessary.

?Supported by the Belgian Science Policy Office (Belspo) thru the Bel-Hornet project.

Copyright c2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0)

(2)

2 Alioscha-Perez et al.

2 Our Approach

Our approach consists in a joint peaks detection and end-to-end sequence pre- diction involving synchronous measurements of individual meteorological sta- tions along with their neighboring peers. Following a Neural Architecture Search (NAS) approach, our solution minimizes the false positive rate (FPR) of the anomaly detection and maximizes the overall accuracy, so that the historical records are corrected as less as possible. This preserves the historical timeseries as original as possible, while still allowing to correct most of erroneous values.

Results: For our study we included more than 170,000 daily extreme temper- ature values involving 24 meteorological stations in Belgium, comprising two time-periods used for training and testing, respectively. We assessed the perfor- mance of our solution in many indices [3] widely adopted for climate studies [2].

Our results are detailed in Table (1), and confirm a perfect match between the errors-free signal and the signal obtained with our solution.

Table 1.Results in climate indicators frequently used for climate analysis for all the 24 meteorological stations under study; given values are per-year and per-station.

Indicator Raw timeseries Errors-free timeseries Proposed solution

Hot days 9 5 5

Summer days 31 27 27

Tropical nights 79 76 76

Frost days 47 48 48

Icy days 7 7 7

Conclusions:We presented a solution for the automatic detection and correc- tion of anomalies, likely associated to human and instrument errors, that are present in daily historical temperature records. In addition, our solution allows to predict some of the missing values in the timeseries under study. In future works, we will explore ways to optimize our NAS model via SVRG [1].

Bibliography

[1] Alioscha-Perez, M., Oveneke, M.C., Dongmei, J., Sahli, H.: Multiple kernel learning via multi-epochs svrg. In: 9th NIPS Workshop on Optimization for Machine Learning. vol. 12 (2016)

[2] Frich, P., Alexander, L.V., Della-Marta, P., Gleason, B., Haylock, M., Tank, A.K., Peterson, T.: Observed coherent changes in climatic extremes during the second half of the twentieth century. Climate research 19(3), 193–212 (2002)

[3] Zhang, X., Alexander, L., Hegerl, G.C., Jones, P., Tank, A.K., Peterson, T.C., Trewin, B., Zwiers, F.W.: Indices for monitoring changes in extremes based on daily temperature and precipitation data. Wiley Interdisciplinary Reviews: Climate Change2(6), 851–870 (2011)

Références

Documents relatifs

For investigating the influence of m–factors on SCIA- MACHY’s total ozone product (a DOAS type retrieval [8]), two datasets have been calculated with the reference algorithm: One

The highlight of this study is the early incidence of MetS and its components over time after initiation of antidepressant medication, in patients without MetS

The classical Julia-Wolff-Carath´ eodory Theorem characterizes the behaviour of the derivative of an analytic self-map of a unit disc or of a half- plane of the complex plane at

RF link for Implanted Medical Devices (IMDs) and Sub-GHz Inductive Power TransmissionA. Antoine Diet, Satvros Koulouridis, Yann Le Bihan, Quang-Trung Luu, Olivier Meyer, Lionel

Abstract: End-to-end verifiability represents a paradigm shift in electronic voting, providing a way to verify the integrity of the election by allowing voters to audit the

Ce manuel avait pour but de soutenir les hôpitaux de soins aigus dans la mise en œuvre et l’entretien de stratégies de prévention des infections nosocomiales, avec des

La formation : une activité à effets différés sur les situations de travail Si nous excluions les activités de formation visant l’acquisition de connaissances simples ou

For a small cloud server cluster (for example, L < 50), increasing the number of user will increase rapidly the response time of request. But for a large cloud server cluster