Professional Documents
Culture Documents
Dataset created for machine learning and deep learning training and teaching
purposes.
Can for instance be used for classification, regression, and forecasting tasks.
Complex enough to demonstrate realistic issues such as overfitting and unbalanced
data, while still remaining intuitively accessible.
Klein Tank, A.M.G. and Coauthors, 2002. Daily dataset of 20th-century surface
air temperature and precipitation series for the European Climate Assessment.
Int. J. of Climatol., 22, 1441-1453.
Data and metadata available at http://www.ecad.eu
-----------
Data collection selection and processing
----------
The initial meteorological data was retrieved from ECA&D [1] a project that makes
available daily
observations at meteorological stations throughout Europe and the Mediterranean. 18
different European
cities or places were selected of which multiple daily observations were available
through the
years 2000 to 2010. Those where Basel (Switzerland), Budapest (Hungary), Dresden,
Düsseldorf,
Kassel, München (all Germany), De Bilt and Maastricht (the Netherlands), Heathrow
(UK), Ljubljana (Slovenia),
Malmo and Stockholm (Sweden), Montélimar, Perpignan and Tours (France), Oslo
(Norway), Roma (Italy), and
Sonnblick (Austria).
After collecting the data, very basic cleaning of the data was performed. Columns
with > 5% invalid
entries (“-9999”) were removed, columns with <= 5% invalid entries where kept but
invalid entries
were replaced by mean values. This resulted in 165 variables (or features) over the
course of 3654 days.
Finally, we transformed several data units to achieve more similar ranges of the
present values.
This makes the data more suitable to be used for machine learning or deep learning
even without
additional processing. We deliberately did not chose to fully standardize the data
because we
wanted to keep the presented units and values as intuitively accessible as
possible. Temperature are
now given in degree Celsius, wind speed and gust in m/s, humidity in fraction of
100%, sea level
pressure in 1000 hPa, global radiation in 100 W/m2, precipitation amounts in
centimeter, sunshine in hours.
-----------
Physical units of the variables:
----------
CONVERTED to:
| Feature (type) | Column name | Description | Physical Unit
|
|------------------|----------------------|-----------------------|----------------
-|
| mean temperature | _temp_mean | mean daily temperature| in 1 °C
|
| max temperature | _temp_max | max daily temperature | in 1 °C
|
| min temperature | _temp_min | min daily temperature | in 1 °C
|
| cloud_cover | _cloud_cover | cloud cover | oktas
|
| wind_speed | _wind_gust | wind gust | in 1 m/s
|
| wind_gust | _wind_speed | wind speed | in 1 m/s
|
| humidity | _humidity | humidity | in 1 %
|
| pressure | _pressure | pressure | in 1000 hPa
|
| global_radiation | _global_radiation | global radiation | in 100 W/m2
|
| precipitation | _precipitation | daily precipitation | in 10 mm
|
| sunshine | _sunshine | sunshine hours | in 0.1 hours
|