WEATHER PREDICTION
USING PYTHON
IMPLEMENTATION
Project
Libraries
The Libraries
The libraries that have been used are the most famous ones for data analysis,
plot and mathematical operations (pandas, matplotlib, numpy). Then there
are some of them for advanced data visualization (like folium) and some of
them are specific libraries for ARIMA models (like statsmodels)
DATA SET
The Dataset is open source and was taken from Kaggle
To select the cities in dataset following lines of code is used
city_data = data.drop_duplicates(['City'])
city_data.head()
To Know the longitude/latitude
If we want to plot these cities in a world map, we need to slightly change the
latitude and longitude we use these few lines of code:
LAT = []
LONG = []
for city in city_data.City.tolist():
locator = Nominatim(user_agent="myGeocoder")
location = locator.geocode(city)
LAT.append(location.latitude)
LONG.append(location.longitude)
Preprocessing, Advanced Visualization, Stationarity
I’ve chosen to isolate Chicago and consider the data of that city to be my
dataset.
The target is the AverageTemperature column, that is the Average
Temperature for that specific month. We have data from 1743 to 2013.
With this line we identify the NaN values and display them with a pie chart:
SCATTER PLOT
Using this dataset it is possible to obtain a scatter plot like this one:
Used Functions
• get_timeseries(start_year,end_year) extract the portion of the dataset
between the two years
• plot_timeseries(start_year,end_year) plots the timeseries extracted in
get_timeseries in a readable way
• plot_from_data(data, time, display_options) plots the data
(AverageTemperature) wrt the time (dt) in a readable way. The display options
permit to display the ticks, change the colors
ARIMA MODEL
In order to take account of this non-stationarity, a differentiation term we
will be considering the ARIMA models
Statistical Models
Conclusions
These methods are extremely easy to adopt as they don’t require any specific
computational power like Deep Learning methods (RNN, CNN … ).
Nonetheless, predictions perfectly fit in the error range designed by the
dataset itself. It is important to consider that we only have examined
monthly average values while it may be interesting to consider daily values
too and have daily predictions.