Professional Documents
Culture Documents
Performing Analysis of
Meteorological Data
Punam Seal Aug 23 · 9 min read
Data Analytics are all ways of referring to the value of extracting useful
information from the plentiful datasets now available across a wide
range of fields and areas of natural and human activity. The ability to
leverage data to improve understanding has always been important, but
is becoming increasingly so as data becomes more readily available. Ever
wondered how the news channel predicts the weather conditions accurately?
The answer is because of data science. It always works in the
background in the whole process of weather prediction. For all
individuals and organizations, it is a great deal to know the accurate
situation of the weather.
OVERVIEW:
In this blog, I am going to analyze one type of data that’s easier to find on
the net is Weather Dataset (please click this to view the dataset) which
provide historical data on many meteorological parameters such as
pressure, temperature, humidity, wind speed, visibility, etc. The dataset
has hourly temperature recorded for last 10 years starting from 2006–
04–0100:00:00.000 +0200 to 2016–09–09 23:00:00.000 +0200. It
corresponds to Finland, a country in the Northern Europe.
You can download the dataset from this Google drive link:
https://drive.google.com/open?id=1ScF_1a-
bkHi1qe8Rn78uxK6_5QwUD9Bu
GOAL:
Our goal is to transform the raw data into information and then convert
it into knowledge and to perform data cleaning, perform analysis for testing
the (given) Hypothesis i.e. The Null Hypothesis Ho is “Has the Apparent
temperature and humidity compared monthly across 10 years of the data
indicate an increase due to Global warming”.
TOOLS USED:
DATA ANALYSIS:
data = pd.read_csv(r"C:\Users\Admin\Downloads\weatherHistory.csv")
data
head() returns the first n rows(observe the index values). The default
number of elements to display is five, but you may pass a custom number.
tail() returns the last n rows(observe the index values). The default
number of elements to display is five, but you may pass a custom number.
data.head()
data.tail()
data.describe()
data.shape
Out:(96453, 11)
data.info()
data.isnull().sum()
9. While analyzing the data, many times the user wants to see the unique
values in a particular column, which can be done using unique()
function. As, I have taken any particular column such as- Humidity ,we
can see all the unique values of it.
data['Humidity'].unique()
Resampling Data
Resample is primarily used for time-series data. Convenience method for
frequency conversion and resampling of time series. The object must
have a datetime-like index ( DatetimeIndex , PeriodIndex , or
TimedeltaIndex ), or the caller must pass the label of a datetime-like
series/index to the on / level keyword parameter.
resampled_data_monthly_mean.head()
resampled_data_monthly_mean.tail()
14. By using this function, it can plot the overall resampled mean data
showing the particular column i.e. Apparent Temperature.
import warnings
warnings.filterwarnings("ignore")
sns.distplot(resampled_data_monthly_mean['Apparent Temperature (C)'])
plt.show()
Similarly by using this function, it can plot the particular column i.e.
Humidity.
sns.distplot(resampled_data_monthly_mean['Humidity'])
plt.show()
then fit the regression model y ~ x and plot the resulting regression line
and a 95% confidence interval for that regression. By using this function,
I plotted the Relation between Apparent Temperature and Humidity
using the two variables of Apparent Temperature and Humidity.
sns.lineplot(data = resampled_data_monthly_mean)
plt.xlabel('Year')
plt.title("Variation of Apparent Temperature (C) and Humidity with
time")
plt.show()
17. seaborn.pairplot() plots multiple pairwise bivariate distributions in
a dataset, you can use the pairplot() function. This shows the
relationship for (n, 2) combination of variable in a DataFrame as a matrix
of plots and the diagonal plots are the univariate plots. By using this
function, I plotted the overall resampled mean data using the kind
parameter i.e. in scatter.
sns.pairplot(resampled_data_monthly_mean,kind = 'scatter')
plt.show()
Humidity_data
def label_color(month):
if month == 1:
return 'January','red'
elif month == 2:
return 'February','purple'
elif month == 3:
return 'March', 'orange'
elif month == 4:
return 'April','green'
elif month == 5:
return 'May','darkblue'
elif month == 6:
return 'June','violet'
elif month == 7:
return 'July','yellow'
elif month == 8:
return 'August','pink'
elif month == 9:
return 'September','black'
elif month == 10:
return 'October','brown'
elif month == 11:
return 'November','grey'
else:
return 'December','blue'
def sns_month(month):
label = label_color(month)[0]
plt.title('Apparent Temperature (C) & Humidity for
{}'.format(label))
plt.xlabel('Year')
data =
resampled_data_monthly_mean[resampled_data_monthly_mean.index.month
== month]
sns.lineplot(data = data, marker = 'o')
plt.show()
for month in range(1,13):
sns_month(month)
GITHUB LINK:
https://github.com/punamseal14/Suven-Consultants-and-Technology-
Tasks/blob/master/Performing%20Analysis%20of%20Meteorological%
20Data/main.ipynb
CONCLUSION:
Explore data smarter not Kernel Methods: A Area Plot — Behind the 3 Crucial Types of
harder with help from Simple Introduction mountains are more Midfielders in Football
widgets Diego Unzueta in Towards Data
mountains Kartik Shanbhag
Ms GG Berg Science Elad Gvirtz
Cohort Analysis with A Tribute to My Big Data, Artificial Python package to make
Python Wrongness: How Health Intelligence, and data cleaning easy
Joe Tran in Towards Data
Tourism Interacts with Information Overload Zahash Z
Science
Airbnb Rents in Istanbul Knowmail
Eren Janberk Genç
PDFmyURL.com - convert URLs, web pages or even full websites to PDF online. Easy API for developers!