You are on page 1of 5

#program

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

# Replace 'your_dataset.csv' with your actual CSV file name

df = pd.read_csv('weather.csv')

# Assuming 'RainToday' and 'RainTomorrow' are your target variables

df['RainToday'] = df['RainToday'].map({'No': 0, 'Yes': 1})

df['RainTomorrow'] = df['RainTomorrow'].map({'No': 0, 'Yes': 1})

# Descriptive Statistics

descriptive_stats = df[['MinTemp', 'MaxTemp', 'Rainfall', 'Evaporation']].describe()

print(descriptive_stats)

# Time Series Visualization for selected columns

time_series_columns = ['MinTemp', 'MaxTemp', 'Rainfall', 'Evaporation', 'Sunshine', 'WindGustSpeed',

'Humidity9am', 'Humidity3pm', 'Pressure9am', 'Pressure3pm', 'Temp9am', 'Temp3pm']

plt.figure(figsize=(12, 8))

sns.lineplot(data=df[time_series_columns])

plt.title('Time Series Visualization of Selected Weather Variables')

plt.xlabel('Data Points')

plt.ylabel('Values')

plt.show()

# Correlation Analysis

correlation_matrix = df[time_series_columns].corr()
plt.figure(figsize=(10, 8))

sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")

plt.title('Correlation Matrix')

plt.show()

# Rainfall Distribution

plt.figure(figsize=(10, 6))

sns.histplot(df['Rainfall'], kde=True)

plt.title('Rainfall Distribution')

plt.xlabel('Rainfall')

plt.ylabel('Frequency')

plt.show()

# Seasonal Analysis

# Assuming 'Rainfall' column in your dataset for seasonal analysis

seasonal_data = df.groupby('Rainfall')[time_series_columns].mean()

seasonal_data.plot(kind='bar', figsize=(12, 8))

plt.title('Seasonal Analysis of Selected Weather Variables')

plt.xlabel('Season')

plt.ylabel('Average Values')

plt.show()

Output:
MinTemp MaxTemp Rainfall Evaporation
count 366.000000 366.000000 366.000000 366.000000
mean 7.265574 20.550273 1.428415 4.521858
std 6.025800 6.690516 4.225800 2.669383
min -5.300000 7.600000 0.000000 0.200000
25% 2.300000 15.025000 0.000000 2.200000
50% 7.450000 19.650000 0.000000 4.200000
75% 12.500000 25.500000 0.200000 6.400000
max 20.900000 35.800000 39.800000 13.800000
Explanation:

Overview:
This Python script performs analysis and visualization of weather data using the pandas, seaborn,
and matplotlib libraries. The key components include basic statistics, time series visualization,
correlation analysis, rainfall distribution, and seasonal analysis.
Approach and Methodologies:
 Data Reading:
 The script begins by reading the weather dataset from a CSV file ('weather.csv') using
pandas.
 Data Preprocessing:
 Categorical variables ('RainToday' and 'RainTomorrow') are mapped to numerical values
for further analysis.
 Basic Statistics:
 Basic statistics (mean, standard deviation, min, 25%, 50%, 75%, max) are calculated for
specific columns ('MinTemp', 'MaxTemp', 'Rainfall', 'Evaporation').
 Time Series Visualization:
 Time series visualization is performed for selected weather variables to observe trends
and patterns over time.
 Correlation Analysis:
 A correlation matrix and heatmap are generated to analyze the relationships between
different weather variables.
 Rainfall Distribution:
 A histogram with a kernel density estimate is created to visualize the distribution of
rainfall values.
 Seasonal Analysis:
 Seasonal analysis is performed using the 'Rainfall' column because there is no seasons
mentioned in the dataset. so, providing insights into average values of selected weather
variables across different seasons.
Challenges:
 Data Quality:
 The script assumes a clean dataset without addressing potential data quality issues.
 Seasonal Analysis Assumption:
 The seasonal analysis assumes the 'Rainfall' column is suitable for this purpose;
additional domain knowledge might be necessary.

You might also like