You are on page 1of 17

Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

PARTA
(PART A: TO BE REFFERED BY STUDENTS)

Experiment No. 09
A.1 Aim:
Case study on weather prediction

A.2 Prerequisite: Knowledge


of Python,

A.3 Outcome:
After successful completion of this experiment students will be able to analyze one case study.

Ever wondered how the news channel predicts the weather conditions accurately? The answer is
because of data science. It always works in the background in the whole process of weather
prediction. For all individuals and organizations, it is a great deal to know the accurate situation
of the weather.
Many businesses are directly or indirectly linked with climatic conditions. For instance,
agriculture relies on weather forecasting to plan for when to plant, irrigate and harvest. Similarly,
other occupations like construction work, airport control authorities and many more are
dependent on the forecasting of weather. With its help, businesses can work with more accuracy
and without any disruptions.

Advantages of Weather Forecasting

Below are the essential benefits of weather forecasting: ● People are warned

prior to what the weather will be like on a particular day.

● To help people take proper precautions to secure themselves and their families in case of
unwanted occurrences.
● Organizations can work better with the help of accurate weather predictions.
● It helps to deliver visual forecasts by various methods that most companies prefer.
● Weather forecasting highly benefits the agriculture sector for buying/selling livestock.
● It also assists the farmers to decide when to plant crops, pastures, and when to irrigate.
Because having a system that tells you the soil is dry but you don’t need to irrigate because
it is going to rain after a few hours seems to be an interesting use case. Isn’t it?
● It is the best method for management of inventory, selling strategies and crop forecasts.
Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

● It provides the business with valuable information that the business can use to make
decisions about future business strategies.
Weather forecasts are made by collecting the maximum amount of data possible about the current
state of the atmosphere (particularly the temperature, humidity, and wind) and using the
understanding of atmospheric processes to determine how the atmosphere evolves in the future.

1. Predictive Modeling and Machine Learning

Weather models are at the heart and they are used both for forecasting and to recreate historical
data. However, over the last decade, machine learning has increasingly come to be applied in
atmospheric science.
Machine learning takes weather data and builds relationships between the available data and the
relative predictors. ML can help improve physically grounded models, and by combining both
approaches, they can get accurate results. Sophisticated models and ML are used to forecast the
weather using a combination of physical models and measured data on huge computer systems.

Over the last few years, data scientists have come to realize that in the foreseeable future they are
always going to need ML and predictive models to be able to provide close to perfect results.
They say- Artificial Intelligence (AI) is the next step to guard the storms!

2. Data – A Crucial Part of Weather Predictions

It is necessary to have the right data to be close to accurate decisions. The data needs to be taken
with respect to the location and the time at which it is noted has to be considered.

Today, all the devices are IoT-enabled with gyrometer, barometers and all sorts of sensors in it.
So, the location from one standpoint to another is very well available. Therefore, mobile phones
Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

proved to be revolutionizing the analytics weather industry and they have really changed the
industry.
In the case of using weather data, the data has to be used within minutes itself because nobody
wants to know what had happened in the past. All of which is important – what is happening now
and what will happen in the future. So in order to come up with meaningful information, the data
has to fall in and fall out quickly and recycle quickly, within minutes.
3. Weather Data – An Aid for many Events

● Prediction of Floods and Natural Disasters – Floods and other natural disasters can be
predicted by weather data analytics using models. This requires collecting data like the
surrounding road condition and the rainfall of the area that year.
● Sports – In sports matches such as cricket, weather like rainfall can lead to delaying or even
abandoning the game in between. Weather forecasting can help in deciding the time for
matches prior to reducing the chances of pausing the game.
● Predict Asthma Attacks – Weather data can be used to predict severe medical issues such
as asthma. The inhalers used during an asthma attack have sensors in them which can gather
data to ensure that they are properly used by the patients. It collects data related to the
temperature, humidity, air quality, and presence of dust in particular areas (where the patient
spends the most time). This information can help reduce the chances of attacks by
predicting where asthma can be triggered.
● Predict Car Sales – Weather data can even be used by car dealer/sellers to figure out car
sales in a particular climatic situation. For example – in the rainy season, people feel timid
but have to go out due to work or other reasons and hence end up buying a car.

4. Satellite Imagery and Sensor Data

Today, the primary source of atmospheric science is satellite imagery and that does not mean
pretty pictures though!

Satellite imagery comes in different sizes and shapes. Some satellites operate in the black and
white spectrum, some can be useful to identify and measure clouds, others to measure winds over
the oceans.

Most data scientists rely on satellite imagery to generate short term forecasts, to determine
whether a forecast is correct, and to validate models too.

Machine learning is also used here for pattern matching. If it acknowledges a pattern that has
already appeared in the past, it can be used to predict what is going to happen in the future.

Sensor data are mostly used to make predictions at a local level to ground-truth weather models
when using reliable equipment.
Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

PARTB
(PART B: TO BE COMPLETED BY STUDENTS)

(Students must submit the soft copy as per following segments within two hours of the
practical. The soft copy must be uploaded on the Blackboard or emailed to the concerned lab
in charge faculties at the end of the practical in case the there is no Black board access
available)
Roll No.: 65 Name: Ketki Kulkarni
Class: BE-A Batch: A4
Date of Experiment: Date of Submission:
Grade:

B.1 Observations and learning:


Code:

# -*- coding: utf-8 -*-


"""ADS_EXP_9_Case_Study.ipynb

Automatically generated by Colaboratory.

Original file is located at


https://colab.research.google.com/drive/1Q_2FRTOYiAxJmnrwQMTa4bBrXqYxaHHW """

#case study on weather prediction #The machine


learning Models used are:

# 1.K-Nearest Neighbour(KNN)

# 2.Support Vector Machine(SVM)

# 3.Gradient Boost

# 4.Extreme Gradient Boosting(XGBC)

import numpy as np import


pandas as pd import
matplotlib.pyplot as plt
import seaborn as sns import scipy
Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

import re import missingno as mso from scipy import stats from


scipy.stats import ttest_ind from scipy.stats import pearsonr from
sklearn.preprocessing import
StandardScaler,LabelEncoder from sklearn.model_selection import
train_test_split from sklearn.neighbors import
KNeighborsClassifier from sklearn.svm import SVC from
sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier from sklearn.metrics import
accuracy_score,confusion_matrix,classification_report data=pd.read_csv("seattle-
weather.csv")
data.head() data.shape

import warnings
warnings.filterwarnings('ignore') sns.countplot(x="weather",
data=data, palette="hls")

countrain=len(data[data.weather=="rain"])
countsun=len(data[data.weather=="sun"]) countdrizzle=len(data[data.weather=="drizzle"])
countsnow=len(data[data.weather=="snow"])
countfog=len(data[data.weather=="fog"]) print("Percent of
Rain:{:2f}%".format((countrain/(len(data.weather))*100))) print("Percent of
Sun:{:2f}%".format((countsun/(len(data.weather))*100))) print("Percent of
Drizzle:{:2f}%".format((countdrizzle/(len(data.weather))*100))) print("Percent of
Snow:{:2f}%".format((countsnow/(len(data.weather))*100))) print("Percent of Fog:{:2f}
%".format((countfog/(len(data.weather))*100)))
data[["precipitation","temp_max","temp_min","wind"]].describe()

sns.set(style="darkgrid") fig,axs=plt.subplots(2,2,figsize=(10,8))
sns.histplot(data=data,x="precipitation",kde=True,ax=axs[0,0],color='green'
) sns.histplot(data=data,x="temp_max",kde=True,ax=axs[0,1],color='red')
sns.histplot(data=data,x="temp_min",kde=True,ax=axs[1,0],color='skyblue')
sns.histplot(data=data,x="wind",kde=True,ax=axs[1,1],color='orange')
plt.figure(figsize=(10,8))
sns.boxplot(x="precipitation",y="weather",data=data,palette="YlOrBr")
plt.figure(figsize=(10,8))
sns.boxplot(x="temp_max",y="weather",data=data,palette="inferno")

plt.figure(figsize=(10,8)) sns.boxplot(x="wind",y="weather",data=data,palette="inferno")
Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

plt.figure(figsize=(10,8)) sns.boxplot(x="temp_min",y="weather",data=data,palette="YlOrBr")

"""**HEATMAP:**

"""

plt.figure(figsize=(12,7)) sns.heatmap(data.corr(),annot=True,cmap='coolwarm')

data.plot("precipitation","temp_max",style='o') print("Pearson
correlation:",data["precipitation"].corr(data["temp_max"])) print("T Test and P
value:",stats.ttest_ind(data["precipitation"],data["temp_max"]))
data.plot("wind","temp_max",style='o') print("Pearson
correlation:",data["wind"].corr(data["temp_max"])) print("T Test and P
value:",stats.ttest_ind(data["wind"],data["temp_max"]))
data.plot("temp_max","temp_min",style='o') data.isna().sum()

plt.figure(figsize=(10,8))
axz=plt.subplot(1,2,2)
mso.bar(data.drop(["date"],axis=1),ax=axz,fontsize=12); df=data.drop(["date"],axis=1)

Q1=df.quantile(0.25)
Q3=df.quantile(0.75)
IQR=Q3-Q1
df=df[~((df<(Q1-1.5*IQR))|(df>(Q3+1.5*IQR))).any(axis=1)]

df.precipitation=np.sqrt(df.precipitation)
df.wind=np.sqrt(df.wind)
sns.set(style="darkgrid")
fig,axs=plt.subplots(2,2,figsize=(10,8))
sns.histplot(data=df,x="precipitation",kde=True,ax=axs[0,0],color='green')
sns.histplot(data=df,x="temp_max",kde=True,ax=axs[0,1],color='red')
sns.histplot(data=df,x="temp_min",kde=True,ax=axs[1,0],color='skyblue')
sns.histplot(data=df,x="wind",kde=True,ax=axs[1,1],color='orange') df.head()

lc=LabelEncoder()
df["weather"]=lc.fit_transform(df["weather"]) df.head()
x=((df.loc[:,df.columns!="weather"]).astype(int)).values[:,0:]
y=df["weather"].values df.weather.unique()
Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.1,random_state=2
)

"""**K-NEAREST NEIGHBOR CLASSIFIER:**

"""

knn=KNeighborsClassifier()
knn.fit(x_train,y_train)
print("KNN Accuracy:{:.2f}%".format(knn.score(x_test,y_test)*100))

"""**SUPPORT VECTOR MACHINE - CLASSIFIER:**

"""

svm=SVC()
svm.fit(x_train,y_train) print("SVM Accuracy:{:.2f}
%".format(svm.score(x_test,y_test)*100))

"""**GRADIENT BOOSTING CLASSIFIER:**

"""

gbc=GradientBoostingClassifier(subsample=0.5,n_estimators=450,max_depth=5,max_leaf_node
s=25)
gbc.fit(x_train,y_train)
print("Gradient Boosting Accuracy:{:.2f}%".format(gbc.score(x_test,y_test)*100))
"""**EXTREME GRADIENT BOOSTING OR XGBCLASSIFIER:**
"""

import warnings
warnings.filterwarnings('ignore') xgb=XGBClassifier()
xgb.fit(x_train,y_train) print("XGB Accuracy:{:.2f}
%".format(xgb.score(x_test,y_test)*100))

input=[[1.140175,8.9,2.8,2.469818]]
ot=xgb.predict(input) print("The weather
is:") if(ot==0):
print("Drizzle")
elif(ot==1):
print("Fog")
Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

elif(ot==2):
print("Rain")
elif(ot==3):
print("snow") else:
print("Sun")

input=[[1.140175,8.9,2.8,2.469818]]
ot1=gbc.predict(input) print("The weather
is:") if(ot1==0):
print("Drizzle")
elif(ot1==1):
print("Fog") elif(ot1==2):
print("Rain")
elif(ot1==3):
print("snow") else:
print("Sun")

Output:
Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI


Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI


Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI


Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

Weather prediction using data science and machine learning involves leveraging historical
weather data, along with other relevant data sources, to develop predictive models that can
forecast various meteorological variables such as temperature, humidity, wind speed,
precipitation, and more. Data science and machine learning techniques are used to analyze and
process large volumes of weather data, identify patterns, and make predictions based on
historical and real-time data.

The process of weather prediction using data science and machine learning typically involves the
following steps:

1. Data Collection: Historical weather data, including meteorological variables such as


temperature, humidity, wind speed, precipitation, and pressure, is collected from various
sources such as weather stations, satellites, and other relevant data sources. Real-time
weather data is also collected continuously to incorporate the latest information into the
prediction models.

2. Data Preprocessing: The collected weather data is cleaned, preprocessed, and


transformed to ensure its quality and usability. This step involves handling missing
values, outlier detection, and feature engineering to extract relevant features from the
Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

data. Data normalization and scaling may also be applied to bring the data to a consistent
format.
3. Exploratory Data Analysis (EDA): EDA is performed to gain insights into the data and
understand the patterns and trends in the weather data. Statistical analysis, visualization,
and data visualization techniques are used to identify patterns, correlations, and
relationships within the data.

4. Model Development: Various machine learning algorithms such as linear regression,


decision trees, random forests, gradient boosting, support vector machines (SVM), and
deep learning models such as recurrent neural networks (RNNs) and convolutional neural
networks (CNNs) are used to develop predictive models. The preprocessed weather data
is used to train and evaluate these models, and hyperparameter tuning may be applied to
optimize the model's performance.

5. Model Evaluation: The developed models are evaluated using performance metrics such
as mean absolute error (MAE), root mean squared error (RMSE), coefficient of
determination (R-squared), and others to assess their accuracy and reliability.
Crossvalidation techniques may also be used to validate the model's performance on
unseen data.

6. Model Deployment: Once the best-performing model is identified, it can be deployed in


an operational weather prediction system. The model can be integrated into existing
weather prediction workflows, where it receives real-time weather data, processes it
using the trained machine learning model, and generates weather forecasts for different
locations and timeframes.

7. Model Monitoring and Update: The deployed model needs to be continuously monitored
to ensure its performance and accuracy. Regular updates may be needed to incorporate
new data, improve the model, and adapt to changing weather patterns and dynamics.

B.2 Conclusion:
Hence, a case study on weather prediction has been performed. Weather prediction is a
complex task that involves the analysis of a large amount of data, including atmospheric
conditions, historical weather patterns, and other meteorological factors. Accurate weather
prediction is crucial for a wide range of applications, including agriculture, transportation, energy
management, and disaster preparedness.
B.3 Question of Curiosity
Q1: How fraud detection can be done?
Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

Answer: Fraud detection leverages machine learning, statistical analysis, and behavior
monitoring to identify the patterns and strategies used by criminals to commit fraud. When
precursors to fraud are identified, the system can stop fraudulent activity before any damage
occurs.

For fraud detection to work, the system must first study instances of known fraud. Some of the
most common fraud detection learning models are:

Supervised Classifications: -

Supervised learning works by training an algorithm to detect fraud based on historical data. The
training uses existing datasets with pre-marked variables. Using this past data, researchers can
measure how well an algorithm performs at detecting fraud.

Supervised learning allows researchers to control what the fraud detection system learns and
provides a simple framework for testing and debugging the machine learning process.

Unsupervised Classifications: -

Unsupervised learning sorts unlabeled data into clusters based on the relationship each data point
has with one another. Hidden relationships can often be discovered, identifying precursors to
fraudulent activities.

This methodology eliminates the need for data to be labeled, which can be time-consuming and
expensive. The drawback is that the algorithm may learn unnecessary patterns in the process that
do not help detect fraud.

Different analytic models can be used to identify the predictors of fraud based on the past actions
of criminals through statistical analysis. With this data, fraud prevention systems can assign
certain behaviors a risk score.

Q2: Illustrate data science lifecycle for house price prediction.

Answer: Data science lifecycle for house price prediction:


Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

1. Problem Definition: Define the problem of predicting house prices based on relevant
features.
2. Data Collection: Gather relevant data, such as historical house sale records, location data,
and housing features.
3. Data Preprocessing: Clean, transform, and preprocess the data by handling missing
values, correcting errors, and normalizing data.
4. Exploratory Data Analysis (EDA): Conduct EDA to gain insights into the data, visualize
data distributions, and identify patterns.
5. Feature Engineering: Engineer relevant features or variables to capture important
information for house price prediction.
6. Model Selection: Choose appropriate machine learning algorithms, such as linear
regression or decision trees, for building predictive models.
7. Model Training: Train the selected model using preprocessed data, by fitting the model to
training data and tuning parameters for optimization.
8. Model Evaluation: Evaluate the performance of the trained model using metrics such as
MSE, RMSE, R-squared, and cross-validation.
9. Model Deployment: Deploy the trained model into a production environment, such as a
web application or API.
10. Model Monitoring and Maintenance: Continuously monitor and update the deployed
model for accuracy and relevance over time.
11. Model Interpretation: Interpret the model to understand the importance of different
features in predicting house prices.
12. Model Deployment and Reporting: Present results and insights to relevant stakeholders in
a clear and understandable manner.

The data science lifecycle for house price prediction involves defining the problem,
collecting data, preprocessing data, conducting EDA, engineering features, selecting and
training a model, evaluating model performance, deploying the model, monitoring and
maintaining the model, interpreting model results, and reporting findings to stakeholders.

Q3: What is customer segmentation? What are different types of it?

Answer: Customer segmentation is the process of dividing a customer base into distinct groups
or segments based on similar characteristics, behaviors, or preferences. The goal of customer
Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

segmentation is to better understand and cater to the unique needs and preferences of different
customer groups, in order to optimize marketing strategies, product development, and customer
relationship management.

There are various types of customer segmentation, including:

1. Demographic segmentation: Dividing customers based on demographic characteristics


such as age, gender, income, occupation, education, marital status, and geographic
location. This type of segmentation is commonly used in marketing and advertising to
tailor messages and offers to specific demographic groups.

2. Geographic segmentation: Segmenting customers based on their geographic location,


such as country, region, city, or postal code. This type of segmentation can be useful for
businesses that have different product offerings or marketing strategies based on regional
or local preferences.

3. Psychographic segmentation: Dividing customers based on their lifestyles, interests,


hobbies, values, opinions, attitudes, and behaviors. This type of segmentation focuses on
understanding the psychological and behavioral aspects of customers, and is often used to
develop targeted marketing campaigns.

4. Behavioral segmentation: Segmenting customers based on their behaviors, actions, usage


patterns, purchasing habits, loyalty, and engagement with a brand or product. This type of
segmentation is based on actual customer behaviors and can provide insights into
customer preferences, needs, and motivations.

5. Purchase history segmentation: Segmenting customers based on their past purchase


behaviors, such as frequency, recency, monetary value, and product preferences. This
type of segmentation is commonly used in e-commerce and retail industries to identify
high-value customers, loyal customers, and dormant customers.

6. B2B segmentation: Segmenting business customers based on firmographics, such as


industry, company size, revenue, location, and purchasing patterns. This type of
segmentation is often used in B2B marketing to customize product offerings, pricing, and
marketing strategies for different business customers.
Mumbai University

TPCT’s, TERNA ENGINEERING COLLEGE (TEC), NAVI MUMBAI

7. Hybrid segmentation: Combining multiple types of segmentation methods to create more


refined and targeted customer segments. For example, combining demographic,
geographic, and psychographic segmentation to create segments of "young, urban,
techsavvy professionals" or "senior, suburban, environmentally conscious homeowners".

Customer segmentation allows businesses to better understand and cater to the unique needs and
preferences of different customer groups, and to tailor their marketing strategies, product
development, and customer relationship management efforts accordingly.

You might also like