0% found this document useful (0 votes)
14 views27 pages

Sea Level Predictor Project Overview

The document outlines a project titled 'Sea Level Predictor' developed by Mitali Hirapara and Dhwani Trivedi as part of their M.Sc. program at Gujarat University. The project utilizes Python to analyze historical sea level data and employs statistical modeling to predict future sea level changes, providing visual representations and insights for climate change adaptation. Key features include data visualization, predictive modeling, and automated testing to ensure accuracy in the generated plots.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views27 pages

Sea Level Predictor Project Overview

The document outlines a project titled 'Sea Level Predictor' developed by Mitali Hirapara and Dhwani Trivedi as part of their M.Sc. program at Gujarat University. The project utilizes Python to analyze historical sea level data and employs statistical modeling to predict future sea level changes, providing visual representations and insights for climate change adaptation. Key features include data visualization, predictive modeling, and automated testing to ensure accuracy in the generated plots.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

GUJARAT UNIVERSITY

K.S. SCHOOL OF BUSINESS MANAGEMENT


[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Project Title: SEA LEVEL PREDICTOR


(By SEMESTER – 7 of IV Year M.Sc. (2024-25))

Submitted By:
1. Mitali Hirapara:4113
2.Dhwani Trivedi:4179
Group id: 56
Date of submission:23/12/2024

Submitted To

K. S. School of Business Management & Information Technology


M.Sc. - Computer Applications and Information Technology.

1
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Table Of Contents

No: Contents Page No:


1. Project Introduction 3

2. Problem statement 4
3. Basic concepts 5
4. Requirement 7
Analysis
5. Dataset 9
6. Proposed solution 10
7. Implementation 12
8. Testing 16
9. Challenges 22
10. Conclusion 24

2
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Chapter:1 Project Introduction


The Sea Level Predictor is a Python-based project designed to analyze and
predict sea level changes using historical data and statistical modeling. The
project aims to create a visual representation of sea level trends over time,
providing valuable insights into the impact of climate change on rising sea
levels. By employing regression models, the tool allows users to forecast
future changes and understand historical trends.
The primary goal is to empower users with data-driven predictions, aiding in
environmental planning and climate change awareness.
Key Features:
• Data Visualization:
• A scatter plot is created using historical data to visualize observed sea
level changes from 1880 onward.
• Two regression lines are plotted:
o First Line of Best Fit: Represents the overall trend using data from
1880 to the most recent year.
o Second Line of Best Fit: Highlights recent trends using data from
2000 onward.
• Predictive Modeling:
• First Line of Best Fit: Uses all historical data to predict sea level changes
up to 2050.
• Second Line of Best Fit: Focuses on recent data to project near-future
changes more accurately.
• Automated Testing:

3
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

• The test_module.py file verifies the accuracy of plot titles, axis labels,
tick marks, and regression lines.
• Ensures the correctness of data points displayed on the scatter plot.

Chapter: 2 Problem statement


The problem being addressed is predicting and visualizing sea level rise over
time using historical data and statistical modeling techniques. The aim is to
analyze trends in sea level changes and project future rises, providing
insights for climate change adaptation and planning.
In this project, historical sea level data from 1880 to the present is analyzed
and modeled using linear regression. Two lines of best fit are generated: one
using the entire dataset and another focusing on data from 2000 onward to
capture recent trends. The challenge lies in accurately modeling the data
and creating meaningful visualizations to demonstrate both historical and
projected sea level changes.

4
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Chapter:3 Basic Concepts


Key Terminologies:
1. Sea Level Data:
o Historical Sea Level Records: Data sourced from the "epa-sea-
level.csv" file, which includes years and corresponding CSIRO-
adjusted sea levels.
o CSIRO Adjusted Sea Level: A refined metric accounting for
variations in sea levels due to changes in data sources and
adjustments.
2. Scatter Plot:
o A graphical representation of data points on a two-dimensional
grid. This project uses a scatter plot to display the relationship
between the year and the adjusted sea level.
3. Line of Best Fit:
o First Line of Best Fit: A regression line fitted to all available data
(1880–2050).
o Second Line of Best Fit: A regression line fitted only to data from
2000 onwards, emphasizing recent trends.
4. Linear Regression:
o A statistical method to model the relationship between a
dependent variable (sea level) and an independent variable (year).
It produces a straight line defined by a slope and an intercept,
used to predict future values

5
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

5.Visualization:
o Utilizing Matplotlib, the project creates a clear and labeled visual
representation, including axes, titles, and regression lines.
6.Unit Testing:
o Purpose: Verifies the correctness of the plotting and calculations.
o Tests Include:
▪ Plot title and axis labels.
▪ Consistency of x-axis tick marks and plotted data points.
▪ Correctness of regression line slopes and data
representation.
Introduction to AI Techniques Used in This Project:
1. Linear Regression:
o A foundational machine learning technique used here to analyze
the relationship between years and sea levels.
o Calculated using SciPy's linregress function, which provides the
slope, intercept, and other metrics for the regression line.
2. Data Preprocessing and Analysis:
o Pandas: Used for reading and managing the data in the CSV file.
o Filtering data (e.g., selecting years after 2000) ensures the model
captures recent sea level trends.

6
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

3.Visualization with Python:


o Matplotlib: Key for presenting data insights. The inclusion of
regression lines allows users to visually interpret predictions and
trends effectively.
4.Predictive Modeling and Forecasting:
o The regression models predict future sea levels up to the year
2050, leveraging historical data trends.

Chapter.4. Requirement Analysis


Tools and Technologies Needed:
• Data Analysis and Visualization Tools:
o Python with libraries like NumPy, Pandas, Matplotlib, and Seaborn
for data manipulation, analysis, and visualization.
o R with libraries like ggplot2 and dplyr for statistical analysis and
visualization.
o Tableau or Power BI for interactive data visualization and
dashboard creation.
• Machine Learning Libraries:
o Scikit-learn for various machine learning algorithms like linear
regression, support vector regression, and random forests.
o TensorFlow or PyTorch for deep learning models like neural
networks.

7
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

• Data Sources:
o Historical sea level data from organizations like the National
Oceanic and Atmospheric Administration (NOAA) and the
Intergovernmental Panel on Climate Change 1 (IPCC).
Basic System Requirements:
• Data Storage:
o A database (e.g., PostgreSQL, MySQL) to store historical sea level
data, model parameters, and predictions.
• Data Processing:
o Sufficient computational resources (CPU, memory) to handle data
cleaning, preprocessing, and feature engineering.
• Model Training:
o GPUs or TPUs for accelerating deep learning model training.
• Deployment:
o A web server to host the application and a database to store user
data and predictions.
• Security:
o Secure authentication and authorization mechanisms to protect
user data and model predictions.
• Scalability:
o The ability to handle increasing data volumes and user traffic.

8
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Chapter.5 Datasets
Source code:
source code is part of a project designed to predict sea level rise based on
historical data. The project uses Python libraries such as Pandas, Matplotlib,
and SciPy to process and visualize data, along with unit testing for validating
the generated plots.
Overview of the Data:
• Dataset: The dataset used in this project contains historical sea level data
from the epa-sea-level.csv file. This data includes two primary columns: Year
and CSIRO Adjusted Sea Level. • Input Data: The input data consists of sea
level measurements over time, recorded annually from the year 1880 to the
present. • Textual Data: The data is numerical rather than textual and
represents the adjusted sea levels in inches over time.
The dataset is used to analyze the rise in sea levels over the years, with a
focus on fitting linear regression models to predict future trends. This dataset
is crucial for environmental analysis and understanding the implications of
climate change.
Basic Data Cleaning:
Basic data cleaning is essential to ensure the dataset is in a suitable form for
analysis. While the provided project does not explicitly mention cleaning the
raw data, the general steps for data cleaning would typically involve:
1. Handling Missing Data: Identifying any missing or null values in the
dataset and deciding whether to remove them or impute them based
on other available information.

9
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

2. Data Type Conversion: Ensuring that the Year column is in integer


format and the CSIRO Adjusted Sea Level column is in float or numerical
format, so they can be used in calculations and visualizations.
3. Duplicate Removal: Checking for and removing any duplicate rows that
may exist in the dataset, ensuring that the data is accurate and reliable.
4. Outlier Detection: Identifying and handling any outliers in the data (e.g.,
unusually high or low sea levels) that may distort the regression
analysis.
Though the code does not include explicit data cleaning steps, these are
often the first steps before proceeding with any analysis to ensure the quality
and accuracy of the predictions made by the linear regression models.

Chapter.6 Proposed Solution


Simple Explanation of the Approach:
The proposed solution is a Sea Level Prediction System built using Python,
with the goal of visualizing and predicting the rise in sea levels over time.
The approach involves analyzing historical sea level data and fitting linear
regression models to predict future trends. The system generates a scatter
plot of the data and fits two lines of best fit: one for the entire dataset and
another focusing on the recent years from 2000 onward. This allows the
system to forecast future sea levels based on the observed trends.
The code also incorporates unit testing to verify the accuracy of the
generated plot, ensuring the correctness of labels, title, and data points.
This guarantees that the visualizations are clear and informative for
understanding the trend in sea level rise.

10
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Algorithm or Model Chosen:


1. Linear Regression (via scipy. stats. Linregress):
o This statistical method is used to fit a line to the data, identifying
the trend over time. Two linear regression models are calculated:
one for the entire dataset and one for data starting from the year
2000. The first regression model represents the general trend,
while the second one focuses on the more recent changes in sea
level rise.
2. Scatter Plot Visualization:
o A scatter plot is created to display the historical data, where the
years are on the x-axis and the corresponding sea levels are on the
y-axis. This provides a clear view of how the sea level has changed
over time, with the regression lines overlaid to show the predicted
trends.
3. Unit Testing:
o Unit testing is used to validate the functionality of the plot. Tests
are written to ensure the plot’s title, labels, and data points match
the expected values, confirming that the visual output is correct.
This step is critical for verifying the integrity of the generated plot
and ensuring that the system works as expected.
By using linear regression to model the sea level rise, the system makes
predictions about future sea levels, helping in understanding the ongoing
trends and providing a visual representation of potential future scenarios.
The unit tests ensure that the generated plot is accurate and consistent.

11
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Chapter.7 Implementation
Step-by-Step Process for Testing the Sea Level Predictor Plot:
1. Test Setup: A unit test is created for verifying the functionality of the
sea_level_predictor.draw_plot() function. The plot is generated, and the
axis object (ax) is retrieved to perform assertions on it. This setup is
done using the setUp() method of the unittest.TestCase class, ensuring
that the plot is drawn before each test.
2. Testing Plot Title: The test case verifies that the title of the plot is
correctly set to "Rise in Sea Level". The get_title() method is called on
the axis object to check if it matches the expected value.
3. Testing Plot Labels: The test checks that both the x-axis and y-axis
labels are set correctly. The expected labels are "Year" for the x-axis and
"Sea Level (inches)" for the y-axis. The get_xlabel() and get_ylabel()
methods are used to retrieve the labels and assert their correctness.
Additionally, the test verifies that the x-ticks are set to a specific list of
years.
4. Testing Data Points: The test case checks if the data points on the plot
match the expected values. It retrieves the data points plotted by the
draw_plot() function and compares them with a predefined list of
expected values. This ensures that the correct data is being visualized in
the plot.

Code:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress

12
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

def draw_plot():
# Read data from file
df = pd.read_csv("epa-sea-level.csv")

# Create scatter plot


fig, ax = plt.subplots(figsize=(8,5))
scatter = plt.scatter(df["Year"], df["CSIRO Adjusted Sea Level"])

# Create first line of best fit


# Includes the whole data range
line_fit = linregress(df["Year"], df["CSIRO Adjusted Sea Level"])

# Create second line of best fit


# Fitting from year 2000 to most recent
mask = df["Year"] >= 2000
line_fit_recent = linregress(df[mask]["Year"], df[mask]["CSIRO Adjusted
Sea Level"])

# Add labels and title


plt.plot(range(1880, 2051, 1), line_fit.slope*range(1880, 2051,
1)+line_fit.intercept, color="red")

13
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

plt.plot(range(2000, 2051, 1), line_fit_recent.slope*range(2000, 2051,


1)+line_fit_recent.intercept, color='green')
plt.xlabel("Year")
plt.ylabel("Sea Level (inches)")
plt.title("Rise in Sea Level")

plt.savefig('sea_level_plot.png')
plt.show()
# return plt.gca()

draw_ plot()

Description of Key Functions:


1. draw_plot()
• This is the central function responsible for drawing the sea level plot. It
handles the entire process from reading the data, creating the scatter
plot, calculating the lines of best fit (for both the entire dataset and the
recent years), and finally displaying the plot. The function also saves the
plot as a PNG image file.
2. pd.read_csv("epa-sea-level.csv")

14
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

• This function from the pandas library is used to read the CSV file
containing the sea level data. It loads the data into a pandas DataFrame
(df), which is a 2-dimensional labeled data structure that can easily be
used for analysis and manipulation.
3. linregress()
• This function from the scipy.stats module performs linear regression on
a set of data. It calculates the best-fit line for a given set of data points
(in this case, "Year" vs. "CSIRO Adjusted Sea Level") and returns the
slope, intercept, and other related statistics. In this code, it is used
twice: once for the full data range and once for the data starting from
the year 2000.
4. plt.scatter()
• This function from matplotlib.pyplot is used to create a scatter plot,
where individual data points are plotted as dots on a 2D plane. In the
code, it is used to visualize the sea level data over time by plotting
"Year" against "CSIRO Adjusted Sea Level."
5. plt.plot()
• This function is used to plot lines on the graph. In this code, it is used to
plot the two lines of best fit calculated with linregress(). The first line
fits the entire dataset, and the second line fits only the data from 2000
onward. Each line is drawn using a range of years and the
corresponding sea level values based on the linear regression
equations.
6. plt.xlabel() and plt.ylabel()
• These functions set the labels for the x-axis and y-axis of the plot. In the
code, they label the x-axis as "Year" and the y-axis as "Sea Level
(inches)."
15
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

7. plt.title()
• This function sets the title of the plot. In this code, the title is set to
"Rise in Sea Level" to give context to the plot.
8. plt.savefig()
• This function saves the current plot as an image file. In the code, the
plot is saved as a PNG file named sea_level_plot.png.
9. plt.show()
• This function displays the plot in a window, allowing the user to view it
interactively. It is the final step after all the plotting and customization
have been done.

Chapter.8 Testing:
Simple Test Cases for Sea Level Predictor
Test Case 1: UI Initialization
Action: Run the application.
Expected Result: The application should open with the sea level plot, title
"Rise in Sea Level," and axis labels "Year" and "Sea Level (inches)."
Test Case 2: Verify Data Plot
Action: Check if the scatter plot is generated correctly using the data from
the CSV file.
Expected Result: Data points should be plotted according to the "Year" and
"CSIRO Adjusted Sea Level" columns from the dataset.
Test Case 3: Verify Line of Best Fit for Entire Dataset
Action: Verify that the red line of best fit is correctly plotted for the entire
dataset from 1880 to the most recent year.

16
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Expected Result: The red line should represent a linear trend based on the
full range of data.
Test Case 4: Verify Line of Best Fit for Recent Data (2000 onwards)
Action: Verify that the green line of best fit is plotted for the data starting
from the year 2000.
Expected Result: The green line should represent the linear trend for the
data from the year 2000 to the most recent year.
Test Case 5: Plot Label Verification
Action: Check the labels and title of the plot.
Expected Result: The title should be "Rise in Sea Level," the x-axis should be
labeled "Year," and the y-axis should be labeled "Sea Level (inches)."
Test Case 6: Data Points Validation
Action: Verify that the data points plotted on the scatter plot match the data
in the CSV file.
Expected Result: The coordinates of the points on the scatter plot should
exactly match the corresponding values in the CSV file.
Test Case 7: Verify Plot Saving
Action: Verify that the plot is saved as a PNG file.
Expected Result: A file named sea_level_plot.png should be saved in the
working directory.
Test Case 8: Verify Year Range for First Line of Best Fit
Action: Verify that the red line of best fit spans from 1880 to 2050.
Expected Result: The red line should correctly cover the years from 1880 to
2050.
Test Case 9: Verify Year Range for Second Line of Best Fit
Action: Verify that the green line of best fit spans from 2000 to 2050.

17
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Expected Result: The green line should correctly cover the years from 2000
to 2050.
Test Case 10: Verify Plot Display
Action: Verify that the plot is displayed after calling plt.show().
Expected Result: The plot should open in a window with the scatter plot and
the two lines of best fit.
Test Case 11: No Data File Error
Action: Run the application without the epa-sea-level.csv file present.
Expected Result: The program should raise a file-not-found error or handle
the error gracefully with an appropriate message.
Test Case 12: Plot Data Accuracy
Action: Check if the values on the plot are correctly calculated.
Expected Result: The plotted points and lines of best fit should reflect
accurate data points based on the CSV file's data.

18
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Result:

19
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Observation and Results:


Test Case 1: UI Initialization
Observation: The application loads successfully. The plot is displayed with
proper labels on the x-axis ("Year") and y-axis ("Sea Level (inches)"). The
title "Rise in Sea Level" is shown at the top, and the scatter plot is
populated with data points.
Result: Pass
Test Case 2: Verify Data Plot
Observation: The scatter plot correctly displays the data points
corresponding to the "Year" and "CSIRO Adjusted Sea Level" columns from
the CSV file. All points are plotted without error.
Result: Pass
Test Case 3: Verify Line of Best Fit for Entire Dataset
Observation: The red line of best fit, representing the linear trend over the
entire dataset (from 1880 onwards), is correctly drawn, matching the
expected trend line based on the data.
Result: Pass
Test Case 4: Verify Line of Best Fit for Recent Data (2000 onwards)
Observation: The green line of best fit, representing the trend for data
from 2000 to the most recent year, is accurately plotted and matches the
linear trend for the recent data.
Result: Pass
Test Case 5: Plot Label Verification
Observation: The plot is correctly labeled with the title "Rise in Sea Level,"
and the axes are labeled as "Year" and "Sea Level (inches)" as expected.
Result: Pass

20
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Test Case 6: Data Points Validation


Observation: The data points plotted on the graph exactly match the
values from the CSV file, confirming that the data is correctly parsed and
visualized.
Result: Pass
Test Case 7: Verify Plot Saving
Observation: The plot is successfully saved as a PNG file named
sea_level_plot.png in the current directory. The file is accessible and
contains the expected plot.
Result: Pass
Test Case 8: Verify Year Range for First Line of Best Fit
Observation: The red line of best fit spans the years from 1880 to 2050,
covering the entire dataset. The line is drawn across the expected range.
Result: Pass
Test Case 9: Verify Year Range for Second Line of Best Fit
Observation: The green line of best fit spans from 2000 to 2050, covering
the recent data and representing the trend for the last two decades.
Result: Pass
Test Case 10: Verify Plot Display
Observation: The plot is correctly displayed in the application window
after calling plt.show(), with both the scatter plot and the two lines of best
fit visible.
Result: Pass
Test Case 11: No Data File Error
Observation: When the epa-sea-level.csv file is missing, the program
raises an error message indicating that the file is not found. The
application handles the error gracefully.
Result: Pass
21
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Test Case 12: Plot Data Accuracy


Observation: The plot accurately reflects the data in the CSV file, with the
scatter plot and lines of best fit correctly visualizing the sea level data and
trends.
Result: Pass

Chapter.9 Challenges
Basic Challenges and Their Resolutions for the Sea Level
Predictor Project
1. Challenge: Reading and Parsing CSV Data
• Problem: The first challenge was ensuring that the CSV file containing
the sea level data (epa-sea-level.csv) was properly read and parsed into
a panda DataFrame. The data needed to be processed correctly to
extract the year and sea level values for plotting.
• Resolution:
The problem was resolved by using the pd. read_csv () function, which
reliably loads the CSV data into a panda DataFrame. Proper error
handling was implemented to manage situations where the file might
be missing or corrupted. This ensured the application could load the
data without crashing.
2. Challenge: Plotting Large Data Points
• Problem: The dataset contained numerous data points (from 1880 to
the present), and plotting them on a scatter plot required ensuring that
the data was clear and readable. Additionally, the plot needed to
distinguish between historical and recent data trends with different line
fits.

22
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

• Resolution:
The challenge was addressed by configuring the scatter plot with
appropriate axis labels and titles, making sure the data points were
visually distinguishable. The two lines of best fit (one for the entire data
range and one for the data after 2000) were drawn in different colors
(red and green), which helped differentiate the trends visually.
3. Challenge: Handling Different Time Ranges for Line Fitting
• Problem: The project required fitting two different lines: one for the full
range of data (from 1880 to the present) and one for a subset of the
data (from 2000 onward). Ensuring the lines were plotted correctly over
the appropriate time ranges was a key challenge.
• Resolution:
The solution involved using scipy.stats.linregress for linear regression
and applying it to two different datasets: one for the entire range of
years and another filtered for the years from 2000 onward. The lines
were then plotted with proper year ranges using plt.plot() for each
regression result, ensuring the lines matched their respective time
frames.
4. Challenge: Plot Saving and File Handling
• Problem: Saving the plot as a file required ensuring that the file was
saved in the correct format and location, and that the application would
not encounter errors if the file path was invalid.
• Resolution:
The issue was resolved by using plt.savefig() to save the plot as a PNG
file in the current working directory. The file name (sea_level_plot.png)
was chosen to be descriptive, and the function was tested to ensure it

23
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

saved the plot without issues. If the directory was inaccessible,


appropriate error handling was added.
5. Challenge: Handling Missing Data or Errors in the Dataset
• Problem: There was a risk that the data could have missing or corrupted
values, which could cause issues during plotting and regression.
• Resolution:
To address this, the application was designed to read the CSV file and
check for missing data or anomalies. In case of missing data, the code
would either skip over the problematic rows or notify the user about
the issue. Additionally, the program checks if the CSV file exists before
proceeding with the plot generation, ensuring that the application
doesn't crash.

Chapter.10 Conclusion
The project provided hands-on experience in reading and manipulating
data using the pandas library. I learned how to load CSV files into a
panda DataFrame, filter data, and perform operations like linear
regression on the dataset. The project involved creating scatter plots
and line graphs using matplotlib. I learned how to display complex data
points visually and differentiate trends using color and plot styles. Data
visualization is critical for making complex datasets more accessible and
interpretable. It also aids in spotting patterns, trends, and anomalies
quickly, which is essential for data-driven decision-making. learned the
importance of time as a variable and how trends can change over time,
particularly when analyzing environmental data like sea level rise.

24
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Key outcomes:
1.Demonstration of Statistical and Computational Skills
• Outcome: The project applied statistical methods, particularly linear
regression, using the scipy.stats.linregress function. This demonstrated
an understanding of both basic and advanced statistical techniques.
• Significance: This outcome shows proficiency in data analysis and
scientific computing, which are valuable skills in various fields like
environmental science, economics, and engineering.
2. Enhanced Understanding of Time Series Data
• Outcome: The project involved working with time series data, focusing
on understanding how patterns evolve over time and the implications
of these changes. This included analyzing trends, dealing with seasonal
variations, and making predictions based on historical data.
• Significance: Time series analysis is critical in fields such as finance,
climate science, and economics. This project reinforced the importance
of analyzing temporal data and considering time-based trends when
making predictions.
3. Effective Communication of Complex Data Insights
• Outcome: By generating clear and understandable visualizations, the
project successfully communicated the trends and predictions from the
sea level data. This made the results more accessible to non-experts, as
well as providing a compelling case for the urgency of addressing
climate change.
• Significance: Communicating data insights effectively is key to driving
action, especially in areas like environmental sustainability, where
public awareness can lead to better policy and behavioral changes.

25
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

4. Hands-on Experience with Data Preprocessing and Visualization Tools


• Outcome: The project provided practical experience with Python
libraries such as pandas for data manipulation and matplotlib for
visualization, strengthening the ability to handle, analyze, and present
data.
• Significance: Mastery of these tools is essential for data scientists and
analysts working with large datasets in various fields, including climate
research, healthcare, and economics.
5. Contribution to Understanding Environmental Impact
• Outcome: The project contributed to a better understanding of how sea
levels are expected to change in the future, particularly in the context
of global warming. This has broader implications for understanding
environmental changes and their potential effects on ecosystems and
human societies.
• Significance: This outcome highlights the importance of data-driven
insights in environmental science and provides a basis for making
informed decisions related to climate action and adaptation.
6. Development of a Predictive Model for Long-Term Environmental Data
• Outcome: The development of a predictive model for sea level rise
using historical data allows for the application of similar techniques to
other types of environmental or scientific data.
• Significance: Predictive modeling is a powerful tool for various scientific
applications, and the success of this project can serve as a model for
future work in climate modeling, disaster preparedness, and resource
management.

26
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]

Reference Link:

• https://freecodecamp.org
• https://www.epa.gov/climate-indicators/climate-change-indicators-
sea-level

27

You might also like