Sea Level Predictor Project Overview
Sea Level Predictor Project Overview
Submitted By:
1. Mitali Hirapara:4113
2.Dhwani Trivedi:4179
Group id: 56
Date of submission:23/12/2024
Submitted To
1
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
Table Of Contents
2. Problem statement 4
3. Basic concepts 5
4. Requirement 7
Analysis
5. Dataset 9
6. Proposed solution 10
7. Implementation 12
8. Testing 16
9. Challenges 22
10. Conclusion 24
2
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
3
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
• The test_module.py file verifies the accuracy of plot titles, axis labels,
tick marks, and regression lines.
• Ensures the correctness of data points displayed on the scatter plot.
4
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
5
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
5.Visualization:
o Utilizing Matplotlib, the project creates a clear and labeled visual
representation, including axes, titles, and regression lines.
6.Unit Testing:
o Purpose: Verifies the correctness of the plotting and calculations.
o Tests Include:
▪ Plot title and axis labels.
▪ Consistency of x-axis tick marks and plotted data points.
▪ Correctness of regression line slopes and data
representation.
Introduction to AI Techniques Used in This Project:
1. Linear Regression:
o A foundational machine learning technique used here to analyze
the relationship between years and sea levels.
o Calculated using SciPy's linregress function, which provides the
slope, intercept, and other metrics for the regression line.
2. Data Preprocessing and Analysis:
o Pandas: Used for reading and managing the data in the CSV file.
o Filtering data (e.g., selecting years after 2000) ensures the model
captures recent sea level trends.
6
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
7
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
• Data Sources:
o Historical sea level data from organizations like the National
Oceanic and Atmospheric Administration (NOAA) and the
Intergovernmental Panel on Climate Change 1 (IPCC).
Basic System Requirements:
• Data Storage:
o A database (e.g., PostgreSQL, MySQL) to store historical sea level
data, model parameters, and predictions.
• Data Processing:
o Sufficient computational resources (CPU, memory) to handle data
cleaning, preprocessing, and feature engineering.
• Model Training:
o GPUs or TPUs for accelerating deep learning model training.
• Deployment:
o A web server to host the application and a database to store user
data and predictions.
• Security:
o Secure authentication and authorization mechanisms to protect
user data and model predictions.
• Scalability:
o The ability to handle increasing data volumes and user traffic.
8
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
Chapter.5 Datasets
Source code:
source code is part of a project designed to predict sea level rise based on
historical data. The project uses Python libraries such as Pandas, Matplotlib,
and SciPy to process and visualize data, along with unit testing for validating
the generated plots.
Overview of the Data:
• Dataset: The dataset used in this project contains historical sea level data
from the epa-sea-level.csv file. This data includes two primary columns: Year
and CSIRO Adjusted Sea Level. • Input Data: The input data consists of sea
level measurements over time, recorded annually from the year 1880 to the
present. • Textual Data: The data is numerical rather than textual and
represents the adjusted sea levels in inches over time.
The dataset is used to analyze the rise in sea levels over the years, with a
focus on fitting linear regression models to predict future trends. This dataset
is crucial for environmental analysis and understanding the implications of
climate change.
Basic Data Cleaning:
Basic data cleaning is essential to ensure the dataset is in a suitable form for
analysis. While the provided project does not explicitly mention cleaning the
raw data, the general steps for data cleaning would typically involve:
1. Handling Missing Data: Identifying any missing or null values in the
dataset and deciding whether to remove them or impute them based
on other available information.
9
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
10
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
11
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
Chapter.7 Implementation
Step-by-Step Process for Testing the Sea Level Predictor Plot:
1. Test Setup: A unit test is created for verifying the functionality of the
sea_level_predictor.draw_plot() function. The plot is generated, and the
axis object (ax) is retrieved to perform assertions on it. This setup is
done using the setUp() method of the unittest.TestCase class, ensuring
that the plot is drawn before each test.
2. Testing Plot Title: The test case verifies that the title of the plot is
correctly set to "Rise in Sea Level". The get_title() method is called on
the axis object to check if it matches the expected value.
3. Testing Plot Labels: The test checks that both the x-axis and y-axis
labels are set correctly. The expected labels are "Year" for the x-axis and
"Sea Level (inches)" for the y-axis. The get_xlabel() and get_ylabel()
methods are used to retrieve the labels and assert their correctness.
Additionally, the test verifies that the x-ticks are set to a specific list of
years.
4. Testing Data Points: The test case checks if the data points on the plot
match the expected values. It retrieves the data points plotted by the
draw_plot() function and compares them with a predefined list of
expected values. This ensures that the correct data is being visualized in
the plot.
Code:
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress
12
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
def draw_plot():
# Read data from file
df = pd.read_csv("epa-sea-level.csv")
13
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
plt.savefig('sea_level_plot.png')
plt.show()
# return plt.gca()
draw_ plot()
14
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
• This function from the pandas library is used to read the CSV file
containing the sea level data. It loads the data into a pandas DataFrame
(df), which is a 2-dimensional labeled data structure that can easily be
used for analysis and manipulation.
3. linregress()
• This function from the scipy.stats module performs linear regression on
a set of data. It calculates the best-fit line for a given set of data points
(in this case, "Year" vs. "CSIRO Adjusted Sea Level") and returns the
slope, intercept, and other related statistics. In this code, it is used
twice: once for the full data range and once for the data starting from
the year 2000.
4. plt.scatter()
• This function from matplotlib.pyplot is used to create a scatter plot,
where individual data points are plotted as dots on a 2D plane. In the
code, it is used to visualize the sea level data over time by plotting
"Year" against "CSIRO Adjusted Sea Level."
5. plt.plot()
• This function is used to plot lines on the graph. In this code, it is used to
plot the two lines of best fit calculated with linregress(). The first line
fits the entire dataset, and the second line fits only the data from 2000
onward. Each line is drawn using a range of years and the
corresponding sea level values based on the linear regression
equations.
6. plt.xlabel() and plt.ylabel()
• These functions set the labels for the x-axis and y-axis of the plot. In the
code, they label the x-axis as "Year" and the y-axis as "Sea Level
(inches)."
15
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
7. plt.title()
• This function sets the title of the plot. In this code, the title is set to
"Rise in Sea Level" to give context to the plot.
8. plt.savefig()
• This function saves the current plot as an image file. In the code, the
plot is saved as a PNG file named sea_level_plot.png.
9. plt.show()
• This function displays the plot in a window, allowing the user to view it
interactively. It is the final step after all the plotting and customization
have been done.
Chapter.8 Testing:
Simple Test Cases for Sea Level Predictor
Test Case 1: UI Initialization
Action: Run the application.
Expected Result: The application should open with the sea level plot, title
"Rise in Sea Level," and axis labels "Year" and "Sea Level (inches)."
Test Case 2: Verify Data Plot
Action: Check if the scatter plot is generated correctly using the data from
the CSV file.
Expected Result: Data points should be plotted according to the "Year" and
"CSIRO Adjusted Sea Level" columns from the dataset.
Test Case 3: Verify Line of Best Fit for Entire Dataset
Action: Verify that the red line of best fit is correctly plotted for the entire
dataset from 1880 to the most recent year.
16
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
Expected Result: The red line should represent a linear trend based on the
full range of data.
Test Case 4: Verify Line of Best Fit for Recent Data (2000 onwards)
Action: Verify that the green line of best fit is plotted for the data starting
from the year 2000.
Expected Result: The green line should represent the linear trend for the
data from the year 2000 to the most recent year.
Test Case 5: Plot Label Verification
Action: Check the labels and title of the plot.
Expected Result: The title should be "Rise in Sea Level," the x-axis should be
labeled "Year," and the y-axis should be labeled "Sea Level (inches)."
Test Case 6: Data Points Validation
Action: Verify that the data points plotted on the scatter plot match the data
in the CSV file.
Expected Result: The coordinates of the points on the scatter plot should
exactly match the corresponding values in the CSV file.
Test Case 7: Verify Plot Saving
Action: Verify that the plot is saved as a PNG file.
Expected Result: A file named sea_level_plot.png should be saved in the
working directory.
Test Case 8: Verify Year Range for First Line of Best Fit
Action: Verify that the red line of best fit spans from 1880 to 2050.
Expected Result: The red line should correctly cover the years from 1880 to
2050.
Test Case 9: Verify Year Range for Second Line of Best Fit
Action: Verify that the green line of best fit spans from 2000 to 2050.
17
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
Expected Result: The green line should correctly cover the years from 2000
to 2050.
Test Case 10: Verify Plot Display
Action: Verify that the plot is displayed after calling plt.show().
Expected Result: The plot should open in a window with the scatter plot and
the two lines of best fit.
Test Case 11: No Data File Error
Action: Run the application without the epa-sea-level.csv file present.
Expected Result: The program should raise a file-not-found error or handle
the error gracefully with an appropriate message.
Test Case 12: Plot Data Accuracy
Action: Check if the values on the plot are correctly calculated.
Expected Result: The plotted points and lines of best fit should reflect
accurate data points based on the CSV file's data.
18
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
Result:
19
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
20
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
Chapter.9 Challenges
Basic Challenges and Their Resolutions for the Sea Level
Predictor Project
1. Challenge: Reading and Parsing CSV Data
• Problem: The first challenge was ensuring that the CSV file containing
the sea level data (epa-sea-level.csv) was properly read and parsed into
a panda DataFrame. The data needed to be processed correctly to
extract the year and sea level values for plotting.
• Resolution:
The problem was resolved by using the pd. read_csv () function, which
reliably loads the CSV data into a panda DataFrame. Proper error
handling was implemented to manage situations where the file might
be missing or corrupted. This ensured the application could load the
data without crashing.
2. Challenge: Plotting Large Data Points
• Problem: The dataset contained numerous data points (from 1880 to
the present), and plotting them on a scatter plot required ensuring that
the data was clear and readable. Additionally, the plot needed to
distinguish between historical and recent data trends with different line
fits.
22
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
• Resolution:
The challenge was addressed by configuring the scatter plot with
appropriate axis labels and titles, making sure the data points were
visually distinguishable. The two lines of best fit (one for the entire data
range and one for the data after 2000) were drawn in different colors
(red and green), which helped differentiate the trends visually.
3. Challenge: Handling Different Time Ranges for Line Fitting
• Problem: The project required fitting two different lines: one for the full
range of data (from 1880 to the present) and one for a subset of the
data (from 2000 onward). Ensuring the lines were plotted correctly over
the appropriate time ranges was a key challenge.
• Resolution:
The solution involved using scipy.stats.linregress for linear regression
and applying it to two different datasets: one for the entire range of
years and another filtered for the years from 2000 onward. The lines
were then plotted with proper year ranges using plt.plot() for each
regression result, ensuring the lines matched their respective time
frames.
4. Challenge: Plot Saving and File Handling
• Problem: Saving the plot as a file required ensuring that the file was
saved in the correct format and location, and that the application would
not encounter errors if the file path was invalid.
• Resolution:
The issue was resolved by using plt.savefig() to save the plot as a PNG
file in the current working directory. The file name (sea_level_plot.png)
was chosen to be descriptive, and the function was tested to ensure it
23
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
Chapter.10 Conclusion
The project provided hands-on experience in reading and manipulating
data using the pandas library. I learned how to load CSV files into a
panda DataFrame, filter data, and perform operations like linear
regression on the dataset. The project involved creating scatter plots
and line graphs using matplotlib. I learned how to display complex data
points visually and differentiate trends using color and plot styles. Data
visualization is critical for making complex datasets more accessible and
interpretable. It also aids in spotting patterns, trends, and anomalies
quickly, which is essential for data-driven decision-making. learned the
importance of time as a variable and how trends can change over time,
particularly when analyzing environmental data like sea level rise.
24
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
Key outcomes:
1.Demonstration of Statistical and Computational Skills
• Outcome: The project applied statistical methods, particularly linear
regression, using the scipy.stats.linregress function. This demonstrated
an understanding of both basic and advanced statistical techniques.
• Significance: This outcome shows proficiency in data analysis and
scientific computing, which are valuable skills in various fields like
environmental science, economics, and engineering.
2. Enhanced Understanding of Time Series Data
• Outcome: The project involved working with time series data, focusing
on understanding how patterns evolve over time and the implications
of these changes. This included analyzing trends, dealing with seasonal
variations, and making predictions based on historical data.
• Significance: Time series analysis is critical in fields such as finance,
climate science, and economics. This project reinforced the importance
of analyzing temporal data and considering time-based trends when
making predictions.
3. Effective Communication of Complex Data Insights
• Outcome: By generating clear and understandable visualizations, the
project successfully communicated the trends and predictions from the
sea level data. This made the results more accessible to non-experts, as
well as providing a compelling case for the urgency of addressing
climate change.
• Significance: Communicating data insights effectively is key to driving
action, especially in areas like environmental sustainability, where
public awareness can lead to better policy and behavioral changes.
25
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
26
GUJARAT UNIVERSITY
K.S. SCHOOL OF BUSINESS MANAGEMENT
[FIVE YEARS’ FULL-TIME M.Sc. (CA&IT) INTEGRATED DEGREE COURSE]
Reference Link:
• https://freecodecamp.org
• https://www.epa.gov/climate-indicators/climate-change-indicators-
sea-level
27