Professional Documents
Culture Documents
on
Predictions for successful Launch Sites
Complete at
Coursera: IBM Data Science
Duration
3 May, 2023 to 19th June, 2023
rd
Submitted By
Sahil Banerjee
V/VII Semester
Enrollment No.: 21E1EBADM40P033
I hereby declare that the Industrial Training Report on predictions for successful launch sites completed
at Coursera: IBM data Science is an authentic record of my own work as requirement of Industrial
Training as a part of the V semester syllabus during the period from 3rd May, 2023 to 19th June,
2023 submitted at the Department of Artificial Intelligence and Data Science, Engineering College
Bikaner for the award of the degree of B.Tech. in Artificial Intelligence and Data Science by Bikaner
(Signature of student)
Sahil Banerjee
21EEBAD033
Date: ____________________
Table Of Contents
Certificate
Student declaration
1. Introduction 1-2
3. Python 4-18
4. Project(capstone) 19-24
5. Project(Visualized) 25-28
6. References 29
INTRODUCTION
In this project, some of the tools that were used are python for machine learning, data visualization,
data exploratory analysis and many more. In the later section we will discuss how python works and
what are its uses in data science.
Python is a popular programming language. It was created by Guido van Rossum, and released in
1991.
It is used for:
In 2021 (and again twice in 2022), security updates were expedited, since all Python versions were
insecure (including 2.7[55]) because of security issues leading to possible remote code execution[56]
and web-cache poisoning.[57] In 2022, Python 3.10.4 and 3.9.12 were expedited[58] and 3.8.13,
because of many security issues.[59] When Python 3.9.13 was released in May 2022, it was announced
that the 3.9 series (joining the older series 3.8 and 3.7) would only receive security fixes in the
future.[60] On 7 September 2022, four new releases were made due to a potential denial-of-service
attack: 3.10.7, 3.9.14, 3.8.14, and 3.7.14.
Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping
data from one "raw" data form into another format with the intent of making it more appropriate and
valuable for a variety of downstream purposes such as analytics. The goal of data wrangling is to assure
quality and useful data. Data analysts typically spend most of their time in the process of data
wrangling compared to the actual analysis of the data.
In this course I have gone through various steps that are used in machine learning from training the
data and testing the data to splitting the data to visualizing the data. Definition of machine learning is
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and
study of statistical algorithms that can learn from data and generalize to unseen data, and thus perform
tasks without explicit instructions. Recently, generative artificial neural networks have been able to
surpass many previous approaches in performance.
1
Machine learning approaches have been applied to many fields including large language models,
computer vision, speech recognition, email filtering, agriculture, and medicine, where it is too costly
to develop algorithms to perform the needed tasks. ML is known in its application across business
problems under the name predictive analytics. Although not all machine learning is statistically based,
computational statistics is an important source of the field's methods.
The project that I have made in this course is about the prediction of the correct launching site for a
company named SPACE- Y in which I worked as a data scientist. As a data scientist, my role was to
collect every piece of data that was unstructured from various sources like Wikipedia and from official
sites of Space-x. In this project i have enhanced my skills in languages like python, SQL, R language
etc.
The following points are written to understand what type of practical knowledge i received in this
project:
1. Master the most up-to-date practical skills and knowledge that data scientists use in their daily
roles
2. Import and clean data sets, analyze and visualize data, and build machine learning models and
pipelines
3. Learn the tools, languages, and libraries used by professional data scientists, including Python
and SQL
This project include the applying of various libraries that were used in data cleaning and visualising
them. Some of the libraries that were used include matplotlib, folium, numpy, pandas and many more.
Pandas is a powerful and versatile library that simplifies tasks of data manipulation in Python . Pandas
is built on top of the NumPy library and is particularly well-suited for working with tabular data, such
as spreadsheets or SQL tables. Its versatility and ease of use make it an essential tool for data analysts,
scientists, and engineers working with structured data in Python.
2
Data visualization is the graphical representation of information and data in a pictorial or graphical
format (Visualization of Data could be: charts, graphs, and maps). Data visualization tools provide an
accessible way to see and understand trends, patterns in data, and outliers. Data visualization tools and
technologies are essential to analyzing massive amounts of information and making data-driven
decisions. The concept of using pictures is to understand data that has been used for centuries. General
types of data visualization are Charts, Tables, Graphs, Maps, and Dashboards.
Deep learning is a branch of machine learning which is based on artificial neural networks. It is capable
of learning complex patterns and relationships within data. In deep learning, we don’t need to explicitly
program everything. It has become increasingly popular in recent years due to the advances in
processing power and the availability of large datasets. Because it is based on artificial neural networks
(ANNs) also known as deep neural networks (DNNs). These neural networks are inspired by the
structure and function of the human brain’s biological neurons, and they are designed to learn from
large amounts of data.
1. Deep Learning is a subfield of Machine Learning that involves the use of neural networks to
model and solve complex problems. Neural networks are modeled after the structure and
function of the human brain and consist of layers of interconnected nodes that process and
transform data.
2. The key characteristic of Deep Learning is the use of deep neural networks, which have
multiple layers of interconnected nodes. These networks can learn complex representations of
data by discovering hierarchical patterns and features in the data. Deep Learning algorithms
can automatically learn and improve from data without the need for manual feature
engineering.
3. Deep Learning has achieved significant success in various fields, including image recognition,
natural language processing, speech recognition, and recommendation systems. Some of the
popular Deep Learning architectures include Convolutional Neural Networks (CNNs),
Recurrent Neural Networks (RNNs), and Deep Belief Networks (DBNs).
4. Training deep neural networks typically requires a large amount of data and computational
resources. However, the availability of cloud computing and the development of specialized
hardware, such as Graphics Processing Units (GPUs), has made it easier to train deep neural
networks.
3
PYTHON
• When learning a new programming language, it is customary to start with a "hello world"
example. As simple as it is, this one line of code will ensure that we know how to print a string
in output and how to execute code within cells in a notebook.
There are two popular versions of the Python programming language in use today: Python 2 and Python
3. The Python community has decided to move on from Python 2 to Python 3, and many popular
libraries have announced that they will no longer support Python 2.
Since Python 3 is the future, in this course we will be using it exclusively. How do we know that our
notebook is executed by a Python 3 runtime? We can look in the top-right hand corner of this notebook
and see "Python 3".
We can also ask Python directly and obtain a detailed answer. Try executing the following code:
import sys
print(sys.version)
In addition to writing code, note that it's always a good idea to add comments to your code. It will help
others to understand what you were trying to accomplish (the reason why you wrote a given snippet
of code). Not only does this help other people understand your code, it can also serve as a reminder to
you when you come back to it weeks or months later.
To write comments in Python, use the number symbol # before writing your
comment. When you run your code, Python will ignore everything past the # on a given line.
4
# print('Hi')
After executing the cell above, you should notice that This line prints a string did not appear in the
output, because it was a comment (and thus ignored by Python).
The second line was also not executed because print('Hi') was preceded by the number sign (#) as well!
Since this isn't an explanatory comment from the programmer, but an actual line of code, we might
say that the programmer commented out that second line of code.
• Errors in Python
Everyone makes mistakes. For many types of mistakes, Python will tell you that you have made a
mistake by giving you an error message. It is important to read error messages carefully to really
understand where you made a mistake and how you may go about correcting it.
For example, if you spell print as frint, Python will display an error message. Give it a try:
where the error occurred (more useful in large notebook cells or scripts), and
Here, Python attempted to run the function frint, but could not determine what frint is since it's not a
built-in function and it has not been previously defined by us either.
You'll notice that if we make a different type of mistake, by forgetting to close the string, we'll obtain
a different error (i.e., a SyntaxError). Try it below:
➢ Does Python know about your error before it runs your code
Python is what is called an interpreted language. Compiled languages examine your entire program at
compile time and can warn you about a whole class of errors prior to execution. In contrast, Python
interprets your script line by line as it executes it. Python will stop executing the entire program when
it encounters an error (unless the error is expected and handled by the programmer, a more advanced
subject that we'll cover later on in this course).
Python is an object-oriented language. There are many different types of objects in Python. Let's start
with the most common object types: strings, integers and floats. Anytime you write words (text) in
5
Python, you're using character strings (strings for short). The most common numbers, on the other
hand, are integers (e.g. -1, 0, 100) and floats, which represent real numbers (e.g. 3.14, -42.0).
# Integer
11
# Float
2.14
# String
➢ Integers
We can verify this is the case by using, you guessed it, the type() function:
# Print the type of -1
type(-1)
Int
➢ Floats
Floats represent real numbers; they are a subset of all integers. but also include "numbers with
decimals". There are some limitations when it comes to machines representing real numbers, but
floating point numbers are a good representation in most cases. You can learn more about the specifics
of floats for your runtime environment, by checking the value of sys.float_info. This will also tell you
what's the largest and smallest number that can be represented with them.
Once again, can test some examples with the type() function:
6
type(0.5)
Float
type(2)
Int
"Michael Jackson"
A string can also be a combination of special characters :
# Special characters in string
7
'@#2_#]&*^%$'
• Indexing
It is helpful to think of a string as an ordered sequence. Each element in the sequence can be accessed
using an index represented by the array of numbers
• Negative Indexing
Negative index can help us to count the element from the end of the string.
the last element is given by the index –1
# Print the last element in the string
print(name[-1])
• Slicing
We can obtain multiple characters from a string using slicing, we can obtain the 0 to 4th and 8th to the
12th element
When taking the slice, the first number means the index (start at 0), and the second number means the
length from the index to the last element you want (start at 1)
# Take the slice on variable name with only index 0 to index 3
name[0:4]
• Stride
We can also input a stride value as follows, with the '2' indicating that we are selecting every second
variable
# Get every second element. The elments on index 1, 3, 5 ...
name[::2]
We can also incorporate slicing with the stride. In this case, we select the first five elements and then
use the stride
# Get every second element in the range from index 0 to index 4
name[0:5:2]
8
• Concatenate Strings
We can concatenate or combine strings by using the addition symbols, and the result is a new string
that is a combination of both
# Concatenate two strings
# Use the search() function to search for the pattern in the string
result = re.search(pattern, s1)
9
print("Match found!")
else:
print("Match not found.")
Regular expressions (RegEx) are patterns used to match and manipulate strings of text. There are
several special sequences in RegEx that can be used to match specific characters or patterns.
Table 1.1
Example
"123" matches "\d\d\d"
\D Matches any non-digit character "hello" matches "\D\D\D\D\D"
Matches any word character (a-z, A-Z, 0- "hello_world" matches
\w
9, and _) "\w\w\w\w\w\w\w\w\w"
\W Matches any non-word character "@#$%" matches "\W\W\W\W"
Matches any whitespace character (space, "hello world" matches
\s
tab, newline, etc.) "\w\s\w\w\w\w\w"
"hello_world" matches
\S Matches any non-whitespace character
"\S\S\S\S\S\S\S\S\S"
Matches the boundary between a word "cat" matches "\bcat\b" in "The cat sat
\b
character and a non-word character on the mat"
"cat" matches "\Bcat\B" in "category"
Matches any position that is not a word
\B but not in "The cat sat on the
boundary
mat"
if match:
print("Phone number found:", match.group())
else:
print("No match")
10
The regular expression pattern is defined as r"\d\d\d\d\d\d\d\d\d\d", which uses the \d special sequence
to match any digit character (0-9), and the \d sequence is repeated ten times to match ten consecutive
digits
A simple example of using the \W special sequence in a regular expression pattern with Python code:
pattern = r"\W" # Matches any non-word character
text = "Hello, world!"
matches = re.findall(pattern, text) print("Matches:", matches)
11
APPLICATION PROGRAMMING INTERFACE
• REST APIs
Rest API's function by sending a request, the request is communicated via HTTP message. The HTTP
message usually contains a JSON file. This contains instructions for what operation we would like the
service or resource to perform. In a similar manner, API returns a response, via an HTTP message, this
response is usually contained within a JSON.
In this lab, we will use the NBA API to determine how well the Golden State Warriors performed
against the Toronto Raptors. We will use the API to determine the number of points the Golden State
Warriors won or lost by for each game. So if the value is three, the Golden State Warriors won by three
points. Similarly it the Golden State Warriors lost by two points the result will be negative two. The
API will handle a lot of the details, such a Endpoints and Authentication
It's quite simple to use the nba api to make a request for a specific team. We don't require a JSON, all
we require is an id. This information is stored locally in the API. We import the module teams
To make things easier, we can convert the dictionary to a table. First, we use the function one dict, to
create a dictionary. We use the common keys for each team as the keys, the value is a list; each element
of the list corresponds to the values for each team. We then convert the dictionary to a dataframe, each
row contains the information for a different team.
12
HTTP AND REQUESTS
• Overview of http
When you, the client, use a web page your browser sends an HTTP request to the server where the
page is hosted. The server tries to find the desired resource by default "index.html". If your request is
successful, the server will send the object to the client in an HTTP response. This includes information
like the type of the resource, the length of the resource, and other information.
The figure below represents the process. The circle on the left represents the client, the circle on the
right represents the Web server. The table under the Web server represents a list of resources stored in
the web server. In this case an HTML file, png image, and txt file .
The HTTP protocol allows you to send and receive information through the web including webpages,
images, and other web resources. In this lab, we will provide an overview of the Requests library for
interacting with the HTTP protocol.
Uniform resource locator (URL) is the most popular way to find resources on the web. We can break
the URL into three parts.
Scheme:- This is this protocol, for this lab it will always be http://
Internet address or Base URL :- This will be used to find the location here are some examples:
www.ibm.com and www.gitlab.com
You may also hear the term Uniform Resource Identifier (URI), URL are actually a subset of URIs.
Another popular term is endpoint, this is the URL of an operation provided by a Web server.
• Requests
The process can be broken into the Request and Response process. The request using the get method
is partially illustrated below. In the start line we have the GET method, this is an HTTP method. Also
13
the location of the resource /index.html and the HTTP version. The Request header passes additional
information with an HTTP request
When an HTTP request is made, an HTTP method is sent, this tells the server what action to perform.
A list of several HTTP methods is shown below. We will go over more examples later.
• Response
The figure below represents the response; the response start line contains the version number
HTTP/1.0, a status code (200) meaning success, followed by a descriptive phrase (OK). The response
header contains useful information. Finally, we have the responsebody containing the requested file,
an HTML document. It should be noted that some requests have headers.
Fig. 1.1
14
GENERATING MAPS WITH PYTHON
• Introduction
In this section, we will learn how to create maps for different objectives. To do that, we will part ways
with Matplotlib and work with another Python visualization library, namely Folium. What is nice about
Folium is that it was developed for the sole purpose of visualizing geospatial data. While other libraries
are available to visualize geospatial data, such as plotly, they might have a cap on how many API calls
you can make within a defined time frame. Folium, on the other hand, is completely free.
• Introduction to Folium
Folium is a powerful Python library that helps you create several types of Leaflet maps. The fact that
the Folium results are interactive makes this library very useful for dashboard building.
• Folium builds on the data wrangling strengths of the Python ecosystem and the mapping
strengths of the Leaflet.js library. Manipulate your data in Python, then visualize it in on a
Leaflet map via Folium.
• Folium makes it easy to visualize data that's been manipulated in Python on an interactive
Leaflet map. It enables both the binding of data to a map for choropleth visualizations as well
as passing Vincent/Vega visualizations as markers on the map.
• The library has a number of built-in tilesets from OpenStreetMap, Mapbox, and Stamen, and
supports custom tilesets with Mapbox or Cloudmade API keys. Folium supports both
GeoJSON and TopoJSON overlays, as well as the binding of data to those overlays to create
choropleth maps with color-brewer color schemes.
• Cholorpeth Maps: A Choropleth map is a thematic map in which areas are shaded or
patterned in proportion to the measurement of the statistical variable being displayed on the
map, such as population density or per-capita income. The choropleth map provides an easy
15
way to visualize how a measurement varies across a geographic area, or it shows the level of
variability within a region. Below is a Choropleth map of the US depicting the population by
square mile per state.
Fig 1.2
16
VISUALIZING DATA WITH MATPLOTLIB
The primary plotting library we will explore in the course is Matplotlib. As mentioned on their website:
• Matplotlib.Pyplot
One of the core aspects of Matplotlib is matplotlib.pyplot. It is Matplotlib's scripting layer which we
studied in details in the videos about Matplotlib. Recall that it is a collection of command style
functions that make Matplotlib work like MATLAB. Each pyplot function makes some change to a
figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area,
decorates the plot with labels, etc. In this lab, we will work with the scripting layer to learn how to
generate line plots. In future labs, we will get to work with the Artist layer as well to experiment first
hand how it differs from the scripting layer.
A line chart or line plot is a type of plot which displays information as a series of data points called
'markers' connected by straight line segments. It is a basic type of chart common in many fields. Use
line plot when you have a continuous data set. These are best suited for trend-based visualizations of
data over a period of time.
17
PROJECT (CAPSTONE)
In this capstone, we will predict if the Falcon 9 first stage will land successfully. SpaceX advertises
Falcon 9 rocket launches on its website with a cost of 62 million dollars; other providers cost upward
of 165 million dollars each, much of the savings is because SpaceX can reuse the first stage. Therefore
if we can determine if the first stage will land, we can determine the cost of a launch. This information
can be used if an alternate company wants to bid against SpaceX for a rocket launch. In this lab, you
will collect and make sure the data is in the correct format from an API. The following is an example
of a successful and launch.
Fig. 2.1
Fig 2.2
Below we will define a series of helper functions that will help us use t
he API to extract information using identification numbers in the launch data. From the rocket column
we would like to learn the booster name.
18
Fig. 2.2
From cores we would like to learn the outcome of the landing, the type of the landing, number of
flights with that core, whether gridfins were used, wheter the core is reused, wheter legs were used,
the landing pad used, the block of the core which is a number used to seperate version of cores, the
number of times this specific core has been reused, and the serial of the core.
Fig 2.3
Here we will be performing web scraping to collect Falcon 9 historical launch records from a
Wikipedia page titled List of Falcon 9 and Falcon Heavy launches
Fig 2.4
19
The following table would be obtained for the details:
Fig 2.5
Next, we want to collect all relevant column names from the HTML table header
3. Data Wrangling
In this section we will perform some exploratory data analysis(EDA) to find some patterns in the data
and determine what would be the label for training supervised models.
In the data set, there are several different cases where the booster did not land successfully. Sometimes
a landing was attempted but failed due to an accident; for example, True Ocean means the mission
outcome was successfully landed to a specific region of the ocean while False Ocean means the mission
outcome was unsuccessfully landed to a specific region of the ocean. True RTLS means the mission
outcome was successfully landed to a ground pad False RTLS means the mission outcome was
unsuccessfully landed to a ground pad.True ASDS means the mission outcome was successfully
landed on a drone ship False ASDS means the mission outcome was unsuccessfully landed on a drone
ship.
We will mainly convert those outcomes into Training Labels with 1 means the booster successfully
landed 0 means it was unsuccessful.
Each launch aims to a dedicated orbit, and here are some common orbit types:
20
• LEO: Low Earth orbit (LEO)is an Earth-centred orbit with an altitude of 2,000 km (1,200 mi)
or less (approximately one-third of the radius of Earth),[1] or with at least 11.25 periods per
day (an orbital period of 128 minutes or less) and an eccentricity less than 0.25.[2] Most of the
manmade objects in outer space are in LEO [1].
• VLEO: Very Low Earth Orbits (VLEO) can be defined as the orbits with a mean altitude below
450 km. Operating in these orbits can provide a number of benefits to Earth observation
spacecraft as the spacecraft operates closer to the observation[2].
• GTO A geosynchronous orbit is a high Earth orbit that allows satellites to match Earth's
rotation. Located at 22,236 miles (35,786 kilometers) above Earth's equator, this position is a
valuable spot for monitoring weather, communications and surveillance. Because the satellite
orbits at the same speed that the Earth is turning, the satellite seems to stay in place over a
single longitude, though it may drift north to south,” NASA wrote on its Earth Observatory
website.
• SSO (or SO): It is a Sun-synchronous orbit also called a heliosynchronous orbit is a nearly
polar orbit around a planet, in which the satellite passes over any given point of the planet's
surface at the same local mean solar time.
• MEO Geocentric orbits ranging in altitude from 2,000 km (1,200 mi) to just below
geosynchronous orbit at 35,786 kilometers (22,236 mi). Also known as an intermediate circular
orbit. These are "most commonly at 20,200 kilometers (12,600 mi), or 20,650 kilometers
(12,830 mi), with an orbital period of 12 hours
• PO It is one type of satellites in which a satellite passes above or nearly above both poles of
the body being orbited (usually a planet such as the Earth
Fig 2.5
21
4. First stage landing prediction using machine learning
Space X advertises Falcon 9 rocket launches on its website with a cost of 62 million dollars; other
providers cost upward of 165 million dollars each, much of the savings is because Space X can reuse
the first stage. Therefore, if we can determine if the first stage will land, we can determine the cost of
a launch. This information can be used if an alternate company wants to bid against space X for a
rocket launch.
Some of the source code which were used for the prediction of the landing site are shown below:
Fig 2.6
Fig 2.7
And the confusion matrix which is performed between if the launched landed or not is shown below:
22
Fig 2.8
Examining the confusion matrix, we see that logistic regression can distinguish between the different
classes. We see that the major problem is false positives.
The launch success rate may depend on many factors such as payload mass, orbit type, and so on. It
may also depend on the location and proximities of a launch site, i.e., the initial position of rocket
trajectories. Finding an optimal location for building a launch site certainly involves many factors and
hopefully we could discover some of the factors by analyzing the existing launch site locations.
23
PROJECT(VISUALISED)
In this very module I will show what and how does my project look and works. Although, this project
was made by using Polty Dash app and was hosted or deployed on a local server for the better view of
the visuals and graphs.
Fig 3.1
Fig 3.2
24
Fig 3.2
And the visual representation of my project about its working and its efficiency is shown below:
Fig 3.4
25
Fig 3.5
Fig 3.6
26
Fig 3.7
27
REFRENCES
28