You are on page 1of 21

DATA SCIENCE CAPSTONE

PROJECT
By

ABDUL SAMAD
Outline
 Executive Summary
 Introduction
 Methodology
 Results
 Conclusion
 Appendix
Executive Summary
 This project collectsdata from SpaceX website
and Wiki to build a predictive model whether a
rocket will launch successfully or not.
 We plots many charts for a better view and
tests many predictive classifications(K-Nearest
Neighbor wasthe best for our problem
Introduction
 Our company want to bid against SpaceX – the
cheapest rocket launches in the market with
Falcon 9
 We want to determinethe cost for first stage
and from that calculating the cost of a launch.
 Our questionis how to determine a launch will
land successful before itstarts?
Methodology
Data collectionmethodology:
Data was requested from SpaceX website (for information of
launches) and scraping from Wiki (for result of eachlaunch)
Perform datawrangling:
Change type of some columns, limit date fo launches in
dataset, replace missing data with mean, change all
independent variable into (0, 1) using one-hotencoding
Perform exploratory data analysis(EDA) using visualization
andSQL
Perform interactive visual analyticsusing Foliumand Plotly
Dash Perform predictiveanalysis using classification models
 How to build, tune, evaluate classificati
Methodology
Data collection – SpaceX API

•Request rocket launch data from SpaceX API and save


in a dataframe

• Take subset of features weneed


• Reformat some features and limit dates of launches

• Create a dict with all featues names we want as keys


•Use given function to extract data fromabove
dataframe to ourdict

Convert dict to dataframe and save as csv file for next


step
Data collection – Web scraping

• Extract title • Extract names ofcolumns

• Create dictionary withthose columns and some


extracolumns

• Looop thru tables and each row in table and add rows to
above dictionary

• Change dict to dataframe and saveas csv file for next stage
Data wrangling

• Check for number of nulls and types of each column

• Count values of each launch site and types of Orbit for


better understaning of ourdataset

• Create a landing outcome label for Outcome column with


0 for failed and 1 for success

• Save dataframe as csv file for further steps


EDA with data visualization
Bar chart was plotted to have visually check
between many types of a variable affecting
success rate
Scatter plot was plotted to visually see
correlationbetween two variables
Line chart was plotted to seehow one variable
was changing thru times
EDA with SQL
 Its purpose is to have better understand of ourdataset
Queried to have names of all unique launch sites, sort our
dataset with many condition such as: Payload mass
carried boosterslaunch by NASA Launchsite start with
string 'CCA' Average payloadmass of booster version F9
v1.1 Listthe date of first successful landingoutcome in
ground Names of boosterswhich have successin drone
ship and have payload greaterthan 4000 and lessthan
6000 Total numberif successful and failuremisson
outcome Names of booster versions which have carried
the maximum payloadmass Months,booster versions and
launch site of failure in drone ship in 2015 Rank the count
of successlandingoutcome from 2010-06-04 to 2017-3-20
Build an interactive map with Folium
 Circlemarker was added to mark locationof all
launch site Cluster Marker was added to see
total number of launches of each site and label
success and failed launches when clicking into
each site onmap Marker was added to mark
proximities of each site and a polyline was
added to connect a certainproximitie toa launch
site, label by distance in km
Build a Dashboard with Plotly Dash
 A pie chart to show totalsuccessful launches
count for all sites or show failed vs. Success if a
sepecific launch site wasselected A scatter plot
to show correlationbetweenpayload mass (kg)
and success rate Pie chart is a good way to
show totalfor all launches and a scatter plot is
also good for showing correlation. Interactive
plot can change range of payload mass to give
us a better look in which payload mass may
perform best.
Predictive analysis (Classification)
• Use y set from class dataset from part 2
• Use x set of all one hot encoding feature from dataset part
1 3

• Convert target set y set to numpy object


• Standerized data in X
2 • Split out dataset into train and test dataset

• Chose best parameter and fit 4 models


• Calculate accuracy
3 • View confusion matrix of each model.
Results

Interactive data
Exploatry data Predictive
analytics demo
analysis result analysis result
in screenshots
Flight Number vs. LaunchSite
Payload vs. LaunchSite
Success rate vs. Orbit type
Flight Number vs. Orbit type
Payload vs. Orbittype
Launch sucess yearly trend

You might also like