Progress Report: Loading of Required Dataset

PROGRESS REPORT
Project Group Number: 23
Project Title: Titanic Survivors Prediction.
Name of Project Supervisor: Ms. Poonam Dahiya.
Submitted By:Rishabh, Karan, Piyush.
S.
Name of the
N Enroll. No. Branch Mobile No. E-mail address
student
o.
1
2 02215002816 Karan sehgal E.C.E 8826608475 Karansehagl151998@gmail.com
3 04215002816 Rishabh Rawat E.C.E 8750644313 rishabhrawat570@gmail.com
MAHARAJA SURAJMAL INSTITUTE OF TECHNOLOGY
C-4, Janakpuri, New Delhi-110058
Affiliated to Guru Gobind Singh Indraprastha University, Delhi
1. Work done till date

● Loading of required dataset.
1
Data is first downloaded using urlretrieve() function of urllib library. The
data downloaded is then read as a csv file.
Syntax - urlretrieve(url, filename)
The above command downloads the file and stores it at the location
provided in the filename argument.The dataset is then read using
function present in pandas library in DataFrame format which is data
structure suitable for high level data manipulation.
Syntax - dataframe = pd.read_csv(csv_file)
Then We can took a look at the dataframe formed by printing
dataframe.head() function which provides the information of first
five samples.
● PreProcessing of Data.
Preprocessing of data includes things like data cleaning , data

transformation. Data cleaning handles irrelevant/missing parts of
data. For instance, if there is missing data, we cc either fill it using
mean or probability or we can even ignore that tuple containing that
data, feasible if dataset is large. Data transformations means getting
data in appropriate forms suitable for the application of algorithm. It
includes scaling the data to bring the values of each columns or
feature to the same range so that any feature will not be able to bias
the result based on it’s value, converting data type of data for
example converting categorical features to numerical type so that
their information will also contribute in the final result.
2. Difficulties and problems encountered
2
Roadblocks and their workarounds encountered during the progress of the
project are explained below,
❖ Missing values in the dataset
➢ Some of the important pieces of information were found
missing while working on the dataset, it included passengers’
age column.
➢ The first workaround was to take average values of
passengers’ ages and fill it where it is NULL. This was not
giving good results and was not a reliable measure to work
against.
➢ Thinking through various solutions, a better approach was
finally implemented which included taking average age for
every possible category of the passenger i.e., calculating the
average of the ages for all the kids, women, men, young boys,
young girl, separately (and all possible salutations were
included). Then filling NULL values according to the
passenger’s identity.
❖ Incompatible chunk of data present
➢ There were many rows in the dataset containing more than
90% NULL or missing values and they were not contributing
to the accuracy of the prediction.
➢ Filling those missing/NULL values was the goal at first, but
after some evaluations it was clear that dropping those values
was the better option at hand.
❖ Redundant information in the dataset
➢ There were many columns in the dataset that had absolutely
zero contribution to the end result.
3
➢ Dropping those columns was the best way to reduce the
overhead in the computation. These columns included,
● Cabin,
● name,
● survived,
● passengerId, and
● ticket
3. Work plan for next two weeks

Include the summary of the work which will probably be completed in next two weeks.
4. Target date of project completion
Evaluation (To be filled by the supervisor)
4
Percentage of work completed till date: …………………………..
Evaluation Criteria
Name Regularity Progress Timely Total
Enroll.
S.No. of the Branch (02) of work submission of Marks
No.
student done progress report (10)
(06) (02)
1
2
3
Date: (Name & Signature of Supervisor)

Progress Report: Loading of Required Dataset

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Progress Report: Loading of Required Dataset

Uploaded by

Copyright:

Available Formats

PROGRESS REPORT

Project Group Number: 23

Project Title: Titanic Survivors Prediction.

Name of Project Supervisor: Ms. Poonam Dahiya.

Submitted By:Rishabh, Karan, Piyush.

MAHARAJA SURAJMAL INSTITUTE OF TECHNOLOGY

C-4, Janakpuri, New Delhi-110058

Affiliated to Guru Gobind Singh Indraprastha University, Delhi

1. Work done till date

Preprocessing of data includes things like data cleaning , data

2. Difficulties and problems encountered

3. Work plan for next two weeks

4. Target date of project completion

Evaluation (To be filled by the supervisor)

Date: (Name & Signature of Supervisor)

You might also like