You are on page 1of 38

SHRI KRISHNASWAMY COLLEGE FOR WOMEN

ANNA NAGAR, CHENNAI-600 040


(AFFILIATED TO UNIVERSITY OF MADRAS)
Accredited By NAAC with B+ Grade & 2(f) Status under UGC Act

PLAN FOR TAXIS ENLARGEMENT


EXPLORATORY DATA ANALYSIS
A Mini Project Report submitted in partial fulfilment of the requirement for the
award of Degree of
BACHELOR OF COMPUTER SCIENCE
by

VEERALAKSHMI. M

REGISTER NUMBER:222008660

Under the Guidance of

Ms.K.KANIMOZHI, MCA.,
Assistant Professor,
DEPARTMENT OF COMPUTER SCIENCE

APRIL 2023
SHRI KRISHNASWAMY COLLEGE FOR WOMEN
ANNA NAGAR, CHENNAI-600 040
(AFFILIATED TO UNIVERSITY OF MADRAS)
Accredited By NAAC with B+ Grade & 2(f) Status under UGC Act

Certified that this report titled PLAN FOR TAXIS ENLARGEMENT


EXPLORATORY DATA ANALYSIS is a bonafide record of the project work done by
Sri.VEERALAKSHMI.M, Register Number:222008660 under our supervision and
guidance, towards partial fulfillment of the requirement for award of the Degree of B.Sc
Computer Science of Shri Krishnaswamy College for Women.

GUIDE IN-CHARGE HEAD OF THE DEPARTMENT

Submitted for the Viva-voice examination to be held on _

INTERNAL EXAMINER EXTERNAL EXAMINER

DATE:
DECLARATION

I hereby declare that the project work entitled “PLAN FOR TAXIS ENLARGEMENT
EXPLORATORY DATA ANALYSIS” is a record of an original work done by me under the
guidance of Ms.K.KANIMOZHI,MCA., SET, Assistant Professor, Department of Computer
Science, SHRI KRISHNASWAMY COLLEGE FOR WOMEN and this project work is
submitted in the partial fulfillment of the requirements for the award of the degree of Bachelor

of Computer Science.

DATE:

PLACE: SIGNATURE OF THE STUDENT


CONTENTS
S.NO. TOPIC PAGE
NO.

ACKNOWLEDGEMENT

ABSTRACT

1 INTRODUCTION 1

2 SYSTEM ANALYSIS

2.1 SYSTEM REQUIREMENTS SPECIFICATION 2

2.2 EXISTING VS PROPOSED 3

2.3 SOFTWARE DESCRIPTION 4

3 SYSTEM DESIGN

3.1 DATA FLOW DIAGRAM 6

4 SYSTEM IMPLEMENTATION

4.1 MODULE DESCRIPTION 7

4.2 SCREEN SHOTS 14

5 SYSTEM TESTING

UNIT BOX TESTING 18

BLACK BOX TESTING 18

WHITE BOX TESTING 18

6 SOURCE CODE 19

7 CONCLUSION 20

8 FUTURE SCOPE 21

9 REFERENCES 22
ACKNOWLEDGEMENT
I thank the Almighty Lord for his blessings by which I have completed the project
successfully.

I am grateful to Thiru. M.A.K. Balakrishnan, Chairman, Shri Krishnaswamy College


for Women, Anna Nagar, Mr. K.B.K. Raj Mohan, Mr. K.B. Arun and Mr.K.B.Krishnanand,
Directors, Shri Krishnaswamy college for Women for providing me this opportunity to carry out
this project.

I had to express my gratitude to our Dean Dr. R. Revathi, M.A., M.Phil., Ph.D., Shri
Krishnaswamy College for Women, Anna Nagar, Chennai for constant support.

I have to express my gratitude to Dr. Anita Rajendran, M.Com., M.A., MBA., Ph.D.,
Principal, Shri Krishnaswamy College for Women, Anna Nagar, Chennai for allowing me to
execute the work.

I express my thanks to Dr.R.Nirmala, M.C.A., M.Phil., Ph.D., Head, Department of


Computer Science, Shri Krishnaswamy College for Women, Anna Nagar, Chennai for her
guidance.

I express my thanks to Ms.K.Kanimozhi, MCA., SET, Assistant Professor, Department


of Computer Science, Shri Krishnaswamy College for Women, Anna Nagar, Chennai for her
support and technical backing and guidance.

I thank all my Department Faculty members for their valuable comments and support. I
have to acknowledge my family members for their constant support and encouragement
throughout the entire program.
ABSTRACT
ABSTRACT

The title of the project “Plan for Taxis Enlargement Exploratory Data Analysis”
relates to the operations that we can implement for the growth of taxi enlargement. Taxis also
known as taxis or merely a taxi. This is a type of vehicle for with one driver, used by one or
several passenger. A taxi transports passengers between their preferred locations. Today we
can find plenty of taxi companies like Uber, Rapido, ola and so on. Challenges for taxi
associations due to high adaptability of ride-hailing apps like ola, uber and lyft. To speed up the
game, a perfect strategy must be in place. The objective of the project is to identify the
parameters to be targeted for the extension of taxis and suggestions for the extension of taxis.
Using data analytics, we can find possible ways to enlarge taxis easily.
INTRODUCTION
INTRODUCTION

A typical taxi company faces a common problem of efficiently assigning the cabs to
passengers so that the service is smooth and hassle free. Today we can find plenty of taxi
companies like Uber, Rapido, ola and so on. Challenges for taxi associations due to high
adaptability of ride-hailing apps like ola, uber and lyft. To speed up the game, a perfect strategy
must be in place. The taxis data set contains the data regarding several taxi trips and its duration
in New York City. The parameters to be targeted for the extension of taxis and suggestions for
the extension of taxis. Using data analytics, we can find possible ways to enlarge taxis easily.

1
SYSTEM ANALYSIS
SYSTEM ANALYSIS

2.1 SYSTEM REQUIREMENTS

HARDWARE REQUIREMENTS

The hardware requirements may serve as the basis for the implementation of the system
and should therefore be a complete and consistent specification of the whole system. It shows
what the systems doand not how it should be implemented.

PROCESSOR AMD Athlon Silver 3050U with Radeon


Graphics 2.30 GHz
RAM 4GB
SYSTEM TYPE 64-bit operating system, x64-based processor

SOFTWARE REQUIREMENTS

The software requirements are the specification of the system. It should include both the
definition and a specification of requirements. It is a set of what the system should do rather
than how it should do it. The software requirements provide a basis for creating the software
requirements specification. It is useful in estimating cost, planning team activities, performing
tasks and tracking the team’s progress throughout the development activity.

SOFTWARE Google colab


OP ER AT IN G S YS TE M Windows 11
PROGRAMMING LANGUAGE Python
2.2 EXISTING VS PROPOSED

EXISTING SYSTEM

 Place where people book more taxis and place where the company charge more fare
parameters are clearly defined.
 The existing approaches are not adequate to predict the parameters for taxis enlargement.
 It contains worthless variables (tolls, color) data visualization for taxis enlargement plan.

DISADVANTAGE

 Insufficient
 Usage of worthless variables data visualization for taxis enlargement.

PROPOSED SYSTEM

 Our method is adequate to predict, taxis enlargement.


 The unusable parameters are removed and new parameters (tips, payment method ) to be
focused for taxis enlargement are used for data visualization.
 The project results in successful implementation for taxis enlargement.

ADVANTAGE

 Suggestions for taxis enlargement


 Easy to interpret.
2.3 SOFTWARE DESCRIPTION

GOOGLE COLAB

Colab is a free Jupyter notebook environment that runs entirely in the cloud. Most
importantly, it does not require a setup and the notebooks that you create can be simultaneously
edited by your team members - just the way you edit documents in Google Docs.

Fig.1 Google Colaboratory

5
PYTHON

 Python has a in-built mathematical libraries and functions, making it easier to calculate
mathematical problems and to perform data analysis. Python is a programming language
widely used by Data Scientists.
 Python is open source, interpreted, high level language and provides a great approach for
object-oriented programming. It is one of the best languages used by data scientists for
various data science projects/applications.
 Python provides great functionality to deal with mathematics, statistics and scientific
function. It provides great libraries to deal with data science applications. One of the
main reasons why Python is widely used in the scientific and research communities is
because of its ease of use and simple syntax which makes it easy to adapt for people.
 In terms of application areas, ML scientists prefer Python as well. When it comes to
areas like building fraud detection algorithms and network security, while for
applications like Natural Language Processing (NLP) and sentiment analysis, developers
opted for Python, because it provides a large collection of libraries that help to solve
complex business problems easily, build strong systems and data applications.

PYTHON LIBRARIES

SEABORN

 Seaborn is a python data visualization library based on matplotlib.


 It provides a high level interface for drawing attractive and informative Statistical
graphics.

6
SYSTEM DESIGN
SYSTEM DESIGN

3.1 DATA FLOW DIAGRAM

Fig.2 Data flow diagram


SYSTEM
IMPLEMENTATION
SYSTEM IMPLEMENTATION

4.1 MODULES DESCRIPTION


 Dataset collection
 Data preprocessing
 Exploratory Data Analysis

DATASET
The in-built taxis data set contains the data regarding several taxi trips and its duration in
New York City.

There are 6433 rows × 14 columns datasets.


 pickup
 dropoff
 passengers
 distance
 fare
 tip
 tolls
 total
 color
 payment
 pickup_zone
 dropoff_zone
 pickup_borough
 dropoff_boroug

7
A snapshot of our dataset is depicted.

Fig.3 taxis data set

DATA PREPROCESSING

 Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly


formatted, duplicate, or incomplete data within a dataset. When combining multiple data
sources, there are many opportunities for data to be duplicated or mislabeled. If data is
incorrect, outcomes and algorithms are unreliable, even though they may look correct.

8
Fig.4 Data cleaning process

9
EXPLORATORY DATA ANALYSIS

 Exploratory Data analysis (EDA) is used for visualize the datasets


 To visualize the dataset like pie chart, bar chart, box plot, histogram graph etc.,

Fig.5 Exploratory Data Analysis

1
IMPLEMENTATION

 The data should be imported, checked and eliminate the duplicate and missing values i.e.
the data should be cleaned, before moving to data visualization.
 Seaborn library is imported and taxis dataset is loaded. head() function is used to display
the first 5 rows(by default) and the attributes are noted. Parameters such as place,
payment method, tips and fare are focused to determine,
o The place where people book more taxis.
o The passengers payment method, to determine their financial status.
o The destination where the customers accord more tips.
o The place where the company charges more fare.
 The shape() function is used to check the shape of the taxis dataset before and after using
drop_duplicates() function and the shape size remains negligible i.e. it does not affect the
next process.
 The dtypes function is used to check whether each and every attributes holds its
respective datatypes and it does.
 The isnull.sum() function is used to find the total number of missing values in dataset.
Since the isnull.sum() function returns a value around 45 which is considered as
negligible value, the dataset is prepared for data visualization.

DATA VISUALIZATION
 Line plot is used to determine the place where people book more taxis and from the plot,
we interpret that people book more taxis from Manhattan. (Taxis dataset covers taxi data
in a few locations in New York.)
 Count plot is used to determine the average financial status of the passengers based on
the payment method and from the plot, we interpret that, the passengers use credit cards
for the payment.
 Bar plot is used to determine the destination where the customers accord more tips and
the destination where the company charges more fare and based on this plot we interpret
that Staten Island passengers accord more tips and high fare is charge. (So, we can
implement a discount plan for the regular customers in a way both the sides are not
affected.)

1
INTERPRETATION

 Passengers book more taxis from Manhattan borough and their payments are done via
credit card.
 Staten Island destination people is highly charge and the passengers accord more tips.

GENERAL SUGGESTION

 Taxi companies can increase their offers and discount for high fare charged places,
where the people book more taxis and accord more tips.

 Get suggestions from Data Analyst, to find the parameters to be targeted for the
betterment of taxis growth for their companies.

UML DIAGRAMS

 UML, which stands for Unified Modeling Language, is a way to visually represent the
architecture, design, and implementation of complex software systems. When you’re
writing code, there are thousands of lines in an application, and it’s difficult to keep track
of the relationships and hierarchies within a software system. UML diagrams divide that
software system into components and subcomponents.
 UML diagrams can help engineering teams:

o Bring new team members or developers switching teams up to speed quickly.


o Navigate source code.
o Plan out new features before any programming takes place.
o Communicate with technical and non-technical audiences more easily.

1
ACTIVITY DIAGRAM

Activity diagram is basically a flowchart to represent the flow from one activity to
another activity. The activity can be described as an operation of the system. The control flow is
drawn from one operation to another. This flow can be sequential, branched, or concurrent.

Fig.6 Activity diagram

1
4.2 SCREEN SHOTS

DATA SET

1
LINE PLOT DIAGRAM

1
COUNT PLOT DIAGRAM

1
BAR PLOT DIAGRAM

1
SYSTEM TESTING
SYSTEM TESTING

 The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub-assemblies and/or a finished product.
 It is the process of exercising software with the intent of ensuring that the software
system meets its requirements and user expectations and does not fail in an unacceptable
manner. There are various types of test. Each test type addresses a specific testing
requirement.

TYPES OF TESTS

UNIT TESTING
 Unit testing is a software development process in which the smallest testable parts of
an application, called units, are individually and independently scrutinized for proper
operation.
 This testing methodology is done during the development process by the software
developers and sometimes QA staff. The main objective of unit testing is to isolate
written code to test and determine if it works as intended.

BLACK BOX TESTING

 Black box testing involves testing a system with no prior knowledge of its internal
workings. A tester provides an input, and observes the output generated by the system
under test.

 This makes it possible to identify how the system responds to expected and unexpected
user actions, its response time, usability issues and reliability issues.

WHITE BOX TESTING


 White box testing is an approach that allows testers to inspect and verify the inner
workings of a software system—its code, infrastructure, and integrations with external
systems.
SOURCE CODE
SOURCE CODE
import seaborn as sns
df=sns.load_dataset(‘taxis’)
df.head(5)
df.tail()
df.shape
ddf=df.dropna()
ddf.shape
df.dtypes
df.isnull().sum()
df.pickup_borough.unique()
df.payment.unique()
df.passengers.unique()
df.tip.unique()
df.dropoff_borough.unique()
df.fare.max()
df.pickup.unique()

DATA VISUALIZATION

LINE PLOT

sns.linepot(x=”pickup_borough”,y=”passengers”,data=df)

COUNT PLOT

sns.countplot(x=”payment”,hue=”pickup_borough”,data=df)

BAR PLOT

sns.barplot(x=”dropoff_borough”,y=”tip”,data=df)
sns.barplot(x=”dropoff_borough”,y=”fare”,data=df)

1
CONCLUSION
CONCLUSION

 As the world becomes more data-driven, many companies are using data analysis to
improve their decision-making capabilities. Businesses can use the information they
gather from data analysis to learn more about their customers, target audience,
competitors and changes in their industry.
 Thus in this project, taxis dataset have been imported and cleaned for data visualization.
Line plot, Count plot, Bar plot are used to find the betterment ways for taxis
enlargement.
 The data exploratory analysis results have shown that our interpretations can perform
effectively for taxis enlargement.

2
FUTURE SCOPE
FUTURE SCOPE

 The future of the taxi industry is digital whether we like it or not and cab firms need to be
investing in digitalization while they can.
 A prime example of a successful online taxi app is Uber. You can use it globally and order
a cab immediately by tapping one button.
 In future work, it has been planned to expand for the next stage of data science process i.e.
Build model using various machine learning techniques and find more possible parameters
for taxis enlargement.

2
REFERENCES
REFERENCES

[1] https://www.lucidchart.com/blog/types-of-UML-diagrams
[2] https://www.tutorialspoint.com/uml/uml_activity_diagram.htm

[3] www.javatpoint.com
[4] https://devopedia.org/exploratory-data-analysis
[5] https://colab.research.google.com/
[6] https://medium.com/analytics-vidhya/exploratory-data-analysis-of-nyc-taxi-trip-
duration-dataset-using-python-257fdef2749e
[7] https://www.techtarget.com/searchsoftwarequality/definition/unit- testing#:~:text=Unit
%20testing%20is%20a%20software,independently%20scrutinized%20for% 20proper
%20operation
[8] https://www.imperva.com/learn/application-security/black-box- testing/#:~:text=Black
%20box%20testing%20involves%20testing,by%20the%20system%20und er%20test
[9] https://www.imperva.com/learn/application-security/white-box- testing/#:~:text=White
%20box%20testing%20is%20an,and%20integrations%20with%20extern al%20systems

You might also like