Professional Documents
Culture Documents
VEERALAKSHMI. M
REGISTER NUMBER:222008660
Ms.K.KANIMOZHI, MCA.,
Assistant Professor,
DEPARTMENT OF COMPUTER SCIENCE
APRIL 2023
SHRI KRISHNASWAMY COLLEGE FOR WOMEN
ANNA NAGAR, CHENNAI-600 040
(AFFILIATED TO UNIVERSITY OF MADRAS)
Accredited By NAAC with B+ Grade & 2(f) Status under UGC Act
DATE:
DECLARATION
I hereby declare that the project work entitled “PLAN FOR TAXIS ENLARGEMENT
EXPLORATORY DATA ANALYSIS” is a record of an original work done by me under the
guidance of Ms.K.KANIMOZHI,MCA., SET, Assistant Professor, Department of Computer
Science, SHRI KRISHNASWAMY COLLEGE FOR WOMEN and this project work is
submitted in the partial fulfillment of the requirements for the award of the degree of Bachelor
of Computer Science.
DATE:
ACKNOWLEDGEMENT
ABSTRACT
1 INTRODUCTION 1
2 SYSTEM ANALYSIS
3 SYSTEM DESIGN
4 SYSTEM IMPLEMENTATION
5 SYSTEM TESTING
6 SOURCE CODE 19
7 CONCLUSION 20
8 FUTURE SCOPE 21
9 REFERENCES 22
ACKNOWLEDGEMENT
I thank the Almighty Lord for his blessings by which I have completed the project
successfully.
I had to express my gratitude to our Dean Dr. R. Revathi, M.A., M.Phil., Ph.D., Shri
Krishnaswamy College for Women, Anna Nagar, Chennai for constant support.
I have to express my gratitude to Dr. Anita Rajendran, M.Com., M.A., MBA., Ph.D.,
Principal, Shri Krishnaswamy College for Women, Anna Nagar, Chennai for allowing me to
execute the work.
I thank all my Department Faculty members for their valuable comments and support. I
have to acknowledge my family members for their constant support and encouragement
throughout the entire program.
ABSTRACT
ABSTRACT
The title of the project “Plan for Taxis Enlargement Exploratory Data Analysis”
relates to the operations that we can implement for the growth of taxi enlargement. Taxis also
known as taxis or merely a taxi. This is a type of vehicle for with one driver, used by one or
several passenger. A taxi transports passengers between their preferred locations. Today we
can find plenty of taxi companies like Uber, Rapido, ola and so on. Challenges for taxi
associations due to high adaptability of ride-hailing apps like ola, uber and lyft. To speed up the
game, a perfect strategy must be in place. The objective of the project is to identify the
parameters to be targeted for the extension of taxis and suggestions for the extension of taxis.
Using data analytics, we can find possible ways to enlarge taxis easily.
INTRODUCTION
INTRODUCTION
A typical taxi company faces a common problem of efficiently assigning the cabs to
passengers so that the service is smooth and hassle free. Today we can find plenty of taxi
companies like Uber, Rapido, ola and so on. Challenges for taxi associations due to high
adaptability of ride-hailing apps like ola, uber and lyft. To speed up the game, a perfect strategy
must be in place. The taxis data set contains the data regarding several taxi trips and its duration
in New York City. The parameters to be targeted for the extension of taxis and suggestions for
the extension of taxis. Using data analytics, we can find possible ways to enlarge taxis easily.
1
SYSTEM ANALYSIS
SYSTEM ANALYSIS
HARDWARE REQUIREMENTS
The hardware requirements may serve as the basis for the implementation of the system
and should therefore be a complete and consistent specification of the whole system. It shows
what the systems doand not how it should be implemented.
SOFTWARE REQUIREMENTS
The software requirements are the specification of the system. It should include both the
definition and a specification of requirements. It is a set of what the system should do rather
than how it should do it. The software requirements provide a basis for creating the software
requirements specification. It is useful in estimating cost, planning team activities, performing
tasks and tracking the team’s progress throughout the development activity.
EXISTING SYSTEM
Place where people book more taxis and place where the company charge more fare
parameters are clearly defined.
The existing approaches are not adequate to predict the parameters for taxis enlargement.
It contains worthless variables (tolls, color) data visualization for taxis enlargement plan.
DISADVANTAGE
Insufficient
Usage of worthless variables data visualization for taxis enlargement.
PROPOSED SYSTEM
ADVANTAGE
GOOGLE COLAB
Colab is a free Jupyter notebook environment that runs entirely in the cloud. Most
importantly, it does not require a setup and the notebooks that you create can be simultaneously
edited by your team members - just the way you edit documents in Google Docs.
5
PYTHON
Python has a in-built mathematical libraries and functions, making it easier to calculate
mathematical problems and to perform data analysis. Python is a programming language
widely used by Data Scientists.
Python is open source, interpreted, high level language and provides a great approach for
object-oriented programming. It is one of the best languages used by data scientists for
various data science projects/applications.
Python provides great functionality to deal with mathematics, statistics and scientific
function. It provides great libraries to deal with data science applications. One of the
main reasons why Python is widely used in the scientific and research communities is
because of its ease of use and simple syntax which makes it easy to adapt for people.
In terms of application areas, ML scientists prefer Python as well. When it comes to
areas like building fraud detection algorithms and network security, while for
applications like Natural Language Processing (NLP) and sentiment analysis, developers
opted for Python, because it provides a large collection of libraries that help to solve
complex business problems easily, build strong systems and data applications.
PYTHON LIBRARIES
SEABORN
6
SYSTEM DESIGN
SYSTEM DESIGN
DATASET
The in-built taxis data set contains the data regarding several taxi trips and its duration in
New York City.
7
A snapshot of our dataset is depicted.
DATA PREPROCESSING
8
Fig.4 Data cleaning process
9
EXPLORATORY DATA ANALYSIS
1
IMPLEMENTATION
The data should be imported, checked and eliminate the duplicate and missing values i.e.
the data should be cleaned, before moving to data visualization.
Seaborn library is imported and taxis dataset is loaded. head() function is used to display
the first 5 rows(by default) and the attributes are noted. Parameters such as place,
payment method, tips and fare are focused to determine,
o The place where people book more taxis.
o The passengers payment method, to determine their financial status.
o The destination where the customers accord more tips.
o The place where the company charges more fare.
The shape() function is used to check the shape of the taxis dataset before and after using
drop_duplicates() function and the shape size remains negligible i.e. it does not affect the
next process.
The dtypes function is used to check whether each and every attributes holds its
respective datatypes and it does.
The isnull.sum() function is used to find the total number of missing values in dataset.
Since the isnull.sum() function returns a value around 45 which is considered as
negligible value, the dataset is prepared for data visualization.
DATA VISUALIZATION
Line plot is used to determine the place where people book more taxis and from the plot,
we interpret that people book more taxis from Manhattan. (Taxis dataset covers taxi data
in a few locations in New York.)
Count plot is used to determine the average financial status of the passengers based on
the payment method and from the plot, we interpret that, the passengers use credit cards
for the payment.
Bar plot is used to determine the destination where the customers accord more tips and
the destination where the company charges more fare and based on this plot we interpret
that Staten Island passengers accord more tips and high fare is charge. (So, we can
implement a discount plan for the regular customers in a way both the sides are not
affected.)
1
INTERPRETATION
Passengers book more taxis from Manhattan borough and their payments are done via
credit card.
Staten Island destination people is highly charge and the passengers accord more tips.
GENERAL SUGGESTION
Taxi companies can increase their offers and discount for high fare charged places,
where the people book more taxis and accord more tips.
Get suggestions from Data Analyst, to find the parameters to be targeted for the
betterment of taxis growth for their companies.
UML DIAGRAMS
UML, which stands for Unified Modeling Language, is a way to visually represent the
architecture, design, and implementation of complex software systems. When you’re
writing code, there are thousands of lines in an application, and it’s difficult to keep track
of the relationships and hierarchies within a software system. UML diagrams divide that
software system into components and subcomponents.
UML diagrams can help engineering teams:
1
ACTIVITY DIAGRAM
Activity diagram is basically a flowchart to represent the flow from one activity to
another activity. The activity can be described as an operation of the system. The control flow is
drawn from one operation to another. This flow can be sequential, branched, or concurrent.
1
4.2 SCREEN SHOTS
DATA SET
1
LINE PLOT DIAGRAM
1
COUNT PLOT DIAGRAM
1
BAR PLOT DIAGRAM
1
SYSTEM TESTING
SYSTEM TESTING
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub-assemblies and/or a finished product.
It is the process of exercising software with the intent of ensuring that the software
system meets its requirements and user expectations and does not fail in an unacceptable
manner. There are various types of test. Each test type addresses a specific testing
requirement.
TYPES OF TESTS
UNIT TESTING
Unit testing is a software development process in which the smallest testable parts of
an application, called units, are individually and independently scrutinized for proper
operation.
This testing methodology is done during the development process by the software
developers and sometimes QA staff. The main objective of unit testing is to isolate
written code to test and determine if it works as intended.
Black box testing involves testing a system with no prior knowledge of its internal
workings. A tester provides an input, and observes the output generated by the system
under test.
This makes it possible to identify how the system responds to expected and unexpected
user actions, its response time, usability issues and reliability issues.
DATA VISUALIZATION
LINE PLOT
sns.linepot(x=”pickup_borough”,y=”passengers”,data=df)
COUNT PLOT
sns.countplot(x=”payment”,hue=”pickup_borough”,data=df)
BAR PLOT
sns.barplot(x=”dropoff_borough”,y=”tip”,data=df)
sns.barplot(x=”dropoff_borough”,y=”fare”,data=df)
1
CONCLUSION
CONCLUSION
As the world becomes more data-driven, many companies are using data analysis to
improve their decision-making capabilities. Businesses can use the information they
gather from data analysis to learn more about their customers, target audience,
competitors and changes in their industry.
Thus in this project, taxis dataset have been imported and cleaned for data visualization.
Line plot, Count plot, Bar plot are used to find the betterment ways for taxis
enlargement.
The data exploratory analysis results have shown that our interpretations can perform
effectively for taxis enlargement.
2
FUTURE SCOPE
FUTURE SCOPE
The future of the taxi industry is digital whether we like it or not and cab firms need to be
investing in digitalization while they can.
A prime example of a successful online taxi app is Uber. You can use it globally and order
a cab immediately by tapping one button.
In future work, it has been planned to expand for the next stage of data science process i.e.
Build model using various machine learning techniques and find more possible parameters
for taxis enlargement.
2
REFERENCES
REFERENCES
[1] https://www.lucidchart.com/blog/types-of-UML-diagrams
[2] https://www.tutorialspoint.com/uml/uml_activity_diagram.htm
[3] www.javatpoint.com
[4] https://devopedia.org/exploratory-data-analysis
[5] https://colab.research.google.com/
[6] https://medium.com/analytics-vidhya/exploratory-data-analysis-of-nyc-taxi-trip-
duration-dataset-using-python-257fdef2749e
[7] https://www.techtarget.com/searchsoftwarequality/definition/unit- testing#:~:text=Unit
%20testing%20is%20a%20software,independently%20scrutinized%20for% 20proper
%20operation
[8] https://www.imperva.com/learn/application-security/black-box- testing/#:~:text=Black
%20box%20testing%20involves%20testing,by%20the%20system%20und er%20test
[9] https://www.imperva.com/learn/application-security/white-box- testing/#:~:text=White
%20box%20testing%20is%20an,and%20integrations%20with%20extern al%20systems