Bike Sharing Python Report

Abstract
In recent years, bike-sharing systems have been widely deployed in many big cities, which
provide an economical and healthy lifestyle. With the prevalence of bike-sharing systems, a
lot of companies join the bike-sharing market, leading to increasingly fierce competition. To
be competitive, bike-sharing companies and app developers need to make strategic decisions
and predict the popularity of bike-sharing apps. However, existing works mostly focus on
predicting the popularity of a single app, the popularity contest among different apps has not
been explored yet. We develop Competitive Bike, a system to predict the popularity contest
among bike-sharing apps leveraging multi-source data. We extract two novel types of
features: coarse-grained and fine-grained competitive features, and utilize Random Forest
model to forecast the future competitiveness. In addition, we view mobile apps competition
as a long-term event and generate the event storyline to enrich our competitive analysis. We
collect data about two bike-sharing apps and two food ordering & delivery apps from 11 app
stores and Sina Weibo, implement extensive experimental studies, and the results
demonstrate the effectiveness and generality of our approach.
Chapter 1
Introduction
In recent years, shared transportation has grown tremendously, which provides us an

economical and healthy lifestyle. Among the various forms of shared transportation, public
bike-sharing systems
A user can rent a bike at a nearby bike station, and return it at another bike station near
his/her destination. The worldwide prevalence of bike sharing systems has inspired lots of
active research, addressing interesting topics such as bike demand prediction bike rebalancing
optimization and bike lanes planning
As bike-sharing apps become increasingly popular, a lot of companies join the market,
leading to fierce competition. To thrive in this competitive market, it is vital for bike sharing
companies and app developers to understand their competitors, and then make strategic
decisions for mobile app development and evolution. Therefore, it is significant and
necessary to predict and compare the future popularity of different bike-sharing apps.
Problem statement
The problem of competitive analysis and popularity prediction over bike-sharing apps can be stated
as follows: given the multi-source data (i.e. app store data and microblogging data) about Mobike
and Ofo, we want to predict which app will be more popular in the future.
Objective
Analytics of the bike sharig app
Creating state of art based on the machine learning algorithms

Literature survey
Prototyping design of a low-cost bike sharing system for smart city application
Irfan Gani Purwanda ; Trio Adiono - 2017
The bike is offer advantages compared than smoky vehicles, e.g. it can be used as a
government's program for reducing carbon emissions as well as minimalizing the noise levels
in the city. Moreover, the bike is used as an alternative transportation for tourists who want to
explore the visited city. By seeing the potential of a large application in the bike rentals, some
cities have been installed a “Bike Sharing System”, which is a bike rental system that
involves digital signage as authentication and security process when the user making
transactions in borrowing a bike. But the price of these devices is very expensive in
installation. This paper discusses the design and implementation of bike sharing systems
(BSS) that use simple digital signage technology. It is consist of Liquid Crystal Display
(LCD) as main displays and Near Field Communication (NFC) interaction modules which are
separately developed. The NFC module serves as a reader, and the LCD is used as a user
interface for some functions, including i) registration process, ii) renting the bikes, iii)
returning the bikes, iv) top-up balances and several another information. Based on the results
of the functionality test, the developed system can work well. It is expected that the system
can be applied massively in order to be enjoyed by communities in other Indonesian cities
because our proposed prototype offers more affordable price. Then the scenarios in
implementing our proposed BSS also discussed detail in this paper.
Assessing bike sharing business model
Oerianto Guyandi ; Gunawan Wang - 2017
Bike-sharing program continues to spread widely around the world, offering many benefits,
i.e. increased accessibility, inexpensiveness, a positive environmental impact, improved
public health and increased mobility. Until 2017, there are more than 1.173 cities around the
world has implemented bike-sharing programs and with active bikers has reached around 1.9
million. The fast growth of bike sharing program globally, unfortunately, is not followed with
its trend in Indonesia. Bike sharing idea is still new and hardly can be found in Indonesian
cities. With the status of eighth largest of world economy in 2017, the bike sharing business
is expected to grow along with global trend. The article examined the pilot project of bike
sharing in one of well-known state universities in Indonesia to illustrate the feasibility of bike
sharing business. The article applied mobile application with cloud computing paradigm and
business model canvas (BMC) to illustrate the feasibility of bike sharing business. The
outcome of the paper is expected to be used as reference for growing bike sharing business in
Indonesia.
Bike-Sharing-A new public transportation mode: State of the practice & prospects
Shang Wang ; Jiangman Zhang - 2017
Bike-Sharing, as a new green public transportation mode, has been developed in several
western cities. In this paper we systematically describe the development process and the state
of art for the Bike-Sharing Program in the world. We investigated the characteristics of
bicycle, business model, operating mechanism, current problems for Bike-sharing programs
of the three metropolitan cities in Europe, and we also discussed the development mode and
applicability of Bike- Sharing program. Then we provide guidelines that's where and how to
establish the Bike-Sharing program in developing country with the consideration of the
weather and topography of the target city. Finally, we discuss how to integrate the bike-
sharing to existing transport mode and the future for bike-sharing program.
Methodology
In this project, we present a system that has been developed to facilitate the collection and
use of Bike-Sharing Systems data for research, notably to develop and compare bike usage
forecasting algorithms. We collected internal and external data for six different application
and developed a system providing short and long-term predictions of bikes and slots
availabilities for bike-sharing stations in real-time. In order to provide the best predictions,
we developed and compared the performances of two types of algorithm; the first one is
based on the state-of-the-art k means algorithm and second, SVM algorithm. Our study
demonstrates their applicability, showing better accuracy for short-term predictions with the k
means algorithm/
we propose CompetitiveBike, a system that predicts the outcomes of the popularity

contest among bike-sharing apps leveraging app store data and microblogging data, and then
generates the event storyline of the contest. We first obtain app descriptive statistics and
sentiment information from app store data, and descriptive statistics and comparative
information from microblogging data. Using these data, we extract both coarse-grained and
fine-grained competitive features, we then train a regression model to predict the outcomes of
popularity contest. We finally generate the event storyline to provide competitive analysis
and present the popularity contest. In summary, we make the following contributions.make
the following contributions.
(1) This work is the first to study the problem of competitive analysis and popularity contest
of bike-sharing apps. We use two indicators for the comparison:
 competitive relationship to indicate which app is more popular; and

 competitive intensity to measure the popularity gap between the two apps/systems.
(2) To predict popularity contest, we extract features from different aspects including
inherent descriptive information of apps, users’ sentiment, and comparative opinions. With
this information, we further extract two novel features: coarse-grained and fine-grained
competitive features, and choose Random Forest algorithm for prediction.
(3) To provide competitive analysis, we utilize topic model to analyze the topics in
competing apps, and apply the minimum-weight dominating algorithm to select
representative microblogs. We also generate event storyline to present and visualize the
popularity contest.
(4) To evaluate CompetitiveBike, we collect data about different bike share mobile app stores. With
the data collected, we conduct extensive experiments from different perspectives. We find that the
K means model performs well on competitive relationship prediction as well as competitive intensity
prediction. A combination of the coarse-grained and fine-grained competitive features improves
performance in popularity contest prediction, and a combination of data from app store and
microblogging also improves performance in popularity contest prediction.
The system framework which consists of three layers: data preparation, feature extraction,
and competitive analysis and popularity prediction.
Data Preparation.
Multi-source data are complementary and can reflect mobile app popularity contest from
different perspectives. We obtain app statistics and reviewers’ sentiment from app store data,
and microblogging statistics and comparative information from microblogging data.
Feature Extraction.
To effectively extract and quantify the factors impacting mobile app popularity contest, we
extract features from different perspectives including the inherent descriptive information of
apps, users’ sentiment, and comparative opinions. With this information, we further extract
two novel sets of features: coarse-grained and finegrained competitive features.
Competitive Analysis and Popularity Prediction.
With these two extracted feature sets, we train a model to predict the popularity contest.
Moreover, we view mobile apps competition as a long-term event and generate the event
storyline to support competitive analysis, which can present descriptive information
regarding popularity contest.
Chapter 2
History of organization
Applicable for interns
Chapter 3
Justification for doing project

The bike is offer advantages compared than smoky vehicles, e.g. it can be used as a
government's program for reducing carbon emissions as well as minimalizing the noise levels
in the city. Moreover, the bike is used as an alternative transportation for tourists who want to
explore the visited city. Bike-sharing program continues to spread widely around the world,
offering many benefits, i.e. increased accessibility, inexpensiveness, a positive environmental
impact, improved public health and increased mobility. Bike-Sharing, as a new green public
transportation mode, has been developed in several western cities.
Chapter 4
Software requirement Specification

Software Requirement Specification (SRS) is a fundamental document, which forms the
foundation of the software development process. SRS not only lists the requirements of a
system but also has a description of its major features. These recommendations extend the
IEEE standards. The recommendations would form the basis for providing clear visibility of
the product to be developed serving as baseline for execution of a contract between client and
the developer.
A system requirement is one of the main steps involved in the development process. It
follows after a resource analysis phase that is the task to determine what a particular software
product does. The focus in this stage is one of the users of the system and not the system
solutions. The result of the requirement specification document states the intention of the
software, properties and constraints of the desired system.
SRS constitutes the agreement between clients and developers regarding the contents
of the software product that is going to be developed. SRS should accurately and completely
represent the system requirements as it makes a huge contribution to the overall project plan.
The software being developed may be a part of the overall larger system or may be a
complete standalone system in its own right. If the software is a system component, the SRS
should state the interfaces between the system and software portion.
3.1 Tools and Technologies used
OVERVIEW OF PYTHON
Python is an interpreted, high-level, general-purpose programming language. Created by

Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code
readability with its notable use of significant whitespace. Its language constructs and object-
oriented approach aim to help programmers write clear, logical code for small and large-scale
projects.
Python is dynamically typed and garbage-collected. It supports multiple programming
paradigms, including procedural, object-oriented, and functional programming. Python is
often described as a "batteries included" language due to its comprehensive standard library.
Python was conceived in the late 1980s as a successor to the ABC language. Python 2.0,
released in 2000, introduced features like list comprehensions and a garbage collection
system capable of collecting reference cycles. Python 3.0, released in 2008, was a major
revision of the language that is not completely backward-compatible, and much Python 2
code does not run unmodified on Python 3. The Python 2 language, i.e. Python 2.7.x, was
officially discontinued on January 1, 2020 (first planned for 2015) after which security
patches and other improvements will not be released for it. With Python 2's end-of-life, only
Python 3.5.x and later are supported.
Python interpreters are available for many operating systems. A global community of
programmers develops and maintains CPython, an open source reference implementation. A
non-profit organization, the Python Software Foundation, manages and directs resources for
Python and CPython development.
Python was conceived in the late 1980s by Guido van Rossum at Centrum Wiskunde &
Informatica (CWI) in the Netherlands as a successor to the ABC language (itself inspired by
SETL),capable of exception handling and interfacing with the Amoeba operating system. Its
implementation began in December 1989. Van Rossum shouldered sole responsibility for the
project, as the lead developer, until July 12, 2018, when he announced his "permanent
vacation" from his responsibilities as Python's Benevolent Dictator For Life, a title the
Python community bestowed upon him to reflect his long-term commitment as the project's
chief decision-maker. He now shares his leadership as a member of a five-person steering
council. In January, 2019, active Python core developers elected Brett Cannon, Nick
Coghlan, Barry Warsaw, Carol Willing and Van Rossum to a five-member "Steering
Council" to lead the project.
Python 2.0 was released on 16 October 2000 with many major new features, including a
cycle-detecting garbage collector and support for Unicode.
Python 3.0 was released on 3 December 2008. It was a major revision of the language that is
not completely backward-compatible. Many of its major features were backported to Python
2.6.x and 2.7.x version series. Releases of Python 3 include the 2to3 utility, which automates
(at least partially) the translation of Python 2 code to Python 3.
Python 2.7's end-of-life date was initially set at 2015 then postponed to 2020 out of concern
that a large body of existing code could not easily be forward-ported to Python 3.
OpenCV
OpenCV (Open source computer vision) is a library of programming functions mainly aimed
at real-time computer vision. Originally developed by Intel, it was later supported by Willow
Garage then Itseez (which was later acquired by Intel). The library is cross-platform and free
for use under the open-source BSD license.
Officially launched in 1999 the OpenCV project was initially an Intel Research initiative to
advance CPU-intensive applications, part of a series of projects including real-time ray
tracing and 3D display walls. The main contributors to the project included a number of
optimization experts in Intel Russia, as well as Intel's Performance Library Team. In the early
days of OpenCV, the goals of the project were described as:
Advance vision research by providing not only open but also optimized code for basic vision
infrastructure. No more reinventing the wheel. Disseminate vision knowledge by providing a
common infrastructure that developers could build on, so that code would be more readily
readable and transferable.
Advance vision-based commercial applications by making portable, performance-optimized

code available for free – with a license that did not require code to be open or free itself.
The second major release of the OpenCV was in October 2009. OpenCV 2 includes major
changes to the C++ interface, aiming at easier, more type-safe patterns, new functions, and
better implementations for existing ones in terms of performance (especially on multi-core
systems). Official releases now occur every six months and development is now done by an
independent Russian team supported by commercial corporations.
Tensorflow
TensorFlow is a free and open-source software library for dataflow and differentiable programming
across a range of tasks. It is a symbolic math library, and is also used for machine learning
applications such as neural networks. It is used for both research and production at Google.‍
TensorFlow was developed by the Google Brain team for internal Google use. It was released under
the Apache License 2.0 on November 9, 2015.
TensorFlow is Google Brain's second-generation system. Version 1.0.0 was released on February 11,
2017.[10] While the reference implementation runs on single devices, TensorFlow can run on
multiple CPUs and GPUs (with optional CUDA and SYCL extensions for general-purpose computing
on graphics processing units). TensorFlow is available on 64-bit Linux, macOS, Windows, and
mobile computing platforms including Android and iOS.
Its flexible architecture allows for the easy deployment of computation across a variety of platforms
(CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.
TensorFlow computations are expressed as stateful dataflow graphs. The name TensorFlow derives
from the operations that such neural networks perform on multidimensional data arrays, which are
referred to as tensors. During the Google I/O Conference in June 2016, Jeff Dean stated that 1,500
repositories on GitHub mentioned TensorFlow, of which only 5 were from Google.
In December 2017, developers from Google, Cisco, RedHat, CoreOS, and CaiCloud introduced
Kubeflow at a conference. Kubeflow allows operation and deployment of TensorFlow on Kubernetes.
In March 2018, Google announced TensorFlow.js version 1.0 for machine learning in JavaScript.
In Jan 2019, Google announced TensorFlow 2.0. It became officially available in Sep 2019.
In May 2019, Google announced TensorFlow Graphics for deep learning in computer graphics.
TensorFlow is an end-to-end open source platform for machine learning. TensorFlow is a rich system
for managing all aspects of a machine learning system; however, this class focuses on using a
particular TensorFlow API to develop and train machine learning models. See the TensorFlow
documentation for complete details on the broader TensorFlow system.
TensorFlow APIs are arranged hierarchically, with the high-level APIs built on the low-level APIs.
Machine learning researchers use the low-level APIs to create and explore new machine learning
algorithms. In this class, you will use a high-level API named tf.keras to define and train machine
learning models and to make predictions. tf.keras is the TensorFlow variant of the open-source Keras
API.
The following figure shows the hierarchy of TensorFlow toolkits:
 NUMPY
NumPy is a Python package which stands for ‘Numerical Python’. It is the core library for
scientific computing, which contains a powerful n-dimensional array object, provide tools for
integrating C, C++ etc. It is also useful in linear algebra, random number capability etc.
NumPy array can also be used as an efficient multi-dimensional container for generic data.
Now, let me tell you what exactly is a python numpy array.
NumPy is the fundamental package for scientific computing with Python. It contains among
other things:
 a powerful N-dimensional array object

 sophisticated (broadcasting) functions
 tools for integrating C/C++ and Fortran code
 useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data. Arbitrary data-types can be defined. This allows NumPy to
seamlessly and speedily integrate with a wide variety of databases.
 PANDA
Pandas is an open-source Python Library providing high-performance data manipulation and

analysis tool using its powerful data structures. The name Pandas is derived from the word
Panel Data – an Econometrics from Multidimensional data.
In 2008, developer Wes McKinney started developing pandas when in need of high
performance, flexible tool for analysis of data.
Prior to Pandas, Python was majorly used for data munging and preparation. It had very little
contribution towards data analysis. Pandas solved this problem. Using Pandas, we can
accomplish five typical steps in the processing and analysis of data, regardless of the origin
of data — load, prepare, manipulate, model, and analyse
Python with Pandas is used in a wide range of fields including academic and commercial
domains including finance, economics, Statistics, analytics, etc.
Key Features of Pandas
 Fast and efficient Data Frame object with default and customized indexing.
 Tools for loading data into in-memory data objects from different file formats.
 Data alignment and integrated handling of missing data.
 Reshaping and pivoting of date sets.
 Label-based slicing, indexing and subsetting of large data sets.
 Columns from a data structure can be deleted or inserted.
 Group by data for aggregation and transformations.
 High performance merging and joining of data.
 Time Series functionality.
 Matplotlib
Matplotlib is a plotting library for the Python programming language and its numerical
mathematics extension NumPy. It provides an object-oriented API for embedding plots into
applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+. There
is also a procedural "pylab" interface based on a state machine (like OpenGL), designed to
closely resemble that of MATLAB, though its use is discouraged.[3] SciPy makes use of
Matplotlib.
Matplotlib was originally written by John D. Hunter, since then it has an active development
community,[4] and is distributed under a BSD-style license. Michael Droettboom was
nominated as matplotlib's lead developer shortly before John Hunter's death in August 2012,
[5] and further joined by Thomas Caswell.
Matplotlib 2.0.x supports Python versions 2.7 through 3.6. Python 3 support started with
Matplotlib 1.2. Matplotlib 1.4 is the last version to support Python 2.6. Matplotlib has
pledged to not support Python 2 past 2020 by signing the Python 3 Statement
Flask
Flask is a web application framework written in Python. Armin Ronacher, who leads an
international group of Python enthusiasts named Pocco, develops it. Flask is based on
Werkzeug WSGI toolkit and Jinja2 template engine. Both are Pocco projects.
3.2 Hardware and Software Requirements
Hardware Requirements
• Processor : Intel i5 2.53GHz
• Hard Disk : 30GB
• Ram : 4 GB or above
Software Requirements
• Operating system : Windows 10
• Coding Language : Python
• Version : 3.6 & above
• IDE : Pycharm
3.3 Functional and Non functional requirements
3.3.1 Functional requirements
Functional requirements capture the intended behaviour of the system. This behaviour may
be expressed as services, tasks or functions the system is required to perform.
 Dataset gathering
In this module, Gathering data is the most important step in solving any supervised machine
learning problem. In this process, gather the information related to the application.
 Data cleansing
In t this process, preparing data for analysis by removing or modifying data that is incorrect,
incomplete, irrelevant, duplicated, or improperly formatted.
 Data Preprocessing
In this module, transforming raw data into an understandable format. Real-world data is
often incomplete, inconsistent, and/or lacking in certain behaviours or trends, and is likely to
contain many errors.
 Classification
In this module, data mining function that assigns items in a collection to target categories or
classes. Classification is to accurately predict the target class for each case in the data. We
use SVM algorithm for classification.
 Prediction
In this module,describe which characteristics and features of the data have the most
significant impact on the model's outcomes of the disease, we use K means algorithm for
prediction.
 Visualization
User can view the prediction through graph representation

3.3.2 Non Functional requirements
Performance Requirements:
The product should support the end users requirements. The product is capable of
processing when the large numbers of files are provided as input and also it must be
interactive and the delays involved should be less. So in every action-response of the system,
there are no immediate delays.
Safety and Security Requirements:

The system should be designed in a secured way by applying safety measures.
Information transmission should be securely transmitted to nodes without any changes
in information. Special exception handling mechanism should be in place to avoid
system errors.
Availability:
The application will not hang and opens quickly and with 99.9% uptime.
Reliability:
The system should not crash and should identify invalid input and produce suitable
error message.
Usability:
The interface should be intuitive and easily navigable and user friendly.
Integrity:
The software does not store any cache data or doesn’t use system resources in
background.
Chapter 5
Design
Three-tier architecture
Three-tier (layer) is a client-server architecture in which the user interface, business

process (business rules) and data storage and data access are developed and maintained as
independent modules or most often on separate platforms.
The Architecture of Application is based on three-tier architecture. The three logical

tiers are
• Presentation tier - Web forms, Images.

• Middle tier – Python
• Data tier- MySql Database
The main reason for considering three-tier architecture for the Application is as
follows:
Flexibility:
• Management of data is independent from the physical storage support,

• Maintenance of the business logic is easier,
• Migration to new graphical environments is faster.
• If there is a minor change in the business logic, we don’t have to install the entire
system in individual user’s PCs.
Reusability:
• Reusability of business logic is greater for the presentation layer. As this

component is developed and tested, we can use it in any other project and would
be helpful for future use.
Security:
 more secured architecture since the client cannot access the database directly.
Fig.1 3-tier Architecture
 Presentation Tier-
The presentation tier is the front end layer in the 3-tier system and consists of the user
interface. This user interface is often a graphical one accessible through a web browser or
web-based application and which displays content and information useful to an end user. This
tier is often built on web technologies such as HTML5, JavaScript, CSS, or through other
popular web development frameworks, and communicates with others layers through API
calls.
 Application Tier-
The application tier contains the functional business logic which drives an application’s core
capabilities. It’s often written in Java, .NET, C#, Python, C++, etc.
 Data Tier-
The data tier comprises of the database/data storage system and data access layer. Examples
of such systems are MySQL, Oracle, PostgreSQL, Microsoft SQL Server, MongoDB, etc.
Data is accessed by the application layer via API calls.
System Architecture
Detailed Design
Detailed design starts after the system design phase is completed and the system design has
been certified through the review. The goal of this phase is to develop the internal logic of
each of the modules identified during system design.
In the system design, the focus is on identifying the modules, whereas during detailed design
the focus is on designing the logic for the modules. In other words in system design attention
is on what components are needed, while in detailed design how the components can be
implemented in the software is the issue.
The design activity is often divided into two separate phase system design and detailed
design. System design is also called top-level design. At the first level focus is on deciding
which modules are needed for the system, the specifications of these modules and how the
modules should be interconnected. This is called system design or top level design. In the
second level the internal design of the modules or how the specifications of the module can
be satisfied is decided. This design level is often called detailed design or logic design.
5.1 Data Flow Diagram
DFD graphically representing the functions, or processes, which capture, manipulate,

store, and distribute data between a system and its environment and between components
of a system. The visual representation makes it a good communication tool between User
and System designer. Structure of DFD allows starting from a broad overview and expand
it to a hierarchy of detailed diagrams. DFD has often been used due to the following
reasons:
 Logical information flow of the system
 Determination of physical system construction requirements
 Simplicity of notation
 Establishment of manual and automated systems requirements
Basic Notation:
Process: any process that changes the data, producing an output. It might perform
computations, or sort data based on logic, or direct the data flow based on business rules. A
short label is used to describe the process, such as “Submit payment.”
Data store: files or repositories that hold information for later use, such as a database table or
a membership form. Each data store receives a simple label, such as “Orders.”
External entity: an outside system that sends or receives data, communicating with the
system being diagrammed. They are the sources and destinations of information entering or
leaving the system. They might be an outside organization or person, a computer system or a
business system. They are also known as terminators, sources and sinks or actors. They are
typically drawn on the edges of the diagram
Data flow: the route that data takes between the external entities, processes and data stores. It
portrays the interface between the other components and is shown with arrows, typically
labeled with a short data name, like “Billing details.
Data Flow Diagram
Data Flow Diagram
Upload Dataset Prediction
Bike Sharing Application
Visualization
Fig.level 0 dfd
User
Register
Application
data
User Upload
Login Prediction Visualization
data Dataset
Fig.level 1 dfd
5.2 Use case diagram
Use case diagram is a graph of actors, a set of use cases enclosed by a system boundary,
communication associations between the actor and the use case. The use case diagram
describes how a system interacts with outside actors; each use case represents a piece of
functionality that a system provides to its users. A use case is known as an ellipse containing
the name of the use case and an actor is shown as a stick figure with the name of the actor
below the figure.
The use cases are used during the analysis phase of a project to identify and partition system
functionality. They separate the system into actors and use case. Actors represent roles that
are played by user of the system. Those users can be humans, other computers, pieces of
hardware, or even other software systems.
Use case diagram
Bike Sharing Application
Register
Login
Upload dataset
Prediction
User
Visualization
Logout
5.3 Sequence Diagram
A sequence diagram shows object interactions arranged in time sequence. It depicts the
objects and classes involved in the scenario and the sequence of messages exchanged
between the objects needed to carry out the functionality of the scenario. Sequence diagrams
are sometimes called event diagrams, event scenarios.
UML sequence diagrams are used to represent or model the flow of messages, events and
actions between the objects or components of a system. Time is represented in the vertical
direction showing the sequence of interactions of the header elements, which are displayed
horizontally at the top of the diagram Sequence Diagrams are used primarily to design,
document and validate the architecture, interfaces and logic of the system by describing the
sequence of actions that need to be performed to complete a task or scenario. UML sequence
diagrams are useful design tools because they provide a dynamic view of the system
behavior.
Purpose
The sequence diagram is used primarily to show the interactions between objects in the
sequential order that those interactions occur. One of the primary uses of sequence diagrams
is in the transition from requirements expressed as use cases to the next and more formal
level of refinement.
User Server: Database:

Validate
Register
credentials
Credentials
User successfully registered
Validated
Validate
Login credentials
Credentials
User homepage rendered Validated
Upload Dataset Load
Loaded Send Message
Prediction Fetch
Visualization Send
Logout
Logout
Fig : User sequence
5.4 Activity Diagrams

Activity diagrams represent the business and operational workflows of a system. An
Activity diagram is a dynamic diagram that shows the activity and the event that causes the
object to be in the particular state. It is a simple and intuitive illustration of what happens in a
workflow, what activities can be done in parallel, and whether there are alternative paths
through the workflow.
Basic Notations
Initial Activity
This shows the starting point or first activity of the flow. It is denoted by a solid circle.
Final Activity
The end of the Activity diagram is shown by a bull's eye symbol, also called as a final activity.
Activity
Represented by a rectangle with rounded (almost oval) edges
Decisions
A logic where a decision is to be made is depicted by a diamond.

Workflow
Workflow is depicted with an arrow. It shows the direction of the workflow in the activity
diagram.
Register
User login to
Application
False
Valid
True
Upload Dataset
Prediction
Visualization
Fig: User activity
5.5 Entity-Relationship Diagram
Entity Relationship Diagram depicts the various relationships among entities, considering
each objective as entity. Entity relationships are described by their dependence on each other,
as well as the extent of the relationship between the data stores. It depicts the relationship
between data objects. The ER diagram is a notation that is used to conduct the data modeling
activity.
The Symbols used in this E-R Diagram are-
SI.NO NAME SYMBOL DESCRIPTION
1 Entity Entity is a “thing” in the real world with an

independent existence. It is an elementary
basic building block of storing information
about business process.
2 Relationships A relationship is a named connection or

association between entities or used to relate to
two or more entities with some common
attributes or meaningful interaction between the
objects.
3 Attributes Attributes are the properties of the entities and

the relationships are the description of the
entity. Attributes are elementary pieces of
information attached to an entity.
4 Key attributes An entity type usually has an attribute whose

values are distinct for each individual entity in
the collection. Such an attribute is called key
attribute.
Table 5.5.1: Symbols used in ER-diagram

Ph_num
User_name User 1
1
user_id
View
email
1
1 *
Upload Dataset Has Prediction
User_name
Dataset_id Description
id
Chapter 6
Coding
Chapter 7
Testing
Testing is the major process involved in software quality assurance (QA). It is iterative
process. Here test data is prepared and is used to test the modules individually. System
testing makes sure that all components of the system function properly as a unit by actually
forcing the system to fail.
The test causes should be planned before testing begins. Then as the testing progresses,
testing shifts focus in an attempt to find errors in integrated clusters of modules and in the
entire system. The philosophy behind testing is to find errors. Actually testing is the estate of
implementation that is aimed at ensuring that the system works actually and efficiently before
implementation.
Testing is done for each module. After testing all the modules, the modules are integrated and
testing of the final system is done with the test data, specially designed to show that the
system will operate successfully in all its aspects conditions. The procedure level testing is
made first. By giving improper inputs, the errors occurred are noted and eliminated. Thus the
system testing is a confirmation that all is correct and an opportunity to show the user that the
system works. The final step involves Validation testing, which determines whether the
software function as the user expected. The end-user rather than the system developer
conduct this test most software developers as a process called “Alpha and Beta test” to
uncover that only the end user seems able to find.
This is the final step in system life cycle. Here we implement the tested error-free system into
real-life environment and make necessary changes, which runs in an online fashion. Here
system maintenance is done every months or year based on company policies, and is checked
for errors like runtime errors, long run errors and other maintenances like table verification
and reports.
During the requirement analysis and design, the output is a document that is usually textual
and non-executable. After the coding phase, computer programs are available that can be
executed for testing purpose. This implies that testing not only has to uncover errors
introduced during coding, but also errors introduced during the previous phases.
Fig. 7.1 The Testing process
The various types of testing done on the system are:
 Unit Testing
 Integration Testing
 Validation Testing
 System Testing
 Acceptance Testing
7.1 UNIT TESTING
Unit testing verification efforts on the smallest unit of software design, module. This is
known as “Module Testing”. The modules are tested separately. This testing is carried out
during programming stage itself. In these testing steps, each module is found to be working
satisfactorily as regard to the expected output from the module.
7.2 INTEGRATION TESTING
Integration testing is a systematic technique for constructing tests to uncover error associated
within the interface. In the project, all the modules are combined and then the entire
programmer is tested as a whole. In the integration-testing step, all the error uncovered is
corrected for the next testing steps.
7.3 VALIDATION TESTING
To uncover functional errors, that is, to check whether functional characteristics confirm to
specification or not specified.
7.4 SYSTEM TESTING
Once individual module testing completed, modules are assembled to perform as a system. Then the
top down testing, which begins from upper level to lower level module testing, has to be done to
check whether the entire system is performing satisfactorily.
After unit and integration testing are over then the system as whole is tested. There are two general
strategies for system testing.
They are:
 Code Testing
 Specification Testing
7.4.1 Code Testing
This strategy examines the logic of the program. A path is a specific combination of conditions
handled by the program. Using this strategy, every path through the program is tested.
7.4.2 Specification Testing
This strategy examines the specifications stating what the program should do and how it should
perform under various conditions. The test cases are developed for each condition of developed
system and processed. It is found that the system developed perform according to its specified
requirements. The system is used experimentally to ensure that the software will run according to tits
specification and in the way user expect.
Specification Testing is done successfully by entering various types of end data. It is checked for both
valid and invalid data and found System is working properly as per requirement.
7.5 ACCEPTANCE TESTING
When the system has no measure problem with its accuracy, the system passes through a final
acceptance test. This test confirms that the system needs the original goal, Objective and requirements
established during analysis. If the system fulfils all the requirements, it is finally acceptable and ready
for operation.
7.6 Test Plan
A software project test plan is a document that describes the objectives, scope approach and
focus of a software testing effort. This process of preparing a test plan is a useful way to think
through the efforts needed to validate the acceptability of a software product. The completed
document will help the people outside the test group understand ‘Why and How’ of
production validation. Different test plans are used at different levels of testing.
7.6.1 Test Plans used in Unit Testing
Each module is tested for correctness whether it is meeting all the expected results. Condition
loops in the code are properly terminated so that they don’t enter into an infinite loop. Proper
validations are done so as to avoid any errors related to data entry from user.
Test Cases
Test Case
Number Testing Scenario Expected result Result
User
TC-01 If we loading the dataset It will classify the data Pass
Preprocessing and classification not

TC-02 If we didn’t load the dataset done Pass
Classify data without uploading

TC-03 dataset Classification not done Pass
View prediction without loading

TC-04 dataset It will not predict Pass
TC - 05 If we upload wrong dataset It will not predict Pass
View prediction with loading

TC - 06 dataset Display the prediction Pass
Chapter 8
Experimental results
Screen shoots
Conclusion
In this project, we focus on the problem of competitive analysis and popularity prediction over
Mobike and Ofo. We propose CompetitiveBike to predict the popularity contest between Mobike
and Ofo leveraging app store data and microblogging data. Specifically, we first extract features from
different perspectives including the inherent descriptive information of apps, users’ sentiment, and
comparative opinions. With this information, we further extract two sets of novel features: coarse-
grained and fine-grained competitive features. Finally, we generate the event storyline to provide
competitive analysis and present the popularity contest. We collect data about two bike-sharing
apps and two food ordering & delivery apps from 11 mobile app stores and Sina Weibo, implement
extensive experimental studies, and the results demonstrate the effectiveness and generality of our
approach.
References
[1] Bike-Sharing Programs Hit the Streets in Over 500 Cities Worldwide; Earth Policy
Institute; Larsen, Janet; 25 April 2013
[2]"In Focus: The Last Mile and Transit Ridership". Institute for Local Government. January
2011.
[3] James May (21 October 2010). "May, James, ''Cycling Proficiency With James May'', The
Daily Telegraph, 2 1. P.P. De Vassimon, Performance Evaluation for Bike-Sharing Systems:
a Benchmarking among 50 Cities, 2016.
2. L. Chen et al., "Bike Sharing Station Placement Leveraging Heterogeneous Urban Open
Data", ACM Ubicomp, pp. 571-575, 2015.
3. P. Jimenez, M. Nogal, B. Caulfield, F. Pilla, "Perceptually important points of mobility

patterns to characterise bike sharing systems: The Dublin case", J. Transp. Geogr., vol. 54,
pp. 228-239, October 2016.1 October 2010". London: Telegraph.co.uk. Retrieved 2012-01-
15.

Bike Sharing Python Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bike Sharing Python Report

Uploaded by

Copyright:

Available Formats

Abstract

In recent years, shared transportation has grown tremendously, which provides us an

Analytics of the bike sharig app

Creating state of art based on the machine learning algorithms

Irfan Gani Purwanda ; Trio Adiono - 2017

Assessing bike sharing business model

Oerianto Guyandi ; Gunawan Wang - 2017

Shang Wang ; Jiangman Zhang - 2017

we propose CompetitiveBike, a system that predicts the outcomes of the popularity

 competitive relationship to indicate which app is more popular; and

Competitive Analysis and Popularity Prediction.

Applicable for interns

Justification for doing project

Software requirement Specification

3.1 Tools and Technologies used

Python is an interpreted, high-level, general-purpose programming language. Created by

Advance vision-based commercial applications by making portable, performance-optimized

 a powerful N-dimensional array object

Pandas is an open-source Python Library providing high-performance data manipulation and

Key Features of Pandas

 Data alignment and integrated handling of missing data.

 Reshaping and pivoting of date sets.

 Label-based slicing, indexing and subsetting of large data sets.

 Columns from a data structure can be deleted or inserted.

 Group by data for aggregation and transformations.

 High performance merging and joining of data.

 Time Series functionality.

• Processor : Intel i5 2.53GHz

• Hard Disk : 30GB

• Operating system : Windows 10

• Coding Language : Python

• Version : 3.6 & above

User can view the prediction through graph representation

Safety and Security Requirements:

Three-tier (layer) is a client-server architecture in which the user interface, business

The Architecture of Application is based on three-tier architecture. The three logical

• Presentation tier - Web forms, Images.

• Management of data is independent from the physical storage support,

• Reusability of business logic is greater for the presentation layer. As this

5.1 Data Flow Diagram

DFD graphically representing the functions, or processes, which capture, manipulate,

Data Flow Diagram

Data Flow Diagram

Upload Dataset Prediction

Bike Sharing Application

5.2 Use case diagram

Bike Sharing Application

5.3 Sequence Diagram

User Server: Database:

Upload Dataset Load

Loaded Send Message

5.4 Activity Diagrams

Represented by a rectangle with rounded (almost oval) edges

A logic where a decision is to be made is depicted by a diamond.

Fig: User activity

5.5 Entity-Relationship Diagram

The Symbols used in this E-R Diagram are-

SI.NO NAME SYMBOL DESCRIPTION

1 Entity Entity is a “thing” in the real world with an

2 Relationships A relationship is a named connection or

3 Attributes Attributes are the properties of the entities and

4 Key attributes An entity type usually has an attribute whose

Table 5.5.1: Symbols used in ER-diagram