Professional Documents
Culture Documents
Bike Sharing Python Report
Bike Sharing Python Report
In recent years, bike-sharing systems have been widely deployed in many big cities, which
provide an economical and healthy lifestyle. With the prevalence of bike-sharing systems, a
lot of companies join the bike-sharing market, leading to increasingly fierce competition. To
be competitive, bike-sharing companies and app developers need to make strategic decisions
and predict the popularity of bike-sharing apps. However, existing works mostly focus on
predicting the popularity of a single app, the popularity contest among different apps has not
been explored yet. We develop Competitive Bike, a system to predict the popularity contest
among bike-sharing apps leveraging multi-source data. We extract two novel types of
features: coarse-grained and fine-grained competitive features, and utilize Random Forest
model to forecast the future competitiveness. In addition, we view mobile apps competition
as a long-term event and generate the event storyline to enrich our competitive analysis. We
collect data about two bike-sharing apps and two food ordering & delivery apps from 11 app
stores and Sina Weibo, implement extensive experimental studies, and the results
demonstrate the effectiveness and generality of our approach.
Chapter 1
Introduction
A user can rent a bike at a nearby bike station, and return it at another bike station near
his/her destination. The worldwide prevalence of bike sharing systems has inspired lots of
active research, addressing interesting topics such as bike demand prediction bike rebalancing
optimization and bike lanes planning
As bike-sharing apps become increasingly popular, a lot of companies join the market,
leading to fierce competition. To thrive in this competitive market, it is vital for bike sharing
companies and app developers to understand their competitors, and then make strategic
decisions for mobile app development and evolution. Therefore, it is significant and
necessary to predict and compare the future popularity of different bike-sharing apps.
Problem statement
The problem of competitive analysis and popularity prediction over bike-sharing apps can be stated
as follows: given the multi-source data (i.e. app store data and microblogging data) about Mobike
and Ofo, we want to predict which app will be more popular in the future.
Objective
The bike is offer advantages compared than smoky vehicles, e.g. it can be used as a
government's program for reducing carbon emissions as well as minimalizing the noise levels
in the city. Moreover, the bike is used as an alternative transportation for tourists who want to
explore the visited city. By seeing the potential of a large application in the bike rentals, some
cities have been installed a “Bike Sharing System”, which is a bike rental system that
involves digital signage as authentication and security process when the user making
transactions in borrowing a bike. But the price of these devices is very expensive in
installation. This paper discusses the design and implementation of bike sharing systems
(BSS) that use simple digital signage technology. It is consist of Liquid Crystal Display
(LCD) as main displays and Near Field Communication (NFC) interaction modules which are
separately developed. The NFC module serves as a reader, and the LCD is used as a user
interface for some functions, including i) registration process, ii) renting the bikes, iii)
returning the bikes, iv) top-up balances and several another information. Based on the results
of the functionality test, the developed system can work well. It is expected that the system
can be applied massively in order to be enjoyed by communities in other Indonesian cities
because our proposed prototype offers more affordable price. Then the scenarios in
implementing our proposed BSS also discussed detail in this paper.
Bike-sharing program continues to spread widely around the world, offering many benefits,
i.e. increased accessibility, inexpensiveness, a positive environmental impact, improved
public health and increased mobility. Until 2017, there are more than 1.173 cities around the
world has implemented bike-sharing programs and with active bikers has reached around 1.9
million. The fast growth of bike sharing program globally, unfortunately, is not followed with
its trend in Indonesia. Bike sharing idea is still new and hardly can be found in Indonesian
cities. With the status of eighth largest of world economy in 2017, the bike sharing business
is expected to grow along with global trend. The article examined the pilot project of bike
sharing in one of well-known state universities in Indonesia to illustrate the feasibility of bike
sharing business. The article applied mobile application with cloud computing paradigm and
business model canvas (BMC) to illustrate the feasibility of bike sharing business. The
outcome of the paper is expected to be used as reference for growing bike sharing business in
Indonesia.
Bike-Sharing-A new public transportation mode: State of the practice & prospects
Bike-Sharing, as a new green public transportation mode, has been developed in several
western cities. In this paper we systematically describe the development process and the state
of art for the Bike-Sharing Program in the world. We investigated the characteristics of
bicycle, business model, operating mechanism, current problems for Bike-sharing programs
of the three metropolitan cities in Europe, and we also discussed the development mode and
applicability of Bike- Sharing program. Then we provide guidelines that's where and how to
establish the Bike-Sharing program in developing country with the consideration of the
weather and topography of the target city. Finally, we discuss how to integrate the bike-
sharing to existing transport mode and the future for bike-sharing program.
Methodology
In this project, we present a system that has been developed to facilitate the collection and
use of Bike-Sharing Systems data for research, notably to develop and compare bike usage
forecasting algorithms. We collected internal and external data for six different application
and developed a system providing short and long-term predictions of bikes and slots
availabilities for bike-sharing stations in real-time. In order to provide the best predictions,
we developed and compared the performances of two types of algorithm; the first one is
based on the state-of-the-art k means algorithm and second, SVM algorithm. Our study
demonstrates their applicability, showing better accuracy for short-term predictions with the k
means algorithm/
(1) This work is the first to study the problem of competitive analysis and popularity contest
of bike-sharing apps. We use two indicators for the comparison:
(2) To predict popularity contest, we extract features from different aspects including
inherent descriptive information of apps, users’ sentiment, and comparative opinions. With
this information, we further extract two novel features: coarse-grained and fine-grained
competitive features, and choose Random Forest algorithm for prediction.
(3) To provide competitive analysis, we utilize topic model to analyze the topics in
competing apps, and apply the minimum-weight dominating algorithm to select
representative microblogs. We also generate event storyline to present and visualize the
popularity contest.
(4) To evaluate CompetitiveBike, we collect data about different bike share mobile app stores. With
the data collected, we conduct extensive experiments from different perspectives. We find that the
K means model performs well on competitive relationship prediction as well as competitive intensity
prediction. A combination of the coarse-grained and fine-grained competitive features improves
performance in popularity contest prediction, and a combination of data from app store and
microblogging also improves performance in popularity contest prediction.
The system framework which consists of three layers: data preparation, feature extraction,
and competitive analysis and popularity prediction.
Data Preparation.
Multi-source data are complementary and can reflect mobile app popularity contest from
different perspectives. We obtain app statistics and reviewers’ sentiment from app store data,
and microblogging statistics and comparative information from microblogging data.
Feature Extraction.
To effectively extract and quantify the factors impacting mobile app popularity contest, we
extract features from different perspectives including the inherent descriptive information of
apps, users’ sentiment, and comparative opinions. With this information, we further extract
two novel sets of features: coarse-grained and finegrained competitive features.
With these two extracted feature sets, we train a model to predict the popularity contest.
Moreover, we view mobile apps competition as a long-term event and generate the event
storyline to support competitive analysis, which can present descriptive information
regarding popularity contest.
Chapter 2
History of organization
Chapter 3
Chapter 4
A system requirement is one of the main steps involved in the development process. It
follows after a resource analysis phase that is the task to determine what a particular software
product does. The focus in this stage is one of the users of the system and not the system
solutions. The result of the requirement specification document states the intention of the
software, properties and constraints of the desired system.
SRS constitutes the agreement between clients and developers regarding the contents
of the software product that is going to be developed. SRS should accurately and completely
represent the system requirements as it makes a huge contribution to the overall project plan.
The software being developed may be a part of the overall larger system or may be a
complete standalone system in its own right. If the software is a system component, the SRS
should state the interfaces between the system and software portion.
OVERVIEW OF PYTHON
Python interpreters are available for many operating systems. A global community of
programmers develops and maintains CPython, an open source reference implementation. A
non-profit organization, the Python Software Foundation, manages and directs resources for
Python and CPython development.
Python was conceived in the late 1980s by Guido van Rossum at Centrum Wiskunde &
Informatica (CWI) in the Netherlands as a successor to the ABC language (itself inspired by
SETL),capable of exception handling and interfacing with the Amoeba operating system. Its
implementation began in December 1989. Van Rossum shouldered sole responsibility for the
project, as the lead developer, until July 12, 2018, when he announced his "permanent
vacation" from his responsibilities as Python's Benevolent Dictator For Life, a title the
Python community bestowed upon him to reflect his long-term commitment as the project's
chief decision-maker. He now shares his leadership as a member of a five-person steering
council. In January, 2019, active Python core developers elected Brett Cannon, Nick
Coghlan, Barry Warsaw, Carol Willing and Van Rossum to a five-member "Steering
Council" to lead the project.
Python 2.0 was released on 16 October 2000 with many major new features, including a
cycle-detecting garbage collector and support for Unicode.
Python 3.0 was released on 3 December 2008. It was a major revision of the language that is
not completely backward-compatible. Many of its major features were backported to Python
2.6.x and 2.7.x version series. Releases of Python 3 include the 2to3 utility, which automates
(at least partially) the translation of Python 2 code to Python 3.
Python 2.7's end-of-life date was initially set at 2015 then postponed to 2020 out of concern
that a large body of existing code could not easily be forward-ported to Python 3.
OpenCV
OpenCV (Open source computer vision) is a library of programming functions mainly aimed
at real-time computer vision. Originally developed by Intel, it was later supported by Willow
Garage then Itseez (which was later acquired by Intel). The library is cross-platform and free
for use under the open-source BSD license.
Officially launched in 1999 the OpenCV project was initially an Intel Research initiative to
advance CPU-intensive applications, part of a series of projects including real-time ray
tracing and 3D display walls. The main contributors to the project included a number of
optimization experts in Intel Russia, as well as Intel's Performance Library Team. In the early
days of OpenCV, the goals of the project were described as:
Advance vision research by providing not only open but also optimized code for basic vision
infrastructure. No more reinventing the wheel. Disseminate vision knowledge by providing a
common infrastructure that developers could build on, so that code would be more readily
readable and transferable.
The second major release of the OpenCV was in October 2009. OpenCV 2 includes major
changes to the C++ interface, aiming at easier, more type-safe patterns, new functions, and
better implementations for existing ones in terms of performance (especially on multi-core
systems). Official releases now occur every six months and development is now done by an
independent Russian team supported by commercial corporations.
Tensorflow
TensorFlow is a free and open-source software library for dataflow and differentiable programming
across a range of tasks. It is a symbolic math library, and is also used for machine learning
applications such as neural networks. It is used for both research and production at Google.
TensorFlow was developed by the Google Brain team for internal Google use. It was released under
the Apache License 2.0 on November 9, 2015.
TensorFlow is Google Brain's second-generation system. Version 1.0.0 was released on February 11,
2017.[10] While the reference implementation runs on single devices, TensorFlow can run on
multiple CPUs and GPUs (with optional CUDA and SYCL extensions for general-purpose computing
on graphics processing units). TensorFlow is available on 64-bit Linux, macOS, Windows, and
mobile computing platforms including Android and iOS.
Its flexible architecture allows for the easy deployment of computation across a variety of platforms
(CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.
TensorFlow computations are expressed as stateful dataflow graphs. The name TensorFlow derives
from the operations that such neural networks perform on multidimensional data arrays, which are
referred to as tensors. During the Google I/O Conference in June 2016, Jeff Dean stated that 1,500
repositories on GitHub mentioned TensorFlow, of which only 5 were from Google.
In December 2017, developers from Google, Cisco, RedHat, CoreOS, and CaiCloud introduced
Kubeflow at a conference. Kubeflow allows operation and deployment of TensorFlow on Kubernetes.
In March 2018, Google announced TensorFlow.js version 1.0 for machine learning in JavaScript.
In Jan 2019, Google announced TensorFlow 2.0. It became officially available in Sep 2019.
In May 2019, Google announced TensorFlow Graphics for deep learning in computer graphics.
TensorFlow is an end-to-end open source platform for machine learning. TensorFlow is a rich system
for managing all aspects of a machine learning system; however, this class focuses on using a
particular TensorFlow API to develop and train machine learning models. See the TensorFlow
documentation for complete details on the broader TensorFlow system.
TensorFlow APIs are arranged hierarchically, with the high-level APIs built on the low-level APIs.
Machine learning researchers use the low-level APIs to create and explore new machine learning
algorithms. In this class, you will use a high-level API named tf.keras to define and train machine
learning models and to make predictions. tf.keras is the TensorFlow variant of the open-source Keras
API.
The following figure shows the hierarchy of TensorFlow toolkits:
NUMPY
NumPy is a Python package which stands for ‘Numerical Python’. It is the core library for
scientific computing, which contains a powerful n-dimensional array object, provide tools for
integrating C, C++ etc. It is also useful in linear algebra, random number capability etc.
NumPy array can also be used as an efficient multi-dimensional container for generic data.
Now, let me tell you what exactly is a python numpy array.
NumPy is the fundamental package for scientific computing with Python. It contains among
other things:
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data. Arbitrary data-types can be defined. This allows NumPy to
seamlessly and speedily integrate with a wide variety of databases.
PANDA
In 2008, developer Wes McKinney started developing pandas when in need of high
performance, flexible tool for analysis of data.
Prior to Pandas, Python was majorly used for data munging and preparation. It had very little
contribution towards data analysis. Pandas solved this problem. Using Pandas, we can
accomplish five typical steps in the processing and analysis of data, regardless of the origin
of data — load, prepare, manipulate, model, and analyse
Python with Pandas is used in a wide range of fields including academic and commercial
domains including finance, economics, Statistics, analytics, etc.
Fast and efficient Data Frame object with default and customized indexing.
Tools for loading data into in-memory data objects from different file formats.
Matplotlib
Matplotlib is a plotting library for the Python programming language and its numerical
mathematics extension NumPy. It provides an object-oriented API for embedding plots into
applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+. There
is also a procedural "pylab" interface based on a state machine (like OpenGL), designed to
closely resemble that of MATLAB, though its use is discouraged.[3] SciPy makes use of
Matplotlib.
Matplotlib was originally written by John D. Hunter, since then it has an active development
community,[4] and is distributed under a BSD-style license. Michael Droettboom was
nominated as matplotlib's lead developer shortly before John Hunter's death in August 2012,
[5] and further joined by Thomas Caswell.
Matplotlib 2.0.x supports Python versions 2.7 through 3.6. Python 3 support started with
Matplotlib 1.2. Matplotlib 1.4 is the last version to support Python 2.6. Matplotlib has
pledged to not support Python 2 past 2020 by signing the Python 3 Statement
Flask
Flask is a web application framework written in Python. Armin Ronacher, who leads an
international group of Python enthusiasts named Pocco, develops it. Flask is based on
Werkzeug WSGI toolkit and Jinja2 template engine. Both are Pocco projects.
3.2 Hardware and Software Requirements
Hardware Requirements
• Ram : 4 GB or above
Software Requirements
• IDE : Pycharm
3.3 Functional and Non functional requirements
3.3.1 Functional requirements
Functional requirements capture the intended behaviour of the system. This behaviour may
be expressed as services, tasks or functions the system is required to perform.
Dataset gathering
In this module, Gathering data is the most important step in solving any supervised machine
learning problem. In this process, gather the information related to the application.
Data cleansing
In t this process, preparing data for analysis by removing or modifying data that is incorrect,
incomplete, irrelevant, duplicated, or improperly formatted.
Data Preprocessing
In this module, transforming raw data into an understandable format. Real-world data is
often incomplete, inconsistent, and/or lacking in certain behaviours or trends, and is likely to
contain many errors.
Classification
In this module, data mining function that assigns items in a collection to target categories or
classes. Classification is to accurately predict the target class for each case in the data. We
use SVM algorithm for classification.
Prediction
In this module,describe which characteristics and features of the data have the most
significant impact on the model's outcomes of the disease, we use K means algorithm for
prediction.
Visualization
Performance Requirements:
The product should support the end users requirements. The product is capable of
processing when the large numbers of files are provided as input and also it must be
interactive and the delays involved should be less. So in every action-response of the system,
there are no immediate delays.
Chapter 5
Design
Three-tier architecture
The main reason for considering three-tier architecture for the Application is as
follows:
Flexibility:
more secured architecture since the client cannot access the database directly.
Fig.1 3-tier Architecture
Presentation Tier-
The presentation tier is the front end layer in the 3-tier system and consists of the user
interface. This user interface is often a graphical one accessible through a web browser or
web-based application and which displays content and information useful to an end user. This
tier is often built on web technologies such as HTML5, JavaScript, CSS, or through other
popular web development frameworks, and communicates with others layers through API
calls.
Application Tier-
The application tier contains the functional business logic which drives an application’s core
capabilities. It’s often written in Java, .NET, C#, Python, C++, etc.
Data Tier-
The data tier comprises of the database/data storage system and data access layer. Examples
of such systems are MySQL, Oracle, PostgreSQL, Microsoft SQL Server, MongoDB, etc.
Data is accessed by the application layer via API calls.
System Architecture
Detailed Design
Detailed design starts after the system design phase is completed and the system design has
been certified through the review. The goal of this phase is to develop the internal logic of
each of the modules identified during system design.
In the system design, the focus is on identifying the modules, whereas during detailed design
the focus is on designing the logic for the modules. In other words in system design attention
is on what components are needed, while in detailed design how the components can be
implemented in the software is the issue.
The design activity is often divided into two separate phase system design and detailed
design. System design is also called top-level design. At the first level focus is on deciding
which modules are needed for the system, the specifications of these modules and how the
modules should be interconnected. This is called system design or top level design. In the
second level the internal design of the modules or how the specifications of the module can
be satisfied is decided. This design level is often called detailed design or logic design.
Basic Notation:
Process: any process that changes the data, producing an output. It might perform
computations, or sort data based on logic, or direct the data flow based on business rules. A
short label is used to describe the process, such as “Submit payment.”
Data store: files or repositories that hold information for later use, such as a database table or
a membership form. Each data store receives a simple label, such as “Orders.”
External entity: an outside system that sends or receives data, communicating with the
system being diagrammed. They are the sources and destinations of information entering or
leaving the system. They might be an outside organization or person, a computer system or a
business system. They are also known as terminators, sources and sinks or actors. They are
typically drawn on the edges of the diagram
Data flow: the route that data takes between the external entities, processes and data stores. It
portrays the interface between the other components and is shown with arrows, typically
labeled with a short data name, like “Billing details.
Visualization
Fig.level 0 dfd
User
Register
Application
data
User Upload
Login Prediction Visualization
data Dataset
Fig.level 1 dfd
Use case diagram is a graph of actors, a set of use cases enclosed by a system boundary,
communication associations between the actor and the use case. The use case diagram
describes how a system interacts with outside actors; each use case represents a piece of
functionality that a system provides to its users. A use case is known as an ellipse containing
the name of the use case and an actor is shown as a stick figure with the name of the actor
below the figure.
The use cases are used during the analysis phase of a project to identify and partition system
functionality. They separate the system into actors and use case. Actors represent roles that
are played by user of the system. Those users can be humans, other computers, pieces of
hardware, or even other software systems.
Use case diagram
Register
Login
Upload dataset
Prediction
User
Visualization
Logout
A sequence diagram shows object interactions arranged in time sequence. It depicts the
objects and classes involved in the scenario and the sequence of messages exchanged
between the objects needed to carry out the functionality of the scenario. Sequence diagrams
are sometimes called event diagrams, event scenarios.
UML sequence diagrams are used to represent or model the flow of messages, events and
actions between the objects or components of a system. Time is represented in the vertical
direction showing the sequence of interactions of the header elements, which are displayed
horizontally at the top of the diagram Sequence Diagrams are used primarily to design,
document and validate the architecture, interfaces and logic of the system by describing the
sequence of actions that need to be performed to complete a task or scenario. UML sequence
diagrams are useful design tools because they provide a dynamic view of the system
behavior.
Purpose
The sequence diagram is used primarily to show the interactions between objects in the
sequential order that those interactions occur. One of the primary uses of sequence diagrams
is in the transition from requirements expressed as use cases to the next and more formal
level of refinement.
Credentials
User successfully registered
Validated
Validate
Login credentials
Credentials
User homepage rendered Validated
Prediction Fetch
Visualization Send
Logout
Logout
Fig : User sequence
Basic Notations
Initial Activity
This shows the starting point or first activity of the flow. It is denoted by a solid circle.
Final Activity
The end of the Activity diagram is shown by a bull's eye symbol, also called as a final activity.
Activity
Decisions
Workflow is depicted with an arrow. It shows the direction of the workflow in the activity
diagram.
Register
User login to
Application
False
Valid
True
Upload Dataset
Prediction
Visualization
Entity Relationship Diagram depicts the various relationships among entities, considering
each objective as entity. Entity relationships are described by their dependence on each other,
as well as the extent of the relationship between the data stores. It depicts the relationship
between data objects. The ER diagram is a notation that is used to conduct the data modeling
activity.
User_name User 1
1
user_id
View
1
1 *
Upload Dataset Has Prediction
User_name
Dataset_id Description
id
Chapter 6
Coding
Chapter 7
Testing
Testing is the major process involved in software quality assurance (QA). It is iterative
process. Here test data is prepared and is used to test the modules individually. System
testing makes sure that all components of the system function properly as a unit by actually
forcing the system to fail.
The test causes should be planned before testing begins. Then as the testing progresses,
testing shifts focus in an attempt to find errors in integrated clusters of modules and in the
entire system. The philosophy behind testing is to find errors. Actually testing is the estate of
implementation that is aimed at ensuring that the system works actually and efficiently before
implementation.
Testing is done for each module. After testing all the modules, the modules are integrated and
testing of the final system is done with the test data, specially designed to show that the
system will operate successfully in all its aspects conditions. The procedure level testing is
made first. By giving improper inputs, the errors occurred are noted and eliminated. Thus the
system testing is a confirmation that all is correct and an opportunity to show the user that the
system works. The final step involves Validation testing, which determines whether the
software function as the user expected. The end-user rather than the system developer
conduct this test most software developers as a process called “Alpha and Beta test” to
uncover that only the end user seems able to find.
This is the final step in system life cycle. Here we implement the tested error-free system into
real-life environment and make necessary changes, which runs in an online fashion. Here
system maintenance is done every months or year based on company policies, and is checked
for errors like runtime errors, long run errors and other maintenances like table verification
and reports.
During the requirement analysis and design, the output is a document that is usually textual
and non-executable. After the coding phase, computer programs are available that can be
executed for testing purpose. This implies that testing not only has to uncover errors
introduced during coding, but also errors introduced during the previous phases.
Fig. 7.1 The Testing process
Unit Testing
Integration Testing
Validation Testing
System Testing
Acceptance Testing
Unit testing verification efforts on the smallest unit of software design, module. This is
known as “Module Testing”. The modules are tested separately. This testing is carried out
during programming stage itself. In these testing steps, each module is found to be working
satisfactorily as regard to the expected output from the module.
Integration testing is a systematic technique for constructing tests to uncover error associated
within the interface. In the project, all the modules are combined and then the entire
programmer is tested as a whole. In the integration-testing step, all the error uncovered is
corrected for the next testing steps.
To uncover functional errors, that is, to check whether functional characteristics confirm to
specification or not specified.
Once individual module testing completed, modules are assembled to perform as a system. Then the
top down testing, which begins from upper level to lower level module testing, has to be done to
check whether the entire system is performing satisfactorily.
After unit and integration testing are over then the system as whole is tested. There are two general
strategies for system testing.
They are:
Code Testing
Specification Testing
This strategy examines the logic of the program. A path is a specific combination of conditions
handled by the program. Using this strategy, every path through the program is tested.
This strategy examines the specifications stating what the program should do and how it should
perform under various conditions. The test cases are developed for each condition of developed
system and processed. It is found that the system developed perform according to its specified
requirements. The system is used experimentally to ensure that the software will run according to tits
specification and in the way user expect.
Specification Testing is done successfully by entering various types of end data. It is checked for both
valid and invalid data and found System is working properly as per requirement.
When the system has no measure problem with its accuracy, the system passes through a final
acceptance test. This test confirms that the system needs the original goal, Objective and requirements
established during analysis. If the system fulfils all the requirements, it is finally acceptable and ready
for operation.
7.6 Test Plan
A software project test plan is a document that describes the objectives, scope approach and
focus of a software testing effort. This process of preparing a test plan is a useful way to think
through the efforts needed to validate the acceptability of a software product. The completed
document will help the people outside the test group understand ‘Why and How’ of
production validation. Different test plans are used at different levels of testing.
Each module is tested for correctness whether it is meeting all the expected results. Condition
loops in the code are properly terminated so that they don’t enter into an infinite loop. Proper
validations are done so as to avoid any errors related to data entry from user.
Test Cases
Test Case
Number Testing Scenario Expected result Result
User
Experimental results
Screen shoots
Conclusion
In this project, we focus on the problem of competitive analysis and popularity prediction over
Mobike and Ofo. We propose CompetitiveBike to predict the popularity contest between Mobike
and Ofo leveraging app store data and microblogging data. Specifically, we first extract features from
different perspectives including the inherent descriptive information of apps, users’ sentiment, and
comparative opinions. With this information, we further extract two sets of novel features: coarse-
grained and fine-grained competitive features. Finally, we generate the event storyline to provide
competitive analysis and present the popularity contest. We collect data about two bike-sharing
apps and two food ordering & delivery apps from 11 mobile app stores and Sina Weibo, implement
extensive experimental studies, and the results demonstrate the effectiveness and generality of our
approach.
References
[1] Bike-Sharing Programs Hit the Streets in Over 500 Cities Worldwide; Earth Policy
Institute; Larsen, Janet; 25 April 2013
[2]"In Focus: The Last Mile and Transit Ridership". Institute for Local Government. January
2011.
[3] James May (21 October 2010). "May, James, ''Cycling Proficiency With James May'', The
Daily Telegraph, 2 1. P.P. De Vassimon, Performance Evaluation for Bike-Sharing Systems:
a Benchmarking among 50 Cities, 2016.
2. L. Chen et al., "Bike Sharing Station Placement Leveraging Heterogeneous Urban Open
Data", ACM Ubicomp, pp. 571-575, 2015.