You are on page 1of 25

How Does a Bike-Share

Navigate Speedy Success?


The task
• How do annual members and casual riders use
Cyclistic bikes differently?

• What does data from the last 12 months show about


– usage patterns
– mix of regular/casual riders
– Distances cycled
– Types of bikes used
Summary
• We need to investigate if incentivising casual users to give the
bikes back as soon as possible. The distance travelled is the same,
but time taken is more than double.
– Is there a lack of trust that there will be a bike available when
needed?
– How can this impact the running/maintenance costs of the
bikes if fewer bikes can stretch further?
• Can the seasonal pattern be incentivised?
– Make the summer months cheaper with a membership
PREPARE
Preparation Tasks
• The data was provided at this location:
https://divvy-tripdata.s3.amazonaws.com/index.html
• The data used is the monthly data for the last 12
months (2020/07 – 2021/06)
• The data was extracted into a single folder and then
imported into a Microsoft Access Database
Import Function
Import Specification
• The import
specification was
defined to ensure the
formatting would be
correct for all fields.
Fortunately, the files
all had the same
structure.
Does the data ROCCC?

There are some issues with the data from 15th Dec 2020
The data has been masked but is from original source
There are 4.46 Million records, this is comprehensive
The data runs until end of June 2021, it was decided to take
the most recent data
The source can be found on slide 4
Licensing, privacy,
security and accessibility

• This information is taken directly from the case study:


– For the purposes of this case study, the datasets are appropriate and will enable
you to answer the business questions. The data has been made available by
Motivate International Inc. under this license.) This is public data that you can
use to explore how dierent customer types are using Cyclistic bikes. But note
that data-privacy issues prohibit you from using riders’ personally identifiable
information. This means that you won’t be able to connect pass purchases to
credit card numbers to determine if casual riders live in the Cyclistic service
area or if they have purchased multiple single passes.
PROCESS
Data Organisation
• 4.46Million records are a lot to deal with, so a series of
databases has been set up to do the heavy lifting.
• TripData – The raw data in a single table
• TripAnalysis – The cleaning and analysis steps
Processing the Data
• There are 6 calculations performed on the data to enable
down stream analysis
– Day of the week – for categorisation
– Month of the year – for categorisation
– Level of “start” detail – for data integrity checks
– Level of “end” detail – for data integrity checks
– Time Taken – How long was the ride
– Distance – How far between start and end point
Distance Calculation
• Given the rides are all within
Chicago, it may be a bit excessive
to use this formula for distance
calculation, but this will give the
most accurate result
Strange Results
rides excluded

• On 15th December 2020, there are 378 rides with an end date of November 2020.
This may have been a short term issue as it all happened on the same day.
• In total there are 9,872 rides where the time ending is before the time starting
spread throughout the complete dataset. There could be a systemic issue that
should be explored.
• There are 369 rides where the time taken is 0, but the distance is greater than 0.
This could also point to an issue.
• In February 2021, there are 90 rides which are within seconds of 25 hours long.
Perhaps there was a fault in the process for handling unreturned bikes? This
caused a distortion to the results because February is a low month anyway.
ANALYZE
Final Data Set
• Rather than running the analysis on all 4+ million
records, a summary query has been written:
Example Impact
• On the strange results slide, the 4th issue around a spike in
February 2021 was analysed and a partial reason found. Just
these 90 rides in February caused a 5 and 3 minute reduction
in average times for the 2 customer types
SHARE
Trends in the data

Minutes per ride Km per ride

Casual riders take a lot longer to complete their journey, even though the journey is
of similar length
Trends in the data
• Members use the bikes
at a very similar level
every day (perhaps
slightly less on Sundays)
• Casual users are very
weekend focussed
Trends in the data
• Casual users use the
bikes more than
members in June. In
winter months, the
members use the bikes
significantly more
How are the bikes used differently?

• Casual users spend much more time with the bike for
the same actual use
• Over the winter months, our regular users use the
bikes far more
ACT
Next steps
• There are some additional questions that arise from
the data
– What is the full operating model of the solution?
• What is the error handling process?
• Why is there such a difference in the quality of the
locations?
• Is there an auto-ending facility if the bike is not
returned properly (giving rise to the 25 hour issue)
Conclusions
• We need to investigate if incentivising casual users to give the
bikes back as soon as possible. The distance travelled is the same,
but time taken is more than double.
– Is there a lack of trust that there will be a bike available when
needed?
– How can this impact the running/maintenance costs of the
bikes if fewer bikes can stretch further?
• Can the seasonal pattern be incentivised?
– Make the summer months cheaper with a membership

You might also like