You are on page 1of 58

fwd initiative

Project-Bike Share Data


Walk-through
S
hout out
● Omar Khaled Mohy El-din
● Mohamed A. Amr
● Mohamed Ghazala
● Mostafa Mohamed Mohamed Imam
● Shady Ahmed Fawzy Ahmed Abdulmajeed
● Michael Bassily
● Ahmed Nagib
● Seif El-Banhawy
● Rana Adel
S
hout out
● Ahmed Saber
● Ahmed Elsayed
● Ahmed Adel Ibrahim Mohamed Hassan
● Alaa Abaas
● Mohamed Ahmed Abdelwahab
● ahmed negm
● Fady Osama Fayez
Agenda

Project Overview Tips & Code Walkthrough


11 2 3 4 5
Project Details Project Details, Tips on Rest of
Important Lessons Interactive
Workspace
Workspace &
& Know The Data the functions
Experience
- Technical requirements

to review
- The Datasets
Submission The get_filter function
- Statistics Computed Submission
- The Files
Agenda

Project Overview Tips & Code Walkthrough


11 2 3 4 5
Project Details Project Details, Tips on Rest of
Important Lessons Interactive
Workspace
Workspace &
& Know The Data the functions
Experience
- Technical requirements

to review
- The Datasets
Submission The get_filter function
- Statistics Computed Submission
- The Files
Important Lesson - Conditionals and Loops

Review the following concepts?:

1. Good and Bad Examples


2. Truth Value Testing
3. For loops
4. While loops
5. Break, Continue
6. Pay attention to Indentation
Important Lesson - Functions

Review the following concepts?:

1. Defining Functions
2. Print vs. Return in Functions
3. Variable Scope

Click on the image to watch the


video …

And check this link for a discussion


around this on discourse
Important Lesson - Scripting & All Pandas Lesson

Review the following concepts?:


1. Editing script
2. Scripting with Raw input
3. Errors and Exceptions
4. Handling errors
5. Accessing error messages
6. Debugging practice
7. Using a main block

Click on the image to watch the


video ...
Important Lesson - All lessons are important EXCEPT:

Numpy lesson:
The only lesson you can skip is the Numpy Lesson as it isn’t directly used in the project yet it’s
important in general. So, if you are out of time, you can skip it and get back to it after submitting
the first project.
Don’t Panic…. You are here to Learn…
Agenda

Project Overview Tips & Code Walkthrough


11 2 3 4 5
Project Details Project Details, Tips on Rest of
Important Lessons Interactive
Workspace
Workspace &
& Know The Data the functions
Experience
- Technical requirements

to review
- The Datasets
Submission The get_filter function
- Statistics Computed Submission
- The Files
Project Overview

Using Python to explore data related to bike share systems for three major cities in the
United States—Chicago, New York City, and Washington. You will write code to:
1. Import the data
2. Answer interesting questions about it by computing descriptive statistics.
3. You will also write a script that takes in raw input to create an interactive experience in the
terminal to present these statistics.
Project Details

What Software Do I Need to complete this project locally?:


1. You should have Python 3, NumPy, and pandas installed using Anaconda
2. A text editor, like Sublime or Atom.
3. A terminal application
Project Details
The Datasets:
1. Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the
data files contain the same core six (6) columns:
a. Start Time (e.g., 2017-01-01 00:07:57)
b. End Time (e.g., 2017-01-01 00:20:53)
c. Trip Duration (in seconds - e.g., 776)
d. Start Station (e.g., Broadway & Barry Ave)
e. End Station (e.g., Sedgwick St & North Ave)
f. User Type (Subscriber or Customer)
2. The Chicago and New York City files also have the following two columns:
a. Gender
b. Birth Year
Project Details
Statistics Computed:
1. Popular times of travel (i.e., occurs most often in the start time):
a. most common month
b. most common day of week
c. most common hour of day
2. Popular stations and trip:
a. most common start station
b. most common end station
c. most common trip from start to end (i.e., most frequent combination of start station and end station)
3. Trip duration:
a. total travel time
b. average travel time
4. User info:
a. counts of each user type
b. counts of each gender (only available for NYC and Chicago)
c. earliest, most recent, most common year of birth (only available for NYC and Chicago)
Project Details
The Files:

Inputs:
1. chicago.csv
2. new_york_city.csv bikeshare.py Outputs :
3. Washington.csv
+ Interactive script displaying
4. Raw input statistics and Data upon request

All four of these files are zipped up in the Bikeshare file in the resource tab
Workspace & Submission
Before You Submit:

Check the Rubric:


Your project will be evaluated by a Udacity reviewer according to this Project Rubric. Be sure to review it
thoroughly before you submit. Your project "meets specifications" only if it meets specifications in all the
criteria.
Workspace & Submission - How to deal with the workspace

Watch this video


Workspace & Submission
Before You Submit::
Agenda

Project Overview Tips & Code Walkthrough


11 2 3 4 5
Project Details Project Details, Tips on Rest of
Important Lessons Interactive
Workspace
Workspace &
& Know The Data the functions
Experience
- Technical requirements

to review
- The Datasets
Submission The get_filter function
- Statistics Computed Submission
- The Files
Before Doing Anything in your Project script - 1
Concept 4 - Understand Your data:

Some useful pandas methods:

● df.head()

● df.columns

● df.describe()

● df.info()

● df['column_name'].value_counts()

● df['column_name'].unique()
Before Doing Anything in your Project script - 1
Learn about your data by solving the practice problems:
1. Practice Problem 1 - Let’s Understand the following lines
# load data file into a dataframe
df = pd.read_csv(filename, parse_dates=['Start Time']
)
# find the most popular hour
popular_hour = df['hour'].mode()[0]

2. Practice Problem 2 - Let’s get closer….


import pandas as pd
filename = 'chicago.csv'

# load data file into a dataframe


df = pd.read_csv(filename)

# print value counts for each user type


user_types = df['User Type'].value_counts().to_frame()
print(user_types)
Before Doing Anything in your Project script- 1
Especial emphasis on Practice problem 3; Extracting month and day of week from Start Time to create new columns (Try
the following different approaches):
a. dt.month_name()
b. dt.day_name()
c. How to filter the dataframe. (Empty dataframe issue)

Please, Do this practice problem and have close look at the


approaches illustrated in this reply
Before Doing Anything in your Project script- 2

This topic is super important:

Build Gradually and Test Frequently


Agenda

Project Overview Tips & Code Walkthrough


11 2 3 4 5
Project Details Project Details, Tips on Rest of
Important Lessons Interactive
Workspace
Workspace &
& Know The Data the functions
Experience
- Technical requirements

to review
- The Datasets
Submission The get_filter function
- Statistics Computed Submission
- The Files
Interactive Experience - 5 Positions
Inputs:
● Raw input (city - month bikeshare.py Outputs :
- day)
● Display raw data Interactive script that answers
questions about the dataset
● Restart the program

Points for Input collection:


1. The City input: Would you like to see data for Chicago, New York, or Washington?
2. Month input (If they chose month): Which month - January, February, March, April, May, or June?
3. Day input (If they chose day): Which day - Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, or Sunday?
4. Ask The user to display raw data samples of 5 rows
5. After filtering the dataset, users will see the statistical result of the data, and choose to start again or exit.
Interactive Experience - Script overview

Don’t forget to
add this function
The First Two Functions
get_filters() Raw User input load_data(city, month,
day)
● Read the city’s relevant data
CITY_DATA = { 'chicago': 'chicago.csv',
Filtered Dataframe
'new york city': ● Create new columns for
'new_york_city.csv',
'washington':
'washington.csv' }
day_of_week and month. df
● Filter the dataframe

Output:
Rest of the
Day:
Month: ● Saturday ● City that matches one in the
● January ● Sunday CITY_DATA dictionary functions
● February ● Monday
● March ● Tuesday ● Month that is a string of the
● April ● Wednesday month name or the string “all”
● Thursday
● May
● Friday
● Day that is a string of the day
● June
● All ● All name or the string “all”
Interactive Experience

Inputs: Outputs :
bikeshare.py
Raw input (City - Timeframe
- Which month /Which day) Interactive script that answers
questions about the dataset

Remember:
Any time you ask users for input, there is a chance they may not enter what you expect, so your code should handle
unexpected input well without failing. You need to anticipate raw input errors like:
1. Using improper upper or lower case (Unify the input with the .lower() method)

2. Typos
While loops to repeat the question
3. Users misunderstanding.
Use the tips provided in the sections of the Scripting lesson in this course to make sure your code does not fail with an
execution error due to unexpected raw input.
The get_filter() function

Inputs: First function step by step.


Outputs :
Raw input (City - Timeframe Click on the link and read the reply
- Which month /Which day) return(city, month, day)

Step 1 - Script Setting Up:


Importing the required libraries
at the top of the script as per
import time
import pandas as pd
import numpy as np the best practices.

CITY_DATA = { 'chicago': 'chicago.csv',


'new york city': 'new_york_city.csv', Assigning a dictionary to map
'washington': 'washington.csv' } the city names' to the
Or (make things simpler) corresponding file name path in
CITY_DATA = { 'ch': 'chicago.csv', the file system to access later
'ny': 'new_york_city.csv',
within the code.
'w': 'washington.csv' }
The get_filter() function

Inputs:
Bikeshare.py Outputs :
Raw input (City - Timeframe
- Which month /Which day) return(city, month, day)

Function Purpose - Docstring:


"""
Asks user to specify a city, month, and day to analyze.
Returns:
(str) city - name of the city to analyze
(str) month - name of the month to filter by, or "all" to apply no month filter
(str) day - name of the day of week to filter by, or "all" to apply no day filter
"""
The get_filter() function (1)

Collection Validation

User Input for City:


● Ask the user: which city to display data about?
● USER INPUT COLLECTION:
■ Easy City selection process to minimize the risk of erroneous input potential. (consider replacing long
city names with shorter abbreviations like “ch” for Chicago, “ny” for New York City...etc)
■ In case of not valid input, prompt the user again to enter valid input until the input is accepted.
■ Don’t forget to unify the input format with .lower()
● Pseudo code:
def get_filters():
print('Hello! Let\'s explore some US bikeshare data!')
# Get the user input and don’t forget to add .lower()at the end to make the output letter case agnostic.
The get_filter() function (1)

Collection Validation

User Input for City:


● USER INPUT VALIDATION:
○ We have now a city selection that we want to validate against the cities in CITY_DATA dictionary.
○ While loop to To the rescue:
● Pseudo code: That's one approach.. keep on for a second approach using a while, try, except statement.
while city not in CITY_DATA :
print('That\'s invalid input.') # tell the user that the input is not right.
# # Get the user input and don’t forget to add .lower()
city = input(string to collect the input).lower()
The get_filter() function (1)

Collection Validation

User Input for City: (suggestions)


● Simply just amend the cities’ names in the original dictionaries to avoid potential error from lengthy names

CITY_DATA = { ‘ch’: 'chicago.csv',


'ny': 'new_york_city.csv',
'w': 'washington.csv' }
The get_filter() function (1) “Warning: Indentation errors”

Collection Validation

User Input for City: (you can use either this method or the one on the previous slide)
● USER INPUT Collection and VALIDATION in one step:
● Example:
rest_type = ['Italian' , 'Asian' , 'American']
while True:
user_sel = input("Please input your preferred cuisine from 'Italian', 'Asian' or
'American' :")
if user_sel.title() in rest_type:
print('We are working on your choice!')
break
else:
print('Please select value from the provided list!\n'
)
The get_filter() function (2)
Month:
● January
● February
● March
● April
● May
Collection Validation ● June
User Input for Time filters:
● All

● USER INPUT COLLECTION (Month):


○ First what is the expected value for a month: ('january', 'february', 'march', 'april', 'may', 'june', 'all').

● Make a list to Validate against (Notice all the elements are lower case letters):

Months = [‘january’, ‘february’, ‘march’, ‘april’, ‘may’, ‘june’, ‘all’]

● Do the same steps you have done with the city part
The get_filter() function (2)

Collection Validation

User Input for Time filters:


● USER INPUT COLLECTION (day):
○ First what is the expected value for a month: ('monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday',
'sunday', 'all').
○ A list can be created for all possible choices to use it during validation step

Follow the same steps as in the city and month selection in the previous slides
The get_filter() function

Outputs:
Now, you can set the return statement: return city, month, day

● Don't forget testing your script after you are done writing the
get_filters() function.
● Also Take EXTRA CARE of the INDENTATION
Common Errors
1. Syntax error:

2. Indentation error:

3. Key error:
KeyError: ‘new york’

4. Name error
NameError: name “city” is not defined
Focus on The Process before the Product

Build gradually …..


Test frequently
Agenda

Project Overview Tips & Code Walkthrough


11 2 3 4 5
Project Details Project Details, Tips on Rest of
Important Lessons Interactive
Workspace
Workspace &
& Know The Data the functions
Experience
- Technical requirements

to review
- The Datasets
Submission The get_filter function
- Statistics Computed Submission
- The Files
Data loading & Filtering the dataframe
Load and filter the
Inputs: data as in practice Outputs :
City - Month - Day problem 3 - Dataframe

The load_data(city, month, day):


● Refer to the Concept no 9 "Practice Problem #3" in the classroom.
● This function should return a dataframe df .
● Pay special attention to the filtering step.
● Pay attention to The filtering step by day

Continued …...
● A brief explanation of the function
Click on the link and read the replies
Data loading & Filtering the dataframe Common Errors

1. Column names spelling and letter case.


2. Creating a new column for month : Here the month column has its entries as numbers from 1 to 6 which will affect
the filtering approach. You can use instead dt.month_name()to get months as names.
3. Creating a new column day_of_week working outside the classroom with recent pandas version, this will raise an
error, use .dt.day_name()instead of weekday_name. This will get you a column populated with days as names
with upper case initials.
Error (1): (Google is your friend)

In this function you will need to extract month and day of week from Start Time to create new columns as per the
classroom df['day_of_week'] = df['Start Time'].dt.weekday_name . But if you work locally use
df['day_of_week'] = df['Start Time'].dt.day_name()instead.

Click on the image:


Data loading & Filtering the dataframe Common Errors

1. Notice all the choices are in lowercase letters because you got the month name from the get_filters function all
lowercase.
2. You can avoid this lengthy step by using dt.month_name()to get months as names so, you can filter easier
if month != 'all':
# filter by month to create the new filtered dataframe
df = df[df['month'] == month.title()
Data loading & Filtering the dataframe Common Errors

1. The most common cause for getting empty dataframe. The day input collected from the get_filters() function is
lowercase, so, it should be piped with the `.title()` to get it with uppercase initials as in the dataframe column.
2. Using day name as abbreviation, you should change the filtering approach, read about .str.startswith()
3. Finally the indentation of the return clause, it’s not under the if, it under the function scope.
Data loading & Filtering the dataframe

Common source of errors in the


load_data(city, month, day):
Empty Dataframe
Build gradually and check the dataframe before
filtering to know how months and days display in
your data
Drawbacks of not Testing Frequently (Google is your friend)

Common source of errors in the time_stats(df):


● Make sure that you use the exact name of the columns of the dataframes

Load data and time stats


Drawbacks of not Testing Frequently - Empty Dataframe - IndexError: Index out of bounds

Common source of errors in the time_stats(df):


● The dataframe is sliced incorrectly in the load_data() review your work on get_filters() and
load_data()
● The following referenced topic has two links, please open and read them
https://nfpdiscussions.udacity.com/t/bikeshare-project-2021/134756/4
Most Frequent route station_stats(df)

Common source of errors in the station_stats(df):


● Make sure that you use the exact name of a column of the dataframes
● Calculating the most frequent route: Check this out
● Check the references at the last slide
Washington with no Gender or Birth Year users_stats(df)

Common source of errors in the users_stats(df):


● Make sure that you use the exact name of the columns of the dataframes
● A Reviewer review on it (Please have a look at))
● Approaches to deal with this issue:
○ Approach 1
○ Approach 2
Interactive Raw Data display

Inputs: Bikeshare.py Outputs :


Raw input (Yes/No) (A function to be added)
Raw data display and ask again.

The display_raw_data(city) function:


Your script also needs to prompt the user whether they would like to see the raw data. If the user answers 'yes,'
then the script should print 5 rows of the data at a time, then ask the user if they would like to see 5 more rows of
the data. The script should continue prompting and printing the next 5 rows at a time until the user chooses 'no,'
they do not want any more raw data to be displayed. (From here you can extract your docstring for the function)
Interactive Raw Data display

The display_raw_data(city) function (Check the references at the last slide):


1. Ask for input:
# You can print a message like
print('\n Raw data is available to check... \n')

# Then we follow a similar process of getting user input and taking action base on it, using the
input function
display_raw = May you want to have a look on the raw data? Type yes or no

2. Display 5 rows and Loop to repeat asking if user still needs to view more data (Using the argument chunksize in the call to pd.read_csv)

Helpful resource to read:

Different approaches to display raw data


Final step
The main() function:
def main():
while True:
city, month, day = get_filters()
df = load_data(city, month, day)

time_stats(df, month, day)


station_stats(df)
trip_duration_stats(df)
user_stats(df)
display_raw_data(city)

restart = input('\nWould you like to restart? Enter yes or no.\n')


if restart.lower() != 'yes':
Break
if __name__ == '__main__':
main()
Way to Go!
1. Study All The lessons… The 8 lessons … Don’t skip any
2. Solve the problem sets before starting the project
3. Build gradually and test frequently
4. Build gradually and test frequently
5. Build gradually and test frequently
6. Build gradually and test frequently
7. Build gradually and test frequently
8. Build gradually and test frequently
9. Go to the referenced topics in the next slide for a trick in each function
10. Reach out on the question hub.
References For Helpful Discourse Topics (Project 1)
1. First function step by step.
2. Load data and time stats
3. Most frequent station (station_stats)
4. Trip duration time (optional)
5. Testing scenario and Using different formats when testing
6. User_stats Washington error and a Reviewer review on it (Please have a look at))
7. Different approaches to display raw data
I know it's a lot of work and mental effort but enjoy building your
pythonista/pythoneer coding Brain!!
© 2020, Udacity. All rights reserved.

You might also like