Professional Documents
Culture Documents
to review
- The Datasets
Submission The get_filter function
- Statistics Computed Submission
- The Files
Agenda
to review
- The Datasets
Submission The get_filter function
- Statistics Computed Submission
- The Files
Important Lesson - Conditionals and Loops
1. Defining Functions
2. Print vs. Return in Functions
3. Variable Scope
Numpy lesson:
The only lesson you can skip is the Numpy Lesson as it isn’t directly used in the project yet it’s
important in general. So, if you are out of time, you can skip it and get back to it after submitting
the first project.
Don’t Panic…. You are here to Learn…
Agenda
to review
- The Datasets
Submission The get_filter function
- Statistics Computed Submission
- The Files
Project Overview
Using Python to explore data related to bike share systems for three major cities in the
United States—Chicago, New York City, and Washington. You will write code to:
1. Import the data
2. Answer interesting questions about it by computing descriptive statistics.
3. You will also write a script that takes in raw input to create an interactive experience in the
terminal to present these statistics.
Project Details
Inputs:
1. chicago.csv
2. new_york_city.csv bikeshare.py Outputs :
3. Washington.csv
+ Interactive script displaying
4. Raw input statistics and Data upon request
All four of these files are zipped up in the Bikeshare file in the resource tab
Workspace & Submission
Before You Submit:
to review
- The Datasets
Submission The get_filter function
- Statistics Computed Submission
- The Files
Before Doing Anything in your Project script - 1
Concept 4 - Understand Your data:
● df.head()
● df.columns
● df.describe()
● df.info()
● df['column_name'].value_counts()
● df['column_name'].unique()
Before Doing Anything in your Project script - 1
Learn about your data by solving the practice problems:
1. Practice Problem 1 - Let’s Understand the following lines
# load data file into a dataframe
df = pd.read_csv(filename, parse_dates=['Start Time']
)
# find the most popular hour
popular_hour = df['hour'].mode()[0]
to review
- The Datasets
Submission The get_filter function
- Statistics Computed Submission
- The Files
Interactive Experience - 5 Positions
Inputs:
● Raw input (city - month bikeshare.py Outputs :
- day)
● Display raw data Interactive script that answers
questions about the dataset
● Restart the program
Don’t forget to
add this function
The First Two Functions
get_filters() Raw User input load_data(city, month,
day)
● Read the city’s relevant data
CITY_DATA = { 'chicago': 'chicago.csv',
Filtered Dataframe
'new york city': ● Create new columns for
'new_york_city.csv',
'washington':
'washington.csv' }
day_of_week and month. df
● Filter the dataframe
Output:
Rest of the
Day:
Month: ● Saturday ● City that matches one in the
● January ● Sunday CITY_DATA dictionary functions
● February ● Monday
● March ● Tuesday ● Month that is a string of the
● April ● Wednesday month name or the string “all”
● Thursday
● May
● Friday
● Day that is a string of the day
● June
● All ● All name or the string “all”
Interactive Experience
Inputs: Outputs :
bikeshare.py
Raw input (City - Timeframe
- Which month /Which day) Interactive script that answers
questions about the dataset
Remember:
Any time you ask users for input, there is a chance they may not enter what you expect, so your code should handle
unexpected input well without failing. You need to anticipate raw input errors like:
1. Using improper upper or lower case (Unify the input with the .lower() method)
2. Typos
While loops to repeat the question
3. Users misunderstanding.
Use the tips provided in the sections of the Scripting lesson in this course to make sure your code does not fail with an
execution error due to unexpected raw input.
The get_filter() function
Inputs:
Bikeshare.py Outputs :
Raw input (City - Timeframe
- Which month /Which day) return(city, month, day)
Collection Validation
Collection Validation
Collection Validation
Collection Validation
User Input for City: (you can use either this method or the one on the previous slide)
● USER INPUT Collection and VALIDATION in one step:
● Example:
rest_type = ['Italian' , 'Asian' , 'American']
while True:
user_sel = input("Please input your preferred cuisine from 'Italian', 'Asian' or
'American' :")
if user_sel.title() in rest_type:
print('We are working on your choice!')
break
else:
print('Please select value from the provided list!\n'
)
The get_filter() function (2)
Month:
● January
● February
● March
● April
● May
Collection Validation ● June
User Input for Time filters:
● All
● Make a list to Validate against (Notice all the elements are lower case letters):
● Do the same steps you have done with the city part
The get_filter() function (2)
Collection Validation
Follow the same steps as in the city and month selection in the previous slides
The get_filter() function
Outputs:
Now, you can set the return statement: return city, month, day
● Don't forget testing your script after you are done writing the
get_filters() function.
● Also Take EXTRA CARE of the INDENTATION
Common Errors
1. Syntax error:
2. Indentation error:
3. Key error:
KeyError: ‘new york’
4. Name error
NameError: name “city” is not defined
Focus on The Process before the Product
to review
- The Datasets
Submission The get_filter function
- Statistics Computed Submission
- The Files
Data loading & Filtering the dataframe
Load and filter the
Inputs: data as in practice Outputs :
City - Month - Day problem 3 - Dataframe
Continued …...
● A brief explanation of the function
Click on the link and read the replies
Data loading & Filtering the dataframe Common Errors
In this function you will need to extract month and day of week from Start Time to create new columns as per the
classroom df['day_of_week'] = df['Start Time'].dt.weekday_name . But if you work locally use
df['day_of_week'] = df['Start Time'].dt.day_name()instead.
1. Notice all the choices are in lowercase letters because you got the month name from the get_filters function all
lowercase.
2. You can avoid this lengthy step by using dt.month_name()to get months as names so, you can filter easier
if month != 'all':
# filter by month to create the new filtered dataframe
df = df[df['month'] == month.title()
Data loading & Filtering the dataframe Common Errors
1. The most common cause for getting empty dataframe. The day input collected from the get_filters() function is
lowercase, so, it should be piped with the `.title()` to get it with uppercase initials as in the dataframe column.
2. Using day name as abbreviation, you should change the filtering approach, read about .str.startswith()
3. Finally the indentation of the return clause, it’s not under the if, it under the function scope.
Data loading & Filtering the dataframe
# Then we follow a similar process of getting user input and taking action base on it, using the
input function
display_raw = May you want to have a look on the raw data? Type yes or no
2. Display 5 rows and Loop to repeat asking if user still needs to view more data (Using the argument chunksize in the call to pd.read_csv)