Professional Documents
Culture Documents
ASSIGNMENT 03
SENTIMENT
ANALYSIS
Due Date:
Friday, November 17, 2023 23:55:00 ET
EXPIRED
ACADEMIC DISHONESTY
Assignments will be run through a similarity checking software to check for code that looks very similar to that of other students. Sharing or
copying code in any way is considered plagiarism (Academic dishonesty) and may result in a mark of 0 on the assignment and/or reported to
If you want to store a PDF version of this assignment, press Ctrl+p on Windows or Command+p on Mac, the print window withh appear. Then,
select Save As PDF from the Destination dropdown. Then click Save
Example 2 in Section 3.3: This example was missing the keyword "perfectly" in the result. This has now been corrected.
1 LEARNING OUTCOMES
2 BACKGROUND
3 TASKS
4 FUNCTIONAL SPECIFICATION
5 NON-FUNCTIONAL SPECIFICATION
6 STARTER CODE
1 Learning Outcomes
1 Using functions
4 Text processing
7 Exceptions in Python
2 Background
With the emergence of social media sites such as Facebook, Reddit, Twitter (also known as X), LinkedIn, and WhatsApp, more and more
data is being produced and made accessible online in a textual format. This textual data, such as Tweets or Facebook posts, can be hard
to process but is incredibly important for organizations as it offers a current snapshot of the public’s feelings (or sentiment) about a
topic at a current point in time. Having a live view of your customer’s current sentiment about your products or the publics view of your
Twitter is a social media site that allows users to post “tweets”, short (typically under 280 characters) messages. It is commonly used
by people to “tweet” aspects of their daily lives and current opinions about a variety of topics. This “flow of tweets” has become a way
to study or at least guess at how people feel about various aspects of the world, their own lives, or a specific topic. For example,
analysis of tweets has been used to try to determine how certain geographical regions may be voting or their opinion on a recently
announced product.
This is accomplished by analyzing the content, the words, and phrases, in tweets. For example, analysis of keywords or phrases in
tweets can be used to determine how popular or unpopular a movie might be. This is often referred to as sentiment analysis.
In this assignment, you will be performing a sentiment analysis on a dataset of Tweets collected in February 2023 relating to a
business, product, or security. The end goal is to produce a report that summarizes the sentiment of the tweets contained in the
dataset.
3 Tasks
In this assignment, you will write a Python module, called sentiment_analysis.py (this is the name of the file that you should use) and a
main program, main.py , that uses the module to analyze Twitter information. In the module sentiment_analysis.py , you will create a
number of functions (as specified in the Functional Specifications) that will perform simple sentiment analysis on Twitter data.
sentiment_analysis.py should only contain your function definitions and have no code outside of these functions.
The Twitter data contains comments (“tweets”) from individuals related to a given keyword. The objective is to determine the average
sentiment for the dataset, the number of positive/negative/neutral tweets, and the top 5 countries that are most positive about this
1 Read in and process a set of keywords and tweets from a given file.
2 Clean the tweets to remove any punctuation and convert them to all lowercase letters.
3 Process each individual tweet to determine a score, a “sentiment score”, for the individual tweet.
4 Analyze these scores to determine an overall average sentiment, an average sentiment for favorited/linked tweets, an average
sentiment for retweeted tweets, the number of positive/negative/neutral tweets, and find the top 5 countries by average
sentiment.
5 Report this information back to the user by outputting a new file containing the report.
3.1 - Read
STARTER CODE
Before you start coding, please note that starter code is available in Section 6. It is highly recommended that you use this code as
Your program will have to read in two files, keywords.tsv (a tab-separated file) and tweets.csv (a comma-separated file). The exact
names of these files will be specified by the user, but the content will always be in the same format.
3.1.1 - keywords.tsv
This Tab-Separated Values (TSV) file will contain a list of one more keyword as well as a score each keyword contributes to the overall
sentiment of a tweet. Each line of this file will start with the keyword in all lower case, followed by a single tab character, and then an
An example of this file, named keywords.tsv, can be found here. This file contains the AFINN-111 wordlist of common keywords and
scores used for sentiment analysis. An example is shown below of the first 13 lines:
abandon -2
abandoned -2
abandons -2
abducted -2
abduction -2
abductions -2
abhor -3
abhorred -3
abhorrent -3
abhors -3
abilities 2
ability 2
aboard 1
Each keyword is separated from it’s corresponding score by a single tab (\t) character. A score of 5 would mean that this is a very
positive/happy word. A score of -5 would mean that this is a very negative/unhappy word.
IMPORTANT!
The AFINN-111 wordlist is just one real life example of a wordlist for sentiment analysis. Your program should work for any
3.1.2 - tweets.csv
This Comma-Separated Values (CSV) file will contain a list of tweets as well as associated metadata about the tweet and the tweeter,
such as their location (only if known), the number of times the tweet has been favorited/liked or retweeted, the date the tweet was
Each line in the file contains information about only one tweet. Each field on a line is separated (delimited) by a comma. The fields are
Created At, Tweet Text, Username, Retweet Count, Favorite Count, Language, Country, State/Province, City, Latitude, Longitude
An example line from the file adidas.csv that can be found here:
Feb 10 21:00:45 2023,Adidas says Kanye West split could cost company $1.3B as Yeezy shoes go unsold https://t.co/eviVODm3ig,D
Key In
Field Name Data Type Description
Dictionary
The date this tweet was posted to twitter in the format MMM DD HH:MM:SS YYYY. This
Created At date String
can be read in as a string.
The text that was tweeted by the user. Note that this text may be unclean and contain
Tweet Text text String
odd characters, punctuation, and hyperlinks.
Username user String The username of the user who made the tweet. Always one word with no spaces.
Retweet
retweet Integer The number of times this tweet has been retweeted. Always a positive integer value.
Count
Favorite Count favorite Integer The number of times this tweet has been favorited/liked. Always a positive integer value.
The language code representing the language this user has set in their profile. In most
Language lang String cases, this will be "en" as only English tweets were selected for inclusion in the dataset,
If known, the country that the user resides in will be listed here. If it is not known, the
Country country String
string value "NULL" will be given.
If known, the state or province that the user resides in will be listed here. If it is not
State/Province state String
known, the string value "NULL" will be given.
If known, the city that the user resides in will be listed here. If it is not known, the string
City city String
value "NULL" will be given.
If possible, an estimate of the user’s current latitude on the earth will be given here as a
Latitude lat Float/String floating-point value. If the latitude could not be estimated, this will be the string value
"NULL".
If possible, an estimate of the user’s current longitude on the earth will be given here as a
Longitude lon Float/String floating-point value. If the longitude could not be estimated, this will be the string value
"NULL".
Note that not all of these fields will be used in our analysis, but they must be read in by your program as described in the Functional
Specification.
3.2 - Clean
The text of the tweets in each dataset is not “clean”. That is to say that it contains characters that must be removed before we can
1) all characters except for English letters and spaces should be removed,
Java, Python, C++; endless possibilities await in the world of coding! http://t.co/ASD32S4S
3.3 - Process
The sentiment score for an individual tweet is calculated by comparing each word in the provided wordlist (e.g. keywords.tsv) to the
words contained in the cleaned tweet text. Each time a keyword is encountered, that keywords score is added to the sentiment score.
The keywords must be an exact match to count. For example the keyword “friend” should not match “friendly” and vice versa.
Examples:
If given the following already cleaned tweet and the provided AFINN-111 wordlist the sentiment score would be 12:
beautiful sunrise friendly smilesjoy setbacks frustrated call from best friend lifted spirits surprise gift added excitement m
This score is calculated by adding the scores for the keywords found in the tweet from the keywords.tsv file:
Any tweet with a positive score (>0) would be classified as a positive tweet. Any tweet with a negative score (<0) would be classified as
a negative tweet. Any tweet with a score of zero (0) would be classified as a neutral tweet.
If a keyword is encountered multiple times in a tweet, it should be counted multiple times such as in this example with a sentiment
in her best dreams the day unfolded perfectly her best friend surprised her with the best present imaginable
Keep in mind that the keyword list can be different depending on the keyword file the user provides.
3.4 - Analyze
After the sentiment score has been calculated for each tweet individually, statistics need to be calculated for the dataset as a whole.
3 The total number of positive, negative, and neutral tweets based on the tweet’s sentiment score (tweets with a positive score
are positive, negative score are negative, and neutral if they have a score of zero).
5 The average sentiment score of only the tweets with at least one favorite/like.
7 The average sentiment score of only the tweets with at least one retweet.
8 The average sentiment score for each country listed in the dataset (used to calculate the top 5 countries).
9 The top 5 countries in the dataset based on their average sentiment score.
All floating-point values should be rounded to two decimal places. These statistics will be returned in a dictionary as described in the
Functional Specification.
If there are no tweets with retweets in the dataset, then a string value of "NAN" should be returned for the average sentiment score of
tweets with at least one retweet. Similarly, if there are no tweets with any favorites in the dataset, then a string value of "NAN" should
be returned for the average sentiment score of tweets with at least one favorite/like.
3.5 - Report
After the analysis has been preformed the statistic calculated must be returned to the user in the form of plain text file (.txt file) with
Where all values shown contained in [ ] should be replaced with a real value as shown in the example below:
Top five countries by average sentiment: United States, United Kingdom, United Arab Emirates, Taiwan, Sweden
The other text contained in the file such as “Average sentiment of all tweets: ” or “Number of favorited tweets: “ must be exactly as
shown including the space and semicolon. The items in the report must be in exactly this order.
The list of the top 5 countries should be on one line and each country should be separated by a comma and space as shown above.
They should be ordered by average sentiment (highest to lowest). There must not be an extra comma at the end of the list (commas
should only appear between country names). The "NULL" value should not be included in the list. If there are less than 5 countries in
4 Functional Specification
4.1 - sentiment_analysis.py
IMPORTANT!
All of your function names and the order of the parameters they take must be exactly as specified in this part. Naming your
functions differently, will result in the autograder being unable to grade your assignment (this will result in a grade penalty).
Your sentiment_analysis.py file must only contain function definitions. You must not call a function, ask for input, or give output
outside of these function definitions. Running the sentiment_analysis.py file should result in no output of any kind as your program
The module sentiment_analysis must contain the functions described in this section and they must be used in some way in your
program to read, clean, process, analyze, or report on the tweets in the given dataset. Each function and it’s parameters must have the
read_keywords(keyword_file_name)
This function should read the Tab-Separated Values (TSV) keywords file previously described (in Section 3.1.1). keyword_file_name is a
string containing the name of the file. You can safely assume that if the file exists, it will be in the current working directory (the
The function should return a dictionary with a key for each keyword in the file and a corresponding value equal to the score listed for
Example:
wonderful 4
unfair -2
trusted 2
tired -2
the dictionary produced should have the following values and keys:
{
'wonderful': 4,
'unfair': -2,
'trusted': 2,
'tired': -2
}
Exceptions:
If an IOError occurs, such as the file not existing, this function should print the text:
where [keyword_file_name] should be replaced with the value of keyword_file_name and the function should return an empty
dictionary.
clean_tweet_text(tweet_text)
This function should take a string, tweet_text, which contains a single tweet from the dataset and return a copy of the string that only
contains English letters and spaces. All letters should also be made lowercase.
More details and an example are given previously in Section 3.2. Clean.
read_tweets(tweet_file_name)
This function should read the Comma-Separated Values (CSV) tweet file previously described (in Section 3.1.2). tweet_file_name is a
string containing the name of the file. You can safely assume that if the file exits, it will be in the current working directory (the directory
The function should return a list of dictionaries. There should be one dictionary for each line contained in the tweet_file_name file. The
keys of the dictionary should be the key names given in the table in Section 3.1.2 and the values the corresponding values for that field
in the file.
The function clean_tweet_text should be used to clean the text of the tweets before they are copied into the dictionary.
Example:
If tweet_file_name contains the following two lines (note that word wrapping is used in this document to show each line on multiple
lines but in the file there is only a line break at the end of each line):
2023-02-10 17:20,Did an Air Canada flight spot the Chinese spy balloon over B.C. on Jan. 31? https://t.co/KOzRJFoORh https://t
2023-02-10 17:16,@AdamJPfeffer @AirCanada Your lucky Air Canada got you there. Lol,tekmacrogersco1,0,0,en,Canada,Ontario,NULL,
[
{
'city': 'NULL',
'country': 'NULL',
'date': '2023-02-10 17:20',
'favorite': 12,
'lang': 'en',
'lat': 'NULL',
'lon': 'NULL',
'retweet': 2,
'state': 'NULL',
'text': 'did an air canada flight spot the chinese spy balloon over bc on jan httpstcokozrjfoorh h
'user': 'CTVNews'
},
{
'city': 'NULL',
'country': 'Canada',
'date': '2023-02-10 17:16',
'favorite': 0,
'lang': 'en',
'lat': 50.000678,
'lon': -86.000977,
'retweet': 0,
'state': 'Ontario',
'text': 'adamjpfeffer aircanada your lucky air canada got you there lol',
'user': 'tekmacrogersco1'
}
]
Note that favorite and retweet should have integer values and not strings, and lat and lon should be floating point values unless they
are given as “NULL” in the file. Any field with a “NULL” value given in the file should simply have a string value of 'NULL' in the dictionary.
Exceptions:
If an IOError occurs, such as the file not existing, this function should print the text:
where [tweet_file_name] should be replaced with the value of tweet_file_name and the function should return an empty list.
calc_sentiment(tweet_text, keyword_dict)
This function should calculate the sentiment score for an individual tweet based on the text contained in that tweet as described in
Section 3.3. Process. tweet_text is a string value containing the already cleaned text of an individual tweet. keyword_dict is a keyword
dictionary created by the read_keywords function to be used for calculating the sentiment score.
The function should return an integer value equal to the sentiment score for the given tweet.
calc_sentiment("in her best dreams the day unfolded perfectlyher best friend surprised her with the best pr
Output:
10
classify(score)
This function takes a sentiment score, score, and classifies it as positive, negative, or neutral. If the score is greater than zero, the
function should return the string "positive", if the score is less than zero it should return the string "negative", if it is equal to zero
make_report(tweet_list, keyword_dict)
This function takes a list of tweets, tweet_list, created by the read_tweets function and a keyword dictionary, keyword_dict, created
by the read_keywords function and performs the analysis described in Section 3.4.
The function should return a dictionary that contains the following keys and values:
The average sentiment value of all tweets that have been favorited/liked at least once. The string
avg_favorite Float/String
value "NAN" should be output if num_favorite is zero.
The average sentiment value of all tweets that have been retweeted at least once. The string value
avg_retweet Float/String
"NAN" should be output if num_retweet is zero.
The average sentiment value of all tweets in the tweet list. The string value "NAN" should be output
avg_sentiment Float/String
if num_tweets is zero.
num_favorite Integer The number of tweets in the tweet list that have been favorited/liked at least once.
num_negative Integer The number of tweets in the tweet list that would be classified as negative by the classify function.
num_neutral Integer The number of tweets in the tweet list that would be classified as neutral by the classify function.
num_positive Integer The number of tweets in the tweet list that would be classified as positive by the classify function.
num_retweet Integer The number of tweets in the tweet list that have been retweeted at least once.
num_tweets Integer The total number of tweets in the given tweet list.
A string containing the top 5 countries found in the tweet list based on the average sentiment of
tweets for that country. They should be ordered by average sentiment (highest to lowest). Each
country listed in the string should be separated by a comma followed by a space. Make sure you
top_five String
don't have an extra comma at the end of the country list. Note that the value "NULL" should not
appear in this list. If there are less than 5 countries in the dataset, there will be less than 5 countries
in this list.
All floating-point values (e.g. the average sentiment scores) should be rounded to two decimal places using python’s round function.
The order of the items in the dictionary does not mater but the keys must be named exactly as listed above.
If an average value can not be calculated, for example due to there being no tweets in the dataset that are favorited, the average value
should be the string "NAN". In all other cases it should be the correct floating point value rounded to two decimal places.
Example Output:
The following is an example of a report dictionary that could be produced by this function:
{
'avg_favorite': 0.1,
'avg_retweet': 0.16,
'avg_sentiment': -0.08,
'num_favorite': 258,
'num_negative': 150,
'num_neutral': 250,
'num_positive': 134,
'num_retweet': 74,
'num_tweets': 534,
'top_five': 'United States, United Kingdom, United Arab Emirates, Taiwan, Sweden'
}
Note that the last value for top_five is a string and not a list. This string should list the top five countries in order of average
sentiment.
Hints:
1 There are several ways to sort your countries by average sentiment depending on how you have them stored. If they are stored
in a dictionary, with the keys being the country names and the values the average sentiment for that country, you can take
advantage of the sorted function. The following are some resources that may help with sorting a dictionary by values:
2 If you have a list of countries (or any string values) and wish to join the values into a string seperated by a comma (or other
write_report(report, output_file)
This function creates the report file described in Section 3.5. As input, it takes report, the dictionary created by the make_report
function, and output_file, the name of the file to write the report to. The report should be formatted exactly as described in Section 3.5
If writing to the file was successful, this function should print the text:
This text should not be printed if an exception occurred when opening or writing to the file.
Exceptions:
Should an IOError occur when opening or writing to the output_file, this function should print the text:
4.2 - main.py
IMPORTANT!
All specified functions should be defined in sentiment_analysis.py and not main.py .
The program in main.py should ask the user for the file names of the keyword file and tweet file that data will be read from, as well as
the name of the report file that will be created. It must use the functions defined in the sentiment_analysis.py module to perform the
Additionally, main.py should check the input from the user is valid and raise an exception in the following cases:
1 If the keywords filename does not end in the .tsv extension an Exception with the text "Must have tsv file
2 If the tweet filename does not end in the .csv extension an Exception with the text "Must have csv file extension!"
should be raised.
3 If the report filename does not end in the .txt extension an Exception with the text "Must have txt file extension!"
should be raised.
4 If either read_keywords or read_tweets returns an empty dictionary or empty list an Exception with the text "Tweet list
User input is shown in red. The report should be written to the file name given by the user (in this case report.txt) and not shown.
Your prompts to the user should contain the same text as shown above. Note that the last line, "Wrote report to report.txt" is
1 Your code must be written for Python 3 and work in Python 3.9.
2 You may not use any modules or third-party libraries not described in this document. Standard built-in functions such as the
String, file, and math functions are fine. You should not have to import anything other than your sentiment_analysis module.
3 You must document your code with brief comments. Each file should contain a comment at the top of the file with your name,
student number, and a brief description of what is contained in that file. At least one comment should also be given for each
function that describes its purpose, parameters, and values returned. You should also include any additional comments to
4 Your program must be efficient and terminate within a reasonable time limit. All gradescope test cases must terminate within
5 Assignments are to be done individually and must be your own original work. You may not show or otherwise share your code
for this assignment with others. Software will be used to detect academic dishonesty (cheating). If you have any questions
about what is or is not academic dishonesty, please consult the document on academic dishonesty and ask any questions to
6 You must follow Python style and coding conventions and good programming techniques, for example:
Follow conventions (either camelCase or snake_case) for naming variables and constants. This must be done consistently
Try to follow the PEP 8 style guide for Python code where possible.
Do not use global variables unless they are constant (never change) and do not have functions access variables outside of
their scope.
Do not use recursion inappropriately or in a way that would eventually cause your program to crash. Your main()
method should only be called once and not from another function (should only be called from the bottom of main.py).
7 All of your code should be contained in the files main.py and sentiment_analysis.py . Only submit these files and no others and
ensure the filenames match exactly. It is your responsibility to ensure you have submitted the correct files.
8 sentiment_analysis.py must only contain function definitions. No code should be outisde of a function in this file. Running
sentiment_analysis.py directly should result in no output and should not wait for any input.
9 main.py should not contain any specified functions, only functions in sentiment_analysis.py will be graded by the autograder.
10 All function names, key names, and outputs should follow the specifications given in this document exactly. Not following the
specifications may lead to test cases failing. It is your responsibility to ensure you have followed them correctly.
11 Frequently backup your work remotely (e.g. using OneDrive) in a way that is secure and private. No extension will be given for
lost or corrupted files. ¸
6 Starter Code
The following starter code has been provided for you. You are free to use this code in your solution. You should keep the function
headers (the names and the parameters the same). Keep the functions in the files shown below and do not use global variables.
6.1 - sentiment_analysis.py
"""
Starter code for sentiment_analysis.py
Your function headers must match this file exactly.
You should only have function definitions in this file.
No code should be outside of a function in this file.
def read_keywords(keyword_file_name):
# Add your code here
# Should return a dict of keywords.
def clean_tweet_text(tweet_text):
# Add your code here
# Should return a string with the clean tweet text.
def classify(score):
# Add your code here
# Should return a string.
def read_tweets(tweet_file_name):
# Add your code here
# Should return a list with a dictionary for each tweet.
6.2 - main.py
"""
Starter code for main.py
This file should take input from the user and call the
functions in sentiment_analysis.py
def main():
# Add code for main() here.
# This should get input from the user and call the
# required functions from sentiment_analysis.py
main()
main.py
sentiment_analysis.py
3 Several tests that will automatically run when you upload your files. It is important to review the results of these testcases as
this will give you an idea of how well your program is working. You may resubmit any number of times up until the due date.
4 It is recommended that you create your own test cases to check that the code is working properly for a multitude of different
scenarios (some example datasets have been provided for you at the bottom of this document).
5 Assignments will not be accepted by email or by any other form then a Gradescipe submission.
7.2 - Marking
The assignment will be marked as a combination of your auto-graded tests and manual grading of your code logic, comments,
Marks will be deducted for failing to follow any of the specifications in this document (both functional and nonfunctional), not
documenting your code with comments, using poor formatting or style, or naming your files incorrectly.
For this assignment Gradescope will show you the result of all testcases (there are no hidden test cases). As such it is your
responsibility to ensure they all pass. TAs will not manually grade your code.
Submit to Gradescope offten and ensure all testcases pass before the due date.
Assume the autograder is correct and marked your testcases accurately until told otherwise.
Marking Scheme
[119 marks] Auto-graded test cases which check your code for correctness and adherence to the specifications given in this
document. For this assignment these MUST pass, TAs will not manually grade your code.
[6 marks] Comments. One comment at the top of each file with your details and a description of the file, one comment describing
[15 marks] Style and Variable Names. Using consistent and clear variable names, avoiding global variables, defining functions
Penalties
Filename Issues: If one or more files are not named correctly. They must be exactly as specified, including capitalization.
Code outside a function in sentiment_analysis.py: If the program has input or output in sentiment_analysis.py outside of a
Functions in main.py: If the functions specified in the assignment such as read_tweets are defined in main.py and not
sentiment_analysis.py. It is fine if functions not specified in the assignment are in main.py, this also does not apply to the main()
function in main.py.
Function or Key Name Incorrect: If the name of a function or a dictionary key does not match the specifcation exactly, including
capitalization.
Hardcoding: Hardcoding is writing code that is not easily modified or reused. Hardcoded code can not properly adapt to user input
and only works for set values or cases. Hardcoding your program to only pass the Gradescope test cases, and not work properly
Late assignments will only be accepted up to 4 days late and only if you have enough late coupons remaining (at least one for each day
late). If you submit one day late, you will need to use 1 late coupon. 2 days late, 2 late coupons. 3 days late, 3 late coupon, and so on. If
you have insufficient late coupons remaining or submit more than 4 days late, you will receive a zero grade on this assignment.
It is your responsibility to track your late coupon use. Any values shown on OWL should be considered an estimate and may not be
accurate or up to date.
REMEMBER!
You have 4 coupons (for the entire semester) that will be automatically applied when you submit late.
It is the student's responsibility to ensure the work was submitted and posted in GradeScope.
Please check this page back whenever an announcement is posted regarding this assignment.
ONLY the 2 mentioned files are to be submitted. Otherwise, marks will be deducted if you submit anything more than these
2 files.
Do NOT submit a PDF or screenshot of your code (this will result in a zero grade)!
Example 1:
In this example, there is only one long tweet. Note that words like "adventurous" and "frustration" do not match the keywords
"adventure" and "frustrated" and do not alter the score. This is as intended. Your program should only count exact matches (after
capitalization and punctuation is removed).
Example 2:
In this example, there is only one tweet with a lot of punctuation. Once it is removed the word smile is left and this is the only word
that impacts the score.
Example 3:
In this example, the word happy is repeated many times. Each time a word is repeated it should count towards the overall score.
Example 4:
In this example, there are 20 tweets in the CSV file. Note that as there are only 4 different countries, the report only lists 4 countries in
the top 5.
Example 5:
In this example, there are 40 tweets in the CSV file. In this case some tweets have NULL values for country, state, or city as well as the
lat/long. NULL should not be considered a country for the top 10 list.
Example 6:
In this example, the full The AFINN-111 wordlist is used on the tweets from example 5.
Real World Datasets (your code will not be tested on these tweets)
The following are real life examples of tweets taken from X (twitter) and word lists used in real life sentiment analysis. As such they
may be more complex, longer, and could contain inappropriate language. The autograder will test your code with smaller and more
basic tweet datasets, but these datasets below are included if you would like to try your code on real world data.
You may find the following built-in Python functions and methods useful.
String Functions/Methods
File Functions/Methods
Type Conversion
TOP