Professional Documents
Culture Documents
Learning outcome:
Explain and select analytic techniques for business intelligence and big data analysis.
Apply data visualization tools and predictive analytics to summarize and analyze business data.
Important note:
You should note that there might not be a single correct answer to the questions. Your answers to
these questions may be different from each other and could all be equally valid.
This is an individual assignment. Copying some or all of another student’s assignment is plagiarism.
Discussing your assignments with other students and seeking their comments and advice is
acceptable but it is not acceptable for two students to hand in assignments that are substantially the
same. When you collaborate on an individual assignment, it is important that the final product is
your own work.
Task
In this assignment, you need to perform exploratory analysis to investigate the world’s top-ten highest-paid
athletes as listed by Forbes since 1990. The sample dataset “Forbes Richest Athletes (1990-2020).csv”
consists of 301 observations for the following 7 features. The dataset doesn’t have records for 2001 due to
the changing of the reporting period from the full calendar year to June-to-June.
(Source: https://www.kaggle.com/datasets/parulpandey/forbes-highest-paid-athletes-19902019).
1. name – Name
2. nationality – Nationality
3. current_rank – Current worldwide ranking
4. previous_year_rank – Worldwide ranking in last year
5. sport – Type of sport
6. year– Year
7. earnings – Earnings (in US millions)
You must use R to perform exploratory analysis on the Forebes richest athletes (1990-2020) dataset. You
must define your own research questions (or hypotheses) and use summary statistics and data visualization
to find the answers for your research questions. For example, you may hypothesize that USA athletes
dominates in Sport earnings. To collect evidence to verify the hypothesis, you use stack column chart to
present the total earnings of athletes in by nationality and year and then draw your conclusion about the
hypothesis.
You must pre-process the data and select appropriate visualization methods in the analysis. You may need
to handle the missing data, re-code the variables, and perform data aggregation. You may use any
appropriate approach to handle the missing data and make reasonable assumptions in the analysis, if
necessary. You must justify your methods and assumptions made. You must analyze the statistics and
graphical output in detail and write up your interpretation.
1
The following two references should be a good start for preparing this assignment:
Submission Details
Your completed works should be uploaded to OLE before deadline (6 March, Monday), as follows:
1. Analysis report – “Assignment 1”
2. R program (or R markdown) – R program”