You are on page 1of 2

BIA B452F Assignment 1

Weighting: 30% (Deadline: 6 March 2023, Monday)

Learning outcome:
 Explain and select analytic techniques for business intelligence and big data analysis.
 Apply data visualization tools and predictive analytics to summarize and analyze business data.

Important note:
 You should note that there might not be a single correct answer to the questions. Your answers to
these questions may be different from each other and could all be equally valid.
 This is an individual assignment. Copying some or all of another student’s assignment is plagiarism.
 Discussing your assignments with other students and seeking their comments and advice is
acceptable but it is not acceptable for two students to hand in assignments that are substantially the
same. When you collaborate on an individual assignment, it is important that the final product is
your own work.

Task
In this assignment, you need to perform exploratory analysis to investigate the world’s top-ten highest-paid
athletes as listed by Forbes since 1990. The sample dataset “Forbes Richest Athletes (1990-2020).csv”
consists of 301 observations for the following 7 features. The dataset doesn’t have records for 2001 due to
the changing of the reporting period from the full calendar year to June-to-June.
(Source: https://www.kaggle.com/datasets/parulpandey/forbes-highest-paid-athletes-19902019).

1. name – Name
2. nationality – Nationality
3. current_rank – Current worldwide ranking
4. previous_year_rank – Worldwide ranking in last year
5. sport – Type of sport
6. year– Year
7. earnings – Earnings (in US millions)
You must use R to perform exploratory analysis on the Forebes richest athletes (1990-2020) dataset. You
must define your own research questions (or hypotheses) and use summary statistics and data visualization
to find the answers for your research questions. For example, you may hypothesize that USA athletes
dominates in Sport earnings. To collect evidence to verify the hypothesis, you use stack column chart to
present the total earnings of athletes in by nationality and year and then draw your conclusion about the
hypothesis.
You must pre-process the data and select appropriate visualization methods in the analysis. You may need
to handle the missing data, re-code the variables, and perform data aggregation. You may use any
appropriate approach to handle the missing data and make reasonable assumptions in the analysis, if
necessary. You must justify your methods and assumptions made. You must analyze the statistics and
graphical output in detail and write up your interpretation.

1
The following two references should be a good start for preparing this assignment:

 “Who earned the most in Sports in 2020” at https://www.kaggle.com/code/parulpandey/who-


earned-the-most-in-sports-in-2020/notebook
 “Assignment 1 Sample Analysis” on OLE
(Note: The sample analysis only illustrates how to write up an analysis report on using R to perform
the exploratory analysis of credit card usage. The program and analysis are not directly applicable
to the given problem. You are expected to provide more in-depth discussion of the findings in your
analysis.)
Write a report to present and discuss your findings of the exploratory analysis. You are strongly
recommended to use R markdown to prepare the report. The report must include an overview of the problem,
describe analysis of the data, your hypotheses, R programs/outputs, and analysis.
This individual assignment will be graded based on the following components (for further details please
see rubrics on OLE):
1. Describe analysis (20 marks)
2. Research questions and data analysis (60 marks)
3. Organization and writing skills (20 marks)

Submission Details
Your completed works should be uploaded to OLE before deadline (6 March, Monday), as follows:
1. Analysis report – “Assignment 1”
2. R program (or R markdown) – R program”

Marks will be deducted if any non-compliance with the submission requirements.

You might also like