You are on page 1of 5

MCD2080 Business Statistics EXCEL Exercise 1

Descriptive Statistics Presentation Using Excel

This assignment is marked out of 80.25 marks.


It is designed to test your understanding of Descriptive Statistics (including Graphical Presentations using Excel
and interpretations) as well as Regression Analysis (including generating regression output using Excel and
interpretations).

Due Date: by 5 pm Sunday of Teaching Week 6 (3rd April 2022)

DATA AND IMPORTANCE OF THIS EXERCISE

The exercise covers the topics and practice exercises from both lectures on week 1 – 4 and tutorial workshops on
week 1 – 5.
The aim of this exercise is for you to get familiar with excel tools and functions which you will require in future
studies at the university and even the outside world. In addition, this gives you an opportunity to be able to interpret
the descriptive statistics and regression analysis you learn during the first few weeks in this unit.
This is a very important task in learning the basic Excel skills.

Data Background: Insurance fraud


Insurance fraud is usually a major business challenge in the insurance industry. In the car insurance industry, up to
20% of claims involve some form of fraud which cost insurers a great deal, leading to higher premiums for everybody.
The costs to investigate these potential frauds are very high. This includes checking the documentations and
undertaking physical inspections which are costly.
Insurers have to minimise these investigation costs. The insurer is interested in carrying out preliminary descriptive
statistics and basic analysis before doing a thorough analysis.
The insurer has a dataset of 5,000 historical claims contained in the tab labelled “MCD2080 Exer T1 22.xlsx”. The
data includes the following variables:
• Fraud Detected: 1 = if a claim with fraud was detected, 0 if not.
• Claim Amount: Amount that was claimed after a vehicle accident.
• Education: Education level/status of the customer.
• Marital status: The status of customer’s marital status: married, single and divorced.
• Months Since Last Claim: Months since the last claim by the customer.

OUTPUT AND RESULTS


Please answer the following questions and use Microsoft Excel where necessary.

• You are required to use Excel to generate the output.


• It is recommended to create your charts and tables in the “Results (output)” worksheet provided.
− To produce the descriptive (summary) statistics, use the excel functions you have learnt during the
tutorials (or Lecture).
− However, it is also recommended to understand how to use Excel’s Data Analysis Toolpak to produce
summary statistics.
• Make sure to save and organise your excel outputs (tables and the charts) in order to answer the questions on
Moodle.
• Keep your work safe until you get final grading and feedback.
• Organise your tables and the charts well, otherwise poor presentation will result in a loss of marks.
1
HOW TO ANSWER QUESTIONS AND UPLOADING TABLES AND CHARTS TO QUESTIONS

Use the link in the Week 2 Tutorial section labelled “Excel Assignment” and follow the instructions. There is a
set of questions, and you should answer all before submitting your work.
Below is just a guide on what you are expected to do in answering the questions.

For the FULL INSTRUCTIONS please refer to the QUESTIONS in the next page.

There are 10 questions in this exercise.


o Question 1(a-e): Requires you to provide the correct answer out of the multiple choices provided.

For questions 1 to 10 use the data set provided (MCD2080 Exer T1 22.xlsx)

o Question 2: Requires you to rename and upload (pdf) “Table 2a &Table 2b”. Upload the table ONLY
otherwise, you lose marks. Hint: copy the Table to a word document and then save as pdf.
o Question 3: Requires you to rename and upload (pdf) “Graph 3a & 3b”. Name and upload the Graph ONLY
otherwise, you lose marks. Hint: copy the Graph to a word document and then save as pdf.
o Question 4: Requires you to rename and upload (pdf) “Table 4”. Name and upload the table ONLY otherwise,
you lose marks. Hint: copy this Table to a word document, then organise it carefully and then save as pdf.
o Question 5: Requires you to interpret the data distribution listed in the question. Be careful not to delete the
box provided to type your answers.
o Question 6: Requires you to construct pivot tables that you will upload on Moodle question 8 (pdf). Name
and upload the Tables 6a), Table 6b) & Table 6c) ONLY otherwise, you lose marks.
Hint: copy the Tables to a word document, then organise them and then save as pdf.
o Question 7: Requires you to provide interpretations of some proportions/probabilities from the pivot tables in
question 6. These are short answers where you chose the correct “word(s)”.
o Question 8a): Requires you to rename and upload (pdf) “Table 8”. Name and upload the Graph ONLY
otherwise, you lose marks. Hint: copy the Graph to a word document and then save as pdf.
o Question 8b-g): Requires you to interpret/explain the results as asked in b – g.
o Question 9a): Requires you to rename and upload (pdf) “Table 9”. Name and upload the Table ONLY
otherwise, you lose marks. Hint: copy the Table to a word document and then save as pdf.
o Question 9b-d): Requires you to use 9a) to interpret/explain the results as asked in b - d.

You can start answering the questions on Moodle BUT DO NOT SUBMIT the exercise UNTIL you answer
ALL questions to your best level.
You can go to any question AT ANY TIME using “NEXT” and “PREVIOUS” buttons.
Caution: When you Click SUBMIT the Exercise (QUIZ), you will not be able to change any question that
will be final.
To be able to score high marks,
o you need to read all the instructions and questions carefully

2
Questions
1. State the data type for each of the variables below, i.e., categorical (nominal or ordinal), numerical
(continuous or discrete). Explain your reasoning in each case.
a) Education
b) Fraud Detected
c) Marital status
d) Months Since Last Claim
e) Claim Amount
[5 marks]

Use the data set provided (MCD2080 Assignment T1 22.xlsx) to answer the rest of the questions
2. In this question, obtain some descriptive statistics on the amount claimed for the vehicle by the customer for the
case with detected and not detected fraudulent.
To achieve that, sort the data by the ‘Fraud Detected’ or filter the data by ‘Fraud Detected’
o In the Results (Output) worksheet, in cell A1 type the heading ‘Fraud Not Detected’ and in cell C1, ‘Fraud
Detected’.
o Copy/paste the values for amount claimed with no fraud detected in column A (i.e. under ‘Fraud Not
Detected’) and for amount claimed with fraud detected in column C (i.e. under ‘Fraud Detected’).
o Use Excel’s Pivot Table function to create a percentage frequency (count) distribution table of these two
variables. [Hint: use Count and NOT Sum]
o For Fraud Not Detected variable use class intervals (groups) starting at $1,500 and ending with $40,000
with a width of $5,000 and for the Fraud Detected variable use class intervals starting at $1,900 and ending
with $20,000 with a width of $2,000.
o Show percentages correct to 2 decimal places.
o Use the Tabular Format report layout and label your table accurately and informatively.
Name them Table 2a and Table 2b and upload to Moodle as required.
[6 marks]
3. Use the table from Question 2 to construct a percentage frequency histogram of amount claimed for all vehicles.
Make sure you Hide/remove all field buttons from the graph and ensure that the histogram is appropriately
labelled.
Name them Graph 3a and Graph 3b and upload to Moodle as required.
(Marks will be deducted for the poor presentation of your graph) [8 marks]

4. Obtain the indicated summary measures in the following table (Table 4), for the amount claimed with no fraud
detected and the amount claimed with fraud detected. Report values to 2 decimal places
Name it Table 4 and upload to Moodle as required.
Table 4: Table of Summary Measure for Vehicle Claim Amounts ($)
Fraud not detected Fraud detected Excel Function Used for claim with fraud detected
Statistics
in claim in claim (including cell references)
Mean
Median
Standard Deviation
Minimum
Maximum
Range
Count
First Quartile
Third Quartile
Interquartile Range (IQR)
Coefficient of Variation (CV)
[8.75 marks]

3
5. Refer to your Questions 2, 3 & 4 results to describe the fraudulent claims distribution. Specifically, discuss
areas of:
a) Central location,
b) Variation, and
c) Shape.
[12.5 marks]

6. Excel’s Pivot Table function to create a table of the frequency distribution of all variable of “Fraud
Detected” (columns) by the variable “Months Since Last Claim” (Rows).
To do this, first group the variable “Months Since Last Claim” in 4 groups by 10 months for each group.
Use the “Tabular Form” report layout.
(Hint: For this question use the original data and make sure your table values are Count and
NOT Sum. Also, label variable categories as defined – see data background, else you will lose marks)
From this table create and label the following three tables, rounding off percentage values to 2 decimal
places:
o Table 6a: the frequency distribution with values shown as “% of Grand Total”.
o Table 6b: the frequency distribution with values shown as “% of Column Total”.
o Table 6c: the frequency distribution with values shown as “% of Row Total”.
Upload these tables to Moodle as required
[7.5 marks]

7. Refer to Tables 6a - c in Question 6 to answer the following:


a) What proportion of claims made 12 months ago are fraudulent?
b) (i) Overall, what is the probability of randomly selecting a claim that is not fraudulent?
(ii) What is the proportion of claims which are fraudulent and in the second least group of months since last
the claim?
c) If a randomly selected claim is not fraudulent, what is the probability that is within the highest group of the
months since the last claim?
d) If we want to investigate if the variable of “Fraud Detected” is independent of the variable “Months Since
Last Claim”
(i) one might find whether the probability of a non-fraudulent claim given it belongs to the group of 10-19
months since last claim to be equal to what probability group of variables? What are these two
probabilities?
(ii) based on these probabilities, state if the variables “Fraud Detected” and “Months Since Last Claim”
are independent and why?
[7 marks]

For the next questions (8 and 9) use the original data in the “Data-student” worksheet.

8. In this question, use Excel’s Data Analysis toolpak to


a) Run a linear regression to estimate the proportion of claims with fraud. Use a 99% confidence level.
Name it Table 8 and upload to Moodle as required and upload to Moodle
Use the regression output produced in part a) to:
b) State and interpret the above proportion value in part a.
c) In normal language, state and interpret the 99% confidence interval of the proportion.
d) Explain the concept of “repeated sampling” and provide a “mathematical” interpretation of the 99%
confidence interval using the above confidence interval in the output produced.
e) Without calculations compare the precision between the 90% and 99% confidence intervals.
State the reason(s) supporting your answer.
4
f) Apart from the confidence level, state and explain other factors that affect the precision of this
confidence interval.
[12 marks]
9. Use Excel’s Data Analysis toolpak to
a) Run a linear regression to analyse the relationship between claims with fraud and the Months Since
the Last Claim.
Name it Table 9 and upload it to Moodle as required.
Use the regression output from part a) and answer the following questions.
b) Write down the estimated linear regression equation.
c) State and interpret the estimated y-intercept. How valid is the estimate?
d) State and interpret the estimated slope.
[11.5 marks]

Hint on writing equation for Question 9b:

You might also like