You are on page 1of 5

LAB 5 Joey Martinez

05/16/16
Math 311
In the following lab we (as in, the insurance company) will be investigating the possibility of
insurance fraud. A wholesale furniture retailer has been the victim of a fire, which destroyed a
warehouse several years ago. The warehouse contained the majority of the retailer’s stock and the
retailer has filed a claim to the insurance company, proposing the fire to be an accident. The typical
procedure, of an insurance company receiving such a claim, is to appraise the furniture in terms of “lost”
profit. That is, to assess total losses in the form of a Gross Profit Margin (GPF) calculated in the form of a
𝑃𝑟𝑜𝑓𝑖𝑡
percentage given by the formula GPF=𝑆𝑎𝑙𝑒𝑠 𝑃𝑟𝑖𝑐𝑒 ∗ (100)%, and is calculated per item.

Our main value of interest is the mean GPF for all items in the warehouse. Unfortunately, the
eventual selling prices and profit values for items in the warehouse cannot be known, since they were
destroyed in the fire. So, it follows that the mean GPF is also unknown. A method to estimate the mean
GPF is to collect data on similar, recently sold items. Given that the retailer has had 3,005 sales in the
previous sales year, we asked the retailer to provide invoices for these sales for an estimate. Instead, the
retailer opted to collect a sample of 253 invoices and computed the mean GPF of these to be x̅=50.8%.
Hence, the retailer’s desired compensation from the insurance company is the computed average GPF
for these invoices. However, according to experienced claims adjusters, the average GPF for fire claims
rarely exceeds 48%. So, we must investigate whether the retailer had consciously selected a sample
containing invoices with higher profit margins or sales prices. If so, it was possibly done in an attempt to
increase the average GPF, where even a 1% boost in the GPF yields an approximate $16,000 increase in
profit.

As part of our investigation, we have hired a CPA firm to independently estimate the retailer’s
true mean GPF. In its assessment, the CPA firm gained information on all 3,005 invoices of the sales
done by the retailer before the fire and, in effect, has access to the population of invoices for items sold.
The CPA firm had been gracious enough to send a copy data to our insurance company. We leave the
remainder of this report to the analysis of the data given and to testing the claim of the retailer.

A) Given our set of data, we calculate our simple descriptive statistics of the GPF as follows:

Descriptive Statistics: Profit margin

Variable N N* Mean SE Mean StDev Minimum Q1 Median


Profit margin 3005 0 48.901 0.252 13.829 -202.510 44.380 50.210

Variable Q3 Maximum
Profit margin 56.060 148.000

We also wish to provide a graphical representation of the distribution of this data, given on the
following page.
LAB 5 Joey Martinez
05/16/16
Math 311

Histogram of Profit margin


Normal
1400
Mean 48.90
StDev 13.83
1200 N 3005

1000
Frequency

800

600

400

200

0
-200 -150 -100 -50 0 50 100 150
Profit margin

The most immediate feature of this distribution is its apparent symmetry. Although there is a
skew to the left, the majority of our values appear to hover about the population mean of µ=48.901%.
We also note that our population standard deviation is given by σ=13.83%. The spread of the population
data is quite dense about the mean GPF, with 50% of the profit margins falling within the range of
44.38% and 56.06%; indicating that the claim of the retailer is not implausible. However, despite the
shape of our histogram, a qq-plot reveals that our distribution is not quite as normal as it appears.
Consider the following graph:
LAB 5 Joey Martinez
05/16/16
Math 311
We reemphasize the left skew of the population, and also acknowledge the oscillatory behavior
of the data about the normality line in our qq-plot. This indicates that, although the histogram of the
distribution appears normal, the population is not completely normal; in fact, it contains an entire group
of outliers, as can be recognized in the qq-plot above and determined by the IQR*1.5 outlier test.

Next, let’s suppose that we were unable to access the raw data provided above. We now test
the claims of the retailer in the proceeding sections. From now on we consider the sample of size n=243
drawn by the retailer, with a mean GPF of x̅=50.8%. Let the claim that the mean GPF is 48% be given and
let the population standard deviation be know as 13%, under a “normal” population distribution.

B) Based on the information above, the probability of a single randomly selected item having a GPF
greater than 50.8% is given by the following.

Distribution Plot of Gross Profit Factor (in %)


Normal, Mean=48, StDev=13

0.030

0.025

0.020
Density

0.015

0.010
0.4147

0.005

0.000
48 50.8
X= GPF%

As indicated by the distribution above, we calculate our probability of a randomly selected item
being the above the desired value to be P(GPF>50.8)=.4147. This means that there is a 41.47% chance of
selecting an item at random that has a GPF of greater than 50.8%; so, the case for our retailer does not
seem too unbelievable at this stage.

C) Suppose we now wish to calculate the probability of the mean GPFs for a sample of size n=253.
By the Central Limit Theorem, since n=253>30, the distribution of sample means is
approximately normal; hence we must have mean µx̄=µ=48% and standard deviation
σxbar=σ/sqrt(253)=.82, as desired.

Thus we calculate the probability of getting a mean GPF value of greater than 50.8%, under the
given sampling distribution values, as follows:
LAB 5 Joey Martinez
05/16/16
Math 311

Distribution Plot
Normal, Mean=48, StDev=0.82
0.5

0.4

0.3
Density

0.2

0.1

0.0003193
0.0
48 50.8
X

Hence, by the above, the probability of the mean GPF being greater than 50.8% is only
P(Mean GPF≥50.8)=.0003193 or approximately .3193%.

D) We now wish to construct a 95% confidence interval about our sample mean GPF of x̅=50.8%.
Consider the following:

One-Sample Z
The assumed standard deviation = 13

N Mean SE Mean 95% CI


253 50.800 0.817 (49.198, 52.402)

The 95% confidence interval above was constructed using Minitab software. In the context of
our problem, this means that we are 95% confident that the true population mean GPF falls is
contained within the interval (49.198, 52.402), after repeated sampling. These results give some
credence to the claim of the retailer, at least for now.

E) At this stage, we wish to begin testing the claims of our retailer and insurance company. Let the
following null and alternative hypotheses be given as H0: µ=48% and Ha: µ≠ 48%. Suppose we
wish to test this claim at the α=.05 significance level. Then our critical value is given by
z.025=±1.96 and our test value is calculated to be
50.8−48
𝑧= 13 = 3.426.
√253

Hence, since |z|>z.025, we would have enough evidence to reject our null hypothesis that the
true population mean GPF is 48%, at the α=.05 significance level ; that is, the chances of a value
as extreme as 50.8%, under the null hypothesis, would be approximately p=.0003 or .3%.
LAB 5 Joey Martinez
05/16/16
Math 311
However, this is very suspicious, considering the majority of cases yield a population mean of
48%.
So, we contrast this initial test by using a randomly generated sample data of size 253 of our
own. The descriptive statistics of the generated sample is as follows:
Descriptive Statistics: Retailer Sample
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3
Random Sample 253 0 48.112 0.872 13.871 -47.400 44.145 49.000 55.240

Variable Maximum
C1 95.140

We still retain the same hypotheses and level of significance as before; however, we calculate
our new results using our generated data as follows:
One-Sample Z: Random Sample
Test of μ = 48 vs ≠ 48
The assumed standard deviation = 13

Variable N Mean StDev SE Mean 95% CI Z P


Random Sample 253 48.112 13.871 0.817 (46.510, 49.714) 0.14 0.891

Now, under these circumstances, we must fail to reject the null hypothesis at the 5% significance
level, due to insufficient evidence. These results place further scrutiny on the claim of the
retailer, since our values obtained by the randomly selected sample in C are slightly more
consistent.

F) Using the same generated sample from part E, we construct a 95% confidence interval; now,
with a sample mean of 48.112%. The desired confidence interval is given by
One-Sample Z: Random Sample
The assumed standard deviation = 13

Variable N Mean StDev SE Mean 95% CI


Random Sample 253 48.112 13.871 0.817 (46.510, 49.714)

At the (1-α)=.95 confidence level, the sample mean of 50.8%, reported by our retailer, is not
captured by our interval. Our results suggest instead that, with 95% confidence, our true
population mean GPF is really captured by the interval (46.510, 49.714), after repeated
sampling. This is the first evidence that places the claim of the retailer into question.

G) In conclusion, given our evidence from this report, we must side with the insurance company.
That is, it is more than likely the retailer has committed insurance fraud; and at the very least,
has done so unintentionally. Our hypothesis testing revealed the chances of observing a value as
extreme as what the retailer claims is p=.0003 or p≈0, under the null hypothesis that µ=48%.
This is a red flag that the reported mean GPF given by the retailer is highly unlikely to occur. We
also reinforce this claim by our 95% confidence interval, constructed from our own randomly
collected sample data of 253 items. At the 95% confidence level, our interval failed to contain
the claimed sample mean of 50.8%, adding more suspicion about this reported value. Thus,
given this evidence against the retailer, it is more than likely that there has been a misreporting
of the average GPF from items destroyed in the fire.

You might also like