You are on page 1of 4

Two Beautiful Plots For Business Client

Background on Patrick’s Midas Automotive Shops


Figure 1: A Midas automotive mechanic shop.
Patrick owns five Midas automotive repair shops (one of them is shown in Figure 1). To keep track of
each
shop, he visits one shop every day. He suspects that each shop is more productive when he is there
as opposed
to when he is not. He hires you to consult for him as he wants to make a data-driven decision about
his visit
schedule to each of the five shops.
Patrick gives you a file which a previously hired data analyst (note: he fired that data analyst for not
making
the insights easy enough to understand.) has cleaned up and put into csv format. It contains 10 weeks
worth
of data:
library(tidyverse)
library(causact)
library(greta)
carsDF = readr::read_csv("carsFixed.csv")
carsDF %>%
group_by(shopID) %>%
summarize(numberOfObservations = n(),
numberOfBossVisits = sum(boss))
## # A tibble: 5 x 3
## shopID numberOfObservations numberOfBossVisits
## <dbl> <int> <dbl>
## 1 1 50 10
## 2 2 50 5
## 3 3 50 15
## 4 4 50 5
## 5 5 50 15
where the data description for carsDF is given below:
• observation: Unique identifier for an observation. In total, there are 250 observations representing
the
previous 10 weeks of business (5 shops × 10 weeks × 5
days
week = 250 observations).
• shopID: A unique shop identifier for each of the five shops.
• boss: A binary variable equal to one when the boss worked at the given store on that day. Note: The
boss did not visit each shop equally.
• carsFixed: This represents the number of cars that the shop fixed on that given workday.
Here are some additional assumptions you should make about Patrick’s business:
1
• For all shops, each additional car that gets fixed yields $50 of profit on average.
• All shops are open 5 days per week for 50 weeks each year - 250 days per year total.
• If you show Patrick a greek letter (e.g. α, β, etc.) in your graphs, he will fire you on the spot and
ignore you completely.
• If you use the words random variable in your analysis, he will fire you on the spot and ignore you
completely.
• If you show Patrick a continuous density function, he will fire you on the spot and ignore you
completely.
• If you use the word Bayesian in your analysis, he will fire you on the spot and ignore you completely.
• Patrick does understand the concept of probability and is a pretty good gambler, so he likes making
decisions when having knowledge of outcome probabilities.
• Patrick really likes simple plots that are insightful.
The Model You Should Use for Interpreting The Data
While you should not show the details of this model to Patrick, this is the generatie DAG you should
use for
your analysis (see Figure @ref(fig:catModel1):
graph = dag_create() %>%
dag_node("Cars Fixed","K",
data = carsDF$carsFixed,
rhs = poisson(lambda)) %>%
dag_node("Exp Cars Fixed - Shop Level","lambda",
rhs = exp(alpha_shop + beta_shop * x),
child = "K") %>%
dag_node("Intercept - Shop Level","alpha_shop",
rhs = normal(alpha,sigma_alpha),
child = "lambda") %>%
dag_node("Boss Effect - Shop Level","beta_shop",
rhs = normal(beta,sigma_beta),
child = "lambda") %>%
dag_node("Intercept - Midas Level","alpha",
rhs = normal(3,2),
child = "alpha_shop") %>%
dag_node("Std Dev - Midas Level","sigma_alpha",
rhs = uniform(0,2),
child = "alpha_shop") %>%
dag_node("Exp Boss Effect - Midas Level","beta",
rhs = normal(0,1),
child = "beta_shop") %>%
dag_node("Std Dev Boss Effect","sigma_beta",
rhs = uniform(0,2),
child = "beta_shop") %>%
dag_node("Boss Present","x",
data = carsDF$boss,
child = "lambda") %>%
dag_plate("Observation","i",
nodeLabels = c("K","lambda","x")) %>%
dag_plate("Shop","j",
nodeLabels = c("beta_shop","alpha_shop"),
data = carsDF$shopID,
addDataNode = TRUE)
2
Observation i [250]
Shop j [5]
Intercept - Midas Level
alpha ~ normal(3,2)
Std Dev - Midas Level
sigma_alpha ~ uniform(0,2)
ExpBossEffect-MidasLevel
beta ~ normal(0,1)
Std Dev Boss Effect
sigma_beta ~ uniform(0,2)
Cars Fixed
K ~ poisson(lambda)
ExpCarsFixed-ShopLevel
lambda = exp(alpha_shop[j] + beta_shop[j] * x)
Boss Present
x
Shop
j
Intercept - Shop Level
alpha_shop ~ normal(alpha,sigma_alpha)
Boss Effect - Shop Level
beta_shop ~ normal(beta,sigma_beta)

Figure 2: Graphical model for Midas shop observations.


HW Assignment and Scoring
Create two beautifully formatted ggplot graphs and submit them in one pdf file with one graph per
page.
Each graph’s purpose should be obvious from the title, subtitle, annotations, color choices, legends,
axes,
labels, etc.
The purpose of the first graph is to convey the historical data that you are relying on to work with
Patrick. This should be an informative graph where a reader (i.e. Patrick) could easily imagine the
following
accompanying narrative:
Graph #1 depicts the last 10 weeks of data for each of the five shops. For some shops, your effect
is seemingly dramatic; they seem to fix more cars when you are there, but for other shops, we
will need to rely on our statistical model to help us understand the effect of your presence at
those shops. Additionally, your past visit schedule did not include visiting shops equally and some
shops appear much busier than others.
The purpose of graph #2 is to convey the probable values (90% credible interval and median value)
for the
expected additional number of cars fixed per day when the boss visits. This will test your ability to
work
with the exp link function and apply it to the posterior distribution. Note: link function and exponential
functions are not terms Patrick would enjoy seeing.
Note: His trustworthy brother Michael manages shop3 and Patrick thinks his trips there are more fun
than
work-related, so he does not want to give them up completely, but is willing to reduce them.
Scoring (12 points Total)
For each graph, you will earn 1 point for each of the following criteria:
• graph’s purpose is clear to the audience
• the graph is beautifully formatted
• the graph has helpful labels
• use of color draws attention to a helpful insight from the graph
• default ggplot color scheme is not used
• text annotation is used to help reader extract some meaning from the graph
Inaccuracies in your representation of the data in graph #1 or the modelling for estimates in graph #2
will
result in reduction of 10% - 40% of the score earned above.
Hints
1. The exponential function is built-in to R. If you need a refresher on this function, see this link:
https://www.khanacademy.org/math/algebra/introduction-to-exponential-functions/exponentialvs-
linear-growth/v/exponential-growth-functions . The exponential function is a common function
to use to map linear predictors (i.e. α + βx) that can range from −∞ to +∞ to a variable that is
between 0 to +∞. Since the rate parameter (λ) of a Poisson distribution is restricted to be between
0 and +∞, the exponential function is often useful to turn a linear predictor into a rate parameter
(i.e. 0 ≤ exp(α + βx) <= ∞). This is similar to the use of the invLogit function to get from linear
predictor to probability. Here you will use the exp function to get from linear predictor to a rate
parameter for the Poisson distribution. In other words, you will use your posterior distribution for
α and β values to calculate the rate parameter. This type of modelling trick is so common, it gets a
special name, and is known as using a generalized linear model.
2. The rate parameter, λ, of a Poisson distribution represents the expected number of cars being fixed
on
the average day. You can use the posterior distribution for αshop and βshop to estimate λ and hence,
estimate the number of cars per day, on average, that will get fixed for shop j (this calculation changes
when the boss is there versus when the boss is not). Remember to use the exponential function when
doing this calculation.

You might also like