You are on page 1of 9

ANALYTICS REPORT

TO: E & J GALLO WINERY

FROM: CHRISTIAN JAMES FARLIN

SUBJECT: CASE 2: WINE DATA FOLLOW-UP ANALYSIS

DATE: 03 DECEMBER 2022

Introduction

I analyzed a diverse dataset containing 13,090 observations of four types of wine—Red,


White, Rose, and Sparkling—that contained: their ratings; the number of ratings; the year
produced; the country of origin; the price; the region; the winery; and the types themselves. I
utilized this dataset to determine the strength and predictability of the relationship between rating
and price through the usage of hypothesis tests and visualizations.
I computed correlation tests for each wine type alongside a multivariable regression test
for the overall wine data in addition to simple regression tests for each wine type to reach my
summarization and recommendation.
Although each wine type has at least a moderate positive correlation between rating and
price, their diverse regression coefficients necessitated individual testing. I determined that the
low-price high-rating relationship evidenced in the white and rose categories is a key indicator
that should prompt E & J Gallo Winery to increase the production and sale of these wine types.
Although the sparkling wines have the strongest correlation between rating and price, the lower-
price high-value appeal of White and Rose wines will reach the largest portion of E & J Gallo
Winery’s consumer base: customers who cannot afford expensive wine.

Data Analysis

Correlations and Scatterplots

The Red, White, and Rose wine types all have moderate positive correlations
between rating and price—as the correlation values are each approximately 0.50. However,
there is a strong positive correlation between the rating and price of sparkling wine—which
is evidenced by the correlation value of 0.72.

After building scatterplots to visualize the relationship between rating (x) and price (y) of
each of the four wine types, I deduced that there is a relationship between rating and price.
However, the data takes the shape of an exponential curve (𝑦 = 𝑒 𝑥 ). Thus, there is a relationship
between rating and price of each wine type, but a linear relationship does not appear to be the
best fit for the data.
Overall Regression Model

Regression Equation

̂ = 11139.97 + 88.51(𝑅𝑎𝑡𝑖𝑛𝑔) − 0.0023(𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝑅𝑎𝑡𝑖𝑛𝑔𝑠) − 5.70(𝑌𝑒𝑎𝑟)


𝑃𝑟𝑖𝑐𝑒

Notes:

 Price is the dependent or response variable, whereas rating, number of ratings, and year
are the independent or explanatory variables.
 All numeric values are rounded to approximately two decimal points unless rounding
renders the numeric value useless, such as the number of ratings coefficient.

R2 Interpretation

26% of the variation in price is accounted for by the variation in rating, number of ratings, and
year.

Rating Significance Test

 Null Hypothesis H0: Wine rating does not significantly impact the price of the wine.
 Numeric Null Hypothesis H0: 𝑏𝑅𝑎𝑡𝑖𝑛𝑔 = 0
 Alternative Hypothesis HA: Wine rating significantly impacts the price of the wine.
 Numeric Alternative Hypothesis HA: 𝑏𝑅𝑎𝑡𝑖𝑛𝑔 ≠ 0

Hypothesis Testing Explanation

Because the wine rating p-value of 0 is less than our significance level of 0.05, we can
reject the Null Hypothesis (H0) and accept the Alternative Hypothesis (HA). Thus, we can
conclude that the explanatory variable “wine rating” significantly impacts the price of the wine
as stipulated by the Alternative Hypothesis.

Interpretation of Coefficient (If Significant)

As the wine rating increases by one star, the price of the wine increases by $88.51, on average
and with all other explanatory variables held constant.

Number of Ratings Significance Test

 Null Hypothesis H0: The number of ratings does not significantly impact the price of the
wine.
 Numeric Null Hypothesis H0: 𝑏𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝑅𝑎𝑡𝑖𝑛𝑔𝑠 = 0
 Alternative Hypothesis HA: The number of ratings significantly impacts the price of the
wine.
 Numeric Alternative Hypothesis HA: 𝑏𝑁𝑢𝑚𝑏𝑒𝑟𝑂𝑓𝑅𝑎𝑡𝑖𝑛𝑔𝑠 ≠ 0
Hypothesis Testing Explanation

Because the number of ratings p-value of 0.0026 is less than our significance level of
0.05, we can reject the Null Hypothesis (H0) and accept the Alternative Hypothesis (HA). Thus,
we can conclude that the explanatory variable “number of ratings” significantly impacts the price
of the wine as stipulated by the Alternative Hypothesis.

Interpretation of Coefficient (If Significant)

As the number of ratings increases by 1 rating, the price of the wine decreases by $0.0023, on
average with all other explanatory variables held constant.

As the number of ratings increases by 100 ratings, the price of the wine decreases by $0.23, on
average and with all other explanatory variables held constant.

Note: Both are correct. Explanation 1 makes more sense in pure statistical terms, but Explanation
2 makes more sense in terms of business, as the values can be better explained when rounded to
two decimal places—as recorded in U.S. dollars.

Year Produced Significance Test

 Null Hypothesis H0: The year that the wine was produced does not significantly impact
the price of the wine.
 Numeric Null Hypothesis H0: 𝑏𝑌𝑒𝑎𝑟𝑃𝑟𝑜𝑑𝑢𝑐𝑒𝑑 = 0
 Alternative Hypothesis HA: The year that the wine was produced significantly impacts
the price of the wine.
 Numeric Alternative Hypothesis HA: 𝑏𝑌𝑒𝑎𝑟𝑃𝑟𝑜𝑑𝑢𝑐𝑒𝑑 ≠ 0

Hypothesis Testing Explanation

Because the year produced p-value of 1.2906 × 10−216 is less than our significance level
of 0.05, we can reject the Null Hypothesis (H0) and accept the Alternative Hypothesis (HA).
Thus, we can conclude that the explanatory variable “year produced” significantly impacts the
price of the wine as stipulated by the Alternative Hypothesis.

Interpretation of Coefficient (If Significant)

As the year that the wine was produced increases by one year—or is produced closer to
the current date—the price of the wine decreases by $5.68, on average and with all other
explanatory variables held constant.
How Could This Model Be Improved?

This multivariable regression model could be improved through the implementation of a


non-linear regression line. Although the linear regression line is useful, it is not the best-fit line.
Due to the exponential shape of the distribution of the ratings-price data of each wine type, a
curved regression would be the best fit—ideally a parabolic or exponential curve.

What Other Variables Might We Add?

A key variable omitted in the existing data is the volume of wine bottles sold per wine
type, and furtherly amongst the prices that each wine type subcategory is sold. Testing for the
correlation between price and bottles sold would be extremely beneficial to E & J Gallo Winery,
as it would clearly indicate which wines are better performing. E & J Gallo Winery would
consider increasing the production of wines that sell higher quantities and consider reducing the
production of wines that are less profitable in terms of quantities sold.
Other key variables that could enhance the existing analysis would be more time-centric
variables. Which types of wine sell more in a specific month or season? Perhaps E & J Gallo
Winery would use the results of time-centric analysis to determine that the production of
Sparkling Wine should be bolstered to prepare for the influx of Sparkling Wine sales in the
summer, or that Red Wine reaches peak sales in February.
Lastly, a simple variable that would be immensely useful to E & J Gallo Winery would
be the profit margin, which is equivalent to a manufacturer’s return on cost. Which wines reap
the most revenue with respect to operating costs? Through the analysis of this explanatory
variable, E & J Gallo Winery could determine that more Red Wine should be placed into store
inventories due to their profitability.

Effect of Rating on Price by Wine Type

Red Wine: As the red wine rating increases by one star, the price of red wine increases by
$124.31, on average and with all other explanatory variables held constant.

White Wine: As the white wine rating increases by one star, the price of white wine increases by
$54.40, on average and with all other explanatory variables held constant.

Rose Wine: As the rose wine rating increases by one star, the price of rose wine increases by
$29.50, on average and with all explanatory variables held constant.

Sparkling Wine: As the sparkling wine rating increases by one star, the price of sparkling wine
increases by $190.19, on average and with all explanatory variables held constant.
Analysis Takeaways and Recommendations

Is There a Relationship Between Correlation Strength and Magnitude of Impact That Wine
Rating Has on Price?

Interestingly, the scatterplot correlation values of the Red, White, and Rose wines are all
moderately positive, whereas the correlation between rating and price of Sparkling Wine is
strong and positive. However, the magnitude of impact of the ratings on the prices of each wine
type drastically differ. Therefore, the correlation between rating and price is moderately strong as
expected, but the actual impact that ratings have on price differs per wine type. There is not a
strong relationship between correlation and rating impact because the correlation-based rating
impact for each wine does not match the predicted rating impact.

Should E & J Gallo Winery Use This Model to Predict Price?

I advise against the utilization of this model to predict wine ratings due to its ill-fitting
regression line. A weak regression line yields a low 𝑅2 value, which indicates the strength of the
model. In this case, the sickly 𝑅2 value of 0.26 reveals that another model should be made to
make better predictions about wine prices. I suggest the implementation of a curved regression,
such as a parabolic or exponential curve. I predict that the implementation of one of these curved
regressions will yield a strong model with a high 𝑅2 value—which will be of much greater use to
E & J Gallo Winery.

How Can E & J Gallo Winery Use This Information to Provide a Set of Diversely Priced Wines
for Customers?

I noticed that higher ratings bolster the sale of lower-priced wines—regardless of wine
type. This inference is essential, as it reveals a key component of E & J Gallo Winery’s
consumer base: their ability or willingness to buy wine. Since the lower-priced wines appear to
have a highly elastic relationship with ratings—that is, their prices tend to increase more when
ratings increase than higher-priced wines—then E & J Gallo Winery should consider producing a
marketing campaign highlighting the value of lower-priced wines.
The expenses that the winery would incur on producing this marketing campaign would
quickly be “erased” by the increased revenues generated by their customers—whose ideal wine
types are high-quality, low-price, high-value wines.
If wine is more affordable to consumers, then the number of ratings of each wine would
drastically increase. A reasonably priced wine with hundreds or even thousands of high
consumer ratings would influence more customers to purchase Gallo wines. Highly rated, lower-
priced wines will reap higher profits than higher priced wines with fewer ratings. Higher priced
wines are largely inaccessible to E & J Gallo Winery’s consumer base, who are not able to buy
expensive wines as much or as often as the lower priced wines. If fewer customers can afford to
purchase expensive wines, then there will be fewer ratings posted about these wines, which will
dissuade potential consumers from purchasing those wines.
Thus, I recommend that E & J Gallo Winery use this data and analysis to produce more
lower-priced wines and compliment their value through marketing campaigns. If E & J Gallo
Winery needs more resources to enlarge the production of these wines, then I recommend
diverting resources from the production of higher-priced yet bought in lower quantities
expensive wines.

Discussion of Tableau

Click here to see my dashboard of how rating affects price across the wine types. This
dashboard includes two visualizations: a price vs rating scatterplot and an overlapping bar graph
that juxtaposes the prices and ratings of each wine type. To maximize the usefulness of this
dashboard, please follow these three instructions.
First, once you open the dashboard, please refresh your page. Tableau attempts to stretch
the visualizations to fit across your page, but this is remedied by clicking on the symbol next
to the top of your search bar.
Second, please use the two dropdowns at the top of the dashboard to filter the data to
your preferences. You can filter the data by year produced and the country of origin.
Third, please hover your cursor over each of the bar graph bars of each wine type. If you
hover over the “Red Wine” bar, you will see the “Red Wine” data points on the scatterplot.
A key takeaway that I observed while analyzing my visualizations was the price-ratings
juxtapositions in the bar graph. The ratings for white and rose wines greatly exceed their prices,
whereas the rating of sparkling wine matches its price. I purport that this is an indicator that E &
Gallo Winery can use to increase their prices of white and rose wines. I also stipulate that the bar
graph bars for the rating and price of sparkling wines reveal that the price of sparkling wine is
appropriate with respect to its rating. Customers will not purchase expensive wines with low
ratings.
Additionally, the red wine values appear to have both the greatest outliers from the price
vs rating scatterplot and datapoints that most closely align with the regression line. This diversity
in red wine prices and ratings—with respect to the general trends of the other three wine types—
necessitates further investigation.

Conclusion
I deduced that the low priced, highly rated White and Rose wines present an invaluable
opportunity to E & J Gallo Winery: they can increase the price of a popular item whilst still
retaining a profitable majority of its customer base. I also deduced that E & J Gallo Winery can
increase the quantity of wine available for consumption to reach the high demand. I purport that
E & J Gallo Winery can implement a marketing campaign focused on introducing the high value,
lower-priced wines to new customers. The marketing campaign and the increases in revenue and
quantities sold will be used to justify their price increases of lower-priced wine to consumers and
their revamped production to their shareholders.
Companies rarely can increase the prices of their goods or services without losing
substantial amounts of their buyers, so I recommend that E & J Gallo Winery uses this
opportunity to moderately increase prices and strengthen the production of white and rose wines
to maximize profits. I also recommend that E & J Gallo Winery request or craft a more complex
multivariable regression model that utilizes the best-fit exponential curve, as that will yield a
stronger model that can better predict the price of a specific wine—given the rating, the number
of ratings, and the year produced—than the linear regression model. My final recommendation is
to further explore the unique qualities of the red wine data—not only due to their multitude but
due to their behavior seen in my scatterplot regression visualization.
Thank you for this opportunity! I hope that this deliverable is helpful. If you have any
questions about my existing analysis, or are interested in future projects and collaborations,
please e-mail me here: christianjfarlin@arizona.edu .

Appendix
Overall Multivariable Regression Output
SUMMARY OUTPUT All 4 Wine Types

Regression Statistics
Multiple R 0.510193356
R Square 0.260297261
Adjusted R Square 0.260127682
Standard Error 62.4502063
Observations 13090

ANOVA
df SS MS F Significance F
Regression 3 17959202.25 5986400.748 1534.963426 0
Residual 13086 51035769.9 3900.028267
Total 13089 68994972.14

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 11139.97378 360.1814014 30.92878681 8.8508E-203 10433.9659 11845.98165
Rating 88.51270089 1.945987493 45.4847224 0 84.69828268 92.32711909
NumberOfRatings -0.002250351 0.000746691 -3.013765482 0.002585211 -0.003713974 -0.000786729
Year -5.679980276 0.177407303 -32.01660921 1.2906E-216 -6.027724363 -5.332236188
Christian Farlin
Red Wine Simple Regression
SUMMARY OUTPUT Red Wine

Regression Statistics
Multiple R 0.451251386
R Square 0.203627813
Adjusted R Square 0.203535811
Standard Error 75.82554439
Observations 8658

ANOVA
df SS MS F Significance F
Regression 1 12725338.25 12725338.25 2213.289689 0
Residual 8656 49767786.11 5749.513183
Total 8657 62493124.36

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept -444.4222315 10.31104326 -43.10157765 0 -464.6343312 -424.2101318
Rating 124.306849 2.642262295 47.04561286 0 119.1273858 129.4863122
Christian Farlin

White Wine Simple Regression


SUMMARY OUTPUT White Wine

Regression Statistics
Multiple R 0.465742847
R Square 0.216916399
Adjusted R Square 0.216707966
Standard Error 27.38629444
Observations 3759

ANOVA
df SS MS F Significance F
Regression 1 780534.3113 780534.3113 1040.699756 9.2151E-202
Residual 3757 2817784.276 750.0091233
Total 3758 3598318.587

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept -187.088626 6.454738385 -28.98469541 6.9807E-167 -199.7437578 -174.4334943
Rating 54.39879516 1.686267829 32.25987842 9.2151E-202 51.09270585 57.70488446
Christian Farlin
Rose Wine Simple Regression
SUMMARY OUTPUT Rose Wine

Regression Statistics
Multiple R 0.433450447
R Square 0.18787929
Adjusted R Square 0.185807554
Standard Error 14.51280194
Observations 394

ANOVA
df SS MS F Significance F
Regression 1 19100.59642 19100.59642 90.68686561 1.77449E-19
Residual 392 82563.5967 210.6214202
Total 393 101664.1931

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept -82.82078777 10.04519824 -8.244813671 2.5227E-15 -102.5699901 -63.07158542
Rating 25.49864359 2.677594965 9.522965169 1.77449E-19 20.23440056 30.76288662
Christian Farlin

Sparkling Wine Simple Regression


SUMMARY OUTPUT Sparkling Wine

Regression Statistics
Multiple R 0.731381124
R Square 0.534918349
Adjusted R Square 0.533239353
Standard Error 50.64704122
Observations 279

ANOVA
df SS MS F Significance F
Regression 1 817233.6047 817233.6047 318.5943416 5.85621E-48
Residual 277 710539.0113 2565.122785
Total 278 1527772.616

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept -712.6377734 43.42264702 -16.41166125 9.48059E-43 -798.1180786 -627.1574683
Rating 190.1903445 10.65539211 17.84921123 5.85621E-48 169.2145121 211.166177
Christian Farlin

You might also like