Data Science at The Warriors

Data Science at the Warriors
Case Synopsis
During the last decade, internal and external factors have motivated many organizations to
develop data science capabilities. This case follows the establishment of a data science team at the
Golden State Warriors, a San Francisco-based National Basketball Association (NBA) team. The
main protagonist, Ray Yocke, the Senior Director of Business Analytics and Strategy, faces several
decisions while setting up the team, such as the structure of the data science team, the necessary
skills of its initial employees, and how to prioritize projects.
After describing the initial team creation, the case focuses on the development of the team’s
first large project: a ticket timing model to predict when customers are likely to purchase game
tickets. The marketing team could then use the model’s output to send more personalized emails
to target customers when they are most likely to buy a ticket. The case and supplemental data
analyses first describe the team’s initial exploratory data analysis (EDA). Next, Yocke must pick
which of two possible target (or outcome) variables to model:
(1) the number of days before a game a customer is likely to purchase a ticket or
(2) the customer “type” based on their purchasing behavior, as identified by in-depth
interviews by a third-party marketing firm.
There are multiple pros and cons for each of these two outcome variables; at the heart of this
debate is determining what model will be most helpful in reaching the end user. To do so, they
need to look at what data are available and what models have the best predictive capability.
Unfortunately, before the team could launch and test one of its models on customers, the effort
was halted when the NBA temporarily suspended the season due to the COVID-19 pandemic.
After a year-long pause, during which Yocke’s team further refined the model, it had to determine
how to evaluate the model when the Warriors started their next season in a new stadium (the
Chase Center) in San Francisco. At this stage, the central debate is whether the team should run
an experiment or directly pursue a full launch.
Case assignments
Opening
The Warriors are an NBA team based in California. Over the past few years, the team has
started to invest heavily in developing its data science capabilities. The case walked you through
this rapid growth period and described one of their earliest and most important projects: the
development of the ticket timing model.
• How did the Warriors do in 2018? Please, feel free to use the case Exhibit 1 to support your
answer.
Why invest in Data Science
One of the core case considerations is why organizations should invest in data science.
• What was the main factor behind the Warriors’ leadership team’s decision to create a data
science team?
• The Warriors were amid great success, so why did they invest in data science?
To answer the above questions, consider Internal factors and External factors.
Tip for External factors: In San Francisco, who do the Warriors have to compete against to fill
their stadium? If the Warriors’ fans are not at the stadium, what are they doing?
Getting Started
This section aims to discuss the aspects of building a data science team. For this case, the
primary focus is on the initial team design, project selection, and why Yocke may have chosen to
kick off the data science initiatives with a quick-win project.
Now that the Warrior’s leadership has decided to invest in a data science team, the leadership
faces its first decision: how to structure the new DS team. One option is to embed data scientists
into the various business areas. The alternative is to build a core data science team that serves
each of the business areas.
• What are the relative strengths of these organizational designs (Embedded Data Science
Team vs. Centralized Data Science Team)?
• If the leadership team decides to embed the first few data scientists in the marketing unit,
would that be the right choice?
• If the leadership team decided to create a brand-new centralized data science team, which
roles would be included, and which areas would be covered?
Let’s deep into Yocke’s quick wins initial project selection strategy. The focus of these
questions is to highlight the balance (or imbalance) between showing quick wins vs. building the
necessary data science infrastructure.
• What do you think of Yocke’s approach of focusing on quick wins? What would you do
differently?
• If you solely prioritize quick wins, what will happen over time?
Please, feel free to use the case Exhibit 4 to support your answer.
Exploratory Data Analysis
One project that Yocke prioritized was the ticket timing model. As the case discussed, the
project had the potential to be highly impactful and very feasible. To get a better sense of what it
takes to turn this idea into an actual data science product, we are going to dive into the data.
For this case, there is a supplementary code in the Excel data file. This code is intended to serve
as a guide, but its use is not mandatory as it is R code, and we would like to keep our hands-on
data analysis in Python.
Some leading questions we would like to consider in the analysis are presented below:
• How many rows or columns are in the data?

• Is there missing data?
• What do the columns represent?
• What do the rows represent?
• What are the summary statistics for the variable that captures the number of days before
tickets are purchased?
Please feel free to explore the data and do not limit the analysis only to the above presented
questions. They are there just to guide the analysis and to serve as warming exercise. Continuing
with your findings:
• What is the narrative that the data is telling you? Did something surprise you? Why or
why not? What is the most interesting lesson you have learned from the data?
• Do you think the marketing firm that labeled the customers did a good job? What evidence
do you have to support your conclusion?
EDA insights. Just recall some of the fundamentals you may have covered in other courses or
recall what we have talked in our sessions:
• Why do we always initiate the development process with EDA?

Prediction
This section focuses on training predictive models. Let’s start with some questions that are
worth considering when developing models.
• In addition to the data in Exhibit 5, what other data would be useful for modeling each of
the following outcome metrics:
a. customer types,
b. how many days before a game a customer is most likely to purchase a ticket?
• How would you improve ticket sales using:

a. the predicted customer types,
b. a prediction of when a customer is most likely to purchase a ticket?
This point is really important because before we get our hands on programming, we must have
a clear idea (or a good idea) of what we expect the model to return and how we will configure its
inputs. At the end of the day, the model is configured by us based on our understanding of the
business and the data, but it is the computer that actually makes the model, isn't it?
On the way as developers, technology provides us more and more with tools that make
programming a little easier. Is anyone still using Stack Overflow, or did you definitely migrate to
ChatGPT (to name a few)? And more important than that, as managers we should just hire
someone to do the modeling for us.
• How would you measure the impact of the ticket timing model?
• Would you recommend the Warriors take a year to run an experiment before deploying
the model? What are the pros and cons of running an experiment?
In the case, Yocke had two different outcome variables: days before a game and customer type.
Predicting the Days_Before_Game variable

Since the Days_Before_Game variable can be modeled as continuous, there are many plausible
models, the simplest being a linear regression model. Consider here a single variable against
multiple linear regression.
Predicting the Customer_Type variable

The Customer_Type variable is treated as a categorical variable in the analysis. While there are
several suitable models for categorical data, a decision tree model is recommended due to its
faster execution time and easier to interpret. It is important to note that you may play with other
models but keep it simple.
Evaluate the model’s ability to predict the outcome for data that wasn’t used to train it. To do
this, use the 5 new customers who were not part of the original 1,000 customers are provided to
input into the model.
• Which outcome variables should Yocke’s team focus on? How would you use this model?
What are the strengths and weaknesses of the regression model? What about a decision
tree?
Recall that the selection needs to create an alignment between the business problem and the
modeling approach. In this case, Yocke decided to model the Customer_Type because he believed
it was an easier modeling task and it would be more likely to be adopted by the marketing team.
Based on your results:
• Which model would you select?
Evaluation
This section focuses on how to evaluate the real-world impact of the selected model.
Suppose you are Yocke, the 2023-24 season is just a few weeks away, the team has
moved into its new stadium, and everyone is eager to sell out the first few games.
• Should Yocke push for wide-scale deployment of the model, or should he slow
things down to run an experiment? What factors should influence this decision?
• Is there anything to be worried about? What could go wrong with using a new
model to determine when tickets should be sent?
• How easy is it to run this experiment? Are you worried that interference could
bias the results?
• What if launching the model would increase sales by X%, are you willing to wait
to see the reward?
Assignment instructions
Submit your completed assignment via MS Teams in the space created for it before the deadline
specified above. This submission must be done just by one of the members of the working group.
Ensure that the submission is labeled with all last names and the title of the case study. Late
submissions will incur a penalty unless prior arrangements have been made.
If you have any questions or need clarification on the assignment, do not hesitate to reach out.
Deadline: The deadline for this assignment is April 9, 2024, at 11:59 p.m. Late submissions will
not be accepted unless prior arrangements have been made.
Document Format: The assignment must be typed and submitted as a PDF file.
Cover Page: No cover page is required. However, the header of the first page must include the
following information:
• Title of the Case Study

• Students Name
• Structure: Organize your analysis following the sections described in the Case
assignments section.
Font: Book Antiqua, 11-point.
Margins: 1 inch on all sides (suggested).
Length: Your analysis should not exceed 16 pages.
Properly cite any sources used within the text and include a bibliography or works cited page.

Data Science at The Warriors - Assignment 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Science at The Warriors - Assignment 1

Uploaded by

Copyright:

Available Formats

Why invest in Data Science

Exploratory Data Analysis

• How many rows or columns are in the data?

• Why do we always initiate the development process with EDA?

• How would you improve ticket sales using:

Predicting the Days_Before_Game variable

Predicting the Customer_Type variable

• Which model would you select?

• Title of the Case Study

Font: Book Antiqua, 11-point.

Margins: 1 inch on all sides (suggested).

Length: Your analysis should not exceed 16 pages.

You might also like