You are on page 1of 9


By Clarissa Franklin, Rajat Malhotra, Dani Diehl, and Raksha Pai

ver the course of the past half produce square footage, number Digging Deeper
year, we have been working of deliveries per week, and Our next step was to analyze
closely with The Kroger Co. seasonal considerations? more granular data based upon
and the asset protection team utilizing ■■ Are there correlations between sales, inventory, and cost factors for
analytics to drive insights from data that produce freshness and wastage? item-level data. We uploaded over 300
lead to better overall understanding and This analysis resulted in some item-level data files, calculated shrink
decisions regarding a problem common interesting takeaways. First, stores at the item-level based on the given
to the entire retail industry—retail that carry more value-based items had information, and calculated aggregate
shrinkage, or shrink. shrink results higher than more upscale statistics at the subcommodity level.
The produce department, in stores. On average, stores that received As an enterprise, Kroger anticipates
particular, is susceptible to loss and six to seven deliveries per week had which commodities and subcommodities
represents a disproportionate amount better shrink results than those with result in the most waste based on
of shrink relative to the entire Kroger fewer deliveries. Perhaps these stores the experience and expertise of the
enterprise. Addressing shrinkage order less per delivery and carry less employees. Some store managers claim
within the produce department will on the floor anticipating that another there are items that are restocked
help Kroger reduce costs and improve delivery will be made soon. solely to throw away again. Through
profitability. The goal of this capstone Further, on a seasonal basis, shrink data analysis we can provide greater
project is to utilize data science and as a percentage to sales tends to be accuracy regarding the wastefulness
analytics to better understand the lowest in February through May. This of each product and use quantitative
relationship between inventory, sales, may be due to a difference in product analysis to support employee experience
produce freshness, customer satisfaction, mix or other seasonal effects. and expertise.
and shrink. However, while the results of the Where is Kroger’s shrink in the
preliminary analysis may display produce department coming from? Our
Exploratory Data Analysis general patterns of shrink performance first objective was to determine which
Our first step was to better by varying characteristics, the purpose commodities contributed the most to
understand the business and determine of the exploratory data analysis was Kroger’s total shrink.
the need. Due to the lack of bar codes, simply to uncover major trends and The dashboard created will then
the variety of products and vendors, and answer relevant questions rather than allow Kroger to mine deeper into each
the perishable nature of the products, determine causal relationships. commodity to see the true problem
produce department data can be
extremely problematic. We needed to Dashboard:
understand the departmental structure Which commodities contributed the most to Kroger’s total shrink?
at Kroger, financial data, how Kroger
measures and records produce inventory, Commodity 001
and how to calculate shrinkage at a Commodity 002
Commodity 003
granular level. This information played Commodity 004
a crucial role in understanding which Commodity 005
questions to ask next, which direction to Commodity 006
Commodity 007
take the analysis, and ultimately which
Commodity 008
recommendations to provide. Commodity 009
Good practice dictates exploratory Commodity 0010
data analysis (EDA) when starting any Commodity 0011
Commodity 0012
analytics project to better understand Commodity 0013
your data. As a first step in our data Commodity 0014
analysis we decided to analyze data at Commodity 0015
Commodity 0016
the overall store level, focusing on the
Commodity 0017
big picture and looking at trends that Commodity 0018
affected each location as a whole. Our Commodity 0019
questions included: Commodity 0020
Commodity 0021
■■ Which stores were performing the best
Commodity 0022
and worst in terms of shrink results? Commodity 0023
■■ Do these stores have any clear Commodity 0024
physical relationship? Commodity 0025
Commodity 0026
■■ How do average shrink results Commodity 0027
vary depending on store type, Commodity 0028


Dashboard: subcommodities and visually identify

Which products experience a high shrink percentage? strong performance and problem areas.
This dashboard allows for easy
Commodity 001
Commodity 002 comparison between stores and much
Commodity 003 faster identification of problems in the
Commodity 004 field, providing awareness and visibility
Commodity 005
Commodity 006
into performance at a granular level
Commodity 007 and offering high-value information
Commodity 008 to Kroger.
Commodity 009
Commodity 0010
Commodity 0011 Predictive Modeling
Commodity 0012 In addition to exploratory data
Commodity 0013 analysis, predictive models were
Commodity 0014
Commodity 0015
developed for the Kroger data to help
Commodity 0016 answer key questions and provide
Commodity 0017 additional visibility into Kroger
Commodity 0018 operations. While this analysis can show
Commodity 0019
Commodity 0020 what’s happened in the past, predictive
Commodity 0021 models will determine the average
Commodity 0022 expected result (shrink percentage) for
Commodity 0023
a given set of conditions. Additionally,
Commodity 0024
Commodity 0025 a well-fitted predictive model will
Commodity 0026 quantify the impact of a change in a
Commodity 0027 factor (such as moving a store from a
Commodity 0028
high-risk neighborhood to a low-risk
neighborhood) when all other conditions
subcommodity. By reviewing the results For example, upon determining that a remain equal.
for each commodity and subcommodity, large portion of a certain product goes to The first predictive model used
Kroger can better determine what waste, data mining was able to reveal the was a multilinear regression. We used
is happening in stores, investigate primary problem by narrowing it down multiple linear regression to model the
results that appear problematic, and to a few select items with extraordinarily relationship between several explanatory
make informed decisions based on the high shrink-to-sales ratios. This may lead variables (including store type, store area,
available data. to critical business decisions, including delivery schedule, customer satisfaction
We then addressed whether certain the possibility of adjusting delivery sizes scores, and more) and the desired
products were being restocked solely to or frequency for these products to better response variable—produce shrink.
be thrown away, as well as the shrink match inventory with demand and to The resulting regression describes
cost per commodity. As depicted in the reduce shrink. The Tableau dashboard how mean responses vary in response
Tableau visual above, we were able to can be a crucial tool allowing store to changes in the explanatory variables.
identify certain products that experience managers to visually observe trends in The model predicts whether shrink
a high shrink percentage in the produce their produce departments. will increase or decrease when a given
department. While clearly problematic, The final dashboard (see next variable changes and quantifies the
this insight creates a valuable business page) depicts a map of stores in expected magnitude of change. Further,
opportunity based on a data-driven a Kroger division, color coded by it can determine which variables impact
decision. Based on this analysis, Kroger shrink results with red representing shrink, allowing Kroger to better
has the opportunity to reconsider best the worst-performing stores and understand factors affecting shrink rates.
practices and improve results at every grey representing the best. Using We modeled shrink rate, expressed as
location where the product is available the dashboard, regional managers basis points rather than the monetary
and all new locations where the products will be able to visualize how the value of shrink, so conclusions would
will be stocked moving forward. division is performing and which be easier to translate from one store to
Similarly, we created a second areas need attention. Store managers the next.
dashboard representing the percent can compare performance to that of Explanatory variables included
of waste within each commodity and neighboring locations and immediately produce department square footage,
subcommodity allowing Kroger to identify problem commodities and store employee turnover rate, percentage
review and compare the performance of subcommodities. Employees can of produce department area relative to
each group of products. click on individual commodities or overall store area, number of produce

Which stores in this division perform the best?

Commodity Category 1 Commodity Category 1a

Commodity Category 2 Commodity Category 2 a
Commodity Category 3 Commodity Category 3 a
Commodity Category 4 Commodity Category 4 a
Commodity Category 5 Commodity Category 5 a
Commodity Category 6 Commodity Category 6 a
Commodity Category 7 Commodity Category 7 a
Commodity Category 8 Commodity Category 8 a
Commodity Category 9 Commodity Category 9 a
Commodity Category 10 Commodity Category 10 a
Commodity Category 11 Commodity Category 11 a
Commodity Category 12 Commodity Category 12 a
Commodity Category 13 Commodity Category 13 a
Commodity Category 14 Commodity Category 14 a
Commodity Category 15 Commodity Category 15 a
Commodity Category 16 Commodity Category 16 a
Commodity Category 17 Commodity Category 17 a
Commodity Category 18 Commodity Category 18 a

deliveries per week, the risk tier for the risk, and max risk. Our model found to type 2, the expected average shrink
store determined by the asset protection no significant differences between would be (600-X) basis points. If the
team, store type, inventory, charges, shrink rates at low- and medium-risk store type changed to type 3, it would
ratio of sales to inventory, net sales, and stores. High-risk stores are expected to be (600-2*X) basis points, and so on.
customer satisfaction score. have X basis points more shrink, and ■■ Inventory, Net Sales, and Charges.
The first four explanatory variables max-risk stores are expected to have Y These explanatory variables are
were determined to have no significant basis points more shrink, compared to expressed in thousands of dollars per
effect on the shrink percentage. low- and medium-risk stores. period. The shrink increases by X basis
The following explanatory variables ■■ Store Type. Our analysis included five points for each additional $1 million
were all found to be significant and store types, categorized 1–5, where inventory, decreases by Y basis points
are discussed in more detail below. In store type 1 corresponded to the for each additional $1 million net
these relationships, the impact of these most upscale stores, and store type sales, and increases by Z basis points
variables is expressed “cetereis paribus,” 5 corresponded to the least upscale for each additional $1 million sales.
or when all other things are equal. Due stores. Transitioning from store type ■■ Sales Per Inventory. Sales per inventory
to the confidential nature of company 1 to 5, each change in store type is expressed as a ratio of sales during
records, actual figures will be replaced corresponds to a reduction in shrink a given period ($) to the value of the
with X, Y, or Z. The actual value of X, by X basis points. For example, inventory on display at the end of
Y, and Z varies for each factor. consider a store that is type 1 and the period ($). For example, if there
■■ Risk Tier. Based on various metrics at has 600 basis points of shrink. If were $800 in sales during the period,
the discretion of the asset protection all other explanatory variables (risk and $1,000 of inventory was on the
team, there are four risk tier tier, inventory, and so forth) are held shelves, the sales per inventory figure
categories: low risk, medium risk, high constant, but the store type changed would be 0.8. For each unit increase

in sales per inventory, the average Decision Tree:

shrink is expected to decline by X How do variables interact with one another to
basis points. classify a store into various levels of shrink?
■■ Customer Satisfaction. The produce
freshness score from customer Rules From DT Shrink
feedback was used as a proxy for
Sales/Inventory Turnover > $ per SQFT < Customer Ratings Store Inventory < Zone
customer satisfaction. For each 1
Ratio < Average Average Average + 15D < Average inventory by 30%
1 percent increase in the average
customer satisfaction score, the Store Type = 1, 2 High

average shrink decreases by X

basis points. Sales/Inventory Turnover > Store Inventory < Zone inventory by 65% High
Ratio < Average Average + 15D Store Type =

Decision Tree Model 3, 4, 5 Store Inventory

< Zone inventory $/I Ratio > Avg - 25D High
The second predictive model used within 65%
was a decision tree. We used decision Customer >
trees to understand the relationship GM% < Average Customer Ratings < Average + 15D Low
between various factors like shrink,
sales, and inventory, and produce Sales/Inventory
Customer Ratings > Average Low
freshness, customer ratings, and Ratio > Average
GM% > Store Inventory > Zone
produce per square foot. Store-level Average
$/FT > Average
inventory within 10%
Customer Ratings
shrink (%) was divided into three < Average
categories namely low, medium, $/FT < Average Store Type = 1, 2, 3 Low

and high.
Turnover < Average Low
A decision tree with shrink 4 Sales/Inventory GM% > Aver-
categories as the target variable Ratio > Average age Turnover >
$/FT > Average Low
was plotted to understand how

They have a great shopping experience.

You have greater peace of mind

with a solution that secures your
profits and property.
Bosch empowers you to build a safer and
more secure world with solutions that enhance
safety, reduce shrink, and help you improve
merchandising, operations and customer
service. Bosch integrated security and
communications solutions enrich the customer
experience and deliver valuable data to help
you increase your profitability.

Learn more at

RETAIL_Half-page_7x4.625_Global_URL.indd 1 8/14/18 12:42 PM 43


the variables interact with one

another to classify a store into Collaborating with Tomorrow’s Industry Leaders
the various categories of shrink. By Ed Tonkan, President, Zebra Retail Solutions
The decision rules were identified
leading to shrink categories, The annual Retail Industry Leaders Association (RILA) Asset Protection
which would help Kroger make Conference provides attendees with exceptional opportunities to connect with
data-driven decisions. peers on the most pressing issues facing the industry while exploring the
Sell-through was revealed as innovative technologies currently transforming asset protection. However, the
the most important factor when event also provides a unique opportunity to collaborate with tomorrow’s leaders
categorizing shrink followed through an innovative program.
by employee turnover rate and Founded under the guidance and direction of Lisa LaBruno, senior vice
customer ratings. For example, if a president of retail operations, the RILA Student Mentor Program was established
store’s sell-through is below average to integrate the skills and insights of a prominent retail chain, a retail solution
compared to other stores, employee provider, and academia as part of a semester-long project. Each year the program
turnover is much higher than focuses on a major area of interest for loss prevention professionals, with the
average, and the store is a type 1 students presenting their findings at RILA’s annual Asset Protection Conference.
or type 2 location (upscale stores), Participating students are selected from The University of Texas at Austin,
the store will likely experience McCombs School of Business, where they are pursuing their master’s degrees
high shrink.  in business analytics. The master’s program produces data scientists and was
recently ranked second of its kind globally. Michael Hasler, PhD, director of the
Recommendations program at McCombs, has supported the RILA Student Mentor Program for the
A dashboard was then created entirety of the collaboration and has developed it into a capstone project for
to identify problem areas at the his students.
individual store level, sorted in Over the past seven years, retail participants have included JCPenney,
high-to-low shrink order, with red 7-Eleven, The Home Depot, and most recently The Kroger Co. under the
indicating poor performance and leadership of Mike Lamb, LPC, vice president of asset protection. Aaron Medley,
yellow signifying good performance the senior manager of asset protection analytics, was also very involved with the
for factors like sales, inventory, students throughout the project as they applied their advanced analytical skills to
customer ratings, and so forth. over 25 million rows of data.
Along with the respective retailers, I’ve had the privilege of serving as
Maintain good sell-through. comentor for the program each year and have seen it become a valuable
opportunity for all involved. In addition to presenting at the conference, students
Introduce emplyoee benefits program, have the opportunity to become fully immersed in this top retail loss prevention
which could decrease attrition. event, engaging with both retailers and solution providers at the show. The
Stores which have lower general manager retailer also receives key insights from these data scientists that it can leverage
percentages need to concentrate to improve business operations. I’m personally inspired by the results of RILA’s
more on produce freshness.
college student program as the students work with industry mentors to complete
Be mindful of the store inventory levels these research projects.
with respect to peer stores.

Clearing at the right time is important.

The third decision tree shows

low-shrink stores that have
above-average performance across
the columns. This dashboard is
interactive, and one can click on
each store to find the problem
products and how they vary across
peer stores and peer commodities.

Due to the nature of the retail
industry, accurate data collection Pictured left to right are Ed Tonkon of Zebra Retail Solutions; Rajat Malhotra, Raksha Pai,
can be difficult and impractical Dani Diehl, and Clarissa Franklin of The University of Texas; and Aaron Medley of Kroger.


Decision Tree:
What are the problem areas at the individual store level?
Store Mask Store Type Shrink Sales Inventory Ratings Turnover GM% $/Produce SQFT
00145 1
00162 1
00086 5
00063 1
00132 2
00011 1
00091 5
00127 2
00136 3
00110 5
00126 2
00179 5
00004 3
00101 5
00117 5
00133 4
00147 1
00002 1
00166 5
00055 3

Inventory Category | Ratings Category | Turnover Category | Shrink Category | Sales Category
Above Average Average Below Average

when considering the bottom line. effective considering the benefit of of vendors delivering the product. There
For example, determining the precise having such detailed records. may be five different vendors providing
quantity of gala apples present in each Produce in particular can present a strawberries, tracked by different SKUs.
store on each date may not be cost daunting task considering the multitude For other items such as potatoes, it’s


Decision Tree:
Which low-shrink stores have above-average performance?
Store Mask Store Type Shrink Sales Inventory Ratings Turnover GM% $/Produce SQFT
????? 3
00033 3
00182 3
00070 3
00095 2
00093 3
00003 2
00104 3
00079 1
00008 5
00183 3
00165 4
00010 3
00094 4
00059 3
00102 3
00029 3
00078 2
00125 2
00124 3
00047 2
00056 3
00177 3
00184 3

Inventory Category | Ratings Category | Turnover Category | Shrink Category | Sales Category
Above Average Average Below Average

nearly impossible to determine the data up to higher levels, joining We presented our findings to a
original vendor once the product is on items or subcommodities together, or crowd filled with experts from the
display leading to overlap or confusion averaging over longer periods of time. retail industry. It was well received and
between different items and different This reduces the noise and extracts a proud moment for the team. The
vendors. Due to these and related issues, a stronger signal from the data but conference also had an exhibition hall
the physical inventory conducted every also conservatively limits our analysis with solution providers showcasing
cycle faces similar limitations. and recommendations. their products. Donning different
Another common issue with produce We elected to provide higher-level caps for each event, we loved learning
is the exchange between different analysis for the data at a level we are something new from each and every
categories. For example, a store may comfortable with, rather than risk person we met. Overall, it was a great
order organic grapes but receive regular overfitting our models to extremely learning experience and we enjoyed the
grapes. This can lead to inconsistencies noisy item- or cycle-level data. conference thoroughly.
in billing for inventory and sales. Further, These decisions are supported by
an item can be modified in some manner cross-validation and out-of-sample Final Thoughts
leading to a different label when the testing during our analysis. Our project benefitted immensely
item is finally sold. For example, whole from a close partnership with the Kroger
pineapple is cut and sold as precut fruit Experience at the team and their willingness to provide
to customers. Similar issues happen with RILA Conference support and in-depth information. Many
juice bars, prepared foods, and the deli. Our presentation was scheduled for of the most rewarding aspects of our
Kroger is aware of these issues and the afternoon of day one of the Retail work were only possible because Kroger
makes corrections to their financial Industry Leaders Association (RILA) was willing to provide us with weekly
records at the commodity level. conference in Orlando. As part of the sales data for each item at every store.
However, these corrections cannot conference, we had the opportunity to The success of every analytics project
fully capture all the unique scenarios attend a plethora of events, ranging from depends on the quality of the data, and
that arise in produce retail and cause talks given by distinguished speakers we were fortunate to have access to such
noise in the data. Additionally, it’s not from the retail industry, to networking detailed records.
uncommon to see negative inventory events where we could meet people and “We were delighted to support this
figures as a result. To compensate exchange information regarding the initiative, both for the benefit of the
for these issues, we have rolled the innovations in the retail industry. continued on page 48


continued from page 46 previously possible. In addition, these regarding the data
methods provide tools that allow and work at
students and the value we derived greater visibility into the business, Kroger. Ed
from their efforts and expertise,” said making retail processes more efficient. Tonkon, our
Mike Lamb, LPC, vice president of With the diversity of products sponsor from
asset protection with Kroger. “We and vendors, seasonal variations, Zebra
know that in order to stay ahead of and highly perishable goods, the Technologies—we
the ever-changing environment that grocery industry can be extremely Mike Lamb appreciate your
affects shrinkage and waste, the benefit complex. This project demonstrated time and thank
of analyzing our data in a meaningful that signals can be found even in you for your ideas, feedback, and
and thoughtful way allows us to take this noisy produce data and reflects comments. Our sponsor at RILA,
a proactive versus reactive approach how similar methods can be applied Ellen Jackson—thank you for
in mitigating our shrink. This not only to other retail sectors. Investing in listening in on every call and
serves to improve shareholder value analytics can benefit retailers in many coordinating our trip to the
from a profitability point of view but different ways, whether in terms of conference in Orlando. Finally, Dr.
also enhances our in-stock position reducing losses, being better prepared Mike Hasler and Dr. Ramesh
and product freshness focus.” for future issues, understanding the Rajagopalan at The University of
Analytical, data-driven retail customer, and running more Texas for guiding us through this
investigations are important to the efficiently as a whole. project, providing us with critical
entire retail industry. Shrink exists in Finally, we would like to thank feedback, and helping us provide the
all sectors of retail, and this type of each and every person who helped in most value through our work.
project benefits the entire industry. this project, including our sponsors at
Through accurate root-cause analysis, Kroger, Aaron Medley, Jason CLARISSA FRANKLIN, RAJAT MALHOTRA,
we can help individual stakeholders McClure, and Mike Lamb—thank DANI DIEHL, and RAKSHA PAI are students
pursuing their master’s degrees in business
identify different pain points and you for supporting our team and analytics in the McCombs School of
achieve a level of detail that was not answering our endless questions Business at The University of Texas at Austin.

Don’t Miss LPM Online

An All-Digital Magazine with All-New Content
LPM Online is an all-new
magazine experience.
LPM Online publishes
every other month on
even-numbered months
in between our print
editions. The inaugural
edition went live in August.
You can view it and our
current edition on the LPM
Online tab on our website,,
or by entering in
your browser.