Professional Documents
Culture Documents
Regression Analysis
OVERVIEW: The premise is that changes in the value of a main variable (for example, the sales of
Product A) are closely associated with changes in some other variable(s) (for example, the cost of
Product B). So, if future values of these other variables (cost of Product B) can be estimated, it can be
used to forecast the main variable (sales of Product A).
BASIC IDEA: Regression analysis is a statistical technique for quantifying the relationship between
variables. In simple regression analysis, there is one dependent variable (e.g. sales) to be forecast and
one independent variable. The values of the independent variable are typically those assumed to
"cause" or determine the values of the dependent variable. Thus, if we assume that the amount of
advertising dollars spent on a product determines the amount of its sales, we could use regression
analysis to quantify the precise nature of the relationship between advertising and sales. For forecasting
purposes, knowing the quantified relationship between the variables allows us to provide forecasting
estimates.
PROCEDURE: The simplest regression analysis models the relationship between two variables uisng the
following equation: Y = a + bX, where Y is the dependent variable and X is the independent variable.
Notice that this simple equation denotes a "linear" relationship between X and Y. So this form would be
appropriate if, when you plotted a graph of Y and X, you tended to see the points roughly form along a
straight line (as compared to having a curvilinear relationship).
When you have several past concurrent observations of Y and X, regression analysis provides a means to
calculate the values of a and b, which are assumed to be constant. Since you will then know a and b, if
you can provide an estimate of X in some future period, you can calculate a future value of Y from the
above equation.
EXAMPLE: We can illustrate regression analysis using data from 1976-1989 for both the annual value of
sales of semiconductors (in $M) and a likely leading indicator of these sales, namely Producers' durable
equipment investment (in $B). First, a graph of the relationship between these two variables suggests
that they might be related in a linear fashion:
We can run a regression of sales on durable equipment investment, which has an equation that looks
like this:
Sales = a + b (Investment)
Using this data, we get the following results:
Sales = -5513.7 + 52.3 (Investment)
t=4.4 t=10.7
The last line are the "t-statistics" for the estimate of a (which is -5513.7) and b (which is 52.3). These t-
statistics allow you to test whether the estimates are significantly different from 0. If the estimate is not
different from 0, then it should be considered 0 when using the regression equation for a forecast (see
below). This is a distinct advantage of regression analysis because it allows you to assess whether the
linear relationship might have occurred by chance (most of the other techniques don't do this!).
As a quick rule of thumb, if the number of observations is between 5 and 20, a t greater than 3 assures
that your estimate is significantly greater than 0 with a confidence of 95%. If the number of observations
is greater than 20, a t greater than 2 is good enough. In the above example, our estimates of a and b are
both significantly greater than 0.
To make a forecast, lets assume we that the leading indicator estimate for Producers' durable
equipment investment for next year is 370. Plugging this into the above equation, our forecast would
be:
Sales = -5513.7 + 52.3 (370) = 13,837
Note that if our estimate of "a" was not statistically significant (i.e. t<3), then we would calculate next
period's forecast of Sales as:
Sales = 0 + 52.3 (370) = 19,351
COMMENTS:
• More complicated relationships between variables can be readily modeled. For example, several
independent variables can be incorporated into the analysis or curvilinear relationships can be handled.
• When the independent variable is "time", you get an estimate of the trend line for a time series. The
resulting equation is: Sales = -66841.3 + 898.8 (time). This suggests that the underlying "trend" of this
data is approximately 898.8$B per year.
• Many analysts plot data using the logarithm of the dependent variable - for example, using the log
(sales) versus time. If this approach results in a graph with points that roughly form along a straight line,
a reasonable model for approximating the data is: log(sales) = a + b(time). This presents no problem
since you then run a regression of log (sales) on time and interpret the estimate for b as the trend
growth rate associated with the logarithm of sales.
• Forecasting accuracy heavily depends on the accuracy of the estimates for the independent variable.
• A consistent relationship between the variables is assumed when making forecasts. This, of course,
may not be the case in many situations.
• Provides statistical tests and confidence intervals for the actual forecasts which most quantitative
techniques lack.
Seasonality
In a perfect world, every product in a warehouse would sell at exactly the same rate throughout the
year. In reality, certain products naturally experience peaks and dips in sales due to factors like holidays
and weather changes. This is a phenomenon known as seasonal variation.
Seasonality in forecasting requires business owners and supply chain managers to identify which goods
have seasonal patterns and which do not. And for the goods that do fluctuate in popularity, the
challenge is in determining when they will see the highest and lowest demand.
Types of Seasonality’s
However, there are also seasonality’s that aren't influenced by time per se. These include:
Man-made seasonality - This is demand caused by external factors. For example, the annual South by
Southwest (SXSW) festival in Austin resulted in 12,000 individual hotel reservations booked during the
event in 2019. Meanwhile, June has historically been associated with a spike in weddings, leading to a
predictable increase in business for caterers, wedding venues, and wedding planners.
Natural seasonality - This seasonality is caused by natural factors, such as the weather or changing
seasons. For example, vacation resorts see a surge in guests during the spring and summer months in
the northern US, and autumn and winter in the south. Meanwhile, the winter months are generally a
quieter period for the real estate sector.
Determine which products are seasonal - To detect seasonality, find a recurring pattern in the demand
of your products over time. Observe existing historical data within a specific time frame and ask yourself
this question, Can you see a similar demand pattern year-over-year, month-over-month, or week-over-
week? Next, find the correlation between each year, month, or week. If you see a pattern, you have a
reliable seasonal demand.
Know why and when spikes in demand happen - Next, understand the correlation between spikes in
demand relative to specific periods of time. Is it due to the weather? A holiday, perhaps? Is it caused by
man-made or natural factors? Understanding how these variables influence your customers' buying
habits will go a long way towards maximizing peaks in demand and accurately estimating seasonality.
Measure the size of these spikes relative to baseline demand - Once you know why and when
popularity for certain goods fluctuate, you then need to measure seasonal peaks in demand and
compare it with average (i.e., baseline) demand levels. Knowing this will help prepare both the inventory
and labor needs accordingly.
Determine the reliability of your forecasts - The dynamic nature of the marketplace means that
forecasting seasonal demand cannot always be 100% accurate. However, there are ways to improve the
accuracy of your forecasts, such as by identifying demand outliers and understanding their effect on
your calculations. This will help in determining the underlying causes and levels of error in previous
forecasts and help in creating seasonally adjusted data. For products or companies who have been
operating for several years, also consider going back as many years as you can to have as much historical
data to work with as possible. For newer products or product lines, it may be necessary to cross-
reference them with the syndicated market data of similar products.
Using S Curve to Forecast Sales Of A New Product
S curve is one of the most important concepts when it comes to the Product Life Cycle (PLC) or the
Product Evolutionary Cycle (PEC). It is a widely used concept in marketing. It is called the S curve
because it looks like the letter S.
S curve is applicable to any business or startup where things move very slowly at first, then it gains
momentum and continues to grow and finally a stage where productivity or sales declines and the
market becomes saturated. S Curve equation enables one to know how large the sales will become and
whether the sales have touched the inflection point.