Professional Documents
Culture Documents
1 INTRODUCTION....................................................................................................................................................... 1-1
List of Figures
Figure 2-1. Dayton Power & Light All Data View ................................................................................................................................................. 2-11
Figure 2-3. Dayton Power & Light Month View: February 2017 ......................................................................................................................... 2-13
Figure 2-4. Dayton Power & Light Month View: July 2017 ................................................................................................................................. 2-14
Figure 2-5. Commonwealth Edision Month View: February 2017 ....................................................................................................................... 2-14
Figure 2-6. Commonwealth Edison Month View: July 2017 ................................................................................................................................ 2-15
Figure 2-7. Dayton Power & Light Week View: February 2017 .......................................................................................................................... 2-16
Figure 2-9. Commonwealth Edision Week View: February 2017 ........................................................................................................................ 2-17
Figure 2-10. Commonwealth Edison Week View: July 2017 ............................................................................................................................... 2-18
Figure 2-11. Dayton Power & Light TWO-DAY View: storm ................................................................................................................................ 2-19
Figure 2-13. Dayton Power & Light Average Daily Load versus Average Daily Temperature ............................................................................ 2-22
Figure 2-14. Commonwealth Edison Average Daily Load versus Average Daily Temperature .......................................................................... 2-22
Figure 2-15. Dayton Power & Light Average Load @ 2AM versus Average Daily Temperature ...................................................................... 2-24
Figure 2-16. Dayton Power & Light Average Load @ 2PM versus Average Daily Temperature ....................................................................... 2-25
Figure 2-17. Commonwealth Edison Load @ 2AM versus Average Daily Temperature .................................................................................... 2-25
Figure 2-18. Commonwealth Edison Load @ 2pM versus Average Daily Temperature .................................................................................... 2-26
Figure 4-3. Four Region Split of the Weather Response of Loads to Temperatures .............................................................................................. 4-9
Figure 4-4. Dayton Power & Light Week View: February 2017 .......................................................................................................................... 4-16
Figure 4-5. Dayton Power & Light Week View: July 2017 .................................................................................................................................. 4-16
Figure 4-6. Commonwealth Edison Week View: February 2017 ......................................................................................................................... 4-17
Figure 4-7. Commonwealth Edison Week View: July 2017 ................................................................................................................................. 4-17
Figure 4-10. Commonwealth Edison: Simple Exponenential Smoothing Model Fit All Data ................................................................................ 4-28
Figure 4-11. Commonwealth Edison: Simple Exponential Smoothing Model Out-of-Sample Fit ......................................................................... 4-29
Figure 4-12. Commonwealth Edison: Simple Exponential Smoothing Model One Month Ahead ......................................................................... 4-29
Figure 4-13. Commonwealth Edison: Simple Exponential Smoothing Model One Year Ahead ........................................................................... 4-30
Figure 4-15. Commonwealth Edison: Double Exponential Smoothing Model One Year Ahead ........................................................................... 4-31
Figure 4-16. Commonwealth Edison: Triple Exponential Smoothing Model One Month Ahead .......................................................................... 4-31
Figure 4-17. Commonwealth Edison: Triple Expoential Smoothing Model One Year Ahead ............................................................................... 4-32
Figure 4-18. Autocorrelation Function (ACF) for two Highly correlated time series............................................................................................ 4-36
Figure 4-19. Autocorrelation Function (ACF) for a non-correlated time series.................................................................................................... 4-37
Figure 4-20. ACF and PACF for Commonwealth Edison Average Daily Load ....................................................................................................... 4-40
Figure 4-21. ACF and PACF After One-time seasonal Differencing...................................................................................................................... 4-40
Figure 4-22. Commonwealth Edison: Seasonally Differenced, AR(1) Model Fit All Data ..................................................................................... 4-41
Figure 4-23. Commonwealth Edison: Seasonally Differenced, AR(1) Model One Month Ahead .......................................................................... 4-41
Figure 4-24. Commonwealth Edison: Seasonally Differenced, AR(1) Model One Year Ahead ............................................................................ 4-42
Figure 4-26. Estimated NonLinear Weather Response Function for Commonwealth Edison ............................................................................... 4-64
Figure 4-27. Estimated NonLinear Weather Response Function for Dayton Power & Light ................................................................................ 4-64
Figure 4-28. Basic Elements of a Feed Forward Neural Network used for Classification.................................................................................... 4-65
Figure 4-31. Replacing the Step Activation Function with a Sigmoid Activation Function ................................................................................... 4-68
Figure 4-32. Feed Forward Neural Network with Sigmoid Activation Function ................................................................................................... 4-68
Figure 4-33. Collapsing the Sum Function with the Sigmoid Activation Function ................................................................................................ 4-69
Figure 4-34. The Input, Hidden and Output Layers of a Neural Network Model ................................................................................................. 4-70
Figure 4-35. Multiple Nodes in the Hidden Layer and the Regression Model Equivalent .................................................................................... 4-71
Figure 4-36. Estimated Nonlinear Weather Response Function for Commonwealth Edison ................................................................................ 4-72
Figure 4-37. Estimated Nonlinear Weather Response Function for Dayton Power & Light ................................................................................. 4-73
Figure 5-3. Dayton Power & Light: Day type WeekDay versus Weekend Binary Variables ................................................................................. 5-6
Figure 5-4. Commonwealth Edison: day type weekday versus weekend binary Variables................................................................................. 5-7
Figure 5-5. Dayton Power & Light: Monthly Binary Variables .............................................................................................................................. 5-9
Figure 5-7. Dayton Power & Light: Seasonal Binary Variables .......................................................................................................................... 5-11
Figure 5-9. Dayton Power & Light: Month/Day-of-the-Week Interaction Binary Variables ................................................................................ 5-17
Figure 5-11. Dayton Power & Light: Season/Day-of-the-Week Interaction Binary Variables ............................................................................ 5-20
Figure 5-12. Commonwealth Edison: Season/Day-of-the-Week Interaction Binary Variables ........................................................................... 5-21
Figure 5-13. Estimated Bin Weather Response Function for Dayton Power & Light ............................................................................................ 5-27
Figure 5-14. Estimated Bin Weather Response with Weekend Offset for Dayton Power & Light........................................................................ 5-28
Figure 5-15. Estimated Bin Weather Response Function for Commonwealth Edison........................................................................................... 5-29
Figure 5-16. Estimated Bin Weather Response with Weekend Offset for Commonwealth Edison ...................................................................... 5-30
Figure 5-17. Estimated Uncapped Spline Weather Response with Weekend Offset for Dayton Power & Light .................................................. 5-34
Figure 5-18. Estimated Capped Spline Weather Response with Weekend Offset for Dayton Power & Light ...................................................... 5-35
Figure 5-19. Estimated Uncapped Spline Weather Response with Weekend Offset for Commonwealth Edison ................................................. 5-36
Figure 5-20. Estimated Capped Spline Weather Response with Weekend Offset for Commonwealth Edison ..................................................... 5-37
Figure 5-21. Estimated Polynomial Weather Response with Weekend Offset for Dayton Power & Light........................................................... 5-39
Figure 5-22. Estimated Polynomial Weather Response with Weekend Offset for Commonwealth Edison ......................................................... 5-40
Figure 5-23. Estimated Neural Net Weather Response with Weekend Offset for Dayton Power & Light ........................................................... 5-42
Figure 5-24. Estimated Neural Net Weather Resposne with Weekend Offset for Commonwealth Edison .......................................................... 5-43
List of Tables
Table 2-1. Dayton Power & Light Data Summary Statistics ................................................................................................................................... 2-5
Table 2-2. Dayton Power & Light Monthly Data Summary Statistics ..................................................................................................................... 2-6
Table 2-3. Dayton Power & Light Day-of-the-Week Data Summary Statistics ....................................................................................................... 2-7
Table 2-6. CommonWealth Edison Day-of-the-Week Data Summary Statistics ................................................................................................... 2-10
1 An energy management system (EMS) is a system of computer-aided tools used by operators of electric utility
grids to monitor, control, and optimize the performance of generation and transmission system.
2 A Rolodex is a rotating file device used to store business information. The Rolodex was invented in 1956 by
Danish engineer Hildaur Neilsen, the chief engineer of Arnold Neustadter’s company, Zephyr American, a
stationery manufacturer in New York.
When statistical and mathematical forecasting frameworks were introduced to the load forecasting
problem, the goal was twofold: (a) improve forecast accuracy, and (b) speed up the load forecast process.
As the operation of electric grids grew in complexity, there was need to update the load forecast more
frequently than once a day. The one- to two-hour manual load forecast process was not fast enough when
storm fronts flowed through a service territory. The evolution of statistical and mathematical load
forecasting went hand-in-hand with the proliferation of desktop computing. Desktop computing was the
key to speeding up the forecast process. Improved accuracy was yet to be established because the load
forecasters with 20 plus years of experience were very good at what they did. Only when the next
generation of forecasters came on board with little to no experience did the statistical and mathematical
frameworks prove their worth.
1.1 PURPOSE
Why this guide? The motivation behind this guide is a request by numerous clients over the years for the
“recipe book” to building powerful short-term load forecast models. This guide is a partial “recipe book,”
providing the full list of possible ingredients with guidance as to when to use which combination of
ingredients. It is impossible to dial up the specific recipe that would work for your loads without first
going through the data analysis and trial-and-error process we go through whenever we start developing
a short-term load forecast model. The Itron forecasting team – Dr. Stuart J. McMenamin, Eric Fox, Rich
Simons, Mark Quan, Andy Sukenik, Christine Fordham, David Simons, Jennifer Blanco, John Pritchard, Jeff
Fordham, Casey Allred, David Fabiszak, Oleg Moskatov, Michael Russo, Gregory Kim, Leigh O’Connor,
James Lischio, Paige Schaefer, Shannon Ashburn and myself – have more than 20 years of collective
experience developing a wide range of load forecasting models. This guide is an attempt to cast a net
over the short-term load forecast modeling experience.
The focus of the guide is the within-day and day-head load forecasts that system operators and energy
traders rely on for scheduling, dispatching, procuring and selling generation to meet demand. The
information presented here is based on 20 plus years of working in the trenches with system operators
1.2 ACKNOWLEDGEMENTS
This brings me to the true heroes of operational forecasting, the forecasters that work the control rooms
and trading desks at places like the New York ISO, ISO New England, the Midcontinent ISO, Electric
Reliability Council of Texas (ERCOT), Independent Electricity System Operator (IESO), the California ISO,
the Australian Energy Market Operator, Western Power, British Gas, CEZ Prodej, EDF Luminus, Engie and
Uniper Benelux. Each of these organizations has a dedicated staff that are tasked with the responsibility
to say this is the load forecast we should operate against. This decision has very little to do with model
techniques and model specifications, but everything to do with answering an impossible list of questions:
Will the weather forecast be right? Will the industrial load shed the 500 MW they
contracted for during a load shedding event and If they do reduce their load, will that 500
MW come back at the end of the load event, or will they send the employees home for the
day? Will the storm front hit at 1 p.m., 2 p.m., 3 p.m., or will it bypass our load centers
altogether? Will the cloud cover dissipate in time for the roof top solar PV generation to
kick in enough to keep from hitting a system peak? What will happen to the load when
the solar eclipse, World Cup, Christmas, …., name the event occurs? And a thousand other
questions, the answers to which impact what they will publish as the official forecast.
Over the years, I have been fortunate to work with these staffs. The common thread is their desire to do
better than the day before. These are the unsung heroes of operational forecasting: Arthur Maniaci of
the New York ISO, Andrew Trachsell of the IESO, Calvin Opheim of ERCOT, Yok Potts and Huaitao Zhang
of the Midcontinent ISO, Gary Klein and Rebecca Webb of the California ISO, Jack Fox of the Australian
Energy Market Operator, Rick Morris of Western Power, Jason Blackmore of British Gas, David Metten
and Rigo D’Exelle of EDF Luminus, Gijs Berg of Engie, Alexandr Cerny, Marcel Prošek, and Filip Tichý of CEZ
Prodej, Barry van de Merbel and Marco Sinke of Uniper Benelux NV. This guide is dedicated to this
talented and unsung group of load forecasters.
1.3 OUTLINE
We begin not with techniques, but the hard work of data review and analysis. Most professional literature
about load forecasting focuses on statistical techniques and pays very little attention to the data. In
practice, we have found the path to a powerful forecast model is through a very thorough analysis of the
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis|2-1
transmission exchanges with other control areas, less (3) transmission and distribution losses. In energy
retail settings, load is based on on-premise metering. This metering is further segmented between
customers with interval-based metering versus non-interval metered customers. The latter require
bespoke load profiles or load profile models to spread their monthly, bi-monthly, or possibly annual
metered consumption values to 15-minute, 30-minute, or hourly load values. Load data based on SCADA
metering can be updated as frequently as every five minutes. On-premise interval-based metering has in
the past been collected daily. However, as advanced meter data collection infrastructures expand, on-
premise interval-metered data will eventually be collected virtually in real time. Further, with mass
deployment of smart meters, the demand of all customers will ultimately be measured using interval-
based metering. As a result, sometime soon even retail forecasting applications will leverage real-time
metering. When that happens, the question of whether it is better to model each individual load or
aggregations of individual loads will need to be analyzed. Based on experience, which approach provides
the most accurate forecast is not always obvious.
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-2
understanding what type of load is being modeling. These statistics and what can be gleaned from them
are as follows.
◼ Average Load. The average load is one indicator of the number of and potential mix of customers
(e.g., residential, commercial, industrial, transportation, and agriculture) underlying the load data.
Large average load values are associated with aggregations of many individual customers or a few
very large nonresidential customers. In contrast, small average load values are associated with
individual customer loads or a relatively small aggregation of small customer loads.
◼ Maximum Load. The maximum load is when a system peaks. Like the average load, this value
also provides insight to the potential mix of customers being modeled. When the maximum load
occurs can also provide useful information such as whether the peak is driven by weather or
underlying industrial processes. Weather-driven peaks vary across a week and month. Industrial
processes tend to be relatively stable, which results in a relatively repeatable peak load across
working days and potentially across months.
◼ Minimum Load. Because minimum loads usually happen right before the dawn, they tend to be
the least impacted by weather. As a result, long-run trends in minimum loads are strong
indicators of the direction of growth – up or down – of non-weather sensitive or baseline loads.
Also, graphing the daily sequence of minimum loads is a good way of capturing the load change
associated with a remapping of the SCADA measurements points that are used to define a load,
whether intentional or not. For example, if one of the generation SCADA points goes missing,
then the resulting calculated load will have an unexcepted shift downward of the minimum load.
◼ Load Factor. Load factor is defined as the ratio of the average load to maximum load. Values that
are close to 1.0 indicate a relatively flat load shape. This would be the signature of a large
industrial load that runs flat out on a 24x7 basis. Smaller load factors are usually associated with
weather-sensitive loads.
◼ Summer Energy Fraction. The summer energy faction is defined as the fraction of annual energy
that occurs in the summer months. Summer months for the northern hemisphere are defined as
June, July, and August. The summer months for the southern hemisphere are defined as
December, January, and February. This statistic is used primarily to indicate air conditioning
driven weather sensitivity. Higher fractions tend to be associated with high saturations of air
conditioning equipment.
◼ Winter Energy Fraction. The winter energy faction is defined as the fraction of annual energy
that occurs in the winter months. Winter months for the southern hemisphere are defined as
June, July, and August. The winter months for the northern hemisphere are defined as December,
January, and February. This statistic is used to indicate space heating driven weather sensitivity.
Higher fractions tend to be associated with high penetrations of electric space heating.
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-3
◼ Standard Deviation. This provides a measure of load volatility. Not all load volatility is bad. Large
standard deviations driven by day-of-the-week and seasonal variation are expected. The hard
part of load volatility is the unexplained or random load deviations that can be associated with
non-systematic load behavior (e.g., a steel crucible switching on and off at random times) and
incorrect or missing measurements.
◼ Coefficient of Variation. The coefficient of variation is defined as the ratio of the standard
deviation to the average load and provides an indication of how much the load swings around the
average. The smaller the coefficient of variation, the more stable the load curve.
◼ Day-of-the-Week Average Values. These values provide insight as to the mix of customer loads
that are being modeled. Residential loads tend to have higher average values on weekends than
weekdays. Commercial loads tend to be about the same Monday through Friday with lower loads
on Saturday and Sunday. Industrial loads can run flat out every day or have a weekday versus
weekend day swing like commercial loads.
◼ Monthly Average Values. Variation in the monthly average values can indicate the level of
weather sensitivity embedded in the load. It can also catch big seasonal operational changes like
agriculture irrigation pump loads.
The summary statistics for these two load zones are presented in the following series of tables.
◼ The first thing to notice is that is average Commonwealth Edison load is almost six times larger
than the average Dayton Power & Light load. In terms of peak load, the Commonwealth Edison
peak is little over six times larger than the Dayton Power & Light peak. Based on the average load
factors, the Dayton Power & Light load is relatively flatter than the Commonwealth Edison load.
Further, the Commonwealth Edison load has more apparent air conditioning load as measured by
larger summer energy fraction of 27.8% versus 26.7% for Dayton Power & Light.
◼ A comparison of the average monthly load to the overall average load suggests that Dayton Power
& Light is relatively more weather sensitive in the winter months than the Commonwealth Edison
load. In contrast, Commonwealth Edison is relatively more weather sensitive in the summer
months than the Dayton Power & Light load.
◼ Relative to Commonwealth Edison, the Dayton Power & Light load has a bigger drop in average
load consumption on weekends versus weekdays. This suggests Commonwealth Edison has a
larger portion of commercial and industrial loads operating at least six days a week.
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-4
TABLE 2-1. DAYTON POWER & LIGHT DATA SUMMARY STATISTICS
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-5
TABLE 2-2. DAYTON POWER & LIGHT MONTHLY DATA SUMMARY STATISTICS
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-6
TABLE 2-3. DAYTON POWER & LIGHT DAY-OF-THE-WEEK DATA SUMMARY STATISTICS
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-7
TABLE 2-4. COMMONWEALTH EDISON DATA SUMMARY STATISTICS
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-8
TABLE 2-5. COMMONWEALTH EDISON MONTHLY DATA SUMMARY STATISTICS
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-9
TABLE 2-6. COMMONWEALTH EDISON DAY-OF-THE-WEEK DATA SUMMARY STATISTICS
◼ All Data View. All data view puts all the load data in one graph. From this type graph you can
pick off obvious things like:
─ Long-run growth trends by looking at how the minimum values either rise or lower over time,
─ Redefinition of the load that would manifest itself as a jump up or down in the overall load,
─ Seasonality of the load and apparent air conditioning and electric space heating loads, and
─ Stability of the load.
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-10
The all-day views for Dayton Power & Light and Commonwealth Edison are shown below. The hourly load
data are in red. The blue line is the 60-day centered moving average. Here are some observations about
these charts.
◼ Load growth (up or down) tends to manifest itself as a trend in the minimum load or bottom of
the graph. Both Dayton Power & Light and Commonwealth Edison appear to have had relatively
flat load growth over the years 2013 through 2017.
◼ Stable loads tend to have relatively clean seasonal swings. Consider the 60-day centered moving
average that was fitted to the load data. The magnitude of actual loads swing around the centered
moving average provides insight as to the potential instability or stability of the loads. In this case,
the Dayton Power & Light load appears to swing more than the Commonwealth Edison load. This
may result in models of Commonwealth Edison loads having better in-sample fit statistics than
models of Dayton Power & Light loads.
◼ Both loads show significant weather sensitivity in both the summer and winter months. It will be
important the load forecast models capture this weather sensitivity.
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-11
FIGURE 2-2. COMMONWEALTH EDISON ALL DATA VIEW
Presented below are the month views for February and July 2017. Observations about these charts
include the following.
◼ The monthly pattern for both load zones are consistent with the double peak in the winter months
and the single peak during the summer months. The winter double peak is consistent with the
story that, in the morning, residential lights and space heating drive the system peak. As the
population leaves home for the day, the lights and space heating loads drop off, which is reflected
in the lowering of the system load. Then, in the evening, residential lighting, cooking, and space
heating again drive the system shape. During summer months, air conditioning loads dominate
the load shape from the morning to the evening. With air conditioning filling in the valley the
result is a single peak shape.
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-12
◼ The graphs also show a clear day-of-the-week load pattern that reflects a different mix of
equipment operating on weekdays than weekend days.
◼ While both loads show a ramp down in loads between Friday and Saturday, the ramp is much
steeper with Dayton Power & Light. It will be important that the load forecast models capture
this ramping.
FIGURE 2-3. DAYTON POWER & LIGHT MONTH VIEW: FEBRUARY 2017
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-13
FIGURE 2-4. DAYTON POWER & LIGHT MONTH VIEW: JULY 2017
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-14
FIGURE 2-6. COMMONWEALTH EDISON MONTH VIEW: JULY 2017
Presented below are week views drawn from February and July 2017. Observations about these charts
include the following.
◼ The day-of-the-week variation in loads is more apparent during the winter months than in the
summer months. This reflects the fact that air conditioning loads run almost continuously
throughout the week while space heating tends to operate only when customers are home.
◼ For the selected weeks, the Dayton Power & Light loads appear to swing more with changing
weather patterns than the Commonwealth Edison loads. This might be attributed to Dayton
Power & Light experiencing significantly more weather variation than CE. But, also the fact that
Commonwealth Edison has a higher baseload the weather-driven swings have less of an apparent
impact.
◼ The July Friday afternoon ramp down in loads for Dayton Power & Light could reflect the impact
of an afternoon storm or reflect the usual shutdown of business starting in the afternoon.
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-15
Additional weeks would need to be reviewed to determine if the ramp down is the norm or a one-
off event that was triggered by a storm.
◼ Although it is not labeled as such, the week shown begins on Sunday, July 2. That places
Independence Day on the following Tuesday. It is clear from the Commonwealth Edison graph
that many businesses shut down on Monday, July 3 and Tuesday, July 4. The actual holiday
schedule adopted by businesses in Dayton Power & Light’s territory is not as obvious.
FIGURE 2-7. DAYTON POWER & LIGHT WEEK VIEW: FEBRUARY 2017
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-16
FIGURE 2-8. DAYTON POWER & LIGHT WEEK VIEW: JULY 2017
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-17
FIGURE 2-10. COMMONWEALTH EDISON WEEK VIEW: JULY 2017
Presented below are two-day week views for two summer days, where one of the days had a storm roll
through the load zone. Observations about these charts include the following.
◼ For Dayton Power & Light, the storm hit right after noon on the first day shown. The next impact
of the storm is to cut cooling loads as temperatures dropped. After the load drop, air conditioning
did pick back up, but not enough to lift the afternoon peak above the morning peak.
◼ For CE, the storm also hit right after noon on the first day. Unlike Dayton Power & Light, the load
drop was temporary as the air conditioning kicked up later in the afternoon to drive the daily
peak.
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-18
FIGURE 2-11. DAYTON POWER & LIGHT TWO-DAY VIEW: STORM
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-19
Hourly graphs provide insight into the type of explanatory variables that will be needed to capture growth
trends, seasonal trends, day-of-the-week load variation, holiday impacts, and other non-weather sensitive
variations in load. To visualize the weather sensitive portion of loads, we rely on scatter plots between
loads and temperatures. The type of scatter plots we find useful are illustrated below. Between the
hourly graphs and the scatter plots, a forecast analyst should have a good feel for the type of load that is
being modeled.
Daily Average Load versus Average Temperature. This type of scatter plot provides a good understanding
of the weather-sensitivity of the loads, and whether the sensitivity holds in both hot weather (i.e., air
conditioning loads) and cold weather (i.e., electric space heating loads). Indications of space cooling will
manifest itself as an increased load as temperature rise above some base level like 65°F or 18°C. The
presence of electric space heating would show up as a rise in loads as temperatures fall below the base
level. A load with both space cooling and space heating would have an almost U shape scatter plot. A
scatter plot that is flat or nearly flat when temperatures fall below the base level suggest that no
significant electric space heating is embedded in the hourly loads. In a similar fashion, flat or nearly flat
loads when temperatures rise above the base temperature indicates little to no air conditioning occurs.
Other key information that can be gleaned from the scatter plot is whether there is a pronounced
difference between weekday and weekend loads. If there is a difference, you would expect to see
essentially two scatters with a gap between the two. The higher scatter plot typically represents the
weekday loads and the lower scatter the weekend loads; however, it is plausible in a load that is
dominated by residential customers that the higher points represent weekend loads when residential
customers are home all day. The difference between the two scatters provides a rough estimate of the
difference in non-weather sensitive loads between weekdays and weekends. Further, it is common to see
differences between the slopes on the weather-sensitive parts of the scatter, which means the space
cooling (space heating) response to temperature change is different between weekdays and weekend
days. One possible reason is that commercial buildings are closed on weekends and so their space
conditioning equipment may sit idle even when it is hot. This would lead to lower weather-sensitivity on
weekend days. In contrast, residential customers may be more likely to run their space conditioning
equipment during the day on weekends when they are home. At the same time, they turn off their space
conditioning equipment while they are away at work. This could lead to higher weather sensitivity on
weekend days. The main thing to figure out is whether the slopes look different. This will impact how the
weather sensitive portion of the load is modeled.
The scatter plots for Dayton Power & Light and Commonwealth Edison are presented below. In these
plots, the average daily load is plotted against the average daily temperature. The blue dots represent
weekdays and the green dots present weekends and holidays. For Dayton Power & Light, the average
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-20
temperature is in degrees Celsius. For CE, the average temperature is in degrees Fahrenheit. The
following observations are drawn from these scatter plots.
◼ Both load zones shown have a distinct weekday versus weekend day pattern. The Dayton Power
& Light load has a more distinct break between the weekday and weekend day response, which
indicates most commercial and industrial loads are off on weekends. There is more overlap
between weekday and weekends in Commonwealth Edison’s load, suggesting most commercial
and industrial loads are off on weekends.
◼ The weekday cooling slope for Commonwealth Edison is approximately equal to 268 MWh per
degree. In other words, for every degree that is above 70°, average daily loads rise by 268 MWh.
To compute this slope, we select two weekday points, one at 70° (e.g., 10,482 MWh) and one at
90° (e.g., 15,853 MWH) and then calculate the slope as (15,853 MWh – 10,482 MW) over (90° –
70°) = 268 MWh/degree. The weekend slope is slightly lower at approximately 257 MWH per
degree. The lower slope is consistent with the story that less commercial and industrial space
cooling equipment is active on weekends than weekdays. On the space heating side of the scatter
plot (i.e., when temperatures fall below 62°), the weekday space heating slope is approximately
72 MWh/degree. The weekend space heating slope is approximately 77 MWh/degree. The fact
that load response to cold temperatures is lower than the response to hot temperatures reflects
the mix of space conditioning equipment that is in place. The main fuel used to cool off
conditioned spaces is electricity via air conditioners. In contrast, the main fuels used to heat up
conditioned spaces are gas and oil. Electricity use associated with space heating tends to be in
delivery systems like force air fans. The bulk of space heating is completed with fossil fuels and
not electricity.
◼ The weekday cooling slope for Dayton Power & Light is approximately equal to 88 MWh per
degree Celsius (49 MWh/degree Fahrenheit). To compute this slope, we select two weekday
points, one at 26° (e.g., 2,523 MWh) and one at 20° (e.g., 1,993 MWH) and then calculate the
slope as (2,523 MWh – 1,993 MW) over (26° – 20°) = 88 MWh/degree. The weekend slope is
slightly lower at approximately 79 MWH per degree Celsius (44 MWh/degree Fahrenheit). The
lower slope is consistent with the story that there is less commercial and industrial space cooling
equipment active on weekends than weekdays. On the space heating side of the scatter plot (i.e.,
when temperatures fall below 10° Celsius), the weekday space heating slope is approximately 35
MWh/degree. The weekend space heating slope is approximately 33 MWh/degree. Like CE, the
bulk of space heating is done with fossil fuels, which means the space heating slopes are not as
steep as the space cooling slopes.
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-21
FIGURE 2-13. DAYTON POWER & LIGHT AVERAGE DAILY LOAD VERSUS AVERAGE DAILY TEMPERATURE
FIGURE 2-14. COMMONWEALTH EDISON AVERAGE DAILY LOAD VERSUS AVERAGE DAILY TEMPERATURE
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-22
Hour Specific Scatter Plots. A scatter of daily energy versus average daily temperatures provide insight
as to whether a load is weather-sensitive or not. Hour specific scatter plots are used to explore how the
weather response varies across hours of the day. For example, morning loads may exhibit a bigger space
heating response than the afternoon. Along those same lines, afternoon loads may exhibit a bigger space
cooling response than the morning. Hour specific scatter plots that plot the load for an hour against the
temperature for that hour can help reveal these response differences.
The 2AM and 2PM scatter plots for Dayton Power & Light and Commonwealth Edison are presented
below. In these plots the hourly load is plotted against the average daily temperature. The blue dots
represent weekdays and the green dots present weekends and holidays. For Dayton Power & Light, the
average temperature is in degrees Celsius. For CE, the average temperature is in degrees Fahrenheit. The
following observations are drawn from these scatter plots.
◼ For both load zones, the relationship between 2AM loads and average daily temperatures is
fuzzier than the relationship between 2PM loads and average daily temperatures. Also, there is
significant overlap between 2AM weekday and weekend loads. In contrast, there is a significant
gap between the weekday and weekend 2PM load response.
◼ The Commonwealth Edison space cooling slopes for 2AM and 2PM are:
─ @2AM Weekday ~ 190 MWh/ °F
─ @2AM Weekend ~ 190 MWh/ °F
─ @2PM Weekday ~ 314 MWh/ °F
─ @2PM Weekend ~ 388 MWh/ °F
◼ The Commonwealth Edison space heating slopes for 2AM and 2PM are:
─ @2AM Weekday ~ 71 MWh/ °F
─ @2AM Weekend ~ 71 MWh/ °F
─ @2PM Weekday ~ 71 MWh/ °F
─ @2PM Weekend ~ 58 MWh/ °F
◼ The Dayton Power & Light space cooling slopes for 2AM and 2PM are:
─ @2AM Weekday ~ 67 MWh/ °C
─ @2AM Weekend ~ 67 MWh/ °C
─ @2PM Weekday ~ 109 MWh/ °C
─ @2PM Weekend ~ 108 MWh/ °C
◼ The Dayton Power & Light space heating slopes for 2AM and 2PM are:
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-23
─ @2AM Weekday ~ 29 MWh/ °C
─ @2AM Weekend ~ 28 MWh/ °C
─ @2PM Weekday ~ 71 MWh/ °C
─ @2PM Weekend ~ 58 MWh/ °C
◼ From a load model building perspective, the key findings that need to be incorporated into a
forecast model specification are as follows.
─ Both loads are weather sensitive.
─ The weather sensitivity leads to a nonlinear response between loads and temperatures.
─ The slopes implied between weekdays and weekends differ.
─ The heating and cooling slopes differ; that is, the weather response is not symmetrical.
─ The heating and cooling slopes differ by time of day.
FIGURE 2-15. DAYTON POWER & LIGHT AVERAGE LOAD @ 2AM VERSUS AVERAGE DAILY TEMPERATURE
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-24
FIGURE 2-16. DAYTON POWER & LIGHT AVERAGE LOAD @ 2PM VERSUS AVERAGE DAILY TEMPERATURE
FIGURE 2-17. COMMONWEALTH EDISON LOAD @ 2AM VERSUS AVERAGE DAILY TEMPERATURE
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-25
FIGURE 2-18. COMMONWEALTH EDISON LOAD @ 2PM VERSUS AVERAGE DAILY TEMPERATURE
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-26
3 DATA CLEANING
The odds are very good that the load data you are modeling will need to be cleaned. It is common to have
load data with missing values, bad measurements, load redefinition, or a thousand other things that would
render a numerical value that simply does not make sense.
Why do we care about cleaning the load data? We care because the modeling approaches we utilize work
on a basic principle of finding model specifications that make the sum of the squared forecast errors as
small as possible. We will elaborate on why minimizing the sum of squared forecast errors makes sense
when we discuss the alternative forecast frameworks. The challenge with bad data is they lead to big
forecast errors. When you square these forecast errors, they get even bigger. In some cases, the squared
errors from bad data points will drive the estimated model specification. In other words, a few bad apples
spoil the whole basket, which in this case is the model. To avoid the possible skewing of the model
specification, we need to eliminate the forecast errors associated with bad data.
When you encounter missing or bad data, you have a couple options to consider. If your estimation data
span several years, your first option is to remove the days that contain one or more missing or bad data
values. For example, say you have three years of load data, which corresponds to roughly 1,095 daily
observations. Even if you removed 10% of the days due to bad data, you are left with about 980
observations, which is plenty of data to accomplish the task.
◼ Load spikes. In most cases, these are easily seen with a quick review of the hourly data. Up data
spikes usually are associated with poor data reads. Down data spikes can occur because one or
more of the SCADA metering points was missing, leading to a lower than expected load value.
Also, if the load data is adjusted for daylight saving time, you will often see a 0.0 value at the hour
at which the clocks sprung forward in the spring. Sometimes, you will see a double counting of
load at the hour when the clocks fall back in autumn.
◼ Repeat Values. Some systems will fill missing load values by repeating the last good data read.
Since the repeated value was good, the repeated values will be good unless a validation test that
is designed specifically to identify repeat values is used. Repeat values are hard to see unless the
graph is zoomed in on a week or even a day.
◼ Load Shifts. Load shifts can be either up or down. A drop in load due to a major customer shutting
down will result in a load shift. Conversely, a new large customer going online will result in a shift
up in the load. Known load shifts like these can be easily modeled. Other load shifts that are
common when working the SCADA measurements is a redefinition of a load zone. There are many
valid reasons for a load zone redefinition. If these reasons are understood and persistent, then
they are easily modeled. However, there are times when a redefinition is inadvertent. In these
cases, the load shift will not persist but will cause large forecast errors. Visual inspection is the
best way to catch these inadvertent load shifts.
◼ Missing Data. Missing data are hard to test, but easy to see in a graph. In most cases, the gaps
are one or two periods in length and can be readily filled. In other cases, days of data can be lost.
◼ Maximum and Minimum Tolerances. This test is looking for data points that exceed minimum
and maximum acceptable tolerances. A quick visual review of the historical data will suggest
reasonable bounds. More sophisticated tests allow the bounds to vary by month or season.
◼ Spike Detection. The goal of a spike detection test is to identify data points that lie (either on the
high side or the low side) outside a reasonable distance from the surrounding data points. An
example of a spike detection test is to compare the current data value to a data value that is n
times bigger (or smaller) than the average of the surrounding data points. Here n is a user-
supplied tolerance. A value of 3 for n means if the current data is more than three times greater
(smaller) than the average of the surrounding data points then the data point should be flagged
as a data spike.
◼ Unexpected Repeat Values. Most hourly and sub-hourly loads do not repeat themselves from
one data read to the next. The exception could be an industrial load that runs flat out. A repeat
value test compares the current data point to the prior k data points. If the k data points are the
same, then they are flagged as repeat data. The value of k is the number of repeat data values
that are acceptable before marking the data as erroneous.
◼ Manual Fill. The most straightforward and yet tedious approach is to replace each erroneous
data value with a user-defined value. It is straightforward because all you need to do is type in a
value. It is tedious because you must decide what value to type.
◼ Linear Interpolation. If the erroneous data lies within a series a good data points, then the
erroneous data are replaced with an average of surrounding data points. The average can be
5 Savitzky, A. and Golay, J.J.E. (1964) “Smoothing and Differentiation of Data by Simplified Least Squares
Procedures”, Analytical Chemistry. 36 (8): 1627-39.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-1
own set of calendar and weather conditions they rely on, there are steps common across most algorithms.
These steps are listed below.
Step 1. Filter on how many months back a similar day can be found. This first high-level data filter
recognizes that economic and technology conditions vary over time. For example, hourly load data from
ten years ago will not be impacted by high saturations of on-premise solar generation. If the forecast day
is for a time with a high saturation of on-premise solar generation, then the ten-year-old data is virtually
useless. By limiting how many months back the similar day lookup algorithm can look for a similar day
ensures the load days that are found have shapes that are consistent with current economic and
technology conditions.
Step 2. Filter on how many days before/after the calendar day of the forecast day. This second high-
level filter attempts to find days that are close in the sense of the calendar to the forecast day. For
example, if the forecast day is April 15, then the load forecasters might opt to find only days that range
from March 15 to May 15.
Step 3. Filter the historical load data based on calendar conditions for the forecast day. The purpose of
this step is to limit the number of similar days to days with similar calendar conditions as the forecast day.
Calendar conditions cover the day-of-the-week, month, or season, and whether a holiday falls on the
forecast day. Examples of other calendar conditions that could be considered include school holidays,
daylight saving observance, whether a change in the production schedule for a large customer is expected,
whether a strike will occur, and whether a major sporting event is to take place. Essentially any special
event that has happened in the past and had a noticeable impact on the load can be used to filter the
historical load data. Examples of how the filtering on calendar conditions can work follow.
◼ If the forecast day is a Monday, the similar day lookup algorithm would filter on all historical days
that are Mondays.
◼ If the forecast day is Easter Sunday, the similar day lookup algorithm would filter on all Easter
Sundays.
◼ If the forecast day is a Tuesday during the spring school break, the similar day lookup algorithm
would filter workdays during the spring school break. These days would then be further filter to
be either just Tuesdays or Tuesdays, Wednesdays, and Thursdays. The latter is an example of how
a similar day lookup algorithm could vary between different load forecasters.
Step 4. Rank similar days based on weather conditions. Steps 1 through Step 3 result in a list of several
similar days. The purpose of this step is to select among this list the historical day(s) that have weather
conditions “most like” the forecast day. At the minimum, the maximum and minimum temperature for
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-2
the day is needed to indicate whether the loads will have significant space heating or space cooling. Other
weather concepts such as humidity, cloud cover, and precipitation can be incorporated. How “most like”
is defined when comparing the weather of a historical day to the forecasted weather for the forecast day
is one area where similar day lookup algorithms vary. The most straightforward approach is to use a
weighted sum of the absolute differences between the forecast day weather and the weather of the
historical days. For example, a similar day lookup algorithm could use the following weighted sum to rank
the historical days.
Here, the weather rank for historical day (d) is the weighted sum of the absolute differences of the various
weather conditions that are forecasted for the forecast day (F) and the observed conditions for day (d).
In this example, the similar day lookup algorithm utilizes (k) different measures of weather conditions.
Once each historical day is scored the days are sorted by their weather rank. The historical days most like
the forecast day will have the smallest weather rank.
At this point, the load forecaster can either use the historical day with the best weather rank as the load
forecast for the forecast day or take a weighted average of say the best three days and use the average
as the load forecast. There may also be some scaling applied to the historical day(s) to ensure the load
forecast is in sync with the most recent load trends. The steps taken once the similar days have been
found are part of the secret sauce of each similar day lookup algorithm. Further, the parameters (e.g.,
months back, days forward, and days back) that guide each step in the similar day lookup algorithm are
controlled by the load forecast analyst. The parameter values depend largely on the forecast analysts’
previous success forecasting days like the forecast day. As a result, most similar day lookup algorithms
facilitate the mechanics of making a load forecast based on expert judgement. An expert with strong local
knowledge will be hard to beat.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-3
This day-of-the-week rotation algorithm is useful when forecasting large industrial and agriculture loads
that follow relatively stable production schedules. In these cases, the operating behavior of the most
recent week is a strong predictor of the operating behavior over the short-term forecast horizon. Where
this variation of a rotation algorithm breaks down is when there is a significant and unpredictable change
in production schedules. But with limited to no knowledge of the production schedules, day-of-the-week
rotation is a good way to go.
With this basic idea of rotating historical days onto the forecast calendar, it is easy to envision and
implement the following variations of the basic rotation algorithm.
◼ Repeat Prior Hour. Under this variation, the forecast values are set equal to the last load
measurement. This variation makes sense for very near-term forecast horizons of up to eight to
twelve hours ahead and where there is real-time load measurement flowing in at least hourly or
more frequently. Further, it only makes sense for flat line loads since the forecast that is produced
is a flat line launching off the last load value. This algorithm would not make sense for the
Commonwealth Edison and Dayton Power & Light load shapes, which are anything but flat lines.
The forecast power of the repeat prior hour rotation algorithm is that the very near-term forecast
is synced to the most recent measurements. Its forecast risk is that the last data measurement
might be erroneous, leading to a near-term forecast that simply repeats the erroneous data
values. High reliance on real-time data reads requires powerful data validation routines to avoid
the risk of projecting off a bad data value.
◼ Repeat Prior Day. Rather than replicating all the days from the prior week, this variation simply
repeats the last day over all the days in the forecast horizon. With a flat load shape, repeating
the last day will lead to the same forecast as replicating all the days from the prior week, but with
less calculations. This has the added advantage that if there is a significant change in the
production schedule, repeating the last value will lead to a forecast error for the first forecast day,
but then the next several days the forecast will adjust to the new load level, which effectively
reduces the chances of several days of significant forecast errors. In contrast, the day-of-the-
week rotation algorithm will lead to seven days of forecast errors before the algorithm catches
up to the new load levels. Like the repeat prior hour algorithm, this algorithm is not well suited
for loads that vary by day of the week like the Commonwealth Edison and Dayton Power & Light
loads.
◼ Repeat Same Day Type Last Week. This rotation algorithm is like the day-of-the-week rotation
algorithm presented above, but with the twist that it first averages the weekday (e.g., Monday,
Tuesday, Wednesday, Thursday, and Friday) load shapes from last week and the weekend (e.g.,
Sunday and Saturday) load shapes prior to rotating the two-day type shapes onto the forecast
calendar. The result would be a load forecast where Saturday and Sunday have the same
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-4
weekend load shape forecast, and the other days of the week have the same weekday load shape
forecast. This is a two-day type rotation algorithm. A four-day type rotation algorithm would
treat Mondays and Fridays as two distinct day types. This would work well for industrial and
agriculture loads with a distinct day-of-the-week load pattern. It could be applied to
Commonwealth Edison and Dayton Power & Light loads, but the load forecast will be subject to
errors if the weather pattern from the last week differ significant from the forecasted weather.
◼ Repeat Day-of-the-Week/Day Type Last Month. The tweak here is the load shapes that are
rotated forward are the average load shapes from the prior month or rolling n weeks. Under this
Rotation algorithm, the forecast analyst decides first whether the load shapes should vary by day-
of-the-week or by day-type. Second, they decide on how many weeks back, n, should be used to
construct the average load shapes. A twist would be to use a weighted average instead of a simple
average of the prior n weeks of data to construct the average load shapes. In this case, the
forecast analyst would place a higher weight on the most recent data.
◼ Repeat Day-of-the-Week/Day Type Last Year. The idea here is to use the load shapes from
roughly the same time of year as last year. The goal is to capture seasonal variation in the loads
that is repeatable from one year to the next. This would make sense with agriculture loads that
have significant pump loads or crop drying loads that take place approximately the same time
each year.
Like the similar day lookup algorithm, the power of a rotation algorithm relies on the expertise of the
forecast analyst. And like similar day lookup algorithms, the rotation algorithms facilitate the mechanics
of making a load forecast based on expert judgement.
6 See https://en.wikipedia.org/wiki/Decision_tree_learning for a brief overview of decision tree learning and a list
of references to obtain more details.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-5
FIGURE 4-1. WEATHER RESPONSE OF LOADS TO TEMPERATURES
What we would like is a function based on these data that we can use to forecast the loads given a forecast
of the average temperature. Decision tree regression is a form of supervised machine learning that will
return such a function. Supervised learning means there is a known outcome that the algorithm is trying
to predict. In this case, the known outcome is the hourly load data. The term “regression” is used in the
machine learning literature to distinguish between “classification” algorithms where the known outcome
is a categorical variable (e.g., won or lost, blue or red) versus “regression” algorithms where the known
outcome is continuous (e.g., loads). In most cases, regression algorithms return an average of the known
outcome. If it helps, think of the decision tree regression as finding a set of load shapes that, when they
are averaged, form the forecast load shape.
At a high-level, decision tree regression starts by splitting the data of our training sample into regions. In
this example, we will start with two regions. Region S is defined as all days where the average temperature
is less than 15° . Region R is defined as all days where the average temperature is greater than or equal
to 15°. We can test how well this data split works by computing the squared differences of the observed
loads from the average load within each region. Formally,
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-6
2
Region S Score = ∑ ̂ S)
(Loadd − Load
d∈Region S
2
Region R Score = ∑ ̂ R)
(Loadd − Load
d∈Region R
The Region S score is the sum of squared errors over all days (d) where the average temperature is less
̂ S is the average load across all days (d) that fall into Region S. This is depicted as the purple
than 15°. Load
line in the figure below. The Region R score is the sum of squared errors over all days (d) where the
̂ R is the average load across all days (d) that fall
average temperature is greater than or equal to 15°. Load
into Region R. This is depicted as the green line in the figure below.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-7
The resulting decision tree regression provides the following forecast function:
̂S
IF (Average Temperature < 15) THEN Load Forecast = Load
̂R
ELSE Load Forecast = Load
What if you want to set the cut point (or binary split) in an optimal fashion? The Greedy algorithm solves
the following optimization problem.7
2 2
Minimize w. r. t. s, ∑ ̂ ∇S ) +
(Loadd − Load ∑ ̂ ∇R )
(Loadd − Load
d∈Region S∇ d∈Region R∇
Here, we are finding the binary split (∇) that minimizes the sum of the squared errors in both regions. The
Greedy algorithm performs a grid search over possible values for the binary split (∇). The value of the
binary split that leads to the smallest sum of squared differences is taken to be the optimal value.
It is easy to extend the size of the decision tree regression by adding additional cut points. For example,
we could define Region X as all days where the average temperature is less than 0°. We could also define
Region Z as all days where the average temperatures are greater than 25°. This would give us four regions:
An example of a decision tree regression with four regions is depicted below. For each region, we would
compute the average load across all days that fall into that region. The forecast function would then pull
the appropriate average load depending on which region the forecast day falls into.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-8
FIGURE 4-3. FOUR REGION SPLIT OF THE WEATHER RESPONSE OF LOADS TO TEMPERATURES
The extension of the above idea to allow the weather response (i.e., the average load) to vary not only
with the average temperature, but also by day-of-the-week. This leads to further segmentation of regions
now with multiple dimensions instead of the single dimension of average temperature. While the idea of
defining regions based on more dimensions is attractive, you run the risk that for some dimensions, there
will be little to no data observations populating those regions. Mechanically this is not too much of a
problem, but what if one of those sparse regions happens to be the type of day we are trying to forecast
(e.g., an extremely hot or cold day)? There is no way to extrapolate the average load for surrounding
regions to imply an average load for the region we are forecasting.
The beauty of decision tree regression is that it provides an estimated forecast function from the data
without imposing a specific functional form. What does that mean? If you look at the last graph, the
nonlinear response between loads and temperatures is approximated by the average loads for each
region. We arrived there without saying the response function is linear, or a polynomial, or a logit
function, etc. We simply estimated the function non-parametrically. This is extremely useful and
powerful if there is sufficient data to cover all regions. If we define the regions either with too narrow cut
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-9
points or with too many features (i.e., explanatory variables like weekend versus weekday), then we run
the risk of having a forecast day that is not covered by one of the existing regions. To avoid this situation,
we must limit the number of regions by balancing the granularity of the cut points with the number of
slope offsets for things like day-of-the-week, season, etc. Further, introducing other weather concepts
like humidity, cloud cover, and wind speed further constrains the number of regions.
In the discussion on statistical approaches to day-ahead load forecasting, we will show that an alternative
parametric forecast function could look like,
Here, we are imposing a specific functional form on the load response to weather. Specifically, we are
stating that the weather response can be approximated by a third order polynomial. We can extend this
functional form to allow the weather response to be different between weekdays and weekends as
follows:
Other interaction terms can be included. The strength of the parametric function is the ability to
extrapolate to days not in the history. For example, if the day we are forecasting has an average
temperature that was not seen in the history, we can use the above estimated equation to produce a load
forecast. Extrapolation is a little bit harder to do in a decision tree regression model.
Effectively, two load forecast approaches were introduced. The first approach, which is demonstrated by
the various rotation algorithms, “learns” the historical relationship between loads from one hour to the
next, from one day to the next, from one week to the next, from one month to the next, and from one
year to the next. This relationship will be referred to in the next section on statistical modeling approaches
as a reliance on autoregressive terms. The concept of “learning” in the context of a rotation algorithm is
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-10
based on the forecast analyst’s perception of the autoregressive relationship. Based on this perception,
the forecast analyst selects the variation of the rotation algorithm that is expected to perform the best.
The second forecast approach, which is demonstrated by the similar day lookup and decision tree
regression algorithms, “learns” the historical relationship between loads and calendar and weather
conditions. That “learned” relationship is then used to forecast loads given forecasted calendar and
weather conditions. In this case, the “learning” is embodied in the forecast analyst’s selection of the
calendar and weather conditions that are used to segment the historical load data.
The main ideas that the reader should take from this section are as follows.
◼ Leveraging historical load patterns can provide powerful load forecasts, but how the historical
load patterns are leveraged is a critical decision point.
◼ Combining information about calendar and weather conditions with historical loads is critical to
accurately forecast weather-sensitive loads.
◼ There is no substitute for forecast experience when deciding on:
─ how to leverage historical load patterns, and
─ which calendar and weather conditions need to be considered when generating a load
forecast.
The next two sections introduce the main statistical approaches used in day-ahead load forecasting. We
will see that the “learning” aspect of load forecasting is removed from the logic of an algorithm to the
process of parameter estimation. Even though the forecast analyst appears to be relieved of the burden
of “learning” and replaced with statistical optimization, the reality is that every equation requires a
forecast analyst to define the elements of the equation. There is no substitute for replacing the process
of ongoing “learning” on the part of a forecast analyst if the goal is build powerful load forecast models.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-11
are blind to forecasted weather conditions and, as a result, they are susceptible to missing key turning
points in loads.
Before presenting the details behind the univariate model frameworks, the modeler needs to make two
key decisions about how the univariate models will be configured.
Here, the load forecast for day (d), interval (i) is a function of the actual loads for day (d), interval (i-1)
back to interval (i-k). For example, the load forecast for today at 10AM is a function of observed loads for
today at 9AM, 8AM, 7AM, 6AM, 5AM, and 4AM.
The second type of lag structure is referred to as day-ahead. With this structure, the load for an hour is a
function of the same hour, but from the previous day(s) leading up to the forecast day. The day-ahead
lag structure can be described generally as:
Here, the load forecast for day (d) and time interval (i) is a function of the actual loads for day (d-1) through
day (d-p) for time interval (i). For example, the load forecast for today at 10AM is a function of observed
loads for yesterday at 10AM, the 10AM loads from two days back, and the 10AM loads from three days
back.
A third type of lag structure is referred to as a mixed lag structure and is a combination of both hour-
ahead and day-ahead lag structures. The mixed lag structure can be described generally as:
LoadForecast d,i
= F(Loadd,i−1 , Loadd,i−2 , Loadd,i−3 , … , Loadd,i−k )
+ G(Loadd−1,i , Loadd−2,i , Loadd−3,i , … , Loadd−p,i )
Here, the load forecast for day (d) and time interval (i) is a function of actual loads for the prior (k) intervals
and the prior (p) days for the same interval (i).
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-12
4.2.2 One or Many Load Forecast Equations
The second decision point is whether there should be:
1. a single load forecast equation that is used to forecast all time intervals in a day, or
2. a separate load forecast equation for each interval of the day.
Consider, the Commonwealth Edison and Dayton Power & Light load data presented in the following
figures. If the decision is to use a single equation to forecast all time intervals of a day, that equation must
have sufficient freedom to allow for a positive relationship for the transition from 7AM and 8AM, and a
negative relationship for the transition from 7PM and 8PM. In other words, the forecast equation must
work with the actual hour-ahead relationship, which varies between positive and negative as you progress
through the day. There are ways of building such an equation by allowing each explanatory variable to
have a parameter specific to each hour of the day. Consider the following example where hourly load on
day (d) and hour (i) is a function of the prior hour of load.
Loadd,i = β1,0 Hour1 + β2,0 Hour2 + β3,0 Hour3 + β4,0 Hour4 + β5,0 Hour5 + β6,0 Hour6
+ β7,0 Hour7 + β8,0 Hour8 + β9,0 Hour9 + β10,0 Hour10 + β11,0 Hour11
+ β12,0 Hour12 + β13,0 Hour13 + β14,0 Hour14 + β15,0 Hour15 + β16,0 Hour16
+ β`17,0 Hour17 + β18,0 Hour18 + β19,0 Hour19 + β20,0 Hour20 + β21,0 Hour21
+ β22,0 Hour22 + β23,0 Hour23 + β24,0 Hour24 + β1 Hour1 Loadd,i−1
+ β2 Hour2 Loadd,i−1 + β3 Hour3 Loadd,i−1 + β4 Hour4 Loadd,i−1
+ β5 Hour5 Loadd,i−1 + β6 Hour6 Loadd,i−1 + β7 Hour7 Loadd,i−1
+ β8 Hour8 Loadd,i−1 + β9 Hour9 Loadd,i−1 + β10 Hour10 Loadd,i−1
+ β11 Hour11 Loadd,i−1 + β12 Hour12 Loadd,i−1 + β13 Hour13 Loadd,i−1
+ β14 Hour14 Loadd,i−1 + β15 Hour15 Loadd,i−1 + β16 Hour16 Loadd,i−1
+ β17 Hour17 Loadd,i−1 + β18 Hour18 Loadd,i−1 + β19 Hour19 Loadd,i−1
+ β20 Hour20 Loadd,i−1 + β21 Hour21 Loadd,i−1 + β22 Hour22 Loadd,i−1
+ β23 Hour23 Loadd,i−1 + β24 Hour24 Loadd,i−1 + ed,i
Here, the explanatory variables Hour1 to Hour 24 are a set of hourly binary variables that take on a value
of 1.0 or 0.0. For example, if (i) is two, then the Hour2 binary variable will take on a value 1.0 and all the
other hourly binary variables will be set equal to 0.0. To allow the relationship between current hour
loads and the prior hour loads to vary across the hours of the day, we introduce a set of 24 model
parameters, one for each hour of the day. Further, we are allowing the average load to vary by hour of
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-13
the day with a second set of 24 model parameters that represent 24 distinct intercept terms. This leads
to a model with 48 distinct model parameters. If the lag structure is to be extended to include the prior
two hours, then another set of 24 model parameters would need to be added. In fact, each explanatory
variable that is added to the model would require a separate set of 24 model parameters. What appears
to be a very simple model approach quickly becomes a mess. Further, if an explanatory variable does not
have an hourly parameter offset, then the estimated impact of that variable is constrained to be the same
across all hours of the day.
The alternative is to define separate hour-ahead equations for each time interval of the day. With hourly
load data, you end up with 24 equations. Following the above example, the equivalent 24, hour-ahead
equations would look like the following.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-14
Loadd,15 = β0 + β1 Loadd,14 + ed,15
Again, a complete set of 48 model parameters are estimated, but these 24-hourly equations are much
cleaner in structure than the single equation with hourly binary offsets. Further, in a paper by
Ramanathan et al.,8 it is shown that a single equation with hourly binary offsets is a constrained version
of set of separate hourly equations. As such, the single equation can only perform as well as the
unconstrained 24-hourly equations. Based on their findings, all the univariate and multivariate models
presented in this handbook utilize separate hourly equations. This is the approach we take in practice.
8 Ramanathan, Ramu, Robert Engle, Clive W.J. Granger, Farshid Vahid-Araghi, and Casey Brace. “Short-Run
Forecasts of Electricity Loads and Peaks,” International Journal of Forecasting, Volume 13, Issue 2, June 1997,
Pages 161-174.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-15
FIGURE 4-4. DAYTON POWER & LIGHT WEEK VIEW: FEBRUARY 2017
FIGURE 4-5. DAYTON POWER & LIGHT WEEK VIEW: JULY 2017
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-16
FIGURE 4-6. COMMONWEALTH EDISON WEEK VIEW: FEBRUARY 2017
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-17
4.2.3 Moving Average
The most straightforward approach is to project future load values as a moving average of the prior N
days of load data. This approach was the mainstay for demand response program evaluation, where a
moving average of days without load shed activity formed the comparison shape in an ex post
performance evaluation of a demand response event. This method has elements common with some of
the variations of the rotation algorithm. This method can be described mathematically as follows:
∑N
n=1 Loadd−n,i
LoadForecast d+1,i =
N
Where,
LoadForecast d+1,i is the day-ahead (d+1) load forecast for time interval (i)
Loadd−n,i is the measured load on day (d-n) for time interval (i)
The choice of the number of days to include in the moving average is a key decision point for the forecast
analyst. Too many days back and you run the risk that the resulting shape does not reflect the seasonality
of the forecast period. This is especially risky rolling into and out of the spring and fall seasons. Too few
days and you run the risk that the forecast shape perpetuates short-run weather-driven load trends
leading to missing turning points in weather. The art of this approach is in the selection of how many
days, N, are included in the moving average.
A more sophisticated moving average would place weights on each observation to create a weighted
moving average such as:
∑N
n=1 Weight d−n Loadd−n,i
LoadForecast d+1,i =
∑N
n=1 Weight d−n
Where,
Dividing by the sum of the weights corrects for the case that the weights do not sum to 1.0. In general,
we would expect the largest weights to be placed on the most recent days with the most distant days
having the smallest weights.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-18
Introducing weights to the moving average opens this method to a range of weighting schemes. For
example, day-of-the-week weights could be designed such that the forecast for a Monday is based on the
weighted average of just the prior Monday loads, a forecast for a Tuesday is based on the weighted
average of just the prior Tuesday loads, and so on. Day type weights that weight working days differently
than non-working days can be designed to provide forecasts that differ between weekdays and weekend
days.
The downside of a weighted moving average is the forecast analyst now has two major decisions to make:
(1) how many days back (N) to include in the average, and (2) what weighting scheme to use. Without a
clear objective function the forecast analyst is left with trial and error in deciding how to best configure
the moving average.
Let us start with writing down the exponential smoothing forecasting formula. We will demonstrate how
this formula leads to a weighting scheme that follows an exponential function in a bit.
Here,
SmoothLoadd,i is the one-day ahead forecast of load for day (d) and time interval (i)
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-19
This base model states that the one-day ahead forecast (SmoothLoadd+1,i) is a weighted sum of the last
measured load (Loadd,i) and the one-day ahead forecast that was made for day (d) (SmoothLoadd,i). In
other words, our forecast for tomorrow is the same as the forecast for today plus some adjustment
associated with the most recent measurements. It helps to recognize that exponential smoothing is
designed to predict the average or mean of a time series, but where that mean evolves over time. My
current prediction of the average is SmoothLoadd,i . I now have a new observation and I need to decide
how does that new observation change my estimate of the average of the series. The smoothing
parameter, α, tells me how much weight I put on the latest measurement. A smoothing parameter value
close to 1.0 means my estimate of the mean of the time series should be very close to the most recent
load data. A smoothing parameter value close to 0.0 says my estimate of the mean should change only
slightly given the latest load measurement. The main take away is that what we are forecasting is the
mean of the load data series. My estimate of the mean is informed by measurement data with the
smoothing parameter determining how much weight is placed on the most recent measurement data.
Why do we refer to this method as exponential smoothing? Let us write down the sequence of day-
ahead forecasts for say the prior seven days plus tomorrow.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-20
By substituting in the result from the prior day, we can rewrite these equations as:
SmoothLoadd−5,i
= αLoadd−6,i + (1 − α)αLoadd−7,i + (1 − α)(1 − α)αLoadd−8,i
+ (1 − α)(1 − α)(1 − α)SmoothLoadd−8,i
SmoothLoadd−4,i
= αLoadd−5,i + (1 − α)αLoadd−6,i + (1 − α)(1 − α)αLoadd−7,i
+ (1 − α)(1 − α)(1 − α)αLoadd−8,i + (1 − α)(1 − α)(1 − α)(1 − α)SmoothLoadd−8,i
SmoothLoadd−3,i
= αLoadd−4,i + (1 − α)αLoadd−5,i + (1 − α)(1 − α)αLoadd−6,i
+ (1 − α)(1 − α)(1 − α)αLoadd−7,i + (1 − α)(1 − α)(1 − α)(1 − α)αLoadd−8,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)SmoothLoadd−8,i
SmoothLoadd−2,i
= αLoadd−3,i + (1 − α)αLoadd−4,i + (1 − α)(1 − α)αLoadd−5,i
+ (1 − α)(1 − α)(1 − α)αLoadd−6,i + (1 − α)(1 − α)(1 − α)(1 − α)αLoadd−7,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−8,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)SmoothLoadd−8,i
SmoothLoadd−1,i
= αLoadd−2,i + (1 − α)αLoadd−3,i + (1 − α)(1 − α)αLoadd−4,i
+ (1 − α)(1 − α)(1 − α)αLoadd−5,i + (1 − α)(1 − α)(1 − α)(1 − α)αLoadd−6,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−7,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−8,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)SmoothLoadd−8,i
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-21
SmoothLoadd+1,i
= αLoadd,i + (1 − α)αLoadd−1,i + (1 − α)(1 − α)αLoadd−2,i
+ (1 − α)(1 − α)(1 − α)αLoadd−3,i + (1 − α)(1 − α)(1 − α)(1 − α)αLoadd−4,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−5,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−6,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−7,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−8,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1
− α)SmoothLoadd−8,i
This last forecast equation is a bit of a mess. To clean things up, we gather all the load terms under one
summation symbol and use the fact that an expression raised to the 0.0 power will return a value of one.
This gives:
n=0
What this says is the one-day ahead forecast is the sum of the one-day ahead forecast that was made nine
days ago and the weighted sum of measured loads over the past eight days. If we had started with the
sequence of load forecast equations that started eight days back instead of seven days back, the above
equation would look like:
n=0
In practice, the forecast equation starts with some initial or a priori forecast value call it,
StartingSmoothLoad0 . This gives us:
n=o
How does this work? Suppose that the smoothing parameter equals 0.5 and the number of observations
leading up to day (d) is 25 (i.e., N=25). With this information we can compute the forecast for d+1. The
first part of the forecast equation is 0.0, because (1 − 0.5)25 ≅ 0.0. In other words, with a smoothing
parameter of 0.5 by the time we are 25 days into the future, our a priori estimate of the mean of the time
series has zero weight.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-22
The second piece is the weighted average of the prior 25 days of loads. The table below shows the weights
that are implied by a smoothing parameter of 0.5. Note the dates in the data are sorted from the most
recent read to the oldest read. Applying the weights to the measured load data leads to a weighted
average load of 195. To summarize we have:
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-23
FIGURE 4-8. EXPONENTIAL SMOOTHING WEIGHTS (𝜶 = 𝟎. 𝟓)
Why is it called Exponential Smoothing? The figure that follows the table shows how the weights decay
over time. This decaying behavior is referred to as a geometric progression, which is a discrete version of
an exponential function.9 Therefore, part one of why this forecast method is referred to as exponential
smoothing has to do with the fact that the weights decay along an exponential path.
Part two of why this method is referred to as exponential smoothing has to do with the smoothing of the
raw time series. Consider taking a simple moving average of any chaotic time series. The resulting
averaged time series will be smoother than the raw time series. If you focus on the second element of
the exponential smoothing forecast equation (∑N n
n=o α (1 − α) Loadd−n,i), we have a moving average of
the raw load data. The result of this moving average will be smoother than the raw load data. As a result,
9 https://en.wikipedia.org/wiki/Exponential_smoothing
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-24
we have a forecast framework that smooths through the raw load data using a moving average that has
exponentially decaying weights. Exponential smoothing for short.
There are several variations of exponential smoothing that allow for trends and seasonal variations in the
loads. Double exponential smoothing is designed to handle load data that is trending.
Here,
SmoothLoadd,i is the one-day ahead forecast of load for day (d) and time interval (i)
Trendd,i is the estimated load trend for day (d) and time interval (i)
Triple exponential smoothing is designed to handle load data that show a trend and seasonal swings.
Loadd,i
SmoothLoadd+1,i = α + (1 − α)[SmoothLoadd,i + Trendd−1,i ]
Id−L
Loadd,i
Id = β + (1 − β)Id−L
SmoothLoadd,i
Here,
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-25
SmoothLoadd,i is the one-day ahead forecast of load for day (d) and time interval (i)
Trendd,i is the estimated load trend for day (d) and time interval (i)
There are several software platforms that offer simple exponential smoothing and associated variations.
Which variation makes sense for your load data requires experimentation and analysis. There is no
theoretical high ground as to right exponential smoothing model to use. The forecast analyst must rely
on their experience to select the version that works best.
The average daily load for Commonwealth Edison is plotted in Figure 4-9. As can be seen from this plot,
there is a noticeable seasonal swing in the average daily load, but no obvious growth trend. We first try
fitting a simple exponential smoothing model to these data. We use all the historical data from January
1, 2013 through December 31, 2016 to estimate the model coefficient. We then use to estimated model
to predict 2017 loads. While this may seem like we are doing a 365 day-ahead forecast for December 31,
2017, the software is generating a sequence of one-day ahead forecasts because the observed data for
each prior day is used to predict the next day. In general, it is hard to trick forecasting software to produce
something other than a sequence of one day ahead or one hour ahead forecasts. That is why you need
to be careful when interpreting the performance of an out-of-sample test like the one run here.
To really see how the model would perform in forecast mode, we re-estimate the model using all of data
through December 31, 2017 and then forecast 2018 where no load is available. In this case, the estimated
value for α the smoothing parameter is 1.228. Shown in Figure 4-10 is an in-sample fit of the estimated
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-26
exponential smoothing model. A quick review suggests the model does a good job in fitting the data. It
appears to capture the major seasonal swings while all the time tracking the data.
Figure 4-11 shows the in-sample fit for the last week of December 2017. The point to notice is that the
exponential smoothing model shadows, with a one-day lag, the actual load data. This is the challenge
with any model that solely relies on the load data from prior days or hours to forecast the future. It is like
driving backwards without any rearview mirrors to help guide you. All you can do is hope that if the road
has been bending to the left that it will continue to bend to the left because if it does not, you will crash
because you will still be steering the car to the left. The seductive nature of a model that relies on
autoregressive terms is that the in-sample fit looks so good. It would take hard work to craft a non-
autoregressive dependent model that fits as tightly as a one-hour lag. As a result, most modelers tend to
opt for the easy road and start their models with autoregressive terms. They obtain great in-sample fits,
but find themselves bewildered when their model fails time and time again to forecast turning points in
the load.
The limitation of the simple exponential smoothing model for forecast horizons of more than one hour
ahead can be seen from Figure 4-13, which shows the true one-month-ahead forecast. Once the model
moves past the last actual load value, there is no new information. As a result, the forecast remains fixed
at its last one-step ahead forecast value. This limitation plays itself out for all of 2018 where the forecast
is the same for every day, as shown in Figure 4-13.
To improve the 2018 load forecast, we next estimate the double exponential smoothing model. The
estimated coefficients are 1.229 on the smoothing parameter and 0.000 on the linear trend term. In other
words, there is effectively a very small growth trend in the Commonwealth Edison data. The forecast
results from the double exponential smoothing model are shown in Figure 4-14 and Figure 4-15. Despite
what appears to be no estimated trend, both figures show a slight upward trend, although one would be
hard put to say this is a great forecasting model.
Finally, we estimate the triple exponential smoothing model. The estimated coefficients are 1.183 on the
smoothing parameter, 0.001 on the linear trend term, and -0.008 on the seasonal term. The forecast
results from the triple exponential smoothing model are shown in Figure 4-16 and Figure 4-17. The
addition of the seasonal component appears to capture the day-of-the-week swings in the loads. For a
one-month-ahead forecast, the triple exponential smoothing model provides the most realistic forecast.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-27
FIGURE 4-9. COMMONWEALTH EDISON: AVERAGE DAILY LOAD
FIGURE 4-10. COMMONWEALTH EDISON: SIMPLE EXPONENENTIAL SMOOTHING MODEL FIT ALL DATA
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-28
FIGURE 4-11. COMMONWEALTH EDISON: SIMPLE EXPONENTIAL SMOOTHING MODEL OUT-OF-SAMPLE FIT
FIGURE 4-12. COMMONWEALTH EDISON: SIMPLE EXPONENTIAL SMOOTHING MODEL ONE MONTH AHEAD
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-29
FIGURE 4-13. COMMONWEALTH EDISON: SIMPLE EXPONENTIAL SMOOTHING MODEL ONE YEAR AHEAD
FIGURE 4-14. COMMONWEALTH EDISON: DOUBLE EXPONENTIAL SMOOTHING MODEL ONE MONTH AHEAD
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-30
FIGURE 4-15. COMMONWEALTH EDISON: DOUBLE EXPONENTIAL SMOOTHING MODEL ONE YEAR AHEAD
FIGURE 4-16. COMMONWEALTH EDISON: TRIPLE EXPONENTIAL SMOOTHING MODEL ONE MONTH AHEAD
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-31
FIGURE 4-17. COMMONWEALTH EDISON: TRIPLE EXPOENTIAL SMOOTHING MODEL ONE YEAR AHEAD
Does exponential smoothing make sense for operational load forecasting? For very near-term forecast,
horizons of one-hour or two ahead, it could prove useful for stable loads. Any load that contains many
turning points during the day, whether those are weather-driven turning points or not, is not a good
candidate for exponential smoothing. In practice, we reserve exponential smoothing for long-term energy
and peak forecasting where it is important to capture long-run growth trends and seasonal cycles in the
data.
Two of the three pieces of an ARIMA model – autoregressive and integrated – are easy to understand.
The third piece – moving average – takes a little bit to wrap your head around. Fortunately, for load
10 Box, George and Jenkins, Gwilym (1970), Time Series Analysis: Forecasting and Control. San Francisco: Holden-
Day.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-32
forecasting, the first two pieces are really about all we need. The ARIMA model can be written generally
as:
Loadd = ARIMA(P, I, Q)
Here,
I is the number of times the raw time series needs to be differenced to get to a stationary time
series
The beauty of ARIMA modeling is that anyone familiar with this approach knows exactly what model is
being used to forecast loads if they know the values of P, I, and Q. Easy, right? It will be after we walk
through the math. Let us start with a simple ARIMA(P,0,0) model. Here, we are saying the model contains
(P) autoregressive terms, no differencing, and no moving average terms. Formally we have:
In this case, we are saying the load on day (d) for time interval (i) is a weighted average of the previous
(P) days of loads at time interval (i) plus a random error (e). The weights are the unknown parameters
(ρp ).
Here, the loads today for interval (i) are a function of the loads for the previous three days at time interval
(i). This model will be familiar to those use to using linear regression. To estimate the model parameters
(ρp ), we use the same optimization problem used to find the model parameters in the linear regression
case.
Trends. One of challenges to ARIMA modeling is that the series we are modeling needs to be stationary
in order to achieve reasonable forecasts. For a series to be stationary, it cannot be trending upwards or
downwards. Consider the following simple example:
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-33
Loadd = 1.2Loadd−1
Here, we have an ARIMA(1,0,0) model and the estimated coefficient for (𝜌1 ) is 1.2. The table below shows
the 10 day-ahead load forecasts using this model. Starting with the last observed load value of 1,00, for
each day in the forecast period the load forecast grows by 20%. By the tenth day in the forecast horizon,
the load forecast calls for a load that is over seven times larger than the starting value. This is an example
of a nonstationary time series.
Unlike exponential smoothing where a trend term was added to the model structure to control for non-
stationarity time series, there is no place to put a trend in an ARIMA model. To overcome this limitation,
the time series approach utilizes the difference operator to create a new detrended time series. The
detrended time series is then modeled as an ARMA(P,Q) process. In forecast mode, the forecast from the
ARMA(P,Q) model is then adjusted by undoing the original differencing. A first order difference can be
written as:
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-34
Load2ndOrderDifferenced,i = Loadd,i − 2Loadd−1,i + Loadd−2,i
The key thing to notice is that each subsequent differencing is applied to the differenced series from the
prior step. Once the differencing has been complete, you would then fit an ARMA(P,Q) model to the
differenced series.
How many times should the original time series be differenced before an ARMA(P,Q) can be fitted to it?
This is where the Box-Jenkins modeling approach comes into play.11 In their book, Box-Jenkins present a
three-step iterative approach to building ARMA models.
Step 1. Stationarity and Seasonality. The first step is to determine whether the time series needs to be
differenced to form a stationary series. ARIMA modeling considers two types of non-stationarities
including linear trends and seasonality. Seasonality, which load data are subject to, suggests that today’s
loads are highly correlated with loads at the same time last week or last year. In most cases, it is easy to
glean from a graph whether a load contains a linear trend and seasonality.
Box-Jenkins also suggest looking at an autocorrelation plot to assess whether a time series contains a
trend and seasonality. The idea is that a non-stationary time series will have an autocorrelation plot that
dies off very slowly. Essentially if there is a trend, it is very likely that today’s load is highly correlated to
the load for the prior several days. To capture this idea mathematically, we form the sequence of
autocorrelation coefficients. For example, the autocorrelation coefficient for a one-day lag is computed
as:
Here, r1 is the autocorrelation coefficient for a one-day lag, Loadn is the load at time (n), and Load̅̅̅̅̅̅̅ is
the mean load over the sample period (N). In the denominator is the variance of the raw time series. The
numerator contains the autocovariance. By dividing the autocovariance by the variance, it places the
values for the autocorrelation coefficients on a scale of -1 to 1. A positive value for the autocorrelation
coefficient indicates current loads are positively correlated to the prior day load. If yesterday’s loads were
high, then it is expected today’s load will also be high. A negative value suggests that if yesterday’s loads
were high, then it is expected today’s loads will be low.
11 Box-Jenkins, Ibid
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-35
The autocorrelation coefficient for a h-step lag would be written generically as follows:
A plot of the autocorrelation coefficients is usually presented as a bar chart with each subsequent bar
representing an additional lag. An example of the autocorrelation function (ACF) plot for two, highly
autocorrelated time series, one positively correlated and the second negatively correlated, is presented
in Figure 4-18. In this example, the Box-Jenkins approach would suggest differencing the data until the
ACF plot no longer displays a slowly decaying series of autocorrelations. An example of an ACF plot that
does not display a slowly decaying series of autocorrelations is presented in
Figure 4-19.
FIGURE 4-18. AUTOCORRELATION FUNCTION (ACF) FOR TWO HIGHLY CORRELATED TIME SERIES
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-36
FIGURE 4-19. AUTOCORRELATION FUNCTION (ACF) FOR A NON-CORRELATED TIME SERIES
Step 2. Identify P and Q. Once a stationary time series has been achieved through differencing, the next
step is to identify potential values for P and Q, the order of the autoregressive, and moving average terms.
Here, Box-Jenkins suggest reviewing both the ACF plot of the differenced series and the Partial
Autocorrelation Function (PACF). The PACF provides information as to the explanatory power of each
subsequent autoregressive lag given all the lagged terms preceding it have had their say. For example,
the PACF for the third order autoregressive lag is the contribution of that third order lag given we have
already accounted for the impact of the first and second order lags. In principle, an AR(P) process (a) will
have an ACF that decays smoothly over time, and (b) the PACF values will drop off toward zero after P
lags. The rough rule of thumb for when a PACF value is closer to zero or insignificant is when the value
2
falls below ± 𝑁, where N is the number of observations used to estimate the PACF values.
√
There is a neat trick to estimating the values for a PACF plot by computing a sequence of regressions
where each regression in the sequence adds an additional autoregressive term. The estimated coefficient
on the additional autoregressive term is the estimated partial autocorrelation. The following is an
example of the sequence of regressions that are used to estimate the partial autocorrelations for up to a
six-period lag.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-37
Loadd,i = ρ1 Loadd−1,i + ρ2 Loadd−2,i + ρ′3 Loadd−3,i + ed,i
The estimated partial autocorrelations are given by the following six parameters: ρ1′ , ρ′2 , ρ′3 , ρ′4 , ρ′5 , ρ′6
Armed with an ACF and PACF plot, we are now ready to identify the values for P and Q of the ARIMA(P,I,Q)
time series. This is the where the art of time series modeling comes into play. Box-Jenkins suggest the
following guidelines for selecting values for P and Q.
◼ If the ACF decays smoothly, then that suggests an AR(P) process. The length of the AR(P) process
is then selected by reviewing the PACF plot. The point right before the PACF bars drop off suggests
the value for P.
◼ If the ACF drops off suddenly instead of decaying slowing, then that suggests a MA(Q) process. In
this case, the value for Q is set equal to the number of ACF bars before the drop off.
◼ If the ACF alternates between negative and positive values but still decays to zero, this suggests
an AR(P) process. The point right before the PACF bars goes to zero suggests the value for P.
◼ If the ACF starts to decay after several periods it suggests a mix ARMA process. The MA(Q) would
be set to right before the ACF starts to decay. The AR(P) would be selected by finding the point
at which the PACF bars start to fall off.
◼ If all the ACF and PACF values are close to zero, then you have an ARMA(0,0) model or random
noise.
◼ If the ACF bars follow a seasonal (including daily) cycle, then seasonal difference should be
applied.
◼ If the ACF effectively does not decay, then the series needs to be differenced.
Step 3. Estimate. At this step, the initial ARMA(P,Q) model is estimated and the model in-sample and
out-of-sample performance is evaluated. Depending the results, the modeler iterates through the steps
until a satisfactory model is found. In fact, there are several statistical tests and selection criterion that
can help decide when a satisfactory model is found. Some of the most common are as follows.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-38
◼ The Dickey-Fuller Test tests the null hypothesis that a unit root is present in an autoregressive
model.12 The simplest version of this test is whether the coefficient of an AR(1) equals 1.0 (i.e.,
unit root). Ultimately what the Dickey-Fuller Test tells you is whether the time series you are
trying to model needs to be differenced.
◼ The Bayesian Information Criterion (BIC) can be used to help select the best value for P and Q13.
The BIC can be written as:
BIC = ln(N) k − 2LN(SSE)
Here, we take the log of the total number of observations (N) times the number of parameters
that are to be estimated (k), which is this case is given by the order of P and Q. As the number of
parameters grows the BIC gets bigger. At the same time, as the order of P and Q grows the Sum
of Squared Errors (SSE) of the estimated model is reduced. The choice of the order of P and Q is
when the incremental reduction in the SSE is outweighed by the penalty, ln(N), of having
additional parameters to estimate.
◼ The Akaike Information Criterion (AIC) is an alternative criterion that can be used to help select
the best value for P and Q.14 The AIC is very similar to the BIC, but places a different weight on
the number of parameters. The AIC is defined as:
AIC = 2k − 2LN(SSE)
The art to ARIMA modeling is knowing when to stop. This is where experience coupled with the BIC and
AIC comes into play.
12 Dickey, D. A., Fuller, W. A. (1979), Distribution of the Estimators for Autoregressive Time Series with a Unit Root.
Journal of the American Statistical Association. 74 (366): 427-431
13 Schwarz, Gideon E. (1978). Estimating the Dimension of a Model. Annals of Statistics, 6 (2): 461-464.
14 Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle. In Petrov, B. N.
and Csaki, F. 2nd International Symposium on Information Theory, Tsahkasdor, Armenia, USSER, September 2-8,
1971, Budapest: Akademiai Kiado, pp. 267-281.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-39
seasonally differenced time series. When this model is estimated, the result in an estimated coefficient
on the AR(1) term of 0.760.
FIGURE 4-20. ACF AND PACF FOR COMMONWEALTH EDISON AVERAGE DAILY LOAD
The forecast model performance is shown Figure 4-22 through Figure 4-24. The first figure shows that
like the exponential smoothing model, an ARMA model will fit tightly to the in-sample data because of
the heavy reliance on autoregressive terms. Like the triple exponential smoothing model, the resulting
ARMA model captures the day-of-the-week load pattern. The last figure shows the challenge with
modeling frameworks that rely heavily on autoregressive terms. Without new information in the form of
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-40
new load measurement, the forecast is fixed throughout the forecast horizon. Since we only seasonally
differenced and did not take first order differences, the forecast does not exhibit the growth trend that
came with the double and triple exponential smoothing models.
FIGURE 4-22. COMMONWEALTH EDISON: SEASONALLY DIFFERENCED, AR(1) MODEL FIT ALL DATA
FIGURE 4-23. COMMONWEALTH EDISON: SEASONALLY DIFFERENCED, AR(1) MODEL ONE MONTH AHEAD
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-41
FIGURE 4-24. COMMONWEALTH EDISON: SEASONALLY DIFFERENCED, AR(1) MODEL ONE YEAR AHEAD
4.3.1 Regression
Most statistical based load forecast frameworks use some form of optimization to estimate the
parameters of the model. The most common optimization problem is characterized as finding values for
the model parameters (i.e., the unknows) such that some objective function is either at a maximum or
minimum value. The following example will illustrate the basic idea. Consider a vector of two variables:
(1) a dependent variable or outcome, and (2) an explanatory variable. Further, assume that the
relationship between the dependent variable and the explanatory variable is roughly linear, which can
be expressed as follows:
Yn = mX n + b + en
Here,
Yn is a vector (a column) of observations on the dependent variable where the elements of the
vector are indexed by (n)
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-42
X n is a vector of observations of the explanatory variable
m is a parameter (a single number) that represents how much the dependent variable will
change with an incremental change in the explanatory variable (i.e., the slope)
This equation should look familiar to the reader since most of us were taught that Y=mX+b represents a
line. The one element that might look strange is the model error. The following figure illustrates the
relationship between Y, the predicted value from a trend line, and e. In this figure, the dependent
variable (Y) is measured on the vertical axis and the explanatory variable (X) is measured on the
horizontal axis. The red dots on the plot represent the (Yn, Xn) observations. The black line is the trend
line that Microsoft Excel® fit to these data. The green vertical lines represent the difference between
the observed Y and what the Excel trend line predicts for Y. In other words, the green lines represent
the model errors. The error values are computed as:
̂n
en = Yn − Y
Where,
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-43
FIGURE 4-25. FITTING A LINE CHART VIEW
Recall, the most common optimization problem is characterized as finding values for the model
parameters (e.g., b and m) such that some objective function is either at a maximum or minimum value.
For the data presented in the figure above, this is equivalent to finding the line (i.e., values for b and m)
that “best” fits the data. Since there are an infinite number of lines that fit these data, we need a rule or
definition for selecting the “best” line. Consider defining “best” as the line that leads to the smallest
sum of errors. In this case, the optimization problem can be expressed as:
N N
Here the optimization problem is to find values for the parameters (b and m) such that the sum of
model errors is as small as possible. In this case, the objective function is defined as the sum of the
model errors. The problem with this objective function is that there are several lines that will lead to
the same value of the objective function, namely 0.0. That is, there are several ways of setting b and m
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-44
so that the sum of the positive errors exactly offset the sum of the negative errors. Although this
objective function reduced the number of lines down from infinity it does not lead to a single unique
line. As a result, we are still left with how to select the “best” line among the set of lines that lead to the
same value of the objective function.
The problem with the above objective function is positive errors cancel negative errors. To avoid this
problem, we need to transform the errors in such a way that positive and negative errors do not negate
each other. One such transformation is to use the absolute function. This leads to the following
objective function.
N N
Here, the task is to find values for the parameters (b and m) such that the sum of the absolute model
errors is as small as possible. This is an admirable objective function, but it is computationally challenging
to solve. Most optimization algorithms are based on differential calculus. The core idea is that the point
at which a function is at a local minimum (maximum) is where the first order derivative equals 0.0. To
determine whether that point is a minimum or maximum requires evaluating the sign of the second order
derivative. The problem with minimizing the sum of the absolute errors is that the objective function is
not differentiable. As result, optimization algorithms that rely on computing the first and second order
derivatives cannot be employed to find the “best” values for b and m.
The advantage of the absolute function is that positive and negative errors do not negate each other.
What is needed is a transformation of the errors that preserves this idea but leads to an objective function
that is continuous and everywhere differentiable. It turns out that such a transformation does exist in the
form of squaring the errors. This leads to the following objective function.
N N
2 2
Find b, m such that ∑(en ) = ∑(Yn − mX n − b) is a small as possible
n=1 n=1
Here, the task is to find values for the parameters (b and m) that lead to the smallest sum of squared
model errors. This quadratic objective function not only meets the criteria of being continuous and
everywhere differentiable, but it is also easy to work with. What we mean by “easy to work with” is the
first and second order derivatives are easy to derive by hand. If we can derive the derivatives by hand,
then it is easy to write a computer program that computes the derivatives.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-45
Define the optimization problem as:
N N
2 2
Minimize w. r. t. b, m; L = ∑(en ) = ∑(Yn − b − mXn )
n=1 n=1
Here, we want to minimize the value of the objective function (L) with respect to (w.r.t.) the unknown
parameters b and m.
To find the estimated value for the intercept term (b), we take the first order derivative with respect to
(b) which is written as follows:
N
ϑL
= ∑ 2(Yn − b − mX n )(−1) = 0
ϑb
n=1
After some rearranging, we have the equation for the optimal value of b as:
∑N
n=1 Yn ∑N
n=1 X n
b̂ = −m =̅ ̂̅
Y−m X
N N
Here, the estimated value for the intercept (b̂) is the difference between the mean of Y (Y̅) and the mean
̂ ). One way of thinking about (b̂) is it represents the portion of the
̅) times the estimated slope (m
of X (X
mean of Y that is not explained by the mean of X times its weight or slope. In the extreme case where X
̂ = 0), then (b̂) is equal to the mean of Y (Y
does not explain any movement in Y, (i.e., m ̅).
Next, we find the estimated value for the slope (m) by taking the first order derivative with respect to (m),
which is written as follows:
N
ϑL
= ∑ 2(Yn − b − mXn )(−Xn ) = 0
ϑm
n=1
̅ − mX
Substituting (Y ̅) in for b in the above equation leads to the solution for the estimated value of m
as:
∑N ̅ ̅
n=1(Yn − Y)(X n − X )
m
̂ = N
∑n=1(X n − ̅
X )(Xn − ̅X )
Here, the estimated slope is equal to the variation of X and Y around their respective means over the
variation of X around its mean squared. If X and Y are positively correlated, then the value in the
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-46
numerator will be positive and the estimated value for m will be positive. If X and Y are negatively
correlated, then the value in the numerator will be negative and the estimated value for m will be
negative. If the joint variation of Y and X is greater than the variation of X with itself, then the estimated
value for m will be greater than 1.0 in absolute value. Essentially, the estimated value for m captures the
extent to which variations in X around its mean explains or is correlated with variations of Y around its
mean.
The beauty of the quadratic objective function is that there is one and only one optimal solution. This
means there are no other values for b and m that will lead to a smaller objective function value. As it
turns out, the process of finding the parameters that minimizes the sum of squared errors has a name
first coined in the scientific literature by Adrien-Marie Legendre as least squares. Later, Carl Friedrich
Gauss laid claimed to the invention of least squares and an academic war ensued. It was not until Francis
Galton applied the least squares technique to a study of the height of the descendants of trees that the
term regression was used. In his 1886 paper, Regression Towards Mediocrity in Hereditary Stature, he
wrote:
“It appeared from these experiments that the offspring did not tend to resemble their parents seeds in
size, but to be always more mediocre than they – to be smaller than the parents, if the parents were
large; to be larger than the parents, if the parents were small.”
Put differently, the size of the offspring regressed toward the mean. In general, when a forecast model is
referred to as a regression, it means the unknown parameters are estimated using the method of least
squares described by Legendre and Gauss.
Unfortunately, there are many variations on the theme in the load forecasting literature, which can cause
confusion. Linear regression and linear least squares are two names for the same thing. Simple linear
regression usually means there is one explanatory variable in addition to the intercept term. By the way,
the intercept term is an explanatory variable. A multivariate linear regression allows for multiple
explanatory variables. Simple linear regression and multivariate linear regression are the same thing with
the same least squares objective function.
Where things become confused in the machine learning literature is around the labels “regression” and
“linear”. A large portion of the machine learning literature is written around solving classification
problems where the dependent variable or the thing that is forecasted takes on one of a finite set of
values, categories, or classes. For example, a credit card transaction is fraudulent or not. When machine
learning is applied to continuous data, the problem is labeled as a “regression” problem. In this case, the
outcome that is returned is in the form of an average value, which harkens back to Galton’s description.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-47
The point is, when the term regression is used in machine learning literature, it does not necessarily mean
the method of least squares.
The term “linear” is perhaps the most misunderstood term in both the load forecasting and the machine
learning literature. In the context of a linear regression or a linear multivariate regression, the term
“linear” refers to the parameters and not to the explanatory variables. The confusion arises when the
author mistakenly infers that the term “linear” refers to the explanatory variables. An example of a linear
regression that is linear in the parameters, but nonlinear in the explanatory variables, follows.
Here, load on day (d) and time interval (i) is modeled as a third order polynomial in temperature. This
equation is highly nonlinear in temperature, but linear in the model parameters (𝛽0 , 𝛽1 , 𝛽2 , 𝛽3 ). The
fact that this equation is linear in the model parameters means the method of least squares will lead to a
unique solution for the four parameters by setting the first order derivatives equal to 0.0.
β2 Temperatured,i β5
Loadd,i = β0 β1 + + β4 Temperatured,i + ed,i
β3 WindSpeedd,i
Here, we introduce three types of nonlinearities in the parameters. The first is a simple multiplication of
β0 and β1 . The reason this is a problem for least squares is there are infinite combinations of β0 and
β1 that, when multiplied together, give the same result. In other words, there is no one unique solution.
The second is a ratio of β2 and β3 . Again, there is no one unique solution. Finally, we have one
parameter β4 raised to the power of a second parameter β5 . This also leads to multiple solutions.
The problem of “linearity” is not a problem with the explanatory variables, which is commonly the knock
against linear regression posited in the machine learning literature, but rather with the model parameters.
Fortunately, there are an unlimited number of ways of expressing the nonlinear response between loads
and temperatures that do not violate the need for the equation to be linear in the model parameters. The
fact that we can construct forecast models that are nonlinear in the explanatory variables is the key to
powerful load forecasting models.
Why least squares? Turns out the solution to the least squares problem has some very attractive features
that can be summarized by the acronym BLUE. Formally, the Gauss-Markov Theorem states that in a
linear regression model in which the errors have expectation zero and are uncorrelated and equal
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-48
variances, the Best Linear Unbiased Estimator (BLUE) of the model parameters is given by the Ordinary
Least Squares (OLS) estimator.15 Here,
◼ Best means the model parameter estimates have the lowest variance, which means test statistics
that rely on measures of distance between a null hypothesis and the estimated parameter will be
as tight as possible.
◼ Linear is the requirement that the equation estimated is linear in the model parameters.
◼ Unbiased means the estimated model parameters will be centered around the true model
coefficients.
◼ Estimator means the value that is derived is an estimate of the true parameter
◼ Ordinary in Ordinary Least Squares means each observation is given equal weight.
From a practical point of view, this means the estimated parameters we obtain using least squares are
very good estimates of the true model coefficients. In the age of desktop computing where estimating
the parameters of a linear regression takes next to zero effort, this may not seem like a big deal. But in
the pre-computer age, where estimating the model parameters would take days of effort and a ream of
paper, knowing that least squares was the right thing to do was extremely comforting.
Another benefit of linear regression is that it is relatively easy to interpret the results. Let us return to the
general conclusion of Francis Galton that the offspring tended to regress toward the mean. Consider the
following linear regression.
Here, the Load on day (d) is given by the weighted sum of seven day-of-the-week binary variables:
Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday. In this case, the binary variables
take on one of two values—1.0 if the day happens to be the day represented by the binary variable,
otherwise 0.0. For example:
◼ April 23, 2018 is a Monday. For this day, the Monday binary variable has a value of 1.0. The
remaining six binary variables have a value of 0.0.
◼ April 24, 2018 is a Tuesday. For this day, the Tuesday binary variable has a value of 1.0. The
remaining six binary variables have a value of 0.0.
15 The Gauss-Markov Theorem is named after Carl Friedrich Gauss and Andrey Markov.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-49
◼ April 25, 2018 is a Wednesday. For this day, the Wednesday binary variable has a value of 1.0.
The remaining six binary variables have a value of 0.0.
◼ April 26, 2018 is a Thursday. For this day, the Thursday binary variable has a value of 1.0. The
remaining six binary variables have a value of 0.0.
◼ April 27, 2018 is a Friday. For this day, the Friday binary variable has a value of 1.0. The remaining
six binary variables have a value of 0.0.
◼ April 28, 2018 is a Saturday. For this day, the Saturday binary variable has a value of 1.0. The
remaining six binary variables have a value of 0.0.
◼ April 29, 2018 is a Sunday. For this day, the Sunday binary variable has a value of 1.0. The
remaining six binary variables have a value of 0.0.
In effect, each day in the estimation has one and only one day-of-the-week binary variable with a value of
1.0. The other day-of-the-week binary variables will have a value of 0.0.
To estimate the parameters for this model, we write the least squares optimization problems as follows:
Here, the least squares optimization problem is to find the parameters (β1 , β2 , β3 , β4 , β5 , β6 , β7 ) that
minimize the value of the objective function, L. To do this, we set the first order derivatives with respect
to each parameter equal to 0. The first order equations look like:
δL
= ∑D
d=1 2(Loadd − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd − β4 Thursdayd −
δβ1
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Mondayd )=0
δL
= ∑D
d=1 2(Loadd − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd − β4 Thursdayd −
δβ2
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Tuesdayd )=0
δL
= ∑D
d=1 2(Loadd − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd − β4 Thursdayd −
δβ3
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Wednesdayd )=0
δL
= ∑D
d=1 2(Loadd − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd − β4 Thursdayd −
δβ4
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Thursdayd )=0
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-50
δL
= ∑D
d=1 2(Loadd − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd − β4 Thursdayd −
δβ5
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Fridayd )=0
δL
= ∑D
d=1 2(Loadd − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd − β4 Thursdayd −
δβ6
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Saturdayd )=0
δL
= ∑D
d=1 2(Loadd − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd − β4 Thursdayd −
δβ7
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Sundayd )=0
This seems like a mess, but let us work through the equation for the Monday coefficient (β1 ). First
multiply through by the term (−Mondayd ). This gives us:
δL
= ∑D 2
d=1 2(−Loadd Mondayd + β1 Mondayd + β2 Tuesdayd Mondayd +
δβ1
β3 Wednesdayd Mondayd + β4 Thursdayd Mondayd + β5 Fridayd Mondayd +
β6 Saturdayd Mondayd + β7 Sundayd Mondayd ) =0
Even more of a mess, but we can clean things up quite a bit by noting that:
δL
= ∑D 2
d=1 2(−Loadd Mondayd + β1 Mondayd ) =0
δβ1
D D
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-51
Pulling the sum on the left-hand side over to write gives us:
∑D
d=1 Loadd Mondayd
β̂1 =
∑D 2
d=1 Mondayd
The numerator contains the sum of all the loads that fell on a Monday; that is, the days when the Monday
binary takes on a value of 1.0. The denominator is the number of Mondays in the estimation period. As
a result, the least squares estimate of the Monday coefficient is the average Monday load. The estimated
coefficients for the other six day-of-the-week coefficients are presented below.
∑D
d=1 Loadd Tuesdayd
β̂2 =
∑D 2
d=1 Tuesdayd
∑D
d=1 Loadd Wednesdayd
β̂3 =
∑D 2
d=1 Wednesdayd
∑D
d=1 Loadd Thursdayd
β̂4 =
∑D 2
d=1 Thursdayd
∑D
d=1 Loadd Fridayd
β̂5 =
∑D 2
d=1 Fridayd
∑D
d=1 Loadd Saturdayd
β̂6 =
∑D 2
d=1 Saturdayd
∑D
d=1 Loadd Sundayd
β̂7 =
∑D 2
d=1 Sundayd
In this case, where the regression model includes the set of seven day-of-the-week binary variables, the
least squares solution is the average load by day-of-the-week. In fact, you would achieve the same
solution if you were to sort the daily load data by day-of-the-week and then compute the average by day-
of-the-week. The estimated coefficients of the regression are the mean values. Understanding that linear
regression is a way of estimating the average value of loads for different subsets of the data is important
for building powerful forecasting models. With this understanding, the modeler’s task is to construct
explanatory variables that create subsets of data that capture the key load variations across days, seasons,
solar and weather patterns, and special event days. The trick to powerful forecast models is the selection
of explanatory variables.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-52
The Intercept Term. In the original definition of an equation, Y = mX + b, we include an intercept term.
To make this obvious, rewrite the equation as Y = mX + bIntercept. Here, the explanatory variable,
Intercept, takes on a value of 1.0 for all observations.
Let us carry this idea of having one of the explanatory variables be given by the intercept variable in the
day-of-the-week regression model presented above. Start by adding the intercept variable to our day-of-
the-week regression model as follows:
Here, the explanatory variable called, Intercept, has a value of 1.0 for all days (d) in the estimation period
(D). To estimate the parameters for this model, we write the least squares optimization problems as
follows:
D
δL
= ∑ 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd
δβ0 d=1
− β4 Thursdayd − β5 Fridayd − β6 Saturdayd
− β7 Sundayd )(−Intecept d ) = 0
δL
= ∑D
d=1 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd −
δβ1
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Mondayd )=0
δL
= ∑D
d=1 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd −
δβ2
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Tuesdayd )=0
δL
= ∑D
d=1 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd −
δβ3
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Wednesdayd )=0
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-53
δL
= ∑D
d=1 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd −
δβ4
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Thursdayd )=0
δL
= ∑D
d=1 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd −
δβ5
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Fridayd )=0
δL
= ∑D
d=1 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd −
δβ6
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Saturdayd )=0
δL
= ∑D
d=1 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd −
δβ7
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Sundayd )=0
Solving for the estimated coefficient on the intercept variable, we begin by taking the first order derivative
with respect to (β0 ) and multiplying through by the intercept variable. This gives:
β0 ∑ Intercept 2d
d=1
D
D
1
β̂0 = ∑(Loadd − β̂1 Mondayd − β̂2 Tuesdayd − β̂3 Wednesdayd − β̂4 Thursdayd
D
d=1
− β̂5 Fridayd − β̂6 Saturdayd − β̂7 Sundayd )
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-54
Since one of the day-of-the-week binary variables takes on a value of 1.0 for every observation in the
estimation data set, we need to know what the values of the other model parameters are before we can
solve for the coefficient on the intercept term.
∑D 2
d=1 2(−Loadd Mondayd + β0 Intercept d Mondayd + β1 Mondayd )=0
∑D
d=1(Loadd Mondayd − β0 Intercept d Mondayd )
β1 =
∑D
d=1 Mondayd
2
∑D ̂
d=1(Loadd Mondayd − β0 Mondayd )
β̂1 =
∑D 2
d=1 Mondayd
In a similar fashion the estimated values for the other Day-of-the-Week coefficients look like:
∑D ̂
d=1(Loadd Tuesdayd − β0 Tuesdayd )
β̂2 =
∑D
d=1 Tuesdayd
2
∑D ̂
d=1(Loadd Wednesdayd − β0 Wednesdayd )
β̂3 =
∑D 2
d=1 Wednesdayd
∑D ̂
d=1(Loadd Thursdayd − β0 Thursdayd )
β̂4 =
∑D
d=1 Thursdayd
2
∑D ̂
d=1(Loadd Fridayd − β0 Fridayd )
β̂5 =
∑D 2
d=1 Fridayd
∑D ̂
d=1(Loadd Saturdayd − β0 Saturdayd )
β̂6 =
∑d=1 Saturdayd2
D
∑D ̂
d=1(Loadd Sundayd − β0 Sundayd )
β̂7 =
∑D 2
d=1 Sundayd
Given these equations, let us back into the value for the coefficient on the intercept variable starting with:
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-55
D
β0 ∑ Intercept 2d
d=1
D
D D D D
βD
0D = ∑ Loadd − β1 ∑ Mondayd − β2 ∑ Tuesdayd − β3 ∑ Wednesdayd
d=1 d=1 d=1 d=1
D D D
− β7 ∑ Sundayd
d=1
Note, the sum across all days (D) in the estimation period of the Monday binary variable returns the total
number of Mondays in the estimation period. We will call this number Mondays. We can do the same
for the other days-of-the-week. This simplifies the equation to:
βD
0D = ∑ Loadd − β1 Mondays − β2 Tuesdays − β3 Wednesdays − β4 Thursdays
d=1
− β5 Fridays − β6 Saturdays − β7 Sundays
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-56
The next step is to substitute in the values for each of the day-of-the-week parameters that we get from
the first order derivatives. This gives:
D
∑D ̂
d=1(Loadd Mondayd − β0 Mondayd )
βD
0D = ∑ Loadd − Mondays
∑d=1 Mondayd2
D
d=1
∑D ̂
d=1(Loadd Tuesdayd − β0 Tuesdayd )
− Tuesdays
∑D
d=1 Tuesdayd
2
∑D ̂
d=1(Loadd Wednesdayd − β0 Wednesdayd )
− Wednesdays
∑Dd=1 Wednesdayd
2
∑D ̂
d=1(Loadd Thursdayd − β0 Thursdayd )
− Thursdays
∑Dd=1 Thursdayd
2
∑D ̂
d=1(Loadd Fridayd − β0 Fridayd )
− Fridays
∑Dd=1 Fridayd
2
∑D ̂
d=1(Loadd Saturdayd − β0 Saturdayd )
− Saturdays
∑Dd=1 Saturdayd
2
∑D ̂
d=1(Loadd Sundayd − β0 Sundayd )
− Sundays
∑Dd=1 Sundayd
2
We can further simplify this expression by recognizing the summations in the denominators count the
number of days of each day-of-the-week. Cancelling out the number of days by day-of-the-week results
in:
D D D D
βD
0D = ∑ Loadd − ∑ Loadd Mondayd + β0 ∑ Mondayd − ∑ Loadd Tuesdayd
d=1 d=1 d=1 d=1
D D D
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-57
Now collecting terms leads to:
βD
0 (D − Mondays − Tuesdays − Wednesdays − Thursdays − Fridays − Saturdays
− Sundays)
D
One last rearrangement and we have the expression for the least squares estimate for the coefficient on
the intercept explanatory variable. However, we have a significant problem. The value of the total
number of days in the estimation period (D) exactly equals the sum of the total number of Mondays,
Tuesdays, Wednesdays, Thursdays, Fridays, Saturdays, and Sundays. In other words, the last step requires
dividing by zero which does not work.
The Problem of Multicollinearity. Where did things go wrong? Notice that to solve for the estimated
value for each of the parameters on the day-of-the-week binary variables, we need to know the estimated
value for the coefficient on the intercept variable. However, to solve for the value for the coefficient on
the intercept variable, we need to know the estimated values for the seven day-of-the-week coefficients.
This is a problem. Effectively, we have eight unknowns (the coefficients that need to be estimated) and
really seven equations. Although we have written down eight first order derivatives, one of these eight
equations can always be written as a combination of the other seven first-order derivatives. With eight
unknowns and seven equations, there is no unique solution. This problem is referred to in the literature
as multicollinearity. This happened when we introduced the intercept term, which created a case that for
every observation (d) in the estimation dataset (D), the intercept term is a linear combination of the day-
of-the-week binary variables. To see this, consider the following seven days of data. If for each day, you
sum the values of the seven day-of-the-week variables, you obtain a sum value of 1.0, which happens to
be exactly equal to the value of intercept variable. In other words, the intercept variable is a linear
combination of the seven day-of-the-week binary variables.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-58
In the pre-computer and early computer days, multicollinearity was the bane of forecast modelers
because they would do all this work only to find out there was no unique solution. With advanced
computing power, multicollinearity is no longer an issue for solving for a solution because most least
squares solution algorithms are programmed to search for multicollinearity and take steps to remove one
or more explanatory variables until the problem goes away.
It would be better if the modeler understood the problem of multicollinearity and took steps to avoid the
problem in the first place. How would we avoid this issue? Consider dropping the Sunday binary variable
from the day-of-the-week regression equation. This gives us the following model:
By dropping the Sunday binary variable, the intercept variable is no longer a linear combination of the
remaining day-of-the-week binary variables. Now see if by dropping the Sunday binary variable, we have
eliminated the problem of multicollinearity.
To estimate the parameters for this model, we write the least squares optimization problems as follows:
D
δL
= ∑ 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd
δβ0 d=1
− β4 Thursdayd − β5 Fridayd − β6 Saturdayd )(−Intecept d ) = 0
D D
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-59
Expanding gives us:
D D D D
Plugging in the first-order derivatives for the six day-of-the-week binary variable coefficients leads to:
D
∑D ̂
d=1(Loadd Mondayd − β0 Mondayd )
Dβ0 = ∑ Loadd − Mondays
∑D
d=1 Mondayd
2
d=1
∑D ̂
d=1(Loadd Tuesdayd − β0 Tuesdayd )
− Tuesdays
∑D
d=1 Tuesdayd
2
∑D ̂
d=1(Loadd Wednesdayd − β0 Wednesdayd )
− Wednesdays
∑Dd=1 Wednesdayd
2
∑D ̂
d=1(Loadd Thursdayd − β0 Thursdayd )
− Thursdays
∑Dd=1 Thursdayd
2
∑D ̂
d=1(Loadd Fridayd − β0 Fridayd )
− Fridays
∑D 2
d=1 Fridayd
∑D ̂
d=1(Loadd Saturdayd − β0 Saturdayd )
− Saturdays
∑D
d=1 Saturdayd
2
− ∑ Loadd Saturdayd
d−1
If we let
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-60
We can write the final solution for the estimated coefficient on the intercept variable as:
∑D
d=1 Loadd Sundayd
β̂0 =
Sundays
In this case, the estimated coefficient on the intercept variable equals the average load for the day-of-
the-week that was left out of the original regression equation, specifically Sunday.
The estimated coefficients for the six day-of-the-week binary variables are found by setting their
respective first order derivatives equal to 0. For the Monday binary variable, the calculations are:
δL
= ∑D
d=1 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd −
δβ1
β4 Thursdayd − β5 Fridayd − β6 Saturdayd ) (−Mondayd )=0
D
δL
= ∑ 2(−Loadd Mondayd + β0 Intercept d Mondayd + β1 Mondayd2 )
δβ1 d=1
D D
The estimated value for the coefficient on the Monday binary variable is then expressed as:
∑D ̂
d=1 Mondayd (Loadd − β0 )
β̂1 =
Mondays
Recall, the estimated coefficient on the intercept variable was equal to the average load for the day-of-
the-week variable that was left out of the equation, in this case Sunday. The estimated coefficient on the
Monday binary variable is the average difference between the Monday loads and the average Sunday
load.
In the same fashion, the estimated coefficients for the remaining day-of-the-week binary variables are
expressed as:
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-61
∑D ̂
d=1 Tuesdayd (Loadd − β0 )
β̂2 =
Tuesdays
∑D ̂
d=1 Wednesdayd (Loadd − β0 )
β̂3 =
Wednesdays
∑D ̂
d=1 Thursdayd (Loadd − β0 )
β̂4 =
Thursdays
∑D ̂
d=1 Fridayd (Loadd − β0 )
β̂5 =
Fridays
∑D ̂
d=1 Saturdayd (Loadd − β0 )
β̂6 =
Saturdays
The estimated coefficients on the day-of-the-week binary variables capture the average load on a
Monday, Tuesday, Wednesday, Thursday, Friday, and Saturday that is not already accounted for by the
intercept variable. In this way, the intercept variable acts like an anchor. It describes the base behavior
and the day-of-the-week binary variables describes how the base behavior changes by day-of-the-week.
In general, intercept variables are included in models to exploit this anchoring effect. The intercept
variable captures behavior that is not accounted for by all the other variables included in the model.
In the above example, we demonstrate that a solution to the multicollinearity problem was to remove
one of the day-of-the-week binary variables. It does not matter which binary variable is removed. Any
one of them will ensure that the intercept variable is not a linear combination of the remaining day-of-
the-week binary variables. Which day-of-the-week binary variable is left out is a matter of preference. In
general, hourly loads tend to be lowest on Sundays. By leaving Sunday out, the estimated coefficients on
the remaining binary variables will tend to be positive. This does not change the forecast performance of
the model, but it does make the estimated coefficients look “pretty”. If optics are important, then drop
out the day-of-the-week with the lowest average load.
Does multicollinearity show up in other cases? The most common cases of multicollinearity arise when
day-of-the-week, monthly, and seasonal binary variables are used. In all three cases, the solution is the
same. Drop out one of the day-of-the-week, month, and season binaries and leave the intercept variable
in the model. Also, if you had a model with no intercept variable, but all seven day-of-the-week and all
12 monthly binary variables, you will have an issue with multicollinearity because the sum of the seven
day-of-the-week binary variables will equal the sum of the twelve monthly binary variables. A good habit
to follow is to always remove one of the day-of-the-week, one of the month, and one of the season binary
variables from the model specification.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-62
Can linear regressions really be non-linear in the explanatory variables? Just to prove the point that
linear regressions can be used to address the non-linear response between loads and weather, we will
solve for the parameters for the following equation.
Here,
Weekendd is a binary variable that takes on a value of 1.0 if the day (d) is a non-working day,
otherwise 0
Weekend × Td is the average temperature for day (d) interacted with (i.e., multiplied by) the
weekend binary variable
Weekend × Td2 is the average temperature for day (d) squared interacted with (i.e., multiplied
by) the weekend binary variable
Weekend × Td3 is the average temperature for day (d) cubed interacted with (i.e., multiplied by)
the weekend binary variable
In this case, the model is nonlinear in temperature, but is linear in the model parameters. The regression
estimates for Commonwealth Edison and Dayton Power & Light are shown in Figure 4-26 and Figure 4-27.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-63
FIGURE 4-26. ESTIMATED NONLINEAR WEATHER RESPONSE FUNCTION FOR COMMONWEALTH EDISON
FIGURE 4-27. ESTIMATED NONLINEAR WEATHER RESPONSE FUNCTION FOR DAYTON POWER & LIGHT
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-64
4.3.2 Neural Network Models
At the heart, or the perhaps at the brain, of the neural network efforts has been the goal to build a model,
software, and hardware that could mimic the function of a brain. While work in this area ebbed and
flowed from the late 1940s through the early 1980s, it was not until the late 1980s that neural networks
gained traction across many industries. Today neural network models are employed for a wide range of
classification, pattern recognition and forecasting problems. This section presents an overview of neural
network models and their application to short-term load forecasting.
We begin the discussion on neural network models with a decision framework. Consider the forecast
problem: Is today going to be a peak load day? Since there are only two answers to this question, either
Yes or No, it can be characterized as a classification problem. An example, of a decision framework that
can be applied to answer this question is presented in Figure 4-28. Reading from left to right, we start
with three inputs to the decision: (a) the forecasted average temperature for the day, (b) a binary variable
that indicates whether the forecast day lands on a Saturday or Sunday, and (c) a second binary variable
that indicates whether the forecast day is a holiday or not. Each of these inputs is assigned a weight and
then the weighted sum (βX) is computed. This weighted sum (βX) is then passed to a decision rule where
the weighted sum is compared to a threshold value (T). The decision rule is based on the following test:
(βX ≥ T). If the test is true, then the rule returns a 1.0 indicating it is a peak day. If the test is false, the
decision rule returns a 0.0 indicating it is not a peak day.
FIGURE 4-28. BASIC ELEMENTS OF A FEED FORWARD NEURAL NETWORK USED FOR CLASSIFICATION
Shown in Figure 4-29 is this decision framework populated with sample data. Here, the average
temperature forecast is 67, the forecast day is a weekend, and no holiday is in effect on this day. The
weight applied to the temperature forecast is 5 MW/degree. The weight applied to the weekend binary
variable is -1 MW. The weight on the holiday binary variable is -2 MW. The weighted sum is computed
as:
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-65
βX = (67 × 5) + (1 × −1) + (0 × −2) = 334
With a threshold value of (T=300), the decision rule returns a value of 1.0.
The decision rule presented above is called a step activation function. The purpose of an activation
function is to convert an input signal (e.g., βX) to an output signal (e.g., 1 or 0). It should be noted that if
the activation function was described as taking the input signal and simply multiplying it by a numeric
value (e.g., 1.0 x βX) to derive the output signal, we would have something very close to a linear regression
model, with the caveat that the output signal would be a MW value and not a categorical value. The fact
that the activation function falls into a class of nonlinear functions is what distinguishes neural network
models from linear regression models. A nonlinear activation function does not eliminate the hard part
of modeling, which is determining the list of inputs (e.g., average temperature, weekend binary variable,
holiday binary variable). The model developer, not the neural network, determines the list of explanatory
variables that are used.
The step activation function used in the prior examples is called out in Figure 4-30. Graphically, the
function is 0 up to the threshold and then 1 thereafter. The challenge with step functions is they not very
easy to work with because they are discontinuous and non-differentiable. The big break through with
neural network computing came when the step activation function was replaced with a sigmoid activation
function as shown in Figure 4-31. The sigmoid function provided the same flavor of producing a switch
based on a threshold, but with a continuous and differentiable functional form. This meant optimization
algorithms that rely on taking derivatives of the activation function could be utilized. This set the stage
for handling large scale problems.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-66
FIGURE 4-30. STEP ACTIVATION FUNCTION
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-67
FIGURE 4-31. REPLACING THE STEP ACTIVATION FUNCTION WITH A SIGMOID ACTIVATION FUNCTION
Replacing the step activation function with the sigmoid activation function leads to the feed forward
neural network shown in Figure 4-32. The feed forward describes how the input data feeds forward
through the weighting to the activation function. The activation function in turn feeds forward an output
signal which in this case is a number ranging between -1 and 1.
FIGURE 4-32. FEED FORWARD NEURAL NETWORK WITH SIGMOID ACTIVATION FUNCTION
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-68
The next step in the evolution of our description of a feed forward neural network is to collapse the
weighting of the inputs into the sigmoid activation function, which is shown in Figure 4-33.
FIGURE 4-33. COLLAPSING THE SUM FUNCTION WITH THE SIGMOID ACTIVATION FUNCTION
At this point we can introduce some concepts that are unique to the neural network modeling paradigm.
In Figure 4-34 we introduce the input layer, the hidden layer, and the output layer. The input layer
contains the set of explanatory variables that are driving the load forecast. This is equivalent to the set of
explanatory variables included in a regression model. The hidden layer serves two purposes. First, it
weights the explanatory variables. Second, it transforms the weighted sum into a predicted value on a
scale of -1 to 1. The output layer contains the forecast values. In this case, there is one output in the
output layer.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-69
FIGURE 4-34. THE INPUT, HIDDEN AND OUTPUT LAYERS OF A NEURAL NETWORK MODEL
It turns out that that there can be more than one sigmoid activation function or nodes contained in the
hidden layer. In the figure is shown one node in the hidden layer. Further, each node can be fed by a
different set of explanatory variables contained in the input layer, as well as have a different functional
form for the activation function. Finally, the predicted values from the hidden layer can be transformed
by another set of weights to produce a forecast of loads. This final iteration of a feed forward neural
network with multiple sigmoid activation functions all leading to a load forecast is commonly used for
short-term load forecasting. This is depicted in Figure 4-35.
Variations on this base feed forward neural network would include multiple layers of nodes in the hidden
layer. In this case, the output of one or more nodes feed forward to another set of nodes, which then
feed forward either to another layer of nodes or to the output layer. Neural network models that have
multiple layers are referred to multi-layer neural networks. Finally, other continuous and differentiable
function forms can be used instead of the sigmoid function.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-70
FIGURE 4-35. MULTIPLE NODES IN THE HIDDEN LAYER AND THE REGRESSION MODEL EQUIVALENT
Some useful properties of the sigmoid function. In addition to being very easy to work with, the sigmoid
function offers some very useful modeling features.
First, two sigmoid functions, when working together, can approximate any continuous non-linear
function. The non-linear function that is of interest in load forecasting is the nonlinear response between
loads and weather. Like a third order polynomial function in temperature, a simple neural network with
temperature included on two sigmoid activation functions will do a very good job in approximating the
nonlinear load response to weather. The estimated weather response functions for Commonwealth
Edison and Dayton Power & Light are shown in Figure 4-36 and Figure 4-37. Here, we are plotting the
predicted value from a single layer neural network with two sigmoid activation functions that include as
explanatory variables average temperature and a weekend binary variable. A third linear activation
function includes the weekend binary variable, which allows the intercept to be different between
weekdays and weekend days.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-71
FIGURE 4-36. ESTIMATED NONLINEAR WEATHER RESPONSE FUNCTION FOR COMMONWEALTH EDISON
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-72
FIGURE 4-37. ESTIMATED NONLINEAR WEATHER RESPONSE FUNCTION FOR DAYTON POWER & LIGHT
Second, the sigmoid function will automatically interact with the explanatory variables included in the
function. To see this, recall the following properties of the exponential function:
On the left-hand side we have included (βX), which in this case is β1 Temperature + β2 Weekend. The
right-hand side expression is mathematically equivalent. If the parameters (β1 , β2) are non-zero, then the
temperature and weekend explanatory variables interact or multiply each other. This means as a modeler
you enter in temperature and whatever you want temperature to interact with into the sigmoid activation
function. The comparable linear regression specification would look like:
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-73
4.3.3 Support Vector Regression
Support vector machines is a classification method designed to handle the case where the dependent
variable is categorical.16 Support vector regression requires a few tweaks of the support vector machine
framework to allow for a dependent variable that is continuous and not categorical. Load forecasting falls
within the class of support vector regression. At the heart of a support vector regression is a forecast
equation that defines the relationship between the target variable (e.g., load) and a set of factors
(explanatory variables). The forecast equation can be written generally as:
Loadd = βX d + ed
Here,
This equation should look familiar since it is your run-of-the-mill load forecast equation. The main
difference between support vector regression and linear regression is the optimization problem that is
used to estimate the model parameters (β). Linear regression solves the optimization problem of finding
the parameters that minimize the sum of the squared errors. This is written generally as:
d
2
Minimize w. r. t. β; ∑(Loadd − βX d )
d=1
To find the solution to this quadratic programming problem, all we need to do is compute the derivative
of the above objective function with respect to (w.r.t.) each parameter and set the results equal to 0.
Generally, the solution can be written as:
β̂ = (X ′ X)−1 X ′ Load
16 The concepts presented here come from a variety of sources. The machine learning lectures from MIT, Georgia
Tech, and UC Irvine found on www.youtube.com are the most useful.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-74
Where, β̂ is a vector of (K) estimated model parameters.
Support vector regression takes a different path to solving for the parameter values. This path introduces
two new concepts to the problem. The first is regularization which is based on the idea that an over
specified regression model could overfit the sample data and render the resulting estimated model
useless for forecasting purposes. In the world of machine learning with large datasets the potential for
overfitting is high. In load forecasting applications, the problem arises in the case where one or more data
outliers cause large residuals. Since the objective of the regression is to find parameters that minimize
the sum of squared errors, the estimated model parameters will be skewed toward reducing the errors
associated with the outliers. This shows up as one or more of the parameters having larger (in absolute
value) numerical values than what would be the case if the outliers did not exist. To avoid this problem,
the least squares objective function is modified to include a penalty on large parameter estimates. The
revised objective function is called ridge regression. Formally, we have:
d K
2
Minimize w. r. t. β: ∑(Loadd − βX d ) + C ∑ β2k
d=1 k=2
Here, C is the cost associated with large parameter values. The cost is summed over all parameters except
for the parameter associated with the intercept term (k=1). The effect of the cost term is to shrink in
absolute size the estimated parameters. With large data sets, this cost function provides insurance
against possible overfitting.
Another way of thinking about shrinking the parameters is to define the following objective function.
K
1
Minimize w. r. t. B: ∑ β2k
2
k=1
Here, we are seeking the value of the parameters that minimize the sum of the squared parameters.
Alternatively, you can think of the objective function as minimizing the square of the Euclidean norm or
length of the parameter vector. To avoid the obvious solution of setting all the parameters equal to 0, we
need to constrain the problem in some way. We do this by defining the following two constraints.
Constraint 1: Loadd − βX d ≤∈
Constraint 2: βX d − Loadd ≤∈
Here, ∈ is an acceptable error margin that forces the values for the parameters to be non-zero. In load
forecasting applications, ∈, would be set equal to an acceptable forecast error (e.g., 250 MW). Now the
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-75
optimization problem is to find parameters such that the resulting forecast equation predicts loads within
an acceptable forecast error tolerance.
This leads to the concept of constrained optimization. Previously, our concept of acceptable forecast
error tolerance was defined by minimizing the sum of the squared forecast errors, we have transformed
the problem to find the smallest parameters possible that ensure the resulting forecast errors are
acceptable. The solution to Constraint 1 and 2 forms the support vectors around our notion of acceptable
forecast error. Further, the data points that lead to predicted errors that are less than ∈ do not matter.
What matters are the data points where the constraints are exactly equal to the margin. What this means
is the estimated parameters are determined by a handful of the data observations, thus reducing the
potential for overfitting.
To be fair, some additional parameters need to be added to handle the case where there is no feasible
set of parameters that meet Constraint 1 and 2. Here, we introduce the concept of a soft margin as
follows:
D
1
Minimize w. r. t. B: ‖β‖2 + C ∑(τd + τ́ d )
2
d=1
Subject to
Constraint 3: τd ≥ 0, ∀ d
Constraint 4: τ́ d ≥ 0, ∀ d
Here, τd , τ́ d are slack parameters that allow the margin, ∈, not to bind for every observation. By imposing
a cost (C) on these parameters, we are ensured their values do not get so big as to render Constraint 1
and 2 meaningless. Otherwise, the slack parameters would be set wide enough to capture all observations
including data outliers. This would render the resulting forecast solution almost useless.
There is an interesting feature of support vector machines that has to do with addressing the possibility
that a nonlinear relationship exists between loads and one or more of the features or explanatory
variables. It turns out that there is a set of functions, called kernel functions, that can be used to capture
these nonlinearities without having to go through the tedious task of keying up the nonlinearities (e.g.,
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-76
temperature, temperature squared, temperature cubed). Further, the use of these kernel functions
reduces the number of calculations that are needed to estimate the parameters. In the world of load
forecasting this may not sound like a big-time saver. However, in the world of pattern recognition where
there could be thousands of interactions, any trick that will save on computer calculations is a very big
deal. Having a function do all the work is extremely useful. The most common kernel functions are:
2
‖X′ X‖
Radial Basis Function Kernel: K(X) = exp (− 2σ2
) where σ2 is a parameter
The kernel functions take a matrix of explanatory variables (X) as an input. These raw variables are then
used to create other variables through nonlinear transformations of the raw data. The nonlinearity aspect
of the kernel functions is often cited as the reason support vector regression is well suited for load
forecasting which has an obvious nonlinear response between loads and weather.
Consider the polynomial kernel function. We have shown earlier that is plausible to capture the nonlinear
weather response in a regression by including weather, weather squared, and weather cubed. Both the
polynomial kernel and the polynomial regression are designed to address the problem in a very similar
way. In effect, the support vector regression when the polynomial kernel function is applied looks like a
constrained version of a polynomial regression.
Now consider the sigmoid kernel. This in fact describes one node in a neural network. If weather is
included in the set of explanatory variables, then the support vector regression that utilizes a sigmoid
kernel is, in effect, a constrained version of a neural network model.
What do we gain with support vector regression over linear regression and neural networks? First
consider forecasting a large system load. Further, assume we do not use any of the kernel functions, but
instead we use the same set of explanatory variables (X) in both a linear regression and a support vector
regression. The only difference between the models will be the optimization problem used to estimate
the parameters. It is unclear if the resulting support vector regression forecast will necessarily outperform
the linear regression forecast. Consider the actual equation we are using to forecast loads.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-77
Here, the number of days into the forecast horizon is indexed by (h)
Loadfd+h is the load forecast derived from the estimated linear regression model
Loadf′
d+h is the load forecast derived from the estimated support vector regression model
The only difference between these two forecast equations is the value of the estimated parameters. We
know from ordinary least squares theory that the regression parameters, (β̂), are unbiased estimates of
the true parameter values. I have not been able to find theoretical proof that the support vector
regression parameter estimates, (β′), are also unbiased estimates of the true parameter values. The two
parameter estimates will be different. How different will depend on the size of ∈ and C.
As for the relative forecast performance, it is hard to imagine a constrained version of the parameters
outperforming the unconstrained version. So how does the machine learning literature justify the use of
support vector regression over linear regression? First, it is argued that linear regression cannot handle
the nonlinear response between loads and weather. We have shown that claim is false. Second, it is
argued that the support vector regression parameter estimates are more robust to data outliers than the
regression parameter estimates. This is a valid argument if steps are not taken to remove data outliers
from the estimation data set. Fortunately, there are well established validation methods that identify
data outliers and remove them from the regression estimation process.
So where does that leave us? I expect, for large well-behaved loads where data outliers are not an issue,
that unconstrained regression and neural networks will yield better forecast performance, as measured
by the size of the forecast errors than their constrained counterparts. In the world of individual customer
load forecasting, the robustness of the support vector regression could prove useful in handling noisy
customer loads. In this world, a robust smooth load forecast takes precedence over forecast accuracy.
However, it is fair to say that support vector regression needs to prove itself in practice (not a static,
academic data set) to determine its operational merits.
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-78
5 EXPLANATORY VARIABLES BASED ON CALENDAR
CONDITIONS
The “art” of developing powerful operational load forecast models is identifying and constructing the list
of explanatory variables that are to be included in the model specifications. When building explanatory
variables, there is no better place to think about than your own home. A typical residential home uses
electricity to drive a vast array of electric equipment. The non-exhaustive list of electric equipment that
can be found in a home includes the following:
◼ Table lamps, desk lamps, recessed lighting, task lighting, ambient lighting, outdoor lighting, garage
door opener lighting, entryway lighting, night lights, appliance lights, oven lights, BBQ lights,
decorative neon lights, …
◼ Big screen TVs, small screen TVs, surround sound systems, gaming consoles, radios, stereos, Blu-
ray Disc players, record players, DVRs, CD players, electronic pin ball machines, …
◼ Desktop computers, laptop computers, tablets, phone chargers, smart devices, …
◼ Microwaves, electric ovens and ranges, toasters, coffee makers, electric tea kettles, electric can
openers, …
◼ Primary refrigerator, secondary refrigerator, freezer, wine cooler…
◼ Clothes washers and dryers, …
◼ Dishwashers, garbage disposals, garbage compactors, …
◼ Space heating and air conditioning, ceiling fans, room fans, whole house fans, humidifiers, …
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-1
◼ Pool pump, spa heater, …
◼ Alarm systems, internet systems, garage door openers, vacuum cleaners, …
◼ Power tools, power tool chargers, electric lawn and garden equipment, …
◼ Car battery trickle chargers, engine block heating, electric vehicle chargers, golf cart chargers, …
With this dizzying array of equipment, the idea that we can somehow accurately forecast the load of an
individual household is daunting. In fact, forecasting the loads of an individual home is a hard task because
we will never have all the information we need to predict when someone is going to turn on the TV, the
lights in the guest bedroom, run the dishwasher, microwave some popcorn, plug in their phone to charge,
or any of dozens of actions they can take that will use electricity. Fortunately, in the world of short-term
operational forecasting, we usually forecast an aggregation of loads across dozens, if not millions, of
households. At the aggregate, the hourly load nuances of an individual house are averaged away leaving
a relatively smooth and predictable load pattern.
In general, the aggregate residential hourly load pattern can be decomposed into six major components.
◼ Calendar Conditions. This component captures the daily load variation associated with people
going to and from work, going to and from school, celebrating holidays, doing household chores,
and other repeatable behaviors or actions that impact electricity consumption.
◼ Solar Conditions. When the sun rises and sets will have a big influence on lighting loads. In
addition, air conditioning loads can be impacted by prevailing solar conditions via internal
temperature gains driven by sunlight penetrating the house core.
◼ Weather Conditions. Space heating and cooling is driven by prevailing weather conditions.
◼ Economic Conditions. The impact of overall economic conditions influences electricity
consumption by impacting available income levels.
◼ Distributed Energy Resources. The introduction of rooftop solar PV, demand response, other on-
site generation and, in the future, on-site storage, if not metered separately from the whole house
load, impacts what is measured as the demand for power. As a result, historical load patterns can
change significantly with the introduction of distributed energy resources.
◼ Electric Vehicle Charging. Electric vehicle charging is a new type of load pattern that is anticipated
to impact evening and night time loads.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-2
5.1 CALENDAR CONDITIONS
This section presents explanatory variables designed to capture load variation associated with calendar
conditions. The work horse of explanatory variable functional forms is the binary variable. A binary
variable has one of two values. Traditionally, the values are 1.0 if the item being modeled is active and
0.0 otherwise. For example, we can construct a Monday binary variable that will take on a value of 1.0 if
the observation falls on a Monday and 0.0 otherwise.
Is it possible for a binary variable to take on values other than 1.0 and 0.0? Statistically, any combination
of two values would work, but what is nice about the use of 1.0 and 0.0 is that it eases the interpretation
of the estimation coefficient attached to the binary variable. For example, in a model that includes just
seven day-of-the-week binary variables defined on a scale of 1.0 and 0.0 (i.e., Monday, Tuesday,
Wednesday, Thursday, Friday, Saturday and Sunday), the estimated coefficient on the Monday binary
variable would be equal to the average Monday load.
The following are examples of binary variables commonly used in load forecast models.
Day-of-the-Week Binary Variables. These variables capture the average load variation across the days-
of-the-week.
Figure 5-1 and Figure 5-2 present examples of the in-sample model fit when just day-of-the-week binary
variables are used. Both figures show that the predicted values from the estimated models follow a fixed
and repeatable weekly pattern. While the day-of-the-week binary variables capture the average variation
of loads across the week, they do not capture the seasonal swings in loads.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-3
FIGURE 5-1. DAYTON POWER & LIGHT: DAY-OF-THE-WEEK BINARY VARIABLES
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-4
FIGURE 5-2. COMMONWEALTH EDISON: DAY-OF-THE-WEEK BINARY VARIABLES
Day Type Binary Variables. These variables capture the average load variation between normal working
days and non-working days.
Weekday: takes on a value of 1.0 if the day is a normal work day (e.g., Monday, Tuesday,
Wednesday, Thursday, Friday) and not a holiday, 0.0 otherwise
Weekend: takes on a value of 1.0 if the day is a normal non-work day (e.g., Saturday, Sunday) or
a holiday, 0.0 otherwise
TWT: takes on a value of 1.0 if the day is a Tuesday, Wednesday, Thursday and not a holiday, 0.0
otherwise
Figure 5-3 and Figure 5-4 present examples of the in-sample model fit when the day type variables Wkday
(Weekdays) and Weekend (Weekend days) are used. Both figures show that the predicted values from
the estimated models follow a fixed and repeatable weekly pattern. A comparison of the model fits to
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-5
the day-of-the-week models shows how this simple day type model specification is a constrained version
limited to two types of shapes, a weekday shape (which is the average shape across all weekdays) and the
weekend shape (which is the average shape across all Saturdays and Sundays). In contrast, the day-of-
the-week model produces seven shapes, one for each day-of-the-week. Both models fail to capture
seasonal swings in loads.
FIGURE 5-3. DAYTON POWER & LIGHT: DAY TYPE WEEKDAY VERSUS WEEKEND BINARY VARIABLES
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-6
FIGURE 5-4. COMMONWEALTH EDISON: DAY TYPE WEEKDAY VERSUS WEEKEND BINARY VARIABLES
Month Binary Variables. These variables capture the average load variation across months of the year.
January: takes on a value of 1.0 if the day is in the month of January, 0.0 otherwise
February: takes on a value of 1.0 if the day is in the month of February, 0.0 otherwise
March: takes on a value of 1.0 if the day is in the month of March, 0.0 otherwise
April: takes on a value of 1.0 if the day is in the month of April, 0.0 otherwise
May: takes on a value of 1.0 if the day is in the month of May, 0.0 otherwise
June: takes on a value of 1.0 if the day is in the month of June, 0.0 otherwise
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-7
July: takes on a value of 1.0 if the day is in the month of July, 0.0 otherwise
August: takes on a value of 1.0 if the day is in the month of August, 0.0 otherwise
September: takes on a value of 1.0 if the day is in the month of September, 0.0 otherwise
October: takes on a value of 1.0 if the day is in the month of October, 0.0 otherwise
November: takes on a value of 1.0 if the day is in the month of November, 0.0 otherwise
December: takes on a value of 1.0 if the day is in the month of December, 0.0 otherwise
Figure 5-5 and Figure 5-6 present examples of the in-sample model fit when the monthly binary variables
are used. Both figures show that the predicted values from the estimated models follow a fixed and
repeatable monthly pattern. A comparison of the model fits to the day-of-the-week and day type model
specifications shows that the load shape is the same across all days in a month but varies by month. The
monthly model does a good job at capturing the seasonal swing in loads, but does not capture the day-
of-the-week load variation.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-8
FIGURE 5-5. DAYTON POWER & LIGHT: MONTHLY BINARY VARIABLES
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-9
FIGURE 5-6. COMMONWEALTH EDISON: MONTHLY BINARY VARIABLES
Season Binary Variables. These variables capture the average load variation across seasons.
Winter: takes on a value of 1.0 if the day is in a winter month, 0.0 otherwise
Spring: takes on a value of 1.0 if the day is in a spring month, 0.0 otherwise
Summer: takes on a value of 1.0 if the day is in a summer month, 0.0 otherwise
Fall: takes on a value of 1.0 if the day is in a fall month, 0.0 otherwise
Figure 5-7 and Figure 5-8 present examples of the in-sample model fit when the seasonal binary variables
are used. Both figures show that the predicted values from the estimated models follow a fixed and
repeatable seasonal pattern. A comparison of the model fits to the day-of-the-week and day type model
specifications shows that the load shape is the same across all days in a season but varies by season. The
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-10
seasonal model does a good job at capturing the seasonal swing in loads but does not capture the day-of-
the-week load variation.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-11
FIGURE 5-8. COMMONWEALTH EDISON: SEAONAL BINARY VARIABLES
Month Interaction Terms with Day-of-the-Week Variables. These variables capture the average load
variation across months/seasons by day-of-the-week. The reason there is not a separate set of interaction
terms for Sundays is because we will include in the model the monthly binary variables. The estimated
coefficient on the monthly binaries will represent the average Sunday load for each month. The estimated
coefficients on the following interactions terms represent the load difference relative to the Sunday base.
Note the choice of leaving Sunday out is purely for aesthetic reasons. On average Sunday loads are lower
than the other day-of-the-week loads. As a result, the estimated parameters on the day-of-the-week
interaction terms will tend to be positive, meaning on average the load is higher than the corresponding
Sunday load. In terms of model performance, it does not matter which day-of-the-week is set aside. The
model performance will be identical regardless of which day-of-the-week is set aside.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-12
JanuaryWednesday: January binary variable x Wednesday binary variable
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-13
AprilFriday: April binary variable x Friday binary variable
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-14
AugustMonday: August binary variable x Monday binary variable
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-15
NovemberWednesday: November binary variable x Wednesday binary variable
Figure 5-9 and Figure 5-10 present examples of the in-sample model fit when the month, day-of-the-week
interaction binary variables are used. Both figures show that the predicted values from the estimated
models follow a monthly pattern, but within each month there is a weekly load pattern.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-16
FIGURE 5-9. DAYTON POWER & LIGHT: MONTH/DAY-OF-THE-WEEK INTERACTION BINARY VARIABLES
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-17
FIGURE 5-10. COMMONWEALTH EDISON: MONTH/DAY-OF-THE-WEEK INTERACTION BINARY VARIABLES
Season Interaction Terms with Day-of-the-Week Variables. These variables capture the average load
variation across seasons by day-of-the-week. The reason there is not a separate set of interaction terms
for Sundays is because we will include in the model the seasonal binary variables. The estimated
coefficient on the seasonal binaries will represent the average Sunday load for each season. The
estimated coefficients on the following interactions terms represent the load difference relative to the
Sunday base. Like the month interaction terms, it does not matter which day-of-the-week is set aside.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-18
WinterSaturday: Winter binary variable x Saturday binary variable
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-19
Figure 5-11 and Figure 5-12 present examples of the in-sample model fit when the season, day-of-the-
week interaction binary variables are used. Both figures show that the predicted values from the
estimated models follow a seasonal pattern but within each season there is a weekly load pattern.
FIGURE 5-11. DAYTON POWER & LIGHT: SEASON/DAY-OF-THE-WEEK INTERACTION BINARY VARIABLES
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-20
FIGURE 5-12. COMMONWEALTH EDISON: SEASON/DAY-OF-THE-WEEK INTERACTION BINARY VARIABLES
Holiday Variables. These variables capture the average load change relative to a non-holiday day. The
modeling challenge with holidays is the behavior of residential, commercial, and industrial customers
depends on the holiday. For example, in the United States the behavior of retail stores on Christmas is
different than Memorial Day. It is not unusual for most retail stores to be closed for a portion of Christmas
while most retail stores remain open on Memorial Day. Other modeling challenges arise with holidays
that do not fall on the same day of the week year over year, but rather fall on a specific calendar day.
Holidays such as Christmas, which occurs on December 25 but could fall on any of the seven days of the
week, are harder to predict the load impact than, say, a bank holiday that always falls on the first Monday
of August. The challenge with a holiday like Christmas is the potential for a holiday spillover effect on the
day(s) leading into and out of the holiday. For example, the load impact of a Christmas that lands on a
Sunday may spillover over to the following Monday. On the other hand, a Christmas that lands on a
Tuesday may impact loads on the Monday before and potentially the following Wednesday.
For holidays that fall on the same day of the week year after year can be modeled with a binary variable
that takes on a value of 1.0 if that day is the holiday, 0.0 otherwise. Examples of these types of holiday
binary variables include the following.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-21
MemorialDay: takes on a value of 1.0 if the day is Memorial Day in the U.S., 0.0 otherwise
MayBankHoliday: takes on a value of 1.0 if the day is the first Monday in May, 0.0 otherwise
AugustBankHoliday: takes on a value of 1.0 if the day is the first Monday in August, 0.0 otherwise
It should be noted that each holiday is treated as a separate binary variable. This is because the underlying
behavior (e.g., retail stores are open versus closed) and prevailing weather conditions vary by holiday
type. This means a holiday that lands in August will capture the load reduction associated with less air
conditioning. Conversely, a holiday that lands in November will capture the load reduction associated
with less space heating. These impacts will be very different. If, instead of allowing each holiday to be a
separate binary variable, we used a single holiday binary variable that took on a value of 1.0 whenever
there is a holiday and 0.0 otherwise, we would constrain the load impact to be the same across all
holidays. That means the estimated coefficient on the holiday variable will be the average impact across
bank holidays in May, bank holidays in August, Christmas, etc. In effect, we are mixing impacts of hot days
and cold days, holidays where retail stores are open, and holidays when retail stores are closed. Consider
the following four holidays.
Now consider using a single holiday binary variable that takes on a value of 1.0 if the day is May 30, August
4, November 13, or February 14. This gives the following simple model:
Loadd = βHolidayd
In this case, the estimated coefficient for the holiday binary variable is going to be equal to the average
load across the four days, β̂ = 1625.
The alternative approach is to create four separate holiday binary variables, one for each holiday. In this
case, the load forecast model looks like:
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-22
In this case, the estimated coefficients will be:
β̂1 = 1300
β̂2 = 2500
β̂3 = 1800
β̂4 = 900
As can be seen, the constrained version of the holiday binary variable provides a single estimate of the
holiday impact, which in this case leads to under forecasting of August 4 and November 13, and over
forecasting of May 30 and February 14. This simple example illustrates why we prefer separate holiday
binary variables.
School Holiday Variables. In some locations there is a noticeable change in aggregate loads between days
when primary and secondary schools are in session versus when the they are out of session. In these
cases, we construct a set of school holiday binary variables that capture the load shift that occurs when
schools are out of session. For example, let the school schedule for a year be defined as follows:
WinterSchoolHoliday: takes on a value of 1.0 if the day is from December 15 through January 15,
0.0 otherwise
SpringSchoolHoliday: takes on a value of 1.0 if the day is from April 1 through April 30, 0.0
otherwise
SummerSchoolHoliday: takes on a value of 1.0 if the day is from July 1 through August 15, 0.0
otherwise
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-23
FallSchoolHoliday: takes on a value of 1.0 if the day is from October 1 through October 15, 0.0
otherwise
DayLightSavingsObservance = 1.0 if date falls within daylight saving time, 0.0 otherwise
Raw Sunrise = Hour and minute that the sunrises not adjusted for daylight saving time
Raw Sunset = Hour and minute that the sunsets not adjusted for daylight saving time
Minutes_of_Light = [(Sunset Hour – Sunrise Hour) * 60] + [60 – Sunrise Minute] + [Sunset Minute]
Fraction of Hour Ending 06:00 that was Dark = MIN(MAX(Time of Sunrise – 5,0),1)
Fraction of Hour Ending 07:00 that was Dark = MIN(MAX(Time of Sunrise – 6,0),1)
Fraction of Hour Ending 18:00 that was Dark = MIN(MAX(18 - Time of Sunset,0),1)
Fraction of Hour Ending 19:00 that was Dark = MIN(MAX(19 - Time of Sunset,0),1)
Temperature Bin Variables. The most straightforward way of estimating the non-linear response
between loads and weather is the use of temperature bin variables. As an example, let the observed
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-24
average daily temperatures range between -20 and 35° Celsius. We can define five-degree temperature
bins as follows:
Bin2 = 1.0 if -15 <= average temperature < -10, 0.0 otherwise
Bin3 = 1.0 if -10 <= average temperature < -5, 0.0 otherwise
The temperature bin variables would then be placed in a regression such as:
10
j
Loadd = ∑ βj Bind + ed
j=1
In this case, the estimated values for the temperature bin variables (βj ) would be the average load when
temperatures fall within that bin range. Together these estimated coefficients form the nonlinear
response between loads and temperatures. The estimated weather response function is then given by
the following equation:
10
j
EstimatedWeatherResponsed = ∑ β̂j Bind
j=1
To allow the weather response to be different between weekdays and weekends, an set of weather bin
variables that are defined as follows can be added to the model.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-25
Bin1Weekend = Bin1 * Weekend binary variable
10 20
j j
Loadd = ∑ βj Bind + ∑ βj BinWeekdnd + ed
j=1 j=11
An example of the bin-based estimated weather response function without and with weekend offsets for
the Dayton Power & Light and Commonwealth Edison loads are shown in Figure 5-13 and Figure 5-16.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-26
FIGURE 5-13. ESTIMATED BIN WEATHER RESPONSE FUNCTION FOR DAYTON POWER & LIGHT
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-27
FIGURE 5-14. ESTIMATED BIN WEATHER RESPONSE WITH WEEKEND OFFSET FOR DAYTON POWER & LIGHT
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-28
FIGURE 5-15. ESTIMATED BIN WEATHER RESPONSE FUNCTION FOR COMMONWEALTH EDISON
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-29
FIGURE 5-16. ESTIMATED BIN WEATHER RESPONSE WITH WEEKEND OFFSET FOR COMMONWEALTH EDISON
The temperature bin approach described above represents a nonparametric estimate of the nonlinear
response. It is nonparametric in that a functional form like a third order polynomial is not imposed. The
advantage of a nonparametric approach is that the estimated response function is data driven. The
disadvantage is in the number of bin variables that need to be constructed. This is further complicated if
it is desired to estimate a separate response function between weekdays and weekend days. In this case,
10 additional temperature bin variables interacted with a weekend binary variable would need to be
created and then added to the regression. That would result in a total of 20 variables to estimate the
weather response. Each additional interaction term would add another 10 explanatory variables. It would
be easy to quickly have more explanatory variables than observations.
Another challenge with the nonparametric approach occurs at the boundary between two bins. For
example, consider the weekday bins between 20° and 25° and 25° plus for CE. If the temperature forecast
is for 25°, the load forecast would be a little under 11,000 MWh. If the temperature forecast is for 26°,
the load forecast jumps up to a little over 13,000 MWh. This knife edge difference between 25° and 26°
will lead to load forecasts that can jump significantly from one iteration to the next. An approach that
allows the load to vary slightly with each degree change in temperature would generate a better load
forecast. This leads to the following parametric functions to estimate the nonlinear weather response
function.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-30
Temperature Spline Variables. Temperature spline variables are like the temperature bin variables in
that separate variables are designed to handle different temperature ranges. There are two types of
spline variables: uncapped and capped. An example of uncapped spline temperature variables follows:
Here, HDD stands for heating degree day and CDD stands for cooling degree day. In this example, we
define two HDD variables—one that takes on positive values when temperatures fall below 65° and the
second that takes on positive values when temperatures fall below 45°. The temperature values of 65
and 45 are determined by the forecast analyst and are commonly referred to as temperature cut points.
The HDD variables are intended to capture the increase in electric space heating when temperatures drop.
In addition, two CDD variables are defined—one that takes on positive values when temperature go above
65° and the second that takes on positive values when temperatures go above 85°. The two CDD variables
are designed to capture the load increase associated with air conditioning loads. A simple regression
specification that would use these variables is as follows:
In this case, the estimated coefficients on the CDD spline variables represent the increase in load
associated with a one degree increase in temperatures. In a similar fashion, the estimated coefficients on
the HDD spline variables represent the increase in load associated with a one degree decrease in
temperatures.
The estimated weather response function is then given by the following equation:
Estimated Weather Responsed = β̂0 + β̂1 HDD2d + β̂2 HDD1d + β̂3 CDD1d + β̂4 CDD2d
The order that the explanatory variables are placed in a regression model does not matter. We have
written this equation with the extremely cold and hot temperature splines on opposite ends for purposes
of optics only.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-31
Capped Spline Variables. Examples of capped spline temperature variables are:
In this case, the inner temperature splines (CappedHDD1 and CappedCDD1) are capped by the difference
in the base and extreme cut points. The outer splines remain uncapped to handle the case when
forecasted temperatures fall outside of the range of temperatures used in model estimation. A simple
regression specification that would use the capped spline variables is as follows:
The estimated weather response function is then given by the following equation:
Weekend interaction terms can be introduced in a fashion like the way it was done with the bin approach.
These interaction terms will allow the estimated weather response to differ between weekdays and
weekend days. A simple regression specification that includes weekend interactions is as follows:
Note the addition of the weekend binary variable as a separate explanatory variable allows the intercept
term to differ between weekdays and weekend days. As a result, the estimated average load for a
weekday is free to be different than the estimated average load for a weekend day.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-32
The estimated weather response function for a weekend day is given by:
In this case, the estimated weekend day response is the sum of the weekday response plus the weekend
intercept and weather slope offsets. For example, the average weekend day load is equal to:
Here, the estimated parameter, β̂5 , represents how much weekend day loads are different on average
than weekday loads.
Relative to the bin approach, the use of temperature spline variables reduces the number of explanatory
variables substantially. Further, by allowing the outer spline variables to be uncapped, the estimated
model will produce a reasonable forecast when forecasted temperatures lie outside of the in-sample data
range.
Examples of the estimated capped and uncapped weather response functions with weekend offsets for
the Dayton Power & Light and Commonwealth Edison loads are shown in Figure 5-17 through Figure 5-20.
As can been seen, there is little to no differences between the capped and uncapped approaches.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-33
FIGURE 5-17. ESTIMATED UNCAPPED SPLINE WEATHER RESPONSE WITH WEEKEND OFFSET FOR DAYTON POWER
& LIGHT
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-34
FIGURE 5-18. ESTIMATED CAPPED SPLINE WEATHER RESPONSE WITH WEEKEND OFFSET FOR DAYTON POWER &
LIGHT
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-35
FIGURE 5-19. ESTIMATED UNCAPPED SPLINE WEATHER RESPONSE WITH WEEKEND OFFSET FOR
COMMONWEALTH EDISON
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-36
FIGURE 5-20. ESTIMATED CAPPED SPLINE WEATHER RESPONSE WITH WEEKEND OFFSET FOR COMMONWEALTH
EDISON
Polynomial Temperature Variables. The challenge with the temperature spline approach is twofold.
First, the forecast analyst needs to decide how many cut points to use and at what values should they be
set. Trial and error can be used to find a good balance between the number of cut points and their
associated values. Alternatively, the reflection points from an estimated weather response function using
a neural network model can be used to determine the number of cut points and their associated values.
Second, linear splines provide a robust but constrained estimate of the weather response function. It is
robust in that linear splines do a good job in cutting through noisy load data to capture the underlying
weather response. It is constrained in that the response is linear between one cut point to the next. In
some cases, like individual customer modeling, this is a desirable outcome that leads to stable smooth
load forecasts. In other cases, like large system load modeling, the added precision of a more flexible
functional form can significantly reduce load forecast errors. We next introduce a functional form that is
more flexible than temperature splines.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-37
A common nonlinear functional form is a polynomial of order p. For estimating the nonlinear response
between loads and temperatures, we find that a third or possibly fourth order polynomial is sufficient. A
simple regression with a third polynomial can be written as follows:
The estimated polynomial expression will span both the heating and cooling side of the weather response
function. The estimated weather response function is given by:
Examples of the estimated polynomial weather response functions with weekend offsets for the Dayton
Power & Light and Commonwealth Edison loads are shown in Figure 5-21 and Figure 5-22.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-38
FIGURE 5-21. ESTIMATED POLYNOMIAL WEATHER RESPONSE WITH WEEKEND OFFSET FOR DAYTON POWER &
LIGHT
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-39
FIGURE 5-22. ESTIMATED POLYNOMIAL WEATHER RESPONSE WITH WEEKEND OFFSET FOR COMMONWEALTH
EDISON
Neural Network Weather Response. In Chapter 5 we introduced neural network models. The specific
model specification described utilizes sigmoid activation functions. A neural network model specification
that mirrors the polynomial regression can be written as:
Here, the explanatory variables, Hd1 and Hd2 , are two sigmoid nodes in the hidden layer. These sigmoid
nodes can be written as:
1
Hd1 =
−(α0 +α1 Temperatured )
1+e
1
Hd2 =
−(δ0 +δ1 Temperatured )
1+e
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-40
A neural network model with weekend intercept and weather slope offsets would be written as:
Where the two sigmoid nodes in the hidden layer are written as:
1
Hd1 =
−(α0 +α1 Weekendd +α2 Temperatured )
1+e
1
Hd2 =
−(δ0 +δ1 Weekendd +δ2 Temperatured )
1+e
Here, the left-hand side is mathematically equivalent to the right-hand side. In this case, if the estimated
parameters are non-zero then temperatures and the weekend binary mathematically interact. As a result,
we do not need to include the interaction terms explicitly in the specification of the sigmoid nodes.
Examples of the estimated neural network weather response functions with weekend offsets for the
Dayton Power & Light and Commonwealth Edison loads are shown in Figure 5-23 and Figure 5-24.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-41
FIGURE 5-23. ESTIMATED NEURAL NET WEATHER RESPONSE WITH WEEKEND OFFSET FOR DAYTON POWER &
LIGHT
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-42
FIGURE 5-24. ESTIMATED NEURAL NET WEATHER RESPOSNE WITH WEEKEND OFFSET FOR COMMONWEALTH
EDISON
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-43
do capture the timing of significant weather-driven load changes. In most cases, a combination of
coincident and daily weather summaries works well for all but significant weather events.
Here are a couple of things to keep in mind about coincident weather data. Except for precipitation,
hourly meteorological measurements represent the value at the time the measurement is taken. For
example, the temperature value for 8AM could represent the temperature measurement made at
7:55AM. That value may or may not be representative of the temperatures that prevailed over the hour
ending at 8AM. It could be the case that temperature started at 55° at 07:00 and rose steadily by 5° every
15 minutes until the measurement of 72° at the measurement point of 7:55AM. The prevailing
temperatures over the hour from 7AM to 8AM averaged 64.4°. If the load data represents the
consumption for the period 07:00 to 08:00, using the coincident temperature measured at 7:55AM of 72°
overstates how warm the temperatures were during that hour. Aligning weather measurements with
load consumption is critical when using coincident weather data. The second thing to consider is how
does the weather service provider handle time changes associated with observance of daylight saving
time. If the weather service provider does not shift the weather data with the observance of day light
saving time, then it is important to design coincident weather variables that do shift.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-44
coincident and daily summary weather variables. These in turn are used to construct the temperature
spline and polynomial explanatory variables used in the models. Listed below are common composite
variables that bind temperatures with some measure of humidity. Humidity matters because air
conditioners work harder to cool a space when the air is humid. As a result, a hot dry day (e.g., 85° with
55% relative humidity) will have a lower air conditioning load than a hot, humid day (e.g., 85° with 95%
relative humidity).
NOAA Heat Index.17 The National Oceanic and Atmospheric Administration (NOAA) index combines
temperatures with relative humidity.
F F % F %
HId,h = −42.279 + 2.04901523Td,h + 10.14333127RHd,h − 0.22475541Td,h RHd,h
F 2 % 2 F % 2
− 0.00683783Td,h − 0.05481717RHd,h + 0.00122874Td,h RHd,h
F % 2 F % 2 2
+ 0.0085282Td,h RHd,h − 0.00000199Td,h RHd,h
Where,
F
HId,h is the NOAA heat index in degree Fahrenheit for day (d) and hour (h)
F
Td,h is the dry-bulb temperature in degrees Fahrenheit for day (d) and hour (h)
%
RHd,h is relative humidity in percentage for day (d) and hour (h)
%
F F
RHd,h F
THId,h = Td,h − [0.55 − (0.55 × )] × (Td,h − 58)
100
Where,
F
THId,h is the temperature-humidity index in degrees Fahrenheit for day (d) and hour (h)
F
Td,h is the dry-bulb temperature in degrees Fahrenheit for day (d) and hour (h)
17 http://www.srh.noaa.gov/images/ffc/pdf/ta_htindx.pdf
18 http://www.progressivedairy.com/dairy-basics/cow-comfort/12307-how-do-i-determine-how-do-i-calculate-
temperature-humidity-index-thi
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-45
%
RHd,h is relative humidity in percentage for day (d) and hour (h)
Carl Schoen Heat Index.19 The Carl Schoen heat index combines dry-bulb temperatures with dew point
temperatures. The formula is:
C C C (0.0801(DPC
d,h −14))
HId,h = Td,h − (1.0799 × e(0.03755Td,h) ) × [1 − e ]
Where,
C
HId,h is the Carl Schoen heat index in degrees Celsius for day (d) and hour (h)
C
Td,h is the dry-bulb temperature in degrees Celsius for day (d) and hour (h)
C
DPd,h is the dew point temperature in degrees Celsius for day (d) and hour (h)
Summer Simmer Index.20 The Summer Simmer Index combines dry-bulb temperatures and relative
humidity. The formula is:
F F % F
SSId,h = 1.98(Td,h − [0.55 − 0.0055RHd,h ][Td,h − 58])
Where,
F
SSId,h is the Summer Simmer Index in degrees Fahrenheit for day (d) and hour (h)
F
Td,h is the dry-bulb temperature in degrees Fahrenheit for day (d) and hour (h)
%
RHd,h is relative humidity in percentage for day (d) and hour (h)
C C hPa m/s
ATd,h = Td,h + 0.33WVPd,h − 0.7WSd,h − 4.0
19 Schoen, Carl. “A New Empirical Model of the Temperature-Humidity Index”, Journal of Applied Meteorology
and Climatology, 44, 1413-1420, September 2005.
20 http://summersimmer.com/
21 http://www.bom.gov.au/products/IDS65004.shtml
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-46
Where,
C
ATd,h is the Australian apparent temperature in degrees Celsius for day (d) and hour (h)
C
Td,h is dry-bulb temperature in degrees Celsius for day (d) and hour (h)
hPa
WVPd,h is water vapour pressure in hPA for day (d) and hour (h)
m/s
WSd,h is wind speed (meters/second) for day (d) and hour (h)
Further,
% 17.27TC
d,h
RHd,h ( ⁄
[237.7+TC
)
hPa
WVPd,h = × 6.105 × e d,h ]
100
Canadian Humidex.22 The Canadian humidex combines dry-bulb and dew point temperatures. The
formula is:
1
(5417.7530×[(273.16)−(1⁄ )])
C C (DPC
d,h +273.16)
Humidexd,h = Td,h + [0.5555 × ([6.11 × e ] − 10)
Where,
C
Humidexd,h is the Canadian humidex in degrees Celsius for day (d) and hour (h)
C
Td,h is dry-bulb temperature in degrees Celsius for day (d) and hour (h)
C
DPd,h is the dew point temperature in degrees Celsius for day (d) and hour (h)
22 Masteron, J.M. and F.A. Richardson. “Humidex: A Method of Quantifying Human Discomfort due to Excessive
Heat and Humidity”, Environment Canada, Atmospheric Environment Service, Ontario, Canada, 1979
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-47
5.6 INCORPORATING WIND SPEED
The heat indices presented above work well for capturing the impact of temperatures and humidity on air
conditioning loads. The next series of indices are designed to address the impact of temperatures and
wind speed on space heating loads.
Environment Canada Wind Chill (Celsius). Environment Canada’s wind chill combines temperatures and
wind speed. The formula is:
Where,
C
WCId,h is Environment Canada’s wind chill in degrees Celsius for day (d) and hour (h)
C
Td,h is dry-bulb temperature in degrees Celsius for day (d) and hour (h)
km/h
WSd,h is wind speed (kilometers/hour) for day (d) and hour (h)
Environment Canada Wind Chill (Fahrenheit). Environment Canada’s wind chill combines temperatures
and wind speed. The formula is:
Where,
F
WCId,h is Environment Canada’s wind chill in degrees Fahrenheit for day (d) and hour (h)
F
Td,h is dry-bulb temperature in degrees Fahrenheit for day (d) and hour (h)
mph
WSd,h is wind speed (miles/hour) for day (d) and hour (h)
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-48
data and form the weather variables using the average weather. A word of caution…if you decide to use
weighted weather, then it is recommended to do the following.
1. First, form all the weather variables you need separately using the weather data for each weather
station. For example, create the HDD and CDD variables for each station.
2. Form the weighted weather variables by weighting the weather variables from step 1 across the
weather stations.
Why is this approach recommend? Consider two weather stations with temperatures of 60° for station 1
and 70° for station 2. Further, assume the cut point for the heating and cooling degree day variables is
65°. Averaging the temperatures results in an average temperature of 65°. The HDD and CDD variables
based on the average temperature are:
In other words, the average temperature across the two stations implies no heating or cooling takes place.
If we follow the above procedure, we first compute HDD and CDD variables for each station and then
average the results. This gives:
The average HDD and CDD variables that go into the model are then HDD=2.5 and CDD=2.5. In this case,
some space heating and space cooling took place.
The point is that any averaging of weather data across stations will flatten the data. To avoid this, it is
best to compute the weather variables by station and then average the results.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-49
5.2 COMPUTING A WEIGHTED AVERAGE WIND SPEED
Often, when we are developing weather variables, we introduce wind speed either as a standalone
explanatory variable or as a contributing term in a wind chill formula. An example of the later is the
Environment Canada wind chill. In cases where you have data for a single weather station, the calculations
are straightforward. But what is the right wind speed to include in the wind chill formula when the wind
speed data are derived from a weighted average of two or more weather stations? It turns out that a
simple average of the wind speeds across the weather stations can lead to very misleading results.
Consider two weather stations: the first station is blowing 20 mph with a wind direction of northeast
(45°), and the second station is blowing 20 mph with a wind direction of northwest (315°). A simple
average of the two winds speeds is 20 mph. While it is tempting to use the 20 mph as the wind speed for
the weighted average of the two stations, it would not represent what is really happening. Specifically,
the 20-mph wind blowing northeast would be working against the 20-mph wind blowing northwest,
leaving a weighted average wind speed of 14.14 mph.
We can extend this idea to wind direction. Let us say we know that the load impact of storms flowing in
from the northwest has a different signature from storms flowing in from the northeast. We would like
to use this information by interacting the wind speed variable with a wind direction variable. In the simple
example presented above, the weighted average wind direction would be 180° (computed as [45° +
315°]/2). In other words, the weighted average wind direction would be due south, which is clearly wrong.
The weighted average wind direction is due north (0°).
How did we derive the true weighted average wind speed and wind direction? Below are the steps to
take to compute a weighted average wind speed and wind direction. The calculations will use the wind
speed and wind direction data in the following table.
The formula for converting wind direction from degrees to radians is:
π
Radians = Degrees ×
180
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-50
Where, π is the numerical constant PI
The results of the conversion from degrees and radians is shown in the following table.
Step 2. For Each Station, Compute the East-West and North-South Vector
Here, the weather stations are indexed by (s) and the time interval is indexed by (i). You would compute
the east-west and north-south vectors separately for each hour of wind speed and wind direction data.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-51
S
Where,
StationWgts s is the user inputted weather station weight for weather station (s). These
calculations assume that the sum of the weather station weights equals 1.0. If not, then the
weights need to be normalized to 1.0 prior to this step. In the example, each station is assigned
a weight of 25%.
WeightedAverageWindSpeedi
= SQRT(EastWestVectorWeightedSum2 + NorthSouthVectorWeightedSum2 )
a. First compute the arch tangent given the east-west and north-south vectors from Step 3 above.
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-52
EastWestVectorWeightedSumi
WeightedWindDirectionRadiansi = ATAN ( )
NorthSouthVectorWeightedSumi
EastWestVectorWeightedSumi
WeightedWindDirectionRadiansi = ATAN ( )
0.0001 + NorthSouthVectorWeightedSumi
This alternative formula controls for possible errors associated with dividing by 0.0.
180
WeightedWindDirectionDegreesi = WeightedWindDirectionRadiansi ×
π
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-53
6 ALTERNATIVE MODEL SPECIFICATIONS
This chapter presents a set of alternative model specifications. The specifications range from the most
basic model of computing the overall average of the loads to models that allow the load shapes to vary
by combination of day-of-the-week and month and weather. Logical extensions of these specifications
include:
For very near-term forecast horizons of one day or less, the model specifications can be extended to
include autoregressive terms. These terms can be in the form of:
◼ Prior hour loads (e.g., load at 8AM is a function of loads at 7AM, 6AM, and 5AM), or
◼ Prior day loads (e.g., load at 8AM is a function of 8AM loads from the prior seven days)
A Practitioner’s Guide to Short-term Load Forecast Modeling Alternative Model Specifications |6-1
This list of alternative model specifications is not intended to be an exhaustive list of all possible model
specifications. Consider them as a launching point for your own models.
Where,
Loadd,h hourly load data for day (d) and hour (h)
β0h is the coefficient for the Intercept d variable for hour (h)
Loadd,h = {β0h Intercept d } + {β1h HDDd + β2h CDDd + β3h HDDd Weekendd + β4h CDDd Weekendd
+ β5h LagHDDd + β6h LagCDDd + β7h LagHDDWeekendd + β8h LagCDDWeekendd }
Where,
LagHDDd is the weighted average of the prior two days of heating degree day for days where
the weights are 0.75 on the one-day lag and 0.25 on the two-day lag
LagCDDd is the weighted average of the prior two days of cooling degree day for days where the
weights are 0.75 on the one-day lag and 0.25 on the two-day lag
HDDd Weekendd is the heating degree day interacted with a weekend binary variable
CDDd Weekendd is the cooling degree day interacted with a weekend binary variable
LagHDDWeekendd is the lagged heating degree day interacted with a weekend binary variable
LagCDDWeekendd is the lagged cooling degree day interacted with a weekend binary variable
Loadd,h = β0h Intercept d + β1h Sundayd + β2h Mondayd + β3h Tuesdayd + β4h Thursdayd
+ β5h Fridayd + β6h Saturdayd
Where,
Sundayd is a binary variable that takes on a value of 1.0 if the day (d) is a Sunday, otherwise 0.0
Mondayd is a binary variable that takes on a value of 1.0 if the day (d) is a Monday, otherwise 0.0
Tuesdayd is a binary variable that takes on a value of 1.0 if the day (d) is a Tuesday, otherwise 0.0
Thursdayd is a binary variable that takes on a value of 1.0 if the day (d) is a Thursday, otherwise 0.0
Fridayd is a binary variable that takes on a value of 1.0 if the day (d) is a Friday, otherwise 0.0
Saturdayd is a binary variable that takes on a value of 1.0 if the day (d) is a Saturday, otherwise 0.0
Loadd,h = {β0h Intercept d + β1h Sundayd + β2h Mondayd + β3h Tuesdayd + β4h Thursdayd
+ β5h Fridayd + β6h Saturdayd } + {β7h HDDd + β8h CDDd + β9h HDDd Weekendd
+ β10 11 12
h CDDd Weekendd + βh LagHDDd + βh LagCDDd
+ β13 14
h LagHDDWeekendd + βh LagCDDWeekendd }
Loadd,h = {β0h Intercept d + β1h Sundayd + β2h Mondayd + β3h Tuesdayd + β4h Thursdayd
+ β5h Fridayd + β6h Saturdayd } + {β7h NightHDDd + β8h NightCDDd
+ β9h NightHDDd Weekendd + β10 h NightCDDd Weekendd
+ βh MorningHDDd + βh MorningCDDd + β13
11 12
h MorningHDDd Weekendd
+ βh MorningCDDd Weekendd + βh AfternoonHDDd + β16
14 15
h AftenoonCDDd
17 18
+ βh AfternoonHDDd Weekendd + βh AfternoonCDDd Weekendd
+ β19 20 21
h EveningHDDd + βh EveningCDDd + βh EveningHDDd Weekendd
23
+ β22
h EveningCDDd Weekendd + βh LagHDDd + βh LagCDDd
24
+ β25 26
h LagHDDWeekendd + βh LagCDDWeekendd }
Where,
NightHDDd is the heating degree day over the night TOU period (12AM to 5AM)
MorningHDDd is the heating degree day over the morning TOU period (6AM to 11AM)
AfternoonHDDd is the heating degree day over the afternoon TOU period (12PM to 5PM)
EveningHDDd is the heating degree day over the evening TOU period (6PM to 12AM)
NightCDDd is the cooling degree day over the night TOU period (12AM to 5AM)
MorningCDDd is the cooling degree day over the morning TOU period (6AM to 11AM)
EveningCDDd is the cooling degree day over the evening TOU period (6PM to 12AM)
Loadd,h = β0h Intercept d + β1h Januaryd + β2h Februaryd + β3h Marchd + β4h Mayd + β5h Juned
+ β6h Julyd + β7h August d + β8h Septemberd + β9h Octoberd + β10
h Novemberd
11
+ βh Decemberd
Where,
Januaryd is a binary variable that takes on a value 1.0 if the day (d) falls in January, otherwise 0.0
Februaryd is a binary variable that takes on a value 1.0 if the day (d) falls in February, otherwise
0.0
Marchd is a binary variable that takes on a value 1.0 if the day (d) falls in March, otherwise 0.0
Mayd is a binary variable that takes on a value 1.0 if the day (d) falls in May, otherwise 0.0
Juned is a binary variable that takes on a value 1.0 if the day (d) falls in June, otherwise 0.0
Julyd is a binary variable that takes on a value 1.0 if the day (d) falls in July, otherwise 0.0
August d is a binary variable that takes on a value 1.0 if the day (d) falls in August, otherwise 0.0
Septemberd is a binary variable that takes on a value 1.0 if the day (d) falls in September,
otherwise 0.0
Octoberd is a binary variable that takes on a value 1.0 if the day (d) falls in October, otherwise
0.0
Novemberd is a binary variable that takes on a value 1.0 if the day (d) falls in November,
otherwise 0.0
Loadd,h = β0h Intercept d + β1h Januaryd + β2h Januaryd Weekendd + β3h Februaryd
+ β4h Februaryd Weekendd + β5h Marchd + β6h Marchd Weekendd + β7h Aprild
+ β8h Aprild Weekendd + β9h Mayd + β10 11
h Mayd Weekendd + βh Juned
13 15
+ β12 14
h Juned Weekendd + βh Julyd + βh Julyd Weekendd + βh August d
+ β16 17 18
h August d Weekendd + βh Septemberd + βh Septemberd Weekendd
+ β19 20 21
h Octoberd + βh Octoberd Weekendd + βh Novemberd
23
+ β22
h Novemberd Weekendd + βh Decemberd
Loadd,h = β0h Intercept d + β1h Januaryd + β2h Januaryd Weekendd + β3h Februaryd
+ β4h Februaryd Weekendd + β5h Marchd + β6h Marchd Weekendd + β7h Aprild
+ β8h Aprild Weekendd + β9h Mayd + β10 11
h Mayd Weekendd + βh Juned
13 15
+ β12 14
h Juned Weekendd + βh Julyd + βh Julyd Weekendd + βh August d
+ β16 17 18
h August d Weekendd + βh Septemberd + βh Septemberd Weekendd
+ β19 20 21
h Octoberd + βh Octoberd Weekendd + βh Novemberd
23 25
+ β22 24
h Novemberd Weekendd + βh Decemberd + {βh HDDd + βh CDDd
+ β26 27 28 29
h HDDd Weekendd + βh CDDd Weekendd + βh LagHDDd + βh LagCDDd
+ β30 31
h LagHDDWeekendd + βh LagCDDWeekendd }
Loadd,h = β0h Intercept d + β1h Januaryd + β2h Januaryd Weekendd + β3h Februaryd
+ β4h Februaryd Weekendd + β5h Marchd + β6h Marchd Weekendd + β7h Aprild
+ β8h Aprild Weekendd + β9h Mayd + β10 11
h Mayd Weekendd + βh Juned
13 15
+ β12 14
h Juned Weekendd + βh Julyd + βh Julyd Weekendd + βh August d
+ β16 17 18
h August d Weekendd + βh Septemberd + βh Septemberd Weekendd
+ β19 20 21
h Octoberd + βh Octoberd Weekendd + βh Novemberd
23
+ β22 24
h Novemberd Weekendd + βh Decemberd + {βh NightHDDd
+ β25 26 27
h NightCDDd + βh NightHDDd Weekendd + βh NightCDDd Weekendd
+ β28 29 30
h MorningHDDd + βh MorningCDDd + βh MorningHDDd Weekendd
+ β31 32 33
h MorningCDDd Weekendd + βh AfternoonHDDd + βh AftenoonCDDd
+ β34 35
h AfternoonHDDd Weekendd,h + βh AfternoonCDDd Weekendd
+ β36 37 38
h EveningHDDd + βh EveningCDDd + βh EveningHDDd Weekendd
+ β39 40 41
h EveningCDDd Weekendd + βh LagHDDd + βh LagCDDd
43
+ β42
h LagHDDWeekendd + βh LagCDDWeekendd }
Loadd,h = {β0h Intercept d + β1h Januaryd + β2h Januaryd Sundayd + β3h Januaryd Mondayd
+ β4h Januaryd Fridayd + β5h Januaryd Saturdayd + β6h Februaryd
+ β7h Februaryd Sundayd + β8h Februaryd Mondayd + β9h Februaryd Fridayd
+ β10 11 12
h Februaryd Saturdayd + βh Marchd + βh Marchd Sundayd
+ β13 14 15
h Marchd Mondayd + βh Marchd Fridayd + βh Marchd Saturdayd
+ β16 17 18
h Aprild + βh Aprild Sundayd + βh Aprild Mondayd
+ β19 20 21
h Aprild Fridayd + βh Aprild Saturdayd + βh Mayd
23
+ β22 24
h Mayd Sundayd + βh Mayd Mondayd + βh Mayd Fridayd
+ β25 26 27
h Mayd Saturdayd + βh Juned + βh Juned Sundayd
+ β28 29 30
h Juned Mondayd + βh Juned Fridayd + βh Juned Saturdayd
+ β31 32 33 34
h Julyd + βh Julyd Sundayd + βh Julyd Mondayd + βh Julyd Fridayd
+ β35 36 37
h Julyd Saturdayd + βh August d + βh August d Sundayd
+ β38 39 40
h August d Mondayd + βh August d Fridayd + βh August d Saturdayd
43
+ β41 42
h Septemberd + βh Septemberd Sundayd + βh Septemberd Mondayd
45 46
+ β44
h Septemberd Fridayd + βh Septemberd Saturdayd + βh Octoberd
48 49
+ β47
h Octoberd Sundayd + βh Octoberd Mondayd + βh Octoberd Fridayd
+ β50 51 52
h Octoberd Saturdayd + βh Novemberd + βh Novemberd Sundayd
+ β53 54
h Novemberd Mondayd + βh Novemberd Fridayd
+ β55 56
h Novemberd Saturdayd + βh Decemberd Sundayd
+ β57 58
h Decemberd Mondayd + βh Decemberd Fridayd
+ β59
h Decemberd Saturdayd }
Loadd,h = {β0h Intercept d + β1h Januaryd + β2h Januaryd Sundayd + β3h Januaryd Mondayd
+ β4h Januaryd Fridayd + β5h Januaryd Saturdayd + β6h Februaryd
+ β7h Februaryd Sundayd + β8h Februaryd Mondayd + β9h Februaryd Fridayd
+ β10 11 12
h Februaryd Saturdayd + βh Marchd + βh Marchd Sundayd
+ β13 14 15 16
h Marchd Mondayd + βh Marchd Fridayd + βh Marchd Saturdayd + βh Aprild
18 19
+ β17
h Aprild Sundayd + βh Aprild Mondayd + βh Aprild Fridayd
+ β20 21 22 23
h Aprild Saturdayd + βh Mayd + βh Mayd Sundayd + βh Mayd Mondayd
25 26
+ β24 27
h Mayd Fridayd + βh Mayd Saturdayd + βh Juned + βh Juned Sundayd
+ β28 29 30 31
h Juned Mondayd + βh Juned Fridayd + βh Juned Saturdayd + βh Julyd
+ β32 33 34
h Julyd Sundayd + βh Julyd Mondayd + βh Julyd Fridayd
+ β35 36 37
h Julyd Saturdayd + βh August d + βh August d Sundayd
+ β38 39 40
h August d Mondayd + βh August d Fridayd + βh August d Saturdayd
43
+ β41 42
h Septemberd + βh Septemberd Sundayd + βh Septemberd Mondayd
45 46
+ β44
h Septemberd Fridayd + βh Septemberd Saturdayd + βh Octoberd
48 49
+ β47
h Octoberd Sundayd + βh Octoberd Mondayd + βh Octoberd Fridayd
+ β50 51 52
h Octoberd Saturdayd + βh Novemberd + βh Novemberd Sundayd
+ β53 54 55
h Novemberd Mondayd + βh Novemberd Fridayd + βh Novemberd Saturdayd
+ β56 57 58
h Decemberd Sundayd + βh Decemberd Mondayd + βh Decemberd Fridayd
+ β59 60 61 62
h Decemberd Saturdayd } + {βh HDDd + βh CDDd + βh HDDd Weekendd
+ β63 64 65 66
h CDDd Weekendd + βh LagHDDd + βh LagCDDd + βh LagHDDWeekendd
+ β67
h LagCDDWeekendd }
Loadd,h = {β0h Intercept d + β1h Januaryd + β2h Januaryd Sundayd + β3h Januaryd Mondayd
+ β4h Januaryd Fridayd + β5h Januaryd Saturdayd + β6h Februaryd
+ β7h Februaryd Sundayd + β8h Februaryd Mondayd + β9h Februaryd Fridayd
+ β10 11 12
h Februaryd Saturdayd + βh Marchd + βh Marchd Sundayd
+ β13 14 15
h Marchd Mondayd + βh Marchd Fridayd + βh Marchd Saturdayd
+ β16 17 18
h Aprild + βh Aprild Sundayd + βh Aprild Mondayd
+ β19 20
h Aprild Fridayd + βh Aprild Saturdayd + βh Mayd
21
23
+ β22 24
h Mayd Sundayd + βh Mayd Mondayd + βh Mayd Fridayd
+ β25 26 27
h Mayd Saturdayd + βh Juned + βh Juned Sundayd
+ β28 29 30
h Juned Mondayd + βh Juned Fridayd + βh Juned Saturdayd
+ β31 32 33 34
h Julyd + βh Julyd Sundayd + βh Julyd Mondayd + βh Julyd Fridayd
+ β35 36 37
h Julyd Saturdayd + βh August d + βh August d Sundayd
+ β38 39 40
h August d Mondayd + βh August d Fridayd + βh August d Saturdayd
43
+ β41 42
h Septemberd + βh Septemberd Sundayd + βh Septemberd Mondayd
45 46
+ β44
h Septemberd Fridayd + βh Septemberd Saturdayd + βh Octoberd
48 49
+ β47
h Octoberd Sundayd + βh Octoberd Mondayd + βh Octoberd Fridayd
+ β50 51 52
h Octoberd Saturdayd + βh Novemberd + βh Novemberd Sundayd
+ β53 54
h Novemberd Mondayd + βh Novemberd Fridayd
+ β55 56
h Novemberd Saturdayd + βh Decemberd Sundayd
+ β57 58
h Decemberd Mondayd + βh Decemberd Fridayd
+ β59 60
h Decemberd Saturdayd } + {βh NightHDDd + βh NightCDDd
61
+ β62 63
h NightHDDd Weekendd + βh NightCDDd Weekendd
+ β64 65 66
h MorningHDDd + βh MorningCDDd + βh MorningHDDd Weekendd
+ β67 68
h MorningCDDd Weekendd + βh AfternoonHDDd + βh AftenoonCDDd
69
+ β70 71
h AfternoonHDDd Weekendd,h + βh AfternoonCDDd Weekendd
73
+ β72 74
h EveningHDDd + βh EveningCDDd + βh EveningHDDd Weekendd
+ β75 76 77
h EveningCDDd Weekendd + βh LagHDDd + βh LagCDDd
+ β78 79
h LagHDDWeekendd + βh LagCDDWeekendd }
Where,
Winterd is a binary variable that takes on a value of 1.0 if the day (d) falls within the December,
January, or February months, otherwise 0.0
Summerd is a binary variable that takes on a value of 1.0 if the day (d) falls within the June, July,
or August months, otherwise 0.0
Falld is a binary variable that takes on a value of 1.0 if the day (d) falls within the September,
October, or November months, otherwise 0.0
Loadd,h = β0h Intercept d + β1h Winterd + β2h Summerd + β3h Falld + β4h Weekendd
+ β5h Winterd Weekendd + β6h Summerd Weekendd + β7h Falld Weekendd
Loadd,h = β0h Intercept d + β1h Winterd + β2h Summerd + β3h Falld + β4h Weekendd
+ β5h Winterd Weekendd + β6h Summerd Weekendd + β7h Falld Weekendd
+ {β8h HDDd + β9h CDDd + β10 11
h HDDd Weekendd + βh CDDd Weekendd
13
+ β12 14
h LagHDDd + βh LagCDDd + βh LagHDDWeekendd
+ β15
h LagCDDWeekendd }
Loadd,h = β0h Intercept d + β1h Winterd + β2h Summerd + β3h Falld + β4h Weekendd
+ β5h Winterd Weekendd + β6h Summerd Weekendd + β7h Falld Weekendd
+ {β8h NightHDDd + β9h NightCDDd + β10
h NightHDDd Weekendd
+ βh NightCDDd Weekendd + βh MorningHDDd + β13
11 12
h MorningCDDd
14 15
+ βh MorningHDDd Weekendd + βh MorningCDDd Weekendd
+ β16 17
h AfternoonHDDd + βh AftenoonCDDd
+ β18 19
h AfternoonHDDd Weekendd,h + βh AfternoonCDDd Weekendd
+ β20 21 22
h EveningHDDd + βh EveningCDDd + βh EveningHDDd Weekendd
+ β23 24 25
h EveningCDDd Weekendd + βh LagHDDd + βh LagCDDd
+ β26 27
h LagHDDWeekendd + βh LagCDDWeekendd }
Loadd,h = β0h Intercept d + β1h Winterd + β2h Winterd Sundayd + β3h Winterd Mondayd
+ β4h Winterd Fridayd + β5h Winterd Saturdayd + β6h Spring d
+ β7h Spring d Sundayd + β8h Spring d Mondayd + β9h Spring d Fridayd
+ β10 11 12
h Spring d Saturdayd + βh Summerd + βh Summerd Sundayd
+ β13 14
h Summerd Mondayd + βh Summerd Fridayd
+ β15 16 17
h Summerd Saturdayd + βh Falld Sundayd + βh Falld Mondayd
+ β18 19
h Falld Fridayd + βh Falld Saturdayd
Loadd,h = β0h Intercept d + β1h Winterd + β2h Winterd Sundayd + β3h Winterd Mondayd
+ β4h Winterd Fridayd + β5h Winterd Saturdayd + β6h Spring d
+ β7h Spring d Sundayd + β8h Spring d Mondayd + β9h Spring d Fridayd
+ β10 11 12
h Spring d Saturdayd + βh Summerd + βh Summerd Sundayd
+ β13 14
h Summerd Mondayd + βh Summerd Fridayd
+ β15 16 17
h Summerd Saturdayd + βh Falld Sundayd + βh Falld Mondayd
+ β18 19 20
h Falld Fridayd + βh Falld Saturdayd + {βh HDDd + βh CDDd
21
23
+ β22
h HDDd Weekendd + βh CDDd Weekendd + βh LagHDDd
24
+ β25 26 27
h LagCDDd + βh LagHDDWeekendd + βh LagCDDWeekendd }
Loadd,h = β0h Intercept d + β1h Winterd + β2h Winterd Sundayd + β3h Winterd Mondayd
+ β4h Winterd Fridayd + β5h Winterd Saturdayd + β6h Spring d
+ β7h Spring d Sundayd + β8h Spring d Mondayd + β9h Spring d Fridayd
+ β10 11 12
h Spring d Saturdayd + βh Summerd + βh Summerd Sundayd
+ β13 14
h Summerd Mondayd + βh Summerd Fridayd
+ βh Summerd Saturdayd + β16
15 17
h Falld Sundayd + βh Falld Mondayd
+ β18 19 20
h Falld Fridayd + βh Falld Saturdayd + {βh NightHDDd
23
+ β21 22
h NightCDDd + βh NightHDDd Weekendd + βh NightCDDd Weekendd
25 26
+ β24
h MorningHDDd + βh MorningCDDd + βh MorningHDDd Weekendd
28 29
+ β27
h MorningCDDd Weekendd + βh AfternoonHDDd + βh AftenoonCDDd
30 31
+ βh AfternoonHDDd Weekendd,h + βh AfternoonCDDd Weekendd
+ β32 33 34
h EveningHDDd + βh EveningCDDd + βh EveningHDDd Weekendd
+ β35 36 37
h EveningCDDd Weekendd + βh LagHDDd + βh LagCDDd
+ β38 39
h LagHDDWeekendd + βh LagCDDWeekendd }
Guideline 1. There is no-one-size-fits-all model. All loads are different, which means each load requires
a model designed specific for the characteristics of that load.
Guideline 2. The best model today will more than likely need to evolve in six months to a year to reflect
underlying changes in customer and technology mix. To remain accurate and current, the load forecast
model specification must evolve over time. A good habit is to:
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models |7-1
Guideline 3. Once you have a good load forecast model, future variations are designed around recent
events where the model performed poorly. In most cases, this will lead to minor modifications to the
existing model specification.
Guideline 4. The best in-sample and out-of-sample model fit statistics are the mean absolute deviation
(MAD) and the mean absolute percentage error (MAPE). These statistics are in a language a non-modeler,
like a control room operator, can understand.
The MAD presents the average absolute model error in the same units that the load being forecasted is
in. For example, the MAD may be 200 MW. This is a value a control room operator can relate to because
they know what that means in terms of whether they have sufficient generation online to meet that error.
The MAPE is useful when you are comparing load forecast errors across different load zones that could
be significantly different in size. For example, it makes more sense to compare the MAPE between the
model of DAY loads to the MAPE from the model of the Commonwealth Edison load than it would be to
compare the MAD.
Guideline 5. Remember, a model returns the average load value for user-defined segments of the
historical load data. These segments are defined by the set of explanatory variables included in the model.
It is up to the modeler to determine which segmentation scheme makes the most sense for the load data
being modeled.
Guideline 6. Building accurate load forecast models is about eliminating any possible time series pattern
in the load model errors. The goal is to define a set of explanatory variables so that when you look at a
graph of the model errors, what you see is a chaotic time series.
Guideline 7. If a graph of the model errors shows a long run trend tilt (either up or down), that suggests
a time trend needs to be included as one of the explanatory variables.
Guideline 8. If a graph of the model errors shows a shift (up or down) in the residuals for a specific range
of data (e.g., a month or week or more), then some form of binary variable that takes on a value of 1.0
when the load shift occurred is needed.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|7-2
Guideline 9. Three to four years of historical load data is usually sufficient. The goal is to forecast
tomorrow. The goal is most definitely not explaining what happened 10 years ago.
Guideline 10. R2 and t-statistics do not provide as strong a measure of load forecast performance as a
comparison of the in-sample and out-of-sample MAD and MAPE.
Guideline 11. Start with one of the model templates presented in Chapter 6 and fine tune from there.
Guideline 12. Spend time getting the list of weather stations and weather concepts right.
Guideline 13. Spend time cleaning the load and weather data.
Guideline 14. Put in a process that cleans load and weather data when the data arrive.
Guideline 15. Use autoregressive load terms sparingly. It is better to build the best model you can without
autoregressive load terms. Add autoregressive terms after you built your best model.
Guideline 16. Take time to talk with the control room operators about different days, weeks, or events
that occurred. Translate their knowledge about what happened into explanatory variables.
Guideline 17. Graph the load data, graph the weather data, graph the model actual versus predicted,
graph the model errors, graph the load forecasts, graph the weather forecasts. Bury yourself with graphs
of the data.
Guideline 18. Add explanatory variables to your model in logical groups. For example, start with variables
that explain load variation by day-of-the-week, month, and/or season. Next add holiday and special event
days. Then add weather variables. At each step, record the MAD and MAPE to ensure you are receiving
the expected improvement.
Guideline 19. When you add a variable to a model, anticipate what that variable will do to the model fit.
Often it is helpful to focus on a particular week and watch how the predicted value either improves or not
as you add successive explanatory variables.
Guideline 20. Model building is 99% hard thinking coupled with trial and error and 1% inspiration.
Inspiration only comes after hard thinking and trial and error. There are no shortcuts to building accurate
load forecast models.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|7-3
8 INCORPORATING BEHIND THE METER SOLAR PV
GENERATION
A growing load forecasting problem is the treatment of behind-the-meter solar PV generation. This
section summarizes the basic information needed to develop a solar generation forecast given a cloud
cover forecast.
Solar Panel Basics. The most commonly used solar generation technology for homes and businesses are
solar photovoltaic (PV) panels. PV cells convert light (photons) to electricity (voltage). The first practical
PV cell was developed in 1954 when scientists at Bell Telephone Laboratories noticed that silicon created
an electric charge when exposed to sunlight. When light strikes a PV cell, a certain portion of the light is
absorbed within the silicon material. The absorbed energy knocks electrons loose, allowing them to flow
freely. The “freed” electrons form a current. The amount of electricity produced by a solar panel depends
◼ Panel size is typically measured in units of Watts/m2 of peak output. Peak output is the amount
of electricity the panel would produce when the panel receives the maximum amount of sunlight
possible and at optimal ambient temperatures (25C or 55F). Typically, a solar panel is composed
of 40 or so solar cells that, in total, generate approximately 150 Watts/m2 of peak electricity
output. From the perspective of generation forecasting, panel size is treated as a known
exogenous forecast driver to the forecast framework. For example, the size of a solar plant is
given by the total peak MW output of the plant. Embedded solar is measured typically in total
MWs of installed rooftop panels for a given geographic area.
◼ Panel efficiency measures the percentage of solar energy hitting the panel that is converted into
electricity. Solar panel efficiencies for a typical residential or commercial application range
between 10% and 20%. For example, suppose the solar energy reaching the panel surface is 1,000
Watts/m2. A panel with a maximum output rating of 150 Watts/m2 would have an efficiency of
15% (computed as 150 Watts Out/m2 over 1,000 Watts In/m2).
Efficiency and panel size ratings are under ideal conditions. Factors that impact a solar panel’s actual
electricity output include (a) panel orientation and tilt, (b) panel temperature and (c) shade.
◼ Panel orientation and tilt are important factors that impact how much direct sunlight reaches the
panel surface for any location and time. For installations in the Northern Hemisphere, the ideal
orientation is south facing, while in the Southern Hemisphere, the ideal orientation is north facing.
Further, under ideal circumstances the panel will tilt throughout the day to track the path of the
sun through the sky. Because the costs of solar tracking systems are high, they tend to be only
found in large commercial installations and solar plants. This implies most residential installations
do not operate under peak conditions.
◼ Solar panels are most efficient at temperatures around 13C or 55F. The hotter the temperature,
the less efficient a panel is in converting sun energy into electricity. For temperatures above 25C
(77F), the efficiency of a rooftop solar panel will degrade 0.48% per degree Celsius (0.27% per
degree Fahrenheit).
◼ Shade lowers the output of a panel by blocking the amount of solar energy reaching the panel
surface. Trees, clouds, nearby structures, and even dirt and snow buildup can effectively block
the amount of solar energy reaching the panels. Clouds reduce solar panel output by reducing
the amount of solar energy striking the panel. At 100% cloud cover, roughly 80% of the solar
energy is reflected out to space and 20% filters through to Earth’s surface.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-2
Unlike solar plants where the panel orientation, tilt, and shading (other than cloud cover) are known, the
specifics of each rooftop installation are unknown. In this case, an average operating efficiency of 15%
should account for all likely combinations of panel orientation, tilt, and shade that a collection of
residential and commercial installations would have. From the perspective of forecasting embedded solar
generation, the impact of temperature and cloud cover will be treated as separate factors influencing
solar panel output on an hour-by-hour basis.
Solar Insolation. Given the basics of solar panel technology, the big question is, how do we predict the
amount of solar energy that will reach the surface of a solar panel for any location and time? Solar
insolation measures how much solar energy (Watts/m2) reaches Earth’s surface under a cloudless day and
is the key input to solar generation forecasting. For any given point on Earth, the amount of solar energy
will vary not only throughout the day as the sun tracks across the sky, but also throughout the year as the
sun cycles between the Tropic of Capricorn and the Tropic of Cancer.
Fortunately, Johannes Kepler’s First, Second, and Third Laws of Motion, combined with Sir Isaac Newton’s
explanation of the motion of planets, give us everything we need to predict solar insolation for any
location and time. The detailed calculations for estimating solar insolation are presented below.
Three parameters are needed to track the position of the sun in the sky for a specific location, date, and
time: (a) solar declination angle, (b) solar hour angle, and (c) solar altitude angle. In addition, the sunrise
hour angle and sunset hour angle are used to determine the time of sunrise and sunset. The formulas for
each parameter are presented below.
Solar Declination Angle. Solar declination angle is the angle between the sun’s rays and a plane passing
through the equator. From the perspective of the Northern Hemisphere, the solar declination angle has
a maximum value of 23.45 on June 21 and a minimum value of 23.45° on December 21. The annual cycle
for the solar declination angle is depicted in Figure 8-1.Figure 8-1. Solar Declination Angle
o 284 + d
SolarDeclinationAngled = 23.45 xSIN (360ox ( ))
365
Where d is the number of the day in the year (e.g., d =1 to 365). The solar declination angle for January
1, 2010 through December 31, 2012 is shown in Figure 8-1.
Solar Hour Angle. Solar hour angle measures the position of the sun relative to solar noon at a given
location and time. The solar hour angle will be 0.0 at local solar noon—this is the point the sun is highest
in the sky. The solar hour angle will be negative before local solar noon and positive after local solar noon.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-3
The solar hour angle changes 15 each hour or 1 every four minutes. For five-minute modeling, this
means the solar hour angle changes 1.25 every five minutes. At local solar hour 08:00, the solar hour
angle will be equal to -60. At local solar hour 16:00, the solar hour angle will be equal to 60.
Solar Altitude Angle. Solar altitude angle is the angle between the sun’s rays and a horizontal plane as
the sun traverses the sky between sunrise and sunset. At the sun’s zenith (solar noon), this angle will vary
across the year. The solar altitude angle can be calculated for any location and time as follows:
Where,
We can use this equation to determine the solar altitude angle for solar noon for a specific day and
location. For example, to calculate the solar altitude angle for solar noon on February 15 in Honolulu,
Hawaii (latitude 21.3069N; longitude 157.8583W), the formula is as follows.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-4
First, we need to compute the solar declination angle for February 15.
o 284 + 46
SolarDeclinationAngleFeb−15 = 23.45 xSIN (360ox ( )) = −13.3
365
SIN(α) = 0.823175
In the Northern Hemisphere as Earth rotates toward the summer solstice, the solar altitude angle will
steepen. I f we redo the calculations for solar noon on June 21, we have a solar altitude angle of 87.9°.
The solar altitude angle for February 15 and June 21 are depicted in Figure 8-2 and Figure 8-3.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-5
FIGURE 8-3. SOLAR ALTITUDE ANGLE (JUNE 21ST)
To determine the time difference between solar noon and sunrise and sunset, it is useful to compute the
sunrise and sunset solar hour angles. To compute the time difference, all you need to do is multiply solar
hour angle at sunrise (or sunset) by four minutes per degree. The sunrise solar hour angle will be the
point at which the solar altitude angle equals 0.0. Using the information above, we can compute the solar
hour angle at the time of sunrise and sunset for February 15 as follows:
In this case, we know the solar altitude angle at the time of sunrise and sunset is 0. This gives:
This implies the sun rises approximately 5 hours and 39 minutes (computed as 84.7 times 4 minutes per
degree) before solar noon and sets approximately 5 hours and 39 minutes after solar noon. If we repeat
the calculations for June 21, we have the sun rising approximately 6 hours and 39 minutes before solar
noon and setting 6 hours and 39 minutes after solar noon.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-6
What time is it? It is important to note that local solar time is not the same as local standard time. For
example, solar noon as measured by a sundial will not always occur at the same time every day. The
difference between the time of solar noon and noon standard time can be up to +/-16 minutes. This also
accounts for the asymmetry in the times of sunrise and sunset. The difference between local standard
time and local solar time is referred to as the equation of time.
◼ Angle of Obliquity. The plane of Earth’s equator is inclined to the plane of Earth’s orbit
around the sun, and
◼ Elliptical Orbit. The orbit of Earth around the sun is an ellipse and not a circle.
Because of the angle of obliquity, solar time changes throughout the year as the sun moves above and
below the equator. The elliptical orbit means the distance between Earth and the sun is at minimum near
December 31 and is at maximum near July 1.
360
B= × (d − 81)
365
Here,
The equation of time is measured in minutes and d is the number of days since the start of the year.
The equation of time accounts for the physical aspects of why solar noon is not the same as noon clock
time. The time correction factor (in minutes) accounts for the variation of local solar time within a given
time zone due to the longitudinal variations within the time zone and incorporates the equation of time.
Here,
Longitude is set by the location of the solar panel. The factor of four minutes is based on Earth rotating
1 every four minutes. In the case where the local longitude equals the local standard meridian, the time
correction factor is simply equal to the equation of time.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-7
Given these equations, we can then compute local solar time by adjusting the local standard time as
follows:
TimeCorrectionFactor
LocalSolarTime = LocalClocktime +
60
Using the example above, we can compute the time of local solar noon for February 15 as follows. Given
February 15 is the 46th day of the year, we have:
360
B= × (46 − 81) = −34.5205
365
Given that the local prime meridian for Hawaii is 160, we can compute the time correction factors as:
This implies solar noon will take place at roughly 11:46AM local clock time, not adjusted for daylight
saving. From the perspective of forecasting solar generation, it is important to recognize this distinction
in time. Specifically, if the generation forecast is to be in local clock time, then adjustments for daylight
savings need to be made to the engineering estimates of solar generation to keep the engineering
estimates in line with metered generation.
Solar Flux. Now that we know where the sun is in the sky for any location and time, we can determine
how much solar energy will strike a horizontal surface (i.e., solar panel). Scientists know that solar
radiation strikes Earth’s outer atmosphere on average at a rate of 1367 Watts/m 2. This is commonly
referred to as the solar constant. To account for seasonal variation due to the annual cycle in the distance
between Earth and the sun, the actual solar radiation (solar flux) hitting Earth’s atmosphere on any day
of the year can be calculated as follows:
360d
SolarFluxd = SolarConstant [1 + 0.034COS ( )]
365.25
Here, d indexes the day of the year. A depiction of the annual cycle of solar flux is shown in Figure 8-4.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-8
FIGURE 8-4. SOLAR FLUX
Computing Solar Insolation. The amount of solar energy hitting a horizontal plane on Earth’s surface for
any location and time of day can then be computed as follows:
i i
SolarInsolationd = SolarFluxd × COS (∅d )
Here, the time interval of the day (d) is indexed by (i) and ∅ is the solar zenith angle. The solar zenith
angle is computed as 90 – the solar altitude angle. Given this, we can rewrite the equation for solar
energy that hits a horizontal plane (i.e., a solar panel) on Earth’s surface for any location and time as
follows:
i
SolarInsolationd = SolarFluxd × COS(αid − 90o )
Using the example data from above, the amount of solar insolation at solar noon on February 15 is 1,152
Watts/m2 and 1,321 Watts/m2 on June 21. The pattern of solar insolation for the weeks of February 14
and June 20 are depicted Figure 8-5 and Figure 8-6.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-9
FIGURE 8-5. SOLAR INSOLATION WEEK OF FEBRUARY 14TH
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-10
Engineering Model of Solar Generation. Given an estimate of how much solar energy is delivered to
Earth’s surface for any location and time, we can construct an engineering estimate of how much
electricity is generated using the following relationship.
i i i
SolarGenerationd = SolarInsolationd × SolarPanelCapacityd × SolarPanelEfficiencyd
Here,
i
SolarGenerationd is the electricity generated on day (d)time interval (i) in Watts Out
i
SolarInsolationd is the solar energy delivered to the panel in Watts In⁄ 2
m
i
SolarPanelEfficiencyd is the solar panel efficiency in Watts Out⁄Watts In
To help fix ideas, assume solar insolation at noon of June 12 is 1,000 Watts/m2, installed capacity is 2.5
kW, and the solar panel efficiency is 15%. If we assume 150 Watts/m2 for the average panel size, we can
say the installed capacity is approximately 16.66 m2 (computed as 2500 Watts over 150 Watts/m2). With
these numbers, we have:
Factoring in Temperature Impacts. The hotter a solar panel becomes, the less efficient it is in converting
sun energy into useful electricity. This leads to the following adjustment to the solar panel efficiency.
i
SolarPanelEfficiencyd = RatedEfficiency × (1 − [MAX (Tempid − ThresholdTemp, 0) × ∇])
Here,
i
SolarPanelEfficiencyd is the solar panel operating efficiency for day (d)time interval (i)
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-11
ThresholdTemp is the temperature above which the efficiency of the panel degrades
Factoring in Cloud Cover. Cloud cover lowers the output of a solar panel by reducing the amount of solar
energy reaching the panel. While the exact impact of cloud cover on a location is difficult to measure, we
can assume that at 100% cloud cover, only about 20% of the solar flux reaches Earth’s surface. That is the
cloud albedo is 80% at 100% cloud cover. We can use this information to adjust the engineering estimate
of solar insolation by incorporating the following relationship.
i i
CloudAlbedod = CloudCoverPercentaged × 80%
i i
SolarInsolationd = SolarFluxd × COS(αid − 90o ) × (1 − CloudAlbedod )
The final engineering model of solar generation can then be written as follows:
i
SolarGenerationd
i i
= SolarInsolationd × (1 − CloudAlbedod ) × SolarPanelCapacityd
× RatedEfficiency × (1 − [MAX (Tempid − ThresholdTemp, 0) × ∇])
In practice, lining up cloud cover with the metered generation is difficult. In many cases, the only available
cloud cover values are for the closest weather station, which may be miles away. This will lead to poor
model fits and skewed values for the adjustment parameter. To mitigate this impact, it is best to estimate
the model using only days that were cloudless or very nearly cloudless. With these selected days, the
adjustment parameter is then free to synchronize the model to observed generation output. The impact
of cloud cover will then be given by the cloud albedo assumption of 80%.
0.009 ℃
Indoor Temperature Increase(℃) ≅
Watt⁄
m2
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-12
0.017℉
IndoorTemperatureIncrease(℉) ≅
Watt⁄
m2
Given estimates of solar insolation, we can then adjust the temperature data used in the energy
forecasting models using the above relationship. That is, if the temperature forecast is 25C and it is a
clear sunny day in the middle of summer, the effective indoor temperature is more like 34C. It is effective
temperature we want to use in the model. The effective temperatures then form the basis of computing
heating degree day and cooling degree day model variables.
The worldwide statistics represent both utility solar installations, where the electricity generated feeds
directly to the grid, and non-utility installations (referred to elsewhere as embedded solar generation),
where the generation offsets on-site consumption. From the perspective of load forecasting, the non-
utility installations are of critical interest since these installations directly impact measured load. Since
short-term load forecast models are based on measured load, the following examples illustrate how
embedded solar generation can impact a load forecast. In these examples, assume the demand for
electricity at noon, regardless of how it is sourced, is 1,300 KW.
No Embedded Solar Generation. Under this first example, there is no embedded solar generation. As a
result, metered demand, which is the load that a system operator sees, equals actual demand. That is,
Metered DemandNoon
d = DemandNoon
d
23 Global Market Outlook for Photovoltaics 2013-2017, European Photovoltaic Industry Association
24 www.seia.org
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-13
Now consider developing a forecasting model of demand for electricity. If we have a year’s worth of
measured demand, we could fit the following regression model.
Metered DemandNoon
d = β1 Constant Noon
d + eNoon
d
Here, metered demand is regressed on a variable that takes on a value of 1.0 for every observation. In
this case of no embedded solar generation, the estimated coefficient on the constant variable will be
equal to the average metered demand, or 1,300 KW. As a result, the forecast from the estimated model
will provide a forecast of actual demand for electricity.
With Constant Embedded Solar Generation. Now assume that 100 KW of embedded solar generation is
produced every day at noon. We can rewrite metered demand as follows:
Metered DemandNoon
d = DemandNoon
d − SolarGenerationNoon
d
Because metered demand will be 100 KW lower, the estimated coefficient from regressing the new lower
metered demand on the constant variable will lead to a different estimated coefficient. Specifically, the
estimated coefficient will be equal to 1,200 KW, which is the new lower average metered demand. In this
case, the resulting forecast model will under predict actual demand for electricity by 100 KW.
From the perspective of system operations, the fact that the forecast model under predicts actual demand
for electricity is not a concern, since they can rely on the 100 KW of solar generation being there all the
time.
With Volatile Embedded Solar Generation. Unfortunately, solar generation is not this reliable. We can
introduce uncertainty into the amount of solar generation that is available by assuming that half the time
cloud cover is thick enough to drive the solar generation to 0. The other days are perfectly clear and the
solar generation is 100 KW. This means half the time the load is 1,200 KW and the other half of the time
the noon load is 1,300 KW. If the cloudy and sunny days are equal in number, the estimated coefficient
will be equal to the average load. Specifically, the estimated coefficient on the constant variable will be
equal to 1,250 KW.
The variability in solar generation means that the statistical model that was fitted to metered demand will
under predict loads on cloudy days and over predict loads on sunny days. From the perspective of system
operations, this means they will need spinning reserves available to cover the load variability and
subsequent load forecast error introduced by the volatile embedded solar generation.
Accounting for Average Solar Generation. Is it possible to improve the accuracy of the load forecast?
Assume we can obtain a perfect forecast of cloud over, and hence we can accurately predict how much
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-14
solar generation is going to be available tomorrow. It seems reasonable to adjust the baseline load
forecast with the forecast of solar generation. Specifically, our forecast of actual demand can be
constructed as:
DemandNoon
d = Predicted Metered DemandNoon
d + PredictedSolarGenerationNoon
d
On a sunny day, the forecast of demand will be equal to the predicted value of 1,250 KW from the model
of metered demand plus 100 KW of solar generation, or 1,350 KW. On a cloudy day, the forecast of
demand will be equal to the predicted value of 1,250 KW from the model of metered demand plus 0 KW
of solar generation. Unfortunately, both forecasts of actual demand are in error. On a sunny day, this
approach over predicts actual demand by the amount of 50 KW, which is equal to the average amount of
solar generation that took place over the period that was used to estimate the coefficient of the model of
metered demand. On a cloudy day, this approach under predicts by 50 KW, which again is the average
amount of solar generation that took place over the period that was used to estimate the coefficient of
the model of metered demand.
In the current example, the average solar generation over the estimation period was 50 KW. As a result,
the estimated coefficient of the metered demand model embodies this average. Since 50 KW is already
accounted for by the metered demand model, we need to add the difference between the predicted solar
generation for the day in question and the average solar generation already accounted for by the model
coefficient. This results in the following calculation:
DemandNoon
d = Predicted Metered DemandNoon
d + SGdNoon + (AvgSGNoon − SGdNoon )
In the above equation, SG represents the actual level of solar generation on day (d) at noon. AvgSG is the
average solar generation over the model estimation period at noon. The third part in the above equation
corrects for how much of the current day solar generation is already embedded in the model coefficients.
Using the current example where AvgSG equals 50 KW, we have the two cases of a sunny day when SG =
100 KW and a cloudy day when SG = 0 KW. The calculations are:
Sunny Day
DemandNoon
d = 1,250 + 100 + (50 − 100) = 1,300
Cloudy Day
DemandNoon
d = 1,250 + 0 + (50 − 0) = 1,300
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-15
These examples illustrate the potential for additional forecast error arising from embedded solar
generation. It is important to recognize that the coefficients of the short-term forecast model embody
the average impact of solar generation on loads. This means that in areas where there has been significant
penetration of embedded solar generation the short-term forecast will tend to under forecast loads on
cloudy days and over forecast loads on sunny days. Ignoring the problem is not an option. There are three
practical approaches to dealing with the impact of embedded solar generation.
◼ Error Correction. The error correction approach implements what many system operators do
initially when faced with the problem of solar PV generation. Namely, they make ex post
adjustments of the load forecast to account for forecasted values of solar PV generation. On
sunny days, the adjustment is to lower the load forecast and on cloudy days, the load forecast is
adjusted upward. The key advantage of the error correction approach is the existing load forecast
model can continue to be used without any changes. All that is needed is a means of forecasting
solar PV generation.
◼ Reconstituted Loads. Under the reconstituted loads approach, the historical time series of
measured load is reconstituted by adding back estimates of solar PV generation. The load forecast
model is then re-estimated against the reconstituted loads. The subsequent reconstituted load
forecasts are then adjusted ex post by subtracting away forecasts of solar PV generation to form
a forecast of measured loads. The advantage of this approach is any inherent bias that might be
imposed on the estimated coefficients of a model of measured loads is controlled for by
estimating the model coefficients against a time series of demand for power regardless of how it
is sourced. The disadvantage is a historical time series of solar PV generation needs to be
developed and maintained to estimate the load forecast model coefficients. Further, this
approach assumes that the historical solar PV generation time series is accurate. This may not
necessarily be true, in which case this approach places too high of a weight on the solar PV
generation values.
◼ Model Direct. Under this approach, the weight placed on the solar PV generation data is
estimated directly by including these data as an explanatory variable in the load forecast models.
The estimated coefficient on the solar PV generation variable is the weight. Also, in principle, by
including solar PV generation as an explanatory variable, the coefficients on the remaining
explanatory variables should not be biased. This approach also provides a direct forecast of
measured loads that accounts for solar PV generation, thus avoiding any ex post processing of the
load forecast. Like the reconstituted load approach, this approach requires developing and
maintaining an historical time series of solar PV generation.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-16
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|8-17