A Practioners Guide To Short Term Load Forecast Modeling

A Practitioner’s Guide to
Short-Term Load Forecast

Modeling
Authored by:
Dr. Frank A. Monforte
Frank.Monforte@itron.com
www.itron.com/forecasting
July 21, 2018

Dr. Frank A. Monforte’s forecast modeling expertise includes
authoring the load forecasting models used to support real-time
system operations for the North American system operators the
California ISO, the New York ISO, the Midwest ISO, ERCOT, the IESO,
and the Australian system operators AEMO and Western Power.
Recent efforts include developing methods for incorporating the
impact of rooftop solar PV generation into a real-time load forecast.
Dr. Monforte founded the annual ISO/TSO Forecasting Summit that
brings together ISO/TSO forecasters from around the world to
discuss forecasting challenges unique to their organizations.
TABLE OF CONTENTS
1 INTRODUCTION....................................................................................................................................................... 1-1
1.1 PURPOSE ............................................................................................................................................................................................ 1-2

1.2 ACKNOWLEDGEMENTS ........................................................................................................................................................................ 1-3
1.3 OUTLINE............................................................................................................................................................................................. 1-3
2 DATA REVIEW AND ANALYSIS ................................................................................................................................. 1-1
2.1 SOURCES OF HISTORICAL LOAD DATA ................................................................................................................................................ 2-1

2.1.1 Handbook Data ................................................................................................................................................................................. 2-2
2.2 WAYS TO LOOK AT HISTORICAL LOAD DATA ...................................................................................................................................... 2-2
2.2.1 Tabular Review ................................................................................................................................................................................. 2-2
2.2.2 Graphical Review ............................................................................................................................................................................ 2-10
3 DATA CLEANING ..................................................................................................................................................... 2-1
3.1 OUTLIER DETECTION ........................................................................................................................................................................... 3-2

3.1.1 Visual Inspection .............................................................................................................................................................................. 3-2
3.1.2 Validation Tests ................................................................................................................................................................................ 3-3
3.2 CLEANING APPROACHES ..................................................................................................................................................................... 3-3
4 LOAD FORECAST METHODS ...................................................................................................................................... 3-1
4.1 LOAD FORECAST ALGORITHMS ........................................................................................................................................................... 4-1

4.1.1 Like Day or Similar Day Lookup Algorithms ...................................................................................................................................... 4-1
4.1.2 Rotation Algorithms .......................................................................................................................................................................... 4-3
4.1.3 Decision Tree Regression .................................................................................................................................................................. 4-5
4.1.4 Forecast Algorithm Summary .......................................................................................................................................................... 4-10
4.2 UNIVARIATE FRAMEWORKS .............................................................................................................................................................. 4-11
4.2.1 Lag Structures ................................................................................................................................................................................ 4-12
4.2.2 One or Many Load Forecast Equations ............................................................................................................................................ 4-13
4.2.3 Moving Average ............................................................................................................................................................................. 4-18
4.2.4 Exponential Smoothing ................................................................................................................................................................... 4-19
Exponential Smoothing Working Example ...................................................................................................................................................... 4-26
4.2.5 ARIMA Models................................................................................................................................................................................. 4-32
ARIMA Working Example................................................................................................................................................................................ 4-39
4.3 MULTIVARIATE FRAMEWORKS .......................................................................................................................................................... 4-42
4.3.1 Regression ..................................................................................................................................................................................... 4-42
4.3.2 Neural Network Models .................................................................................................................................................................. 4-65
4.3.3 Support Vector Regression .............................................................................................................................................................. 4-74
5 EXPLANATORY VARIABLES BASED ON CALENDAR CONDITIONS ................................................................................ 4-1
5.1 CALENDAR CONDITIONS ..................................................................................................................................................................... 5-3

5.2 SUNRISE & SUNSET CONDITIONS....................................................................................................................................................... 5-24
5.3 FUNCTIONAL FORMS FOR HANDLING THE NON-LINEAR RESPONSE BETWEEN LOADS AND WEATHER .................................................. 5-24
5.4 DAILY VERSUS COINCIDENT HOURLY VERSUS DAILY SUMMARY VARIABLES ....................................................................................... 5-43
Forecasting Handbook Table of Contents|i

5.5 COMBINED HOURLY TEMPERATURES AND HUMIDITY VARIABLES ....................................................................................................... 5-44
5.6 INCORPORATING WIND SPEED .......................................................................................................................................................... 5-48
5.1 COMPUTING A WEIGHTED AVERAGE WEATHER DATA ........................................................................................................................ 5-48
5.2 COMPUTING A WEIGHTED AVERAGE WIND SPEED ............................................................................................................................. 5-50
6 ALTERNATIVE MODEL SPECIFICATIONS .................................................................................................................... 5-1
6.1.1 Model Template: Constant ................................................................................................................................................................ 6-2

6.1.2 Model Template: Constant with Daily Temperature ........................................................................................................................... 6-2
6.1.3 Model Template: Day-of-the-Week .................................................................................................................................................... 6-3
6.1.4 Model Template: Day-of-the-Week with Daily Temperature .............................................................................................................. 6-4
6.1.5 Model Template: Day-of-the-Week with Time-of-Use Temperatures .................................................................................................. 6-4
6.1.6 Model Template: Month .................................................................................................................................................................... 6-5
6.1.7 Model Template: Day Type ................................................................................................................................................................ 6-6
6.1.8 Model Template: Day Type with Daily Temperature .......................................................................................................................... 6-6
6.1.9 Model Template: Day Type with Time-of-Use Temperatures .............................................................................................................. 6-7
6.1.10 Model Template: Extended Day Type ................................................................................................................................................ 6-8
6.1.11 Model Template: Extended Day Type with Daily Temperature ........................................................................................................... 6-9
6.1.12 Model Template: Extended Day Type with Time-of-Use Temperatures............................................................................................. 6-10
6.1.13 Model Template: Season ................................................................................................................................................................. 6-11
6.1.14 Model Template: Season Day Type .................................................................................................................................................. 6-11
6.1.15 Model Template: Season Day Type with Daily Weather ................................................................................................................... 6-11
6.1.16 Model Template: Season Day Type with Time-of-Use Temperatures ................................................................................................ 6-12
6.1.17 Model Template: Extended Season .................................................................................................................................................. 6-12
6.1.18 Model Template: Extended Season with Daily Weather ................................................................................................................... 6-13
6.1.19 Model Template: Extended Season with Time-of-Use Temperatures ................................................................................................ 6-13
7 GUIDELINES FOR BUILDING LOAD FORECAST MODELS .............................................................................................. 6-1
8 INCORPORATING BEHIND THE METER SOLAR PV GENERATION ................................................................................. 7-1
8.1 INCORPORATING THE IMPACT OF SUNSHINE ON ELECTRICITY DEMAND............................................................................................. 8-12

8.2 INCORPORATING THE IMPACT OF SOLAR PV GENERATION ON LOADS ............................................................................................... 8-13
List of Figures
Figure 2-1. Dayton Power & Light All Data View ................................................................................................................................................. 2-11
Figure 2-2. Commonwealth Edison All Data View................................................................................................................................................ 2-12
Figure 2-3. Dayton Power & Light Month View: February 2017 ......................................................................................................................... 2-13
Figure 2-4. Dayton Power & Light Month View: July 2017 ................................................................................................................................. 2-14
Figure 2-5. Commonwealth Edision Month View: February 2017 ....................................................................................................................... 2-14
Figure 2-6. Commonwealth Edison Month View: July 2017 ................................................................................................................................ 2-15
Figure 2-7. Dayton Power & Light Week View: February 2017 .......................................................................................................................... 2-16
Forecasting Handbook Table of Contents|ii

Figure 2-8. Dayton Power & Light Week View: July 2017 .................................................................................................................................. 2-17
Figure 2-9. Commonwealth Edision Week View: February 2017 ........................................................................................................................ 2-17
Figure 2-10. Commonwealth Edison Week View: July 2017 ............................................................................................................................... 2-18
Figure 2-11. Dayton Power & Light TWO-DAY View: storm ................................................................................................................................ 2-19
Figure 2-12. Commonwealth Edison Two-Day View: Storm ................................................................................................................................. 2-19
Figure 2-13. Dayton Power & Light Average Daily Load versus Average Daily Temperature ............................................................................ 2-22
Figure 2-14. Commonwealth Edison Average Daily Load versus Average Daily Temperature .......................................................................... 2-22
Figure 2-15. Dayton Power & Light Average Load @ 2AM versus Average Daily Temperature ...................................................................... 2-24
Figure 2-16. Dayton Power & Light Average Load @ 2PM versus Average Daily Temperature ....................................................................... 2-25
Figure 2-17. Commonwealth Edison Load @ 2AM versus Average Daily Temperature .................................................................................... 2-25
Figure 2-18. Commonwealth Edison Load @ 2pM versus Average Daily Temperature .................................................................................... 2-26
Figure 4-1. Weather Response of Loads to Temperatures .................................................................................................................................... 4-6
Figure 4-2. Binary split of the Weather Response................................................................................................................................................. 4-7
Figure 4-3. Four Region Split of the Weather Response of Loads to Temperatures .............................................................................................. 4-9
Figure 4-4. Dayton Power & Light Week View: February 2017 .......................................................................................................................... 4-16
Figure 4-5. Dayton Power & Light Week View: July 2017 .................................................................................................................................. 4-16
Figure 4-6. Commonwealth Edison Week View: February 2017 ......................................................................................................................... 4-17
Figure 4-7. Commonwealth Edison Week View: July 2017 ................................................................................................................................. 4-17
Figure 4-8. Exponential Smoothing Weights (𝜶 = 𝟎. 𝟓) ................................................................................................................................ 4-24
Figure 4-9. Commonwealth Edison: Average Daily Load ................................................................................................................................... 4-28
Figure 4-10. Commonwealth Edison: Simple Exponenential Smoothing Model Fit All Data ................................................................................ 4-28
Figure 4-11. Commonwealth Edison: Simple Exponential Smoothing Model Out-of-Sample Fit ......................................................................... 4-29
Figure 4-12. Commonwealth Edison: Simple Exponential Smoothing Model One Month Ahead ......................................................................... 4-29
Figure 4-13. Commonwealth Edison: Simple Exponential Smoothing Model One Year Ahead ........................................................................... 4-30
Forecasting Handbook Table of Contents|iii

Figure 4-14. Commonwealth Edison: Double Exponential Smoothing Model One Month Ahead ........................................................................ 4-30
Figure 4-15. Commonwealth Edison: Double Exponential Smoothing Model One Year Ahead ........................................................................... 4-31
Figure 4-16. Commonwealth Edison: Triple Exponential Smoothing Model One Month Ahead .......................................................................... 4-31
Figure 4-17. Commonwealth Edison: Triple Expoential Smoothing Model One Year Ahead ............................................................................... 4-32
Figure 4-18. Autocorrelation Function (ACF) for two Highly correlated time series............................................................................................ 4-36
Figure 4-19. Autocorrelation Function (ACF) for a non-correlated time series.................................................................................................... 4-37
Figure 4-20. ACF and PACF for Commonwealth Edison Average Daily Load ....................................................................................................... 4-40
Figure 4-21. ACF and PACF After One-time seasonal Differencing...................................................................................................................... 4-40
Figure 4-22. Commonwealth Edison: Seasonally Differenced, AR(1) Model Fit All Data ..................................................................................... 4-41
Figure 4-23. Commonwealth Edison: Seasonally Differenced, AR(1) Model One Month Ahead .......................................................................... 4-41
Figure 4-24. Commonwealth Edison: Seasonally Differenced, AR(1) Model One Year Ahead ............................................................................ 4-42
Figure 4-25. Fitting a Line Chart View ................................................................................................................................................................. 4-44
Figure 4-26. Estimated NonLinear Weather Response Function for Commonwealth Edison ............................................................................... 4-64
Figure 4-27. Estimated NonLinear Weather Response Function for Dayton Power & Light ................................................................................ 4-64
Figure 4-28. Basic Elements of a Feed Forward Neural Network used for Classification.................................................................................... 4-65
Figure 4-29. An Example of the Feed Forward Calculations................................................................................................................................ 4-66
Figure 4-30. Step Activation Function .................................................................................................................................................................. 4-67
Figure 4-31. Replacing the Step Activation Function with a Sigmoid Activation Function ................................................................................... 4-68
Figure 4-32. Feed Forward Neural Network with Sigmoid Activation Function ................................................................................................... 4-68
Figure 4-33. Collapsing the Sum Function with the Sigmoid Activation Function ................................................................................................ 4-69
Figure 4-34. The Input, Hidden and Output Layers of a Neural Network Model ................................................................................................. 4-70
Figure 4-35. Multiple Nodes in the Hidden Layer and the Regression Model Equivalent .................................................................................... 4-71
Figure 4-36. Estimated Nonlinear Weather Response Function for Commonwealth Edison ................................................................................ 4-72
Figure 4-37. Estimated Nonlinear Weather Response Function for Dayton Power & Light ................................................................................. 4-73
Forecasting Handbook Table of Contents|iv

Figure 5-1. Dayton Power & Light: Day-of-the-Week Binary Variables................................................................................................................ 5-4
Figure 5-2. Commonwealth Edison: Day-of-the-Week binary Variables .............................................................................................................. 5-5
Figure 5-3. Dayton Power & Light: Day type WeekDay versus Weekend Binary Variables ................................................................................. 5-6
Figure 5-4. Commonwealth Edison: day type weekday versus weekend binary Variables................................................................................. 5-7
Figure 5-5. Dayton Power & Light: Monthly Binary Variables .............................................................................................................................. 5-9
Figure 5-6. Commonwealth Edison: Monthly Binary Variables........................................................................................................................... 5-10
Figure 5-7. Dayton Power & Light: Seasonal Binary Variables .......................................................................................................................... 5-11
Figure 5-8. Commonwealth Edison: Seaonal Binary Variables ........................................................................................................................... 5-12
Figure 5-9. Dayton Power & Light: Month/Day-of-the-Week Interaction Binary Variables ................................................................................ 5-17
Figure 5-10. Commonwealth Edison: Month/Day-of-the-Week Interaction binary Variables............................................................................. 5-18
Figure 5-11. Dayton Power & Light: Season/Day-of-the-Week Interaction Binary Variables ............................................................................ 5-20
Figure 5-12. Commonwealth Edison: Season/Day-of-the-Week Interaction Binary Variables ........................................................................... 5-21
Figure 5-13. Estimated Bin Weather Response Function for Dayton Power & Light ............................................................................................ 5-27
Figure 5-14. Estimated Bin Weather Response with Weekend Offset for Dayton Power & Light........................................................................ 5-28
Figure 5-15. Estimated Bin Weather Response Function for Commonwealth Edison........................................................................................... 5-29
Figure 5-16. Estimated Bin Weather Response with Weekend Offset for Commonwealth Edison ...................................................................... 5-30
Figure 5-17. Estimated Uncapped Spline Weather Response with Weekend Offset for Dayton Power & Light .................................................. 5-34
Figure 5-18. Estimated Capped Spline Weather Response with Weekend Offset for Dayton Power & Light ...................................................... 5-35
Figure 5-19. Estimated Uncapped Spline Weather Response with Weekend Offset for Commonwealth Edison ................................................. 5-36
Figure 5-20. Estimated Capped Spline Weather Response with Weekend Offset for Commonwealth Edison ..................................................... 5-37
Figure 5-21. Estimated Polynomial Weather Response with Weekend Offset for Dayton Power & Light........................................................... 5-39
Figure 5-22. Estimated Polynomial Weather Response with Weekend Offset for Commonwealth Edison ......................................................... 5-40
Figure 5-23. Estimated Neural Net Weather Response with Weekend Offset for Dayton Power & Light ........................................................... 5-42
Figure 5-24. Estimated Neural Net Weather Resposne with Weekend Offset for Commonwealth Edison .......................................................... 5-43
Forecasting Handbook Table of Contents|v

Figure 8-1. Solar Declination Angle ....................................................................................................................................................................... 8-4
Figure 8-2. Solar Altitude Angle (February 15th).................................................................................................................................................... 8-5
Figure 8-3. Solar Altitude Angle (June 21st) ........................................................................................................................................................... 8-6
Figure 8-4. Solar Flux ............................................................................................................................................................................................ 8-9
Figure 8-5. Solar Insolation Week of February 14th ............................................................................................................................................ 8-10
Figure 8-6. Solar Insolation Week of June 20th ................................................................................................................................................... 8-10
List of Tables
Table 2-1. Dayton Power & Light Data Summary Statistics ................................................................................................................................... 2-5
Table 2-2. Dayton Power & Light Monthly Data Summary Statistics ..................................................................................................................... 2-6
Table 2-3. Dayton Power & Light Day-of-the-Week Data Summary Statistics ....................................................................................................... 2-7
Table 2-4. CommonWealth Edison Data Summary Statistics ................................................................................................................................. 2-8
Table 2-5. CommonWealth Edison Monthly Data Summary Statistics.................................................................................................................... 2-9
Table 2-6. CommonWealth Edison Day-of-the-Week Data Summary Statistics ................................................................................................... 2-10
Table 4-1. Exponential Smoothing Weights (𝜶 = 𝟎. 𝟓).................................................................................................................................. 4-23
Forecasting Handbook Table of Contents|vi

1 INTRODUCTION
Accurate short-term load forecasts are essential for operating the electric grid as efficiently as possible.
This handbook presents alternative frameworks for developing short-term load forecasts with emphasis
placed on the approach Itron has deployed for most of the North American and Australian Independent
System Operators.
Prior to diving into the specifics of each framework, it is helpful to

understand how short-term load forecasting has evolved with the mass
adoption of computers. There was a time, not too long ago, when
computers were very expensive and scarce. The available computing power
was dedicated to solving the least cost dispatch problem at the heart of an
energy management system.1 During these pre-desktop computing days,
most short-term load forecasts were developed by hand using the load
shapes of prior days as guidance. Often the historical data were stored as index cards in a rolodex2 with
the load shape of each historical day drawn on a card. Key information such as the date, day-of-the-week,
whether it was a holiday or not, and the day’s maximum and minimum temperatures would be recorded
on the card. The load forecaster would open the day’s newspaper to obtain the next day’s temperature
1 An energy management system (EMS) is a system of computer-aided tools used by operators of electric utility
grids to monitor, control, and optimize the performance of generation and transmission system.
2 A Rolodex is a rotating file device used to store business information. The Rolodex was invented in 1956 by
Danish engineer Hildaur Neilsen, the chief engineer of Arnold Neustadter’s company, Zephyr American, a
stationery manufacturer in New York.
A Practitioner’s Guide to Short-term Load Forecast Modeling Introduction|1-1

forecast. By combining the day-ahead weather forecast with years of experience, the load forecaster
would spin through the rolodex to find a day or two that best matched the load forecaster’s expectation
for the next day. The forecaster would then draw the next day’s forecasted load shape on a new card and
pass this card to the control room operators. As archaic as this process may seem, today’s computing
power and algorithms would be hard pressed to outperform 20 plus years of load forecasting experience
and system knowledge. If we could somehow bundle that experience with today’s computing power we
might, just might, do as well as the generation of load forecasters that had nothing more to work with
than a pencil, a sheet of paper, and years of experience.
When statistical and mathematical forecasting frameworks were introduced to the load forecasting
problem, the goal was twofold: (a) improve forecast accuracy, and (b) speed up the load forecast process.
As the operation of electric grids grew in complexity, there was need to update the load forecast more
frequently than once a day. The one- to two-hour manual load forecast process was not fast enough when
storm fronts flowed through a service territory. The evolution of statistical and mathematical load
forecasting went hand-in-hand with the proliferation of desktop computing. Desktop computing was the
key to speeding up the forecast process. Improved accuracy was yet to be established because the load
forecasters with 20 plus years of experience were very good at what they did. Only when the next
generation of forecasters came on board with little to no experience did the statistical and mathematical
frameworks prove their worth.
1.1 PURPOSE
Why this guide? The motivation behind this guide is a request by numerous clients over the years for the
“recipe book” to building powerful short-term load forecast models. This guide is a partial “recipe book,”
providing the full list of possible ingredients with guidance as to when to use which combination of
ingredients. It is impossible to dial up the specific recipe that would work for your loads without first
going through the data analysis and trial-and-error process we go through whenever we start developing
a short-term load forecast model. The Itron forecasting team – Dr. Stuart J. McMenamin, Eric Fox, Rich
Simons, Mark Quan, Andy Sukenik, Christine Fordham, David Simons, Jennifer Blanco, John Pritchard, Jeff
Fordham, Casey Allred, David Fabiszak, Oleg Moskatov, Michael Russo, Gregory Kim, Leigh O’Connor,
James Lischio, Paige Schaefer, Shannon Ashburn and myself – have more than 20 years of collective
experience developing a wide range of load forecasting models. This guide is an attempt to cast a net
over the short-term load forecast modeling experience.
The focus of the guide is the within-day and day-head load forecasts that system operators and energy
traders rely on for scheduling, dispatching, procuring and selling generation to meet demand. The
information presented here is based on 20 plus years of working in the trenches with system operators
A Practitioner’s Guide to Short-term Load Forecast Modeling Introduction |1-2

and energy traders in Australia, Europe and North America. Unlike the academic world, there is no time
for re-dos. When the forecasts go wrong in these environments, bad things can happen. Real-time
operational forecasting is not for the faint hearted.
1.2 ACKNOWLEDGEMENTS
This brings me to the true heroes of operational forecasting, the forecasters that work the control rooms
and trading desks at places like the New York ISO, ISO New England, the Midcontinent ISO, Electric
Reliability Council of Texas (ERCOT), Independent Electricity System Operator (IESO), the California ISO,
the Australian Energy Market Operator, Western Power, British Gas, CEZ Prodej, EDF Luminus, Engie and
Uniper Benelux. Each of these organizations has a dedicated staff that are tasked with the responsibility
to say this is the load forecast we should operate against. This decision has very little to do with model
techniques and model specifications, but everything to do with answering an impossible list of questions:
Will the weather forecast be right? Will the industrial load shed the 500 MW they
contracted for during a load shedding event and If they do reduce their load, will that 500
MW come back at the end of the load event, or will they send the employees home for the
day? Will the storm front hit at 1 p.m., 2 p.m., 3 p.m., or will it bypass our load centers
altogether? Will the cloud cover dissipate in time for the roof top solar PV generation to
kick in enough to keep from hitting a system peak? What will happen to the load when
the solar eclipse, World Cup, Christmas, …., name the event occurs? And a thousand other
questions, the answers to which impact what they will publish as the official forecast.
Over the years, I have been fortunate to work with these staffs. The common thread is their desire to do
better than the day before. These are the unsung heroes of operational forecasting: Arthur Maniaci of
the New York ISO, Andrew Trachsell of the IESO, Calvin Opheim of ERCOT, Yok Potts and Huaitao Zhang
of the Midcontinent ISO, Gary Klein and Rebecca Webb of the California ISO, Jack Fox of the Australian
Energy Market Operator, Rick Morris of Western Power, Jason Blackmore of British Gas, David Metten
and Rigo D’Exelle of EDF Luminus, Gijs Berg of Engie, Alexandr Cerny, Marcel Prošek, and Filip Tichý of CEZ
Prodej, Barry van de Merbel and Marco Sinke of Uniper Benelux NV. This guide is dedicated to this
talented and unsung group of load forecasters.
1.3 OUTLINE
We begin not with techniques, but the hard work of data review and analysis. Most professional literature
about load forecasting focuses on statistical techniques and pays very little attention to the data. In
practice, we have found the path to a powerful forecast model is through a very thorough analysis of the

data. The first section outlines an approach for reviewing load data. This is followed by a section on data
cleaning approaches and philosophies. With the preliminaries complete, we introduce the load forecast
techniques that are the bread and butter of the industry. We introduce machine learning frameworks
that can be used to augment today’s operational load forecast tools. This is followed by a section on
defining a set of explanatory variables that can be used in a load forecast model. This includes treatment
of calendar conditions, holidays, and weather conditions. We then introduce a series of load forecast
model recipes. This is followed by a set of model building guidelines. In Chapter 8, we introduce the
newly developing topic of incorporating solar PV generation into a load forecast model.

2 DATA REVIEW AND ANALYSIS
Forecasting is the application of applying analysis to historical data to project predictable load patterns
into the future. As simple as this may sound, the ability for an analyst to look at the data and see
predictable load patterns is a lost art. Perhaps “lost art” is an overstatement because it implies that there
was a time analyst spent hours looking at load data to discern predictable load patterns. In fact, for many
years, unless you were willing to work your way through a foot-tall stack of 80-column wide printouts,
there was no easy way to view load data. With today’s graphical software packages, there is no excuse
for not looking at yearly, monthly, weekly, and daily load data graphs. The art that was lost or perhaps
was never found to begin with is understanding what you are looking at when you view a load data graph.
2.1 SOURCES OF HISTORICAL LOAD DATA

There are two primary sources of historical load data. In real-time operations, the load data are based on
Supervisory Control and Data Acquisition (SCADA) metering that monitors and controls generation,
transmission, and distribution points on the grid. In this case, what is called load is a result of a calculation
that sums (1) SCADA metered generation plus (2) SCADA metered net inflow across distribution or
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis|2-1
transmission exchanges with other control areas, less (3) transmission and distribution losses. In energy
retail settings, load is based on on-premise metering. This metering is further segmented between
customers with interval-based metering versus non-interval metered customers. The latter require
bespoke load profiles or load profile models to spread their monthly, bi-monthly, or possibly annual
metered consumption values to 15-minute, 30-minute, or hourly load values. Load data based on SCADA
metering can be updated as frequently as every five minutes. On-premise interval-based metering has in
the past been collected daily. However, as advanced meter data collection infrastructures expand, on-
premise interval-metered data will eventually be collected virtually in real time. Further, with mass
deployment of smart meters, the demand of all customers will ultimately be measured using interval-
based metering. As a result, sometime soon even retail forecasting applications will leverage real-time
metering. When that happens, the question of whether it is better to model each individual load or
aggregations of individual loads will need to be analyzed. Based on experience, which approach provides
the most accurate forecast is not always obvious.
2.1.1 Handbook Data

The examples presented in this handbook are based on hourly load data downloaded from the PJM
website.3 The load data for Commonwealth Edison and Dayton Power & Light are used to demonstrate
how the load modeling problem can vary between a large, geographically dispersed load zone
(Commonwealth Edison) and a small, geographically compact load zone (Dayton Power & Light). Historical
weather data were downloaded from Weather Underground.4 For Commonwealth Edison, the daily
weather data for O’Hare International Airport (KORD) and Rockford, Illinois (KRFD) are used. For Dayton
Power & Light, the daily weather data for Cox-Dayton International Airport (KDAY) are used.
2.2 WAYS TO LOOK AT HISTORICAL LOAD DATA

There are two primary ways to look at load data. Tabular approaches display the load data values in the
time order that the data were collected. Graphical approaches display the same data, but in a way that
allows visual identification of within day, across day, across month, and across year patterns.
2.2.1 Tabular Review

There is no way to put this nicely but looking at pages and pages of hourly load data is mind numbing. To
make the process more productive, there are handful of summary statistics that are useful in
3 The source for the hourly load data is (http://www.pjm.com/markets-and-operations/ops-analysis/historical-

load-data.aspx).
4 The source for the historical weather data is (https://www.wunderground.com/).
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Review and Analysis |2-2
understanding what type of load is being modeling. These statistics and what can be gleaned from them
are as follows.
◼ Average Load. The average load is one indicator of the number of and potential mix of customers
(e.g., residential, commercial, industrial, transportation, and agriculture) underlying the load data.
Large average load values are associated with aggregations of many individual customers or a few
very large nonresidential customers. In contrast, small average load values are associated with
individual customer loads or a relatively small aggregation of small customer loads.
◼ Maximum Load. The maximum load is when a system peaks. Like the average load, this value
also provides insight to the potential mix of customers being modeled. When the maximum load
occurs can also provide useful information such as whether the peak is driven by weather or
underlying industrial processes. Weather-driven peaks vary across a week and month. Industrial
processes tend to be relatively stable, which results in a relatively repeatable peak load across
working days and potentially across months.
◼ Minimum Load. Because minimum loads usually happen right before the dawn, they tend to be
the least impacted by weather. As a result, long-run trends in minimum loads are strong
indicators of the direction of growth – up or down – of non-weather sensitive or baseline loads.
Also, graphing the daily sequence of minimum loads is a good way of capturing the load change
associated with a remapping of the SCADA measurements points that are used to define a load,
whether intentional or not. For example, if one of the generation SCADA points goes missing,
then the resulting calculated load will have an unexcepted shift downward of the minimum load.
◼ Load Factor. Load factor is defined as the ratio of the average load to maximum load. Values that
are close to 1.0 indicate a relatively flat load shape. This would be the signature of a large
industrial load that runs flat out on a 24x7 basis. Smaller load factors are usually associated with
weather-sensitive loads.
◼ Summer Energy Fraction. The summer energy faction is defined as the fraction of annual energy
that occurs in the summer months. Summer months for the northern hemisphere are defined as
June, July, and August. The summer months for the southern hemisphere are defined as
December, January, and February. This statistic is used primarily to indicate air conditioning
driven weather sensitivity. Higher fractions tend to be associated with high saturations of air
conditioning equipment.
◼ Winter Energy Fraction. The winter energy faction is defined as the fraction of annual energy
that occurs in the winter months. Winter months for the southern hemisphere are defined as
June, July, and August. The winter months for the northern hemisphere are defined as December,
January, and February. This statistic is used to indicate space heating driven weather sensitivity.
Higher fractions tend to be associated with high penetrations of electric space heating.
◼ Standard Deviation. This provides a measure of load volatility. Not all load volatility is bad. Large
standard deviations driven by day-of-the-week and seasonal variation are expected. The hard
part of load volatility is the unexplained or random load deviations that can be associated with
non-systematic load behavior (e.g., a steel crucible switching on and off at random times) and
incorrect or missing measurements.
◼ Coefficient of Variation. The coefficient of variation is defined as the ratio of the standard
deviation to the average load and provides an indication of how much the load swings around the
average. The smaller the coefficient of variation, the more stable the load curve.
◼ Day-of-the-Week Average Values. These values provide insight as to the mix of customer loads
that are being modeled. Residential loads tend to have higher average values on weekends than
weekdays. Commercial loads tend to be about the same Monday through Friday with lower loads
on Saturday and Sunday. Industrial loads can run flat out every day or have a weekday versus
weekend day swing like commercial loads.
◼ Monthly Average Values. Variation in the monthly average values can indicate the level of
weather sensitivity embedded in the load. It can also catch big seasonal operational changes like
agriculture irrigation pump loads.
The summary statistics for these two load zones are presented in the following series of tables.
◼ The first thing to notice is that is average Commonwealth Edison load is almost six times larger
than the average Dayton Power & Light load. In terms of peak load, the Commonwealth Edison
peak is little over six times larger than the Dayton Power & Light peak. Based on the average load
factors, the Dayton Power & Light load is relatively flatter than the Commonwealth Edison load.
Further, the Commonwealth Edison load has more apparent air conditioning load as measured by
larger summer energy fraction of 27.8% versus 26.7% for Dayton Power & Light.
◼ A comparison of the average monthly load to the overall average load suggests that Dayton Power
& Light is relatively more weather sensitive in the winter months than the Commonwealth Edison
load. In contrast, Commonwealth Edison is relatively more weather sensitive in the summer
months than the Dayton Power & Light load.
◼ Relative to Commonwealth Edison, the Dayton Power & Light load has a bigger drop in average
load consumption on weekends versus weekdays. This suggests Commonwealth Edison has a
larger portion of commercial and industrial loads operating at least six days a week.
TABLE 2-1. DAYTON POWER & LIGHT DATA SUMMARY STATISTICS
TABLE 2-2. DAYTON POWER & LIGHT MONTHLY DATA SUMMARY STATISTICS
TABLE 2-3. DAYTON POWER & LIGHT DAY-OF-THE-WEEK DATA SUMMARY STATISTICS
TABLE 2-4. COMMONWEALTH EDISON DATA SUMMARY STATISTICS
TABLE 2-5. COMMONWEALTH EDISON MONTHLY DATA SUMMARY STATISTICS
TABLE 2-6. COMMONWEALTH EDISON DAY-OF-THE-WEEK DATA SUMMARY STATISTICS
2.2.2 Graphical Review

The phrase “a picture is worth a thousand words” rings true when it comes to building load forecast
models. The most common hourly graphs and what information can be gleaned from each view are
summarized below.
◼ All Data View. All data view puts all the load data in one graph. From this type graph you can
pick off obvious things like:
─ Long-run growth trends by looking at how the minimum values either rise or lower over time,
─ Redefinition of the load that would manifest itself as a jump up or down in the overall load,
─ Seasonality of the load and apparent air conditioning and electric space heating loads, and
─ Stability of the load.
The all-day views for Dayton Power & Light and Commonwealth Edison are shown below. The hourly load
data are in red. The blue line is the 60-day centered moving average. Here are some observations about
these charts.
◼ Load growth (up or down) tends to manifest itself as a trend in the minimum load or bottom of
the graph. Both Dayton Power & Light and Commonwealth Edison appear to have had relatively
flat load growth over the years 2013 through 2017.
◼ Stable loads tend to have relatively clean seasonal swings. Consider the 60-day centered moving
average that was fitted to the load data. The magnitude of actual loads swing around the centered
moving average provides insight as to the potential instability or stability of the loads. In this case,
the Dayton Power & Light load appears to swing more than the Commonwealth Edison load. This
may result in models of Commonwealth Edison loads having better in-sample fit statistics than
models of Dayton Power & Light loads.
◼ Both loads show significant weather sensitivity in both the summer and winter months. It will be
important the load forecast models capture this weather sensitivity.
FIGURE 2-1. DAYTON POWER & LIGHT ALL DATA VIEW
FIGURE 2-2. COMMONWEALTH EDISON ALL DATA VIEW
◼ Month View. This type of view reveals:

─ Weekday patterns (e.g., weekend versus weekday load swings),
─ Holiday impacts, and
─ Weather-sensitive runs in loads.
Presented below are the month views for February and July 2017. Observations about these charts
include the following.
◼ The monthly pattern for both load zones are consistent with the double peak in the winter months
and the single peak during the summer months. The winter double peak is consistent with the
story that, in the morning, residential lights and space heating drive the system peak. As the
population leaves home for the day, the lights and space heating loads drop off, which is reflected
in the lowering of the system load. Then, in the evening, residential lighting, cooking, and space
heating again drive the system shape. During summer months, air conditioning loads dominate
the load shape from the morning to the evening. With air conditioning filling in the valley the
result is a single peak shape.
◼ The graphs also show a clear day-of-the-week load pattern that reflects a different mix of
equipment operating on weekdays than weekend days.
◼ While both loads show a ramp down in loads between Friday and Saturday, the ramp is much
steeper with Dayton Power & Light. It will be important that the load forecast models capture
this ramping.
FIGURE 2-3. DAYTON POWER & LIGHT MONTH VIEW: FEBRUARY 2017
FIGURE 2-4. DAYTON POWER & LIGHT MONTH VIEW: JULY 2017
FIGURE 2-5. COMMONWEALTH EDISION MONTH VIEW: FEBRUARY 2017
FIGURE 2-6. COMMONWEALTH EDISON MONTH VIEW: JULY 2017
◼ Weekly View. This type of view reveals:

─ Day-of-the-week differences,
─ Specific holiday impacts, and
─ Impacts of special event days like hurricanes, Super Bowls etc.
Presented below are week views drawn from February and July 2017. Observations about these charts
include the following.
◼ The day-of-the-week variation in loads is more apparent during the winter months than in the
summer months. This reflects the fact that air conditioning loads run almost continuously
throughout the week while space heating tends to operate only when customers are home.
◼ For the selected weeks, the Dayton Power & Light loads appear to swing more with changing
weather patterns than the Commonwealth Edison loads. This might be attributed to Dayton
Power & Light experiencing significantly more weather variation than CE. But, also the fact that
Commonwealth Edison has a higher baseload the weather-driven swings have less of an apparent
impact.
◼ The July Friday afternoon ramp down in loads for Dayton Power & Light could reflect the impact
of an afternoon storm or reflect the usual shutdown of business starting in the afternoon.
Additional weeks would need to be reviewed to determine if the ramp down is the norm or a one-
off event that was triggered by a storm.
◼ Although it is not labeled as such, the week shown begins on Sunday, July 2. That places
Independence Day on the following Tuesday. It is clear from the Commonwealth Edison graph
that many businesses shut down on Monday, July 3 and Tuesday, July 4. The actual holiday
schedule adopted by businesses in Dayton Power & Light’s territory is not as obvious.
FIGURE 2-7. DAYTON POWER & LIGHT WEEK VIEW: FEBRUARY 2017
FIGURE 2-8. DAYTON POWER & LIGHT WEEK VIEW: JULY 2017
FIGURE 2-9. COMMONWEALTH EDISION WEEK VIEW: FEBRUARY 2017
FIGURE 2-10. COMMONWEALTH EDISON WEEK VIEW: JULY 2017
◼ Day View. The type of view reveals:

─ Potential impact of behind-the-meter solar PV generation,
─ Demand response impacts, and
─ Impact of acute meteorological events like thunderstorms, heavy fog.
Presented below are two-day week views for two summer days, where one of the days had a storm roll
through the load zone. Observations about these charts include the following.
◼ For Dayton Power & Light, the storm hit right after noon on the first day shown. The next impact
of the storm is to cut cooling loads as temperatures dropped. After the load drop, air conditioning
did pick back up, but not enough to lift the afternoon peak above the morning peak.
◼ For CE, the storm also hit right after noon on the first day. Unlike Dayton Power & Light, the load
drop was temporary as the air conditioning kicked up later in the afternoon to drive the daily
peak.
FIGURE 2-11. DAYTON POWER & LIGHT TWO-DAY VIEW: STORM
FIGURE 2-12. COMMONWEALTH EDISON TWO-DAY VIEW: STORM
Hourly graphs provide insight into the type of explanatory variables that will be needed to capture growth
trends, seasonal trends, day-of-the-week load variation, holiday impacts, and other non-weather sensitive
variations in load. To visualize the weather sensitive portion of loads, we rely on scatter plots between
loads and temperatures. The type of scatter plots we find useful are illustrated below. Between the
hourly graphs and the scatter plots, a forecast analyst should have a good feel for the type of load that is
being modeled.
Daily Average Load versus Average Temperature. This type of scatter plot provides a good understanding
of the weather-sensitivity of the loads, and whether the sensitivity holds in both hot weather (i.e., air
conditioning loads) and cold weather (i.e., electric space heating loads). Indications of space cooling will
manifest itself as an increased load as temperature rise above some base level like 65°F or 18°C. The
presence of electric space heating would show up as a rise in loads as temperatures fall below the base
level. A load with both space cooling and space heating would have an almost U shape scatter plot. A
scatter plot that is flat or nearly flat when temperatures fall below the base level suggest that no
significant electric space heating is embedded in the hourly loads. In a similar fashion, flat or nearly flat
loads when temperatures rise above the base temperature indicates little to no air conditioning occurs.
Other key information that can be gleaned from the scatter plot is whether there is a pronounced
difference between weekday and weekend loads. If there is a difference, you would expect to see
essentially two scatters with a gap between the two. The higher scatter plot typically represents the
weekday loads and the lower scatter the weekend loads; however, it is plausible in a load that is
dominated by residential customers that the higher points represent weekend loads when residential
customers are home all day. The difference between the two scatters provides a rough estimate of the
difference in non-weather sensitive loads between weekdays and weekends. Further, it is common to see
differences between the slopes on the weather-sensitive parts of the scatter, which means the space
cooling (space heating) response to temperature change is different between weekdays and weekend
days. One possible reason is that commercial buildings are closed on weekends and so their space
conditioning equipment may sit idle even when it is hot. This would lead to lower weather-sensitivity on
weekend days. In contrast, residential customers may be more likely to run their space conditioning
equipment during the day on weekends when they are home. At the same time, they turn off their space
conditioning equipment while they are away at work. This could lead to higher weather sensitivity on
weekend days. The main thing to figure out is whether the slopes look different. This will impact how the
weather sensitive portion of the load is modeled.
The scatter plots for Dayton Power & Light and Commonwealth Edison are presented below. In these
plots, the average daily load is plotted against the average daily temperature. The blue dots represent
weekdays and the green dots present weekends and holidays. For Dayton Power & Light, the average
temperature is in degrees Celsius. For CE, the average temperature is in degrees Fahrenheit. The
following observations are drawn from these scatter plots.
◼ Both load zones shown have a distinct weekday versus weekend day pattern. The Dayton Power
& Light load has a more distinct break between the weekday and weekend day response, which
indicates most commercial and industrial loads are off on weekends. There is more overlap
between weekday and weekends in Commonwealth Edison’s load, suggesting most commercial
and industrial loads are off on weekends.
◼ The weekday cooling slope for Commonwealth Edison is approximately equal to 268 MWh per
degree. In other words, for every degree that is above 70°, average daily loads rise by 268 MWh.
To compute this slope, we select two weekday points, one at 70° (e.g., 10,482 MWh) and one at
90° (e.g., 15,853 MWH) and then calculate the slope as (15,853 MWh – 10,482 MW) over (90° –
70°) = 268 MWh/degree. The weekend slope is slightly lower at approximately 257 MWH per
degree. The lower slope is consistent with the story that less commercial and industrial space
cooling equipment is active on weekends than weekdays. On the space heating side of the scatter
plot (i.e., when temperatures fall below 62°), the weekday space heating slope is approximately
72 MWh/degree. The weekend space heating slope is approximately 77 MWh/degree. The fact
that load response to cold temperatures is lower than the response to hot temperatures reflects
the mix of space conditioning equipment that is in place. The main fuel used to cool off
conditioned spaces is electricity via air conditioners. In contrast, the main fuels used to heat up
conditioned spaces are gas and oil. Electricity use associated with space heating tends to be in
delivery systems like force air fans. The bulk of space heating is completed with fossil fuels and
not electricity.
◼ The weekday cooling slope for Dayton Power & Light is approximately equal to 88 MWh per
degree Celsius (49 MWh/degree Fahrenheit). To compute this slope, we select two weekday
points, one at 26° (e.g., 2,523 MWh) and one at 20° (e.g., 1,993 MWH) and then calculate the
slope as (2,523 MWh – 1,993 MW) over (26° – 20°) = 88 MWh/degree. The weekend slope is
slightly lower at approximately 79 MWH per degree Celsius (44 MWh/degree Fahrenheit). The
lower slope is consistent with the story that there is less commercial and industrial space cooling
equipment active on weekends than weekdays. On the space heating side of the scatter plot (i.e.,
when temperatures fall below 10° Celsius), the weekday space heating slope is approximately 35
MWh/degree. The weekend space heating slope is approximately 33 MWh/degree. Like CE, the
bulk of space heating is done with fossil fuels, which means the space heating slopes are not as
steep as the space cooling slopes.
FIGURE 2-13. DAYTON POWER & LIGHT AVERAGE DAILY LOAD VERSUS AVERAGE DAILY TEMPERATURE
FIGURE 2-14. COMMONWEALTH EDISON AVERAGE DAILY LOAD VERSUS AVERAGE DAILY TEMPERATURE
Hour Specific Scatter Plots. A scatter of daily energy versus average daily temperatures provide insight
as to whether a load is weather-sensitive or not. Hour specific scatter plots are used to explore how the
weather response varies across hours of the day. For example, morning loads may exhibit a bigger space
heating response than the afternoon. Along those same lines, afternoon loads may exhibit a bigger space
cooling response than the morning. Hour specific scatter plots that plot the load for an hour against the
temperature for that hour can help reveal these response differences.
The 2AM and 2PM scatter plots for Dayton Power & Light and Commonwealth Edison are presented
below. In these plots the hourly load is plotted against the average daily temperature. The blue dots
represent weekdays and the green dots present weekends and holidays. For Dayton Power & Light, the
average temperature is in degrees Celsius. For CE, the average temperature is in degrees Fahrenheit. The
following observations are drawn from these scatter plots.
◼ For both load zones, the relationship between 2AM loads and average daily temperatures is
fuzzier than the relationship between 2PM loads and average daily temperatures. Also, there is
significant overlap between 2AM weekday and weekend loads. In contrast, there is a significant
gap between the weekday and weekend 2PM load response.
◼ The Commonwealth Edison space cooling slopes for 2AM and 2PM are:
─ @2AM Weekday ~ 190 MWh/ °F
─ @2AM Weekend ~ 190 MWh/ °F
─ @2PM Weekday ~ 314 MWh/ °F
─ @2PM Weekend ~ 388 MWh/ °F
◼ The Commonwealth Edison space heating slopes for 2AM and 2PM are:
─ @2AM Weekday ~ 71 MWh/ °F
─ @2AM Weekend ~ 71 MWh/ °F
─ @2PM Weekday ~ 71 MWh/ °F
─ @2PM Weekend ~ 58 MWh/ °F
◼ The Dayton Power & Light space cooling slopes for 2AM and 2PM are:
─ @2AM Weekday ~ 67 MWh/ °C
─ @2AM Weekend ~ 67 MWh/ °C
─ @2PM Weekday ~ 109 MWh/ °C
─ @2PM Weekend ~ 108 MWh/ °C
◼ The Dayton Power & Light space heating slopes for 2AM and 2PM are:
─ @2AM Weekday ~ 29 MWh/ °C
─ @2AM Weekend ~ 28 MWh/ °C
─ @2PM Weekday ~ 71 MWh/ °C
─ @2PM Weekend ~ 58 MWh/ °C
◼ From a load model building perspective, the key findings that need to be incorporated into a
forecast model specification are as follows.
─ Both loads are weather sensitive.
─ The weather sensitivity leads to a nonlinear response between loads and temperatures.
─ The slopes implied between weekdays and weekends differ.
─ The heating and cooling slopes differ; that is, the weather response is not symmetrical.
─ The heating and cooling slopes differ by time of day.
FIGURE 2-15. DAYTON POWER & LIGHT AVERAGE LOAD @ 2AM VERSUS AVERAGE DAILY TEMPERATURE
FIGURE 2-16. DAYTON POWER & LIGHT AVERAGE LOAD @ 2PM VERSUS AVERAGE DAILY TEMPERATURE
FIGURE 2-17. COMMONWEALTH EDISON LOAD @ 2AM VERSUS AVERAGE DAILY TEMPERATURE
FIGURE 2-18. COMMONWEALTH EDISON LOAD @ 2PM VERSUS AVERAGE DAILY TEMPERATURE
3 DATA CLEANING
The odds are very good that the load data you are modeling will need to be cleaned. It is common to have
load data with missing values, bad measurements, load redefinition, or a thousand other things that would
render a numerical value that simply does not make sense.
Why do we care about cleaning the load data? We care because the modeling approaches we utilize work
on a basic principle of finding model specifications that make the sum of the squared forecast errors as
small as possible. We will elaborate on why minimizing the sum of squared forecast errors makes sense
when we discuss the alternative forecast frameworks. The challenge with bad data is they lead to big
forecast errors. When you square these forecast errors, they get even bigger. In some cases, the squared
errors from bad data points will drive the estimated model specification. In other words, a few bad apples
spoil the whole basket, which in this case is the model. To avoid the possible skewing of the model
specification, we need to eliminate the forecast errors associated with bad data.
When you encounter missing or bad data, you have a couple options to consider. If your estimation data
span several years, your first option is to remove the days that contain one or more missing or bad data
values. For example, say you have three years of load data, which corresponds to roughly 1,095 daily
observations. Even if you removed 10% of the days due to bad data, you are left with about 980
observations, which is plenty of data to accomplish the task.
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Cleaning|3-1

Conversely, if you have a very limited range of data or the bad data falls on a critical day like a holiday for
which there are very few holiday days, then you need to fix the data. Discussed below are different
approaches to fix the data. Which approach to use will depend on how many data points are bad.
3.1 OUTLIER DETECTION

There are two approaches to detecting bad data: (1) visual inspection and (2) validation tests. When
building models, my preference is visual inspection because it is difficult to define validation tests that
catch an unexpected shift in the load arising from redefinition that does not push the data out of
reasonable bounds. Validation tests, on the other hand, are extremely valuable in real-time forecasting
when load data are flowing in hourly or sub-hourly and there is no time to visually inspect the data.
3.1.1 Visual Inspection

Visual inspection uses the same hourly graphs and scatter plots described above and is good at catching
the following data defects.
◼ Load spikes. In most cases, these are easily seen with a quick review of the hourly data. Up data
spikes usually are associated with poor data reads. Down data spikes can occur because one or
more of the SCADA metering points was missing, leading to a lower than expected load value.
Also, if the load data is adjusted for daylight saving time, you will often see a 0.0 value at the hour
at which the clocks sprung forward in the spring. Sometimes, you will see a double counting of
load at the hour when the clocks fall back in autumn.
◼ Repeat Values. Some systems will fill missing load values by repeating the last good data read.
Since the repeated value was good, the repeated values will be good unless a validation test that
is designed specifically to identify repeat values is used. Repeat values are hard to see unless the
graph is zoomed in on a week or even a day.
◼ Load Shifts. Load shifts can be either up or down. A drop in load due to a major customer shutting
down will result in a load shift. Conversely, a new large customer going online will result in a shift
up in the load. Known load shifts like these can be easily modeled. Other load shifts that are
common when working the SCADA measurements is a redefinition of a load zone. There are many
valid reasons for a load zone redefinition. If these reasons are understood and persistent, then
they are easily modeled. However, there are times when a redefinition is inadvertent. In these
cases, the load shift will not persist but will cause large forecast errors. Visual inspection is the
best way to catch these inadvertent load shifts.
◼ Missing Data. Missing data are hard to test, but easy to see in a graph. In most cases, the gaps
are one or two periods in length and can be readily filled. In other cases, days of data can be lost.
A Practitioner’s Guide to Short-term Load Forecast Modeling Data Cleaning |3-2

For these cases there is nothing to do but ignore these data gaps. Often there will be a missing
value on the daylight savings switch in the spring.
3.1.2 Validation Tests

There are two key advantages of utilizing validation tests to identify erroneous data. First, no matter how
diligent you are, it is tedious to look at multiple years of hourly or sub-hourly data. Validation tests can
spin through the data quickly and catch most problems. Second, in real-time operations there is not
sufficient time to review the data as it flows in. Here, data validation is critical to preventing erroneous
data from flowing into the forecast process. The following are the most common data validation tests.
◼ Maximum and Minimum Tolerances. This test is looking for data points that exceed minimum
and maximum acceptable tolerances. A quick visual review of the historical data will suggest
reasonable bounds. More sophisticated tests allow the bounds to vary by month or season.
◼ Spike Detection. The goal of a spike detection test is to identify data points that lie (either on the
high side or the low side) outside a reasonable distance from the surrounding data points. An
example of a spike detection test is to compare the current data value to a data value that is n
times bigger (or smaller) than the average of the surrounding data points. Here n is a user-
supplied tolerance. A value of 3 for n means if the current data is more than three times greater
(smaller) than the average of the surrounding data points then the data point should be flagged
as a data spike.
◼ Unexpected Repeat Values. Most hourly and sub-hourly loads do not repeat themselves from
one data read to the next. The exception could be an industrial load that runs flat out. A repeat
value test compares the current data point to the prior k data points. If the k data points are the
same, then they are flagged as repeat data. The value of k is the number of repeat data values
that are acceptable before marking the data as erroneous.
3.2 CLEANING APPROACHES

Once the historical load data have been reviewed for erroneous or missing values, the question arises as
to how to correct the data. The following are three broad data cleaning approaches.
◼ Manual Fill. The most straightforward and yet tedious approach is to replace each erroneous
data value with a user-defined value. It is straightforward because all you need to do is type in a
value. It is tedious because you must decide what value to type.
◼ Linear Interpolation. If the erroneous data lies within a series a good data points, then the
erroneous data are replaced with an average of surrounding data points. The average can be

computed using linear weights or polynomial weights as suggested by A. Savitzky and M.J.E.
Golay.5 In cases where the erroneous data leave long gaps in the data, it is best to remove that
day(s) of data from the model estimation process in lieu of filling the data.
◼ Fill with Model. The idea here is to use the predicted value from a statistical model that has been
fitted to the non-erroneous data. Ironically, if there is sufficient data to fit a powerful statistical
model by removing the erroneous data, there is no reason from a forecasting perspective to fill
the bad data.
In general, linear interpolation works well for filling small data gaps arising from missing values, data
spikes, and repeat data values. Bigger data gaps are best removed from model estimation. Load shifts
require model specifications that are designed to address the shift.
5 Savitzky, A. and Golay, J.J.E. (1964) “Smoothing and Differentiation of Data by Simplified Least Squares
Procedures”, Analytical Chemistry. 36 (8): 1627-39.

4 LOAD FORECAST METHODS
We introduce two main load forecast frameworks: algorithms and statistical models. Within statistical
model frameworks, we further segment the framework between univariate models that use only the
historical load data to forecast future load values and multi-variate models that combine historical load
with a broad set of explanatory data that capture the calendar and meteorological factors that drive loads.
We begin with the most common load forecast algorithms.
4.1 LOAD FORECAST ALGORITHMS

Load forecast algorithms implement user configurable logic to select one or more historical load shapes
that best represent the day that is to be forecasted and use those historical load shapes to craft the load
forecast. We present three types of algorithms. The first two algorithms have been actively used by the
industry for so long that it is hard to pinpoint who to credit for developing these algorithms. The third
algorithm comes from the domain of machine learning algorithms. Because it is an algorithm only recently
brought to the field of short-term load forecasting, the track record on its relative forecast performance
is insufficient.
4.1.1 Like Day or Similar Day Lookup Algorithms

Arguably the most pervasive short-term load forecast method is the like day or similar day lookup
algorithm. In general, a similar day lookup algorithm utilizes a database of historical load shapes where
each day is assigned the calendar and weather conditions that prevailed on that day. These conditions
are used to find (i.e., filter) historical days that are “like” the calendar and weather conditions of the
forecast day. Although there is no single similar day lookup algorithm and each load forecaster has their
A Practitioner’s Guide to Short-term Load Forecast Modeling Load Forecast Methods |4-1
own set of calendar and weather conditions they rely on, there are steps common across most algorithms.
These steps are listed below.
Step 1. Filter on how many months back a similar day can be found. This first high-level data filter
recognizes that economic and technology conditions vary over time. For example, hourly load data from
ten years ago will not be impacted by high saturations of on-premise solar generation. If the forecast day
is for a time with a high saturation of on-premise solar generation, then the ten-year-old data is virtually
useless. By limiting how many months back the similar day lookup algorithm can look for a similar day
ensures the load days that are found have shapes that are consistent with current economic and
technology conditions.
Step 2. Filter on how many days before/after the calendar day of the forecast day. This second high-
level filter attempts to find days that are close in the sense of the calendar to the forecast day. For
example, if the forecast day is April 15, then the load forecasters might opt to find only days that range
from March 15 to May 15.
Step 3. Filter the historical load data based on calendar conditions for the forecast day. The purpose of
this step is to limit the number of similar days to days with similar calendar conditions as the forecast day.
Calendar conditions cover the day-of-the-week, month, or season, and whether a holiday falls on the
forecast day. Examples of other calendar conditions that could be considered include school holidays,
daylight saving observance, whether a change in the production schedule for a large customer is expected,
whether a strike will occur, and whether a major sporting event is to take place. Essentially any special
event that has happened in the past and had a noticeable impact on the load can be used to filter the
historical load data. Examples of how the filtering on calendar conditions can work follow.
◼ If the forecast day is a Monday, the similar day lookup algorithm would filter on all historical days
that are Mondays.
◼ If the forecast day is Easter Sunday, the similar day lookup algorithm would filter on all Easter
Sundays.
◼ If the forecast day is a Tuesday during the spring school break, the similar day lookup algorithm
would filter workdays during the spring school break. These days would then be further filter to
be either just Tuesdays or Tuesdays, Wednesdays, and Thursdays. The latter is an example of how
a similar day lookup algorithm could vary between different load forecasters.
Step 4. Rank similar days based on weather conditions. Steps 1 through Step 3 result in a list of several
similar days. The purpose of this step is to select among this list the historical day(s) that have weather
conditions “most like” the forecast day. At the minimum, the maximum and minimum temperature for
the day is needed to indicate whether the loads will have significant space heating or space cooling. Other
weather concepts such as humidity, cloud cover, and precipitation can be incorporated. How “most like”
is defined when comparing the weather of a historical day to the forecasted weather for the forecast day
is one area where similar day lookup algorithms vary. The most straightforward approach is to use a
weighted sum of the absolute differences between the forecast day weather and the weather of the
historical days. For example, a similar day lookup algorithm could use the following weighted sum to rank
the historical days.
WeatherRank d = Weight1 |MaxTempF − MaxTempd | + Weight 2 |MinTempF − MinTempd |

+ Weight 3 |AvgCloudCoverF − AvgCloudCoverd | + ⋯
+ Weight k |PrecipitationF − Precipitationd |
Here, the weather rank for historical day (d) is the weighted sum of the absolute differences of the various
weather conditions that are forecasted for the forecast day (F) and the observed conditions for day (d).
In this example, the similar day lookup algorithm utilizes (k) different measures of weather conditions.
Once each historical day is scored the days are sorted by their weather rank. The historical days most like
the forecast day will have the smallest weather rank.
At this point, the load forecaster can either use the historical day with the best weather rank as the load
forecast for the forecast day or take a weighted average of say the best three days and use the average
as the load forecast. There may also be some scaling applied to the historical day(s) to ensure the load
forecast is in sync with the most recent load trends. The steps taken once the similar days have been
found are part of the secret sauce of each similar day lookup algorithm. Further, the parameters (e.g.,
months back, days forward, and days back) that guide each step in the similar day lookup algorithm are
controlled by the load forecast analyst. The parameter values depend largely on the forecast analysts’
previous success forecasting days like the forecast day. As a result, most similar day lookup algorithms
facilitate the mechanics of making a load forecast based on expert judgement. An expert with strong local
knowledge will be hard to beat.
4.1.2 Rotation Algorithms

The basic idea of a rotation algorithm is to rotate historical load shapes onto the forecast calendar. The
most basic rotation algorithm takes the load from last Monday and uses that as the forecast for next
Monday. It then takes the load from last Tuesday and uses that as the forecast for next Tuesday. Last
Wednesday becomes the forecast for next Wednesday, and so on and so forth. You can think of this as a
rotation by day-of-the-week.
This day-of-the-week rotation algorithm is useful when forecasting large industrial and agriculture loads
that follow relatively stable production schedules. In these cases, the operating behavior of the most
recent week is a strong predictor of the operating behavior over the short-term forecast horizon. Where
this variation of a rotation algorithm breaks down is when there is a significant and unpredictable change
in production schedules. But with limited to no knowledge of the production schedules, day-of-the-week
rotation is a good way to go.
With this basic idea of rotating historical days onto the forecast calendar, it is easy to envision and
implement the following variations of the basic rotation algorithm.
◼ Repeat Prior Hour. Under this variation, the forecast values are set equal to the last load
measurement. This variation makes sense for very near-term forecast horizons of up to eight to
twelve hours ahead and where there is real-time load measurement flowing in at least hourly or
more frequently. Further, it only makes sense for flat line loads since the forecast that is produced
is a flat line launching off the last load value. This algorithm would not make sense for the
Commonwealth Edison and Dayton Power & Light load shapes, which are anything but flat lines.
The forecast power of the repeat prior hour rotation algorithm is that the very near-term forecast
is synced to the most recent measurements. Its forecast risk is that the last data measurement
might be erroneous, leading to a near-term forecast that simply repeats the erroneous data
values. High reliance on real-time data reads requires powerful data validation routines to avoid
the risk of projecting off a bad data value.
◼ Repeat Prior Day. Rather than replicating all the days from the prior week, this variation simply
repeats the last day over all the days in the forecast horizon. With a flat load shape, repeating
the last day will lead to the same forecast as replicating all the days from the prior week, but with
less calculations. This has the added advantage that if there is a significant change in the
production schedule, repeating the last value will lead to a forecast error for the first forecast day,
but then the next several days the forecast will adjust to the new load level, which effectively
reduces the chances of several days of significant forecast errors. In contrast, the day-of-the-
week rotation algorithm will lead to seven days of forecast errors before the algorithm catches
up to the new load levels. Like the repeat prior hour algorithm, this algorithm is not well suited
for loads that vary by day of the week like the Commonwealth Edison and Dayton Power & Light
loads.
◼ Repeat Same Day Type Last Week. This rotation algorithm is like the day-of-the-week rotation
algorithm presented above, but with the twist that it first averages the weekday (e.g., Monday,
Tuesday, Wednesday, Thursday, and Friday) load shapes from last week and the weekend (e.g.,
Sunday and Saturday) load shapes prior to rotating the two-day type shapes onto the forecast
calendar. The result would be a load forecast where Saturday and Sunday have the same
weekend load shape forecast, and the other days of the week have the same weekday load shape
forecast. This is a two-day type rotation algorithm. A four-day type rotation algorithm would
treat Mondays and Fridays as two distinct day types. This would work well for industrial and
agriculture loads with a distinct day-of-the-week load pattern. It could be applied to
Commonwealth Edison and Dayton Power & Light loads, but the load forecast will be subject to
errors if the weather pattern from the last week differ significant from the forecasted weather.
◼ Repeat Day-of-the-Week/Day Type Last Month. The tweak here is the load shapes that are
rotated forward are the average load shapes from the prior month or rolling n weeks. Under this
Rotation algorithm, the forecast analyst decides first whether the load shapes should vary by day-
of-the-week or by day-type. Second, they decide on how many weeks back, n, should be used to
construct the average load shapes. A twist would be to use a weighted average instead of a simple
average of the prior n weeks of data to construct the average load shapes. In this case, the
forecast analyst would place a higher weight on the most recent data.
◼ Repeat Day-of-the-Week/Day Type Last Year. The idea here is to use the load shapes from
roughly the same time of year as last year. The goal is to capture seasonal variation in the loads
that is repeatable from one year to the next. This would make sense with agriculture loads that
have significant pump loads or crop drying loads that take place approximately the same time
each year.
Like the similar day lookup algorithm, the power of a rotation algorithm relies on the expertise of the
forecast analyst. And like similar day lookup algorithms, the rotation algorithms facilitate the mechanics
of making a load forecast based on expert judgement.
4.1.3 Decision Tree Regression

This section introduces a machine learning algorithm for day-ahead load forecasting called decision tree
regression.6 Consider modeling the relationship between loads (dependent variable) and temperatures
(explanatory variable or feature) that is depicted in the following scatter plot. Here, load is plotted on the
vertical axis and average temperature (C°) is plotted on the horizontal axis.
6 See https://en.wikipedia.org/wiki/Decision_tree_learning for a brief overview of decision tree learning and a list
of references to obtain more details.
FIGURE 4-1. WEATHER RESPONSE OF LOADS TO TEMPERATURES
What we would like is a function based on these data that we can use to forecast the loads given a forecast
of the average temperature. Decision tree regression is a form of supervised machine learning that will
return such a function. Supervised learning means there is a known outcome that the algorithm is trying
to predict. In this case, the known outcome is the hourly load data. The term “regression” is used in the
machine learning literature to distinguish between “classification” algorithms where the known outcome
is a categorical variable (e.g., won or lost, blue or red) versus “regression” algorithms where the known
outcome is continuous (e.g., loads). In most cases, regression algorithms return an average of the known
outcome. If it helps, think of the decision tree regression as finding a set of load shapes that, when they
are averaged, form the forecast load shape.
At a high-level, decision tree regression starts by splitting the data of our training sample into regions. In
this example, we will start with two regions. Region S is defined as all days where the average temperature
is less than 15° . Region R is defined as all days where the average temperature is greater than or equal
to 15°. We can test how well this data split works by computing the squared differences of the observed
loads from the average load within each region. Formally,
2
Region S Score = ∑ ̂ S)
(Loadd − Load
d∈Region S
2
Region R Score = ∑ ̂ R)
(Loadd − Load
d∈Region R
The Region S score is the sum of squared errors over all days (d) where the average temperature is less
̂ S is the average load across all days (d) that fall into Region S. This is depicted as the purple
than 15°. Load
line in the figure below. The Region R score is the sum of squared errors over all days (d) where the
̂ R is the average load across all days (d) that fall
average temperature is greater than or equal to 15°. Load
into Region R. This is depicted as the green line in the figure below.
FIGURE 4-2. BINARY SPLIT OF THE WEATHER RESPONSE
The resulting decision tree regression provides the following forecast function:
̂S
IF (Average Temperature < 15) THEN Load Forecast = Load
̂R
ELSE Load Forecast = Load
What if you want to set the cut point (or binary split) in an optimal fashion? The Greedy algorithm solves
the following optimization problem.7
2 2
Minimize w. r. t. s, ∑ ̂ ∇S ) +
(Loadd − Load ∑ ̂ ∇R )
(Loadd − Load
d∈Region S∇ d∈Region R∇
Here, we are finding the binary split (∇) that minimizes the sum of the squared errors in both regions. The
Greedy algorithm performs a grid search over possible values for the binary split (∇). The value of the
binary split that leads to the smallest sum of squared differences is taken to be the optimal value.
It is easy to extend the size of the decision tree regression by adding additional cut points. For example,
we could define Region X as all days where the average temperature is less than 0°. We could also define
Region Z as all days where the average temperatures are greater than 25°. This would give us four regions:
Region X when average temperatures < 0°
Region S when average temperatures >= 0 and < 15°
Region R when average temperatures >= 15 and < 25°
Regions Z when average temperatures > 25°
An example of a decision tree regression with four regions is depicted below. For each region, we would
compute the average load across all days that fall into that region. The forecast function would then pull
the appropriate average load depending on which region the forecast day falls into.
7 See https://en.wikipedia.org/wiki/Greedy_algorithm for a high-level description of the algorithm and references

for more details.
FIGURE 4-3. FOUR REGION SPLIT OF THE WEATHER RESPONSE OF LOADS TO TEMPERATURES
The extension of the above idea to allow the weather response (i.e., the average load) to vary not only
with the average temperature, but also by day-of-the-week. This leads to further segmentation of regions
now with multiple dimensions instead of the single dimension of average temperature. While the idea of
defining regions based on more dimensions is attractive, you run the risk that for some dimensions, there
will be little to no data observations populating those regions. Mechanically this is not too much of a
problem, but what if one of those sparse regions happens to be the type of day we are trying to forecast
(e.g., an extremely hot or cold day)? There is no way to extrapolate the average load for surrounding
regions to imply an average load for the region we are forecasting.
The beauty of decision tree regression is that it provides an estimated forecast function from the data
without imposing a specific functional form. What does that mean? If you look at the last graph, the
nonlinear response between loads and temperatures is approximated by the average loads for each
region. We arrived there without saying the response function is linear, or a polynomial, or a logit
function, etc. We simply estimated the function non-parametrically. This is extremely useful and
powerful if there is sufficient data to cover all regions. If we define the regions either with too narrow cut
points or with too many features (i.e., explanatory variables like weekend versus weekday), then we run
the risk of having a forecast day that is not covered by one of the existing regions. To avoid this situation,
we must limit the number of regions by balancing the granularity of the cut points with the number of
slope offsets for things like day-of-the-week, season, etc. Further, introducing other weather concepts
like humidity, cloud cover, and wind speed further constrains the number of regions.
In the discussion on statistical approaches to day-ahead load forecasting, we will show that an alternative
parametric forecast function could look like,
Loadd = β0 + β1 AvgTempd + β2 AvgTemp2d + β3 AvgTemp3d
Here, we are imposing a specific functional form on the load response to weather. Specifically, we are
stating that the weather response can be approximated by a third order polynomial. We can extend this
functional form to allow the weather response to be different between weekdays and weekends as
follows:
Loadd = β0 + β1 AvgTempd + β2 AvgTemp2d + β3 AvgTemp3d + β4 Weekendd

+ β5 AvgTempd Weekendd + β6 AvgTemp2d Weekendd + β7 AvgTemp3d Weekendd
Other interaction terms can be included. The strength of the parametric function is the ability to
extrapolate to days not in the history. For example, if the day we are forecasting has an average
temperature that was not seen in the history, we can use the above estimated equation to produce a load
forecast. Extrapolation is a little bit harder to do in a decision tree regression model.
4.1.4 Forecast Algorithm Summary

Presented below are three load forecast algorithms that implement user configurable logic to select one
or more historical load shapes that best represent the day that is to be forecasted. The first two
algorithms, similar day lookup and rotation, have been around for at least the past 50 years. The third
algorithm, decision tree regression, is a relative newcomer to the field of day-ahead load forecasting, yet
it shares many of the same ideas that are embedded in a similar day lookup algorithm.
Effectively, two load forecast approaches were introduced. The first approach, which is demonstrated by
the various rotation algorithms, “learns” the historical relationship between loads from one hour to the
next, from one day to the next, from one week to the next, from one month to the next, and from one
year to the next. This relationship will be referred to in the next section on statistical modeling approaches
as a reliance on autoregressive terms. The concept of “learning” in the context of a rotation algorithm is
based on the forecast analyst’s perception of the autoregressive relationship. Based on this perception,
the forecast analyst selects the variation of the rotation algorithm that is expected to perform the best.
The second forecast approach, which is demonstrated by the similar day lookup and decision tree
regression algorithms, “learns” the historical relationship between loads and calendar and weather
conditions. That “learned” relationship is then used to forecast loads given forecasted calendar and
weather conditions. In this case, the “learning” is embodied in the forecast analyst’s selection of the
calendar and weather conditions that are used to segment the historical load data.
The main ideas that the reader should take from this section are as follows.
◼ Leveraging historical load patterns can provide powerful load forecasts, but how the historical
load patterns are leveraged is a critical decision point.
◼ Combining information about calendar and weather conditions with historical loads is critical to
accurately forecast weather-sensitive loads.
◼ There is no substitute for forecast experience when deciding on:
─ how to leverage historical load patterns, and
─ which calendar and weather conditions need to be considered when generating a load
forecast.
The next two sections introduce the main statistical approaches used in day-ahead load forecasting. We
will see that the “learning” aspect of load forecasting is removed from the logic of an algorithm to the
process of parameter estimation. Even though the forecast analyst appears to be relieved of the burden
of “learning” and replaced with statistical optimization, the reality is that every equation requires a
forecast analyst to define the elements of the equation. There is no substitute for replacing the process
of ongoing “learning” on the part of a forecast analyst if the goal is build powerful load forecast models.
4.2 UNIVARIATE FRAMEWORKS

In this section, we introduce forecast frameworks that use historical load data only to forecast loads. In
principle, rotation algorithms could be added to this section, but we consider rotation more as an
algorithm and less a statistical framework. The salient feature of the methods introduced in this section
is the ability to identify near-term trends in the data series and project these trends into the forecast
horizon. Because of the heavy reliance on historical, data they tend to shadow historical load values. They
are most powerful for very near-term forecast horizons of up to one to two hours ahead, where the most
recent load trends are expected to hold form. Because of the reliance on just historical load data, they
are blind to forecasted weather conditions and, as a result, they are susceptible to missing key turning
points in loads.
Before presenting the details behind the univariate model frameworks, the modeler needs to make two
key decisions about how the univariate models will be configured.
4.2.1 Lag Structures

The first decision point of load forecasting with univariate frameworks is the type of lag structure that
should be used. The first lag structure is referred to as hour-ahead. With this lag structure, the load for
an hour is a function of the previous hour(s) load(s). The hour-ahead lag structure can be described
generally as:
LoadForecast d,i = F(Loadd,i−1 , Loadd,i−2 , Loadd,i−3 , … , Loadd,i−k )
Here, the load forecast for day (d), interval (i) is a function of the actual loads for day (d), interval (i-1)
back to interval (i-k). For example, the load forecast for today at 10AM is a function of observed loads for
today at 9AM, 8AM, 7AM, 6AM, 5AM, and 4AM.
The second type of lag structure is referred to as day-ahead. With this structure, the load for an hour is a
function of the same hour, but from the previous day(s) leading up to the forecast day. The day-ahead
lag structure can be described generally as:
LoadForecast d,i = G(Loadd−1,i , Loadd−2,i , Loadd−3,i , … , Loadd−p,i )
Here, the load forecast for day (d) and time interval (i) is a function of the actual loads for day (d-1) through
day (d-p) for time interval (i). For example, the load forecast for today at 10AM is a function of observed
loads for yesterday at 10AM, the 10AM loads from two days back, and the 10AM loads from three days
back.
A third type of lag structure is referred to as a mixed lag structure and is a combination of both hour-
ahead and day-ahead lag structures. The mixed lag structure can be described generally as:
LoadForecast d,i
= F(Loadd,i−1 , Loadd,i−2 , Loadd,i−3 , … , Loadd,i−k )
+ G(Loadd−1,i , Loadd−2,i , Loadd−3,i , … , Loadd−p,i )
Here, the load forecast for day (d) and time interval (i) is a function of actual loads for the prior (k) intervals
and the prior (p) days for the same interval (i).
4.2.2 One or Many Load Forecast Equations
The second decision point is whether there should be:
1. a single load forecast equation that is used to forecast all time intervals in a day, or
2. a separate load forecast equation for each interval of the day.
Consider, the Commonwealth Edison and Dayton Power & Light load data presented in the following
figures. If the decision is to use a single equation to forecast all time intervals of a day, that equation must
have sufficient freedom to allow for a positive relationship for the transition from 7AM and 8AM, and a
negative relationship for the transition from 7PM and 8PM. In other words, the forecast equation must
work with the actual hour-ahead relationship, which varies between positive and negative as you progress
through the day. There are ways of building such an equation by allowing each explanatory variable to
have a parameter specific to each hour of the day. Consider the following example where hourly load on
day (d) and hour (i) is a function of the prior hour of load.
Loadd,i = β1,0 Hour1 + β2,0 Hour2 + β3,0 Hour3 + β4,0 Hour4 + β5,0 Hour5 + β6,0 Hour6
+ β7,0 Hour7 + β8,0 Hour8 + β9,0 Hour9 + β10,0 Hour10 + β11,0 Hour11
+ β12,0 Hour12 + β13,0 Hour13 + β14,0 Hour14 + β15,0 Hour15 + β16,0 Hour16
+ β`17,0 Hour17 + β18,0 Hour18 + β19,0 Hour19 + β20,0 Hour20 + β21,0 Hour21
+ β22,0 Hour22 + β23,0 Hour23 + β24,0 Hour24 + β1 Hour1 Loadd,i−1
+ β2 Hour2 Loadd,i−1 + β3 Hour3 Loadd,i−1 + β4 Hour4 Loadd,i−1
+ β23 Hour23 Loadd,i−1 + β24 Hour24 Loadd,i−1 + ed,i
Here, the explanatory variables Hour1 to Hour 24 are a set of hourly binary variables that take on a value
of 1.0 or 0.0. For example, if (i) is two, then the Hour2 binary variable will take on a value 1.0 and all the
other hourly binary variables will be set equal to 0.0. To allow the relationship between current hour
loads and the prior hour loads to vary across the hours of the day, we introduce a set of 24 model
parameters, one for each hour of the day. Further, we are allowing the average load to vary by hour of
the day with a second set of 24 model parameters that represent 24 distinct intercept terms. This leads
to a model with 48 distinct model parameters. If the lag structure is to be extended to include the prior
two hours, then another set of 24 model parameters would need to be added. In fact, each explanatory
variable that is added to the model would require a separate set of 24 model parameters. What appears
to be a very simple model approach quickly becomes a mess. Further, if an explanatory variable does not
have an hourly parameter offset, then the estimated impact of that variable is constrained to be the same
across all hours of the day.
The alternative is to define separate hour-ahead equations for each time interval of the day. With hourly
load data, you end up with 24 equations. Following the above example, the equivalent 24, hour-ahead
equations would look like the following.
Loadd,1 = β0 + β1 Loadd−1,24 + ed,1
Loadd,2 = β0 + β1 Loadd,1 + ed,2
Again, a complete set of 48 model parameters are estimated, but these 24-hourly equations are much
cleaner in structure than the single equation with hourly binary offsets. Further, in a paper by
Ramanathan et al.,8 it is shown that a single equation with hourly binary offsets is a constrained version
of set of separate hourly equations. As such, the single equation can only perform as well as the
unconstrained 24-hourly equations. Based on their findings, all the univariate and multivariate models
presented in this handbook utilize separate hourly equations. This is the approach we take in practice.
8 Ramanathan, Ramu, Robert Engle, Clive W.J. Granger, Farshid Vahid-Araghi, and Casey Brace. “Short-Run
Forecasts of Electricity Loads and Peaks,” International Journal of Forecasting, Volume 13, Issue 2, June 1997,
Pages 161-174.
FIGURE 4-4. DAYTON POWER & LIGHT WEEK VIEW: FEBRUARY 2017
FIGURE 4-5. DAYTON POWER & LIGHT WEEK VIEW: JULY 2017
FIGURE 4-6. COMMONWEALTH EDISON WEEK VIEW: FEBRUARY 2017
FIGURE 4-7. COMMONWEALTH EDISON WEEK VIEW: JULY 2017
4.2.3 Moving Average
The most straightforward approach is to project future load values as a moving average of the prior N
days of load data. This approach was the mainstay for demand response program evaluation, where a
moving average of days without load shed activity formed the comparison shape in an ex post
performance evaluation of a demand response event. This method has elements common with some of
the variations of the rotation algorithm. This method can be described mathematically as follows:
∑N
n=1 Loadd−n,i
LoadForecast d+1,i =
N
Where,
LoadForecast d+1,i is the day-ahead (d+1) load forecast for time interval (i)
Loadd−n,i is the measured load on day (d-n) for time interval (i)
N is the number of days back over which the average is computed
The choice of the number of days to include in the moving average is a key decision point for the forecast
analyst. Too many days back and you run the risk that the resulting shape does not reflect the seasonality
of the forecast period. This is especially risky rolling into and out of the spring and fall seasons. Too few
days and you run the risk that the forecast shape perpetuates short-run weather-driven load trends
leading to missing turning points in weather. The art of this approach is in the selection of how many
days, N, are included in the moving average.
A more sophisticated moving average would place weights on each observation to create a weighted
moving average such as:
∑N
n=1 Weight d−n Loadd−n,i
LoadForecast d+1,i =
∑N
n=1 Weight d−n
Where,
Weight d−n is the weight placed on the nth observation
Dividing by the sum of the weights corrects for the case that the weights do not sum to 1.0. In general,
we would expect the largest weights to be placed on the most recent days with the most distant days
having the smallest weights.
Introducing weights to the moving average opens this method to a range of weighting schemes. For
example, day-of-the-week weights could be designed such that the forecast for a Monday is based on the
weighted average of just the prior Monday loads, a forecast for a Tuesday is based on the weighted
average of just the prior Tuesday loads, and so on. Day type weights that weight working days differently
than non-working days can be designed to provide forecasts that differ between weekdays and weekend
days.
The downside of a weighted moving average is the forecast analyst now has two major decisions to make:
(1) how many days back (N) to include in the average, and (2) what weighting scheme to use. Without a
clear objective function the forecast analyst is left with trial and error in deciding how to best configure
the moving average.
4.2.4 Exponential Smoothing

Exponential smoothing builds on the concept of a weighted moving average by providing a functional
form for the weights. Specifically, the weights are determined by a discrete version of the exponential
function that relies on one parameter referred to as the smoothing parameter. The mathematical field of
Signal Process refers to exponential smoothing as an example of a data filter. The idea behind data
filtering is to remove the noise from a time series, leaving you with the pure signal. It is like the 1960s
and 1970s when you would wiggle the dial on your car radio until you found just the right place to allow
the hit songs to come through without all the background static. If it helps, you can think of the models
we use to forecast loads as data filters that remove the background noise and leave the predictable
portion of the measured load.
Let us start with writing down the exponential smoothing forecasting formula. We will demonstrate how
this formula leads to a weighting scheme that follows an exponential function in a bit.
SmoothLoadd+1,i = αLoadd,i + (1 − α)SmoothLoadd,i
Here,
SmoothLoadd+1,i is the one-day ahead (d+1) forecast of load at interval (i)
Loadd,i is the measured load on day (d) time interval (i)
SmoothLoadd,i is the one-day ahead forecast of load for day (d) and time interval (i)
α is the smoothing parameter
This base model states that the one-day ahead forecast (SmoothLoadd+1,i) is a weighted sum of the last
measured load (Loadd,i) and the one-day ahead forecast that was made for day (d) (SmoothLoadd,i). In
other words, our forecast for tomorrow is the same as the forecast for today plus some adjustment
associated with the most recent measurements. It helps to recognize that exponential smoothing is
designed to predict the average or mean of a time series, but where that mean evolves over time. My
current prediction of the average is SmoothLoadd,i . I now have a new observation and I need to decide
how does that new observation change my estimate of the average of the series. The smoothing
parameter, α, tells me how much weight I put on the latest measurement. A smoothing parameter value
close to 1.0 means my estimate of the mean of the time series should be very close to the most recent
load data. A smoothing parameter value close to 0.0 says my estimate of the mean should change only
slightly given the latest load measurement. The main take away is that what we are forecasting is the
mean of the load data series. My estimate of the mean is informed by measurement data with the
smoothing parameter determining how much weight is placed on the most recent measurement data.
Why do we refer to this method as exponential smoothing? Let us write down the sequence of day-
ahead forecasts for say the prior seven days plus tomorrow.
SmoothLoadd−7,i = αLoadd−8,i + (1 − α)SmoothLoadd−8,i
SmoothLoadd,i = αLoadd−1,i + (1 − α)SmoothLoadd−1,i
SmoothLoadd+1,i = αLoadd,i + (1 − α)SmoothLoadd,i
By substituting in the result from the prior day, we can rewrite these equations as:
SmoothLoadd−6,i = αLoadd−7,i + (1 − α)αLoadd−8,i + (1 − α)(1 − α)SmoothLoadd−8,i
SmoothLoadd−5,i
= αLoadd−6,i + (1 − α)αLoadd−7,i + (1 − α)(1 − α)αLoadd−8,i
+ (1 − α)(1 − α)(1 − α)SmoothLoadd−8,i
SmoothLoadd−4,i
+ (1 − α)(1 − α)(1 − α)αLoadd−8,i + (1 − α)(1 − α)(1 − α)(1 − α)SmoothLoadd−8,i
SmoothLoadd−3,i
+ (1 − α)(1 − α)(1 − α)αLoadd−7,i + (1 − α)(1 − α)(1 − α)(1 − α)αLoadd−8,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)SmoothLoadd−8,i
SmoothLoadd−2,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−8,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)SmoothLoadd−8,i
SmoothLoadd−1,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−7,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−8,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)SmoothLoadd−8,i
SmoothLoadd,i = αLoadd−1,i + (1 − α)αLoadd−2,i + (1 − α)(1 − α)αLoadd−3,i

+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−6,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−7,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−8,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)SmoothLoadd−8,i
SmoothLoadd+1,i
= αLoadd,i + (1 − α)αLoadd−1,i + (1 − α)(1 − α)αLoadd−2,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−5,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−6,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−7,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)αLoadd−8,i
+ (1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1 − α)(1
− α)SmoothLoadd−8,i
This last forecast equation is a bit of a mess. To clean things up, we gather all the load terms under one
summation symbol and use the fact that an expression raised to the 0.0 power will return a value of one.
This gives:
SmoothLoadd+1,i = (1 − α) SmoothLoadd−8,i + ∑ α (1 − α)n Loadd−n,i +

9
n=0
What this says is the one-day ahead forecast is the sum of the one-day ahead forecast that was made nine
days ago and the weighted sum of measured loads over the past eight days. If we had started with the
sequence of load forecast equations that started eight days back instead of seven days back, the above
equation would look like:
SmoothLoadd+1,i = (1 − α) SmoothLoadd−9,i + ∑ α (1 − α)n Loadd−n,i +

10
n=0
In practice, the forecast equation starts with some initial or a priori forecast value call it,
StartingSmoothLoad0 . This gives us:
SmoothLoadd+1 = (1 − α) StartingSmoothLoad0 + ∑ α (1 − α)n Loadd−n

N
n=o
How does this work? Suppose that the smoothing parameter equals 0.5 and the number of observations
leading up to day (d) is 25 (i.e., N=25). With this information we can compute the forecast for d+1. The
first part of the forecast equation is 0.0, because (1 − 0.5)25 ≅ 0.0. In other words, with a smoothing
parameter of 0.5 by the time we are 25 days into the future, our a priori estimate of the mean of the time
series has zero weight.
The second piece is the weighted average of the prior 25 days of loads. The table below shows the weights
that are implied by a smoothing parameter of 0.5. Note the dates in the data are sorted from the most
recent read to the oldest read. Applying the weights to the measured load data leads to a weighted
average load of 195. To summarize we have:
SmoothLoadd+1,i = 0 + 195 = 195
TABLE 4-1. EXPONENTIAL SMOOTHING WEIGHTS (𝜶 = 𝟎. 𝟓)

n (𝟏 − 𝛂)𝐧 𝜶(𝟏 − 𝛂)𝐧 𝐋𝐨𝐚𝐝𝐝−𝐧,𝐢 𝜶(𝟏 − 𝛂)𝐧 𝐋𝐨𝐚𝐝𝐝−𝐧,𝐢
0 1.00000000 0.50000000 200 100.00000
1 0.50000000 0.25000000 195 48.75000
2 0.25000000 0.12500000 190 23.75000
3 0.12500000 0.06250000 185 11.56250
4 0.06250000 0.03125000 180 5.62500
5 0.03125000 0.01562500 175 2.73438
6 0.01562500 0.00781250 170 1.32813
7 0.00781250 0.00390625 165 0.64453
8 0.00390625 0.00195313 160 0.31250
9 0.00195313 0.00097656 155 0.15137
10 0.00097656 0.00048828 150 0.07324
11 0.00048828 0.00024414 145 0.03540
12 0.00024414 0.00012207 140 0.01709
13 0.00012207 0.00006104 135 0.00824
14 0.00006104 0.00003052 130 0.00397
15 0.00003052 0.00001526 125 0.00191
16 0.00001526 0.00000763 120 0.00092
17 0.00000763 0.00000381 115 0.00044
18 0.00000381 0.00000191 110 0.00021
19 0.00000191 0.00000095 105 0.00010
20 0.00000095 0.00000048 100 0.00005
21 0.00000048 0.00000024 95 0.00002
22 0.00000024 0.00000012 90 0.00001
23 0.00000012 0.00000006 85 0.00001
24 0.00000006 0.00000003 80 0.00000
25 0.00000003 0.00000001 75 0.00000
SUM 195.00000
FIGURE 4-8. EXPONENTIAL SMOOTHING WEIGHTS (𝜶 = 𝟎. 𝟓)
Why is it called Exponential Smoothing? The figure that follows the table shows how the weights decay
over time. This decaying behavior is referred to as a geometric progression, which is a discrete version of
an exponential function.9 Therefore, part one of why this forecast method is referred to as exponential
smoothing has to do with the fact that the weights decay along an exponential path.
Part two of why this method is referred to as exponential smoothing has to do with the smoothing of the
raw time series. Consider taking a simple moving average of any chaotic time series. The resulting
averaged time series will be smoother than the raw time series. If you focus on the second element of
the exponential smoothing forecast equation (∑N n
n=o α (1 − α) Loadd−n,i), we have a moving average of
the raw load data. The result of this moving average will be smoother than the raw load data. As a result,
9 https://en.wikipedia.org/wiki/Exponential_smoothing
we have a forecast framework that smooths through the raw load data using a moving average that has
exponentially decaying weights. Exponential smoothing for short.
There are several variations of exponential smoothing that allow for trends and seasonal variations in the
loads. Double exponential smoothing is designed to handle load data that is trending.
SmoothLoadd+1,i = αLoadd,i + (1 − α)[SmoothLoadd,i + Trendd−1,i ]
Trendd,i = γ(SmoothLoadd,i − SmoothLoadd−1, ) + (1 − γ)Trendd−1,i
Here,
Trendd,i is the estimated load trend for day (d) and time interval (i)
γ is a second smoothing parameter
Triple exponential smoothing is designed to handle load data that show a trend and seasonal swings.
Loadd,i
SmoothLoadd+1,i = α + (1 − α)[SmoothLoadd,i + Trendd−1,i ]
Id−L
Trendd,i = γ(SmoothLoadd,i − SmoothLoadd−1, ) + (1 − γ)Trendd−1,i
Loadd,i
Id = β + (1 − β)Id−L
SmoothLoadd,i
Here,
Trendd,i is the estimated load trend for day (d) and time interval (i)
γ is a second smoothing parameter
Id is the seasonal load index for day (d)
β is a third smoothing parameter
L number of days in a season (e.g., 90 days)
There are several software platforms that offer simple exponential smoothing and associated variations.
Which variation makes sense for your load data requires experimentation and analysis. There is no
theoretical high ground as to right exponential smoothing model to use. The forecast analyst must rely
on their experience to select the version that works best.
Exponential Smoothing Working Example

In this section, we apply the exponential smoothing framework to the load data for CE. Specifically, we
will develop a forecast model of average daily load, which is computed as the average of the 24 hourly
load values.
The average daily load for Commonwealth Edison is plotted in Figure 4-9. As can be seen from this plot,
there is a noticeable seasonal swing in the average daily load, but no obvious growth trend. We first try
fitting a simple exponential smoothing model to these data. We use all the historical data from January
1, 2013 through December 31, 2016 to estimate the model coefficient. We then use to estimated model
to predict 2017 loads. While this may seem like we are doing a 365 day-ahead forecast for December 31,
2017, the software is generating a sequence of one-day ahead forecasts because the observed data for
each prior day is used to predict the next day. In general, it is hard to trick forecasting software to produce
something other than a sequence of one day ahead or one hour ahead forecasts. That is why you need
to be careful when interpreting the performance of an out-of-sample test like the one run here.
To really see how the model would perform in forecast mode, we re-estimate the model using all of data
through December 31, 2017 and then forecast 2018 where no load is available. In this case, the estimated
value for α the smoothing parameter is 1.228. Shown in Figure 4-10 is an in-sample fit of the estimated
exponential smoothing model. A quick review suggests the model does a good job in fitting the data. It
appears to capture the major seasonal swings while all the time tracking the data.
Figure 4-11 shows the in-sample fit for the last week of December 2017. The point to notice is that the
exponential smoothing model shadows, with a one-day lag, the actual load data. This is the challenge
with any model that solely relies on the load data from prior days or hours to forecast the future. It is like
driving backwards without any rearview mirrors to help guide you. All you can do is hope that if the road
has been bending to the left that it will continue to bend to the left because if it does not, you will crash
because you will still be steering the car to the left. The seductive nature of a model that relies on
autoregressive terms is that the in-sample fit looks so good. It would take hard work to craft a non-
autoregressive dependent model that fits as tightly as a one-hour lag. As a result, most modelers tend to
opt for the easy road and start their models with autoregressive terms. They obtain great in-sample fits,
but find themselves bewildered when their model fails time and time again to forecast turning points in
the load.
The limitation of the simple exponential smoothing model for forecast horizons of more than one hour
ahead can be seen from Figure 4-13, which shows the true one-month-ahead forecast. Once the model
moves past the last actual load value, there is no new information. As a result, the forecast remains fixed
at its last one-step ahead forecast value. This limitation plays itself out for all of 2018 where the forecast
is the same for every day, as shown in Figure 4-13.
To improve the 2018 load forecast, we next estimate the double exponential smoothing model. The
estimated coefficients are 1.229 on the smoothing parameter and 0.000 on the linear trend term. In other
words, there is effectively a very small growth trend in the Commonwealth Edison data. The forecast
results from the double exponential smoothing model are shown in Figure 4-14 and Figure 4-15. Despite
what appears to be no estimated trend, both figures show a slight upward trend, although one would be
hard put to say this is a great forecasting model.
Finally, we estimate the triple exponential smoothing model. The estimated coefficients are 1.183 on the
smoothing parameter, 0.001 on the linear trend term, and -0.008 on the seasonal term. The forecast
results from the triple exponential smoothing model are shown in Figure 4-16 and Figure 4-17. The
addition of the seasonal component appears to capture the day-of-the-week swings in the loads. For a
one-month-ahead forecast, the triple exponential smoothing model provides the most realistic forecast.
FIGURE 4-9. COMMONWEALTH EDISON: AVERAGE DAILY LOAD
FIGURE 4-10. COMMONWEALTH EDISON: SIMPLE EXPONENENTIAL SMOOTHING MODEL FIT ALL DATA
FIGURE 4-11. COMMONWEALTH EDISON: SIMPLE EXPONENTIAL SMOOTHING MODEL OUT-OF-SAMPLE FIT
FIGURE 4-12. COMMONWEALTH EDISON: SIMPLE EXPONENTIAL SMOOTHING MODEL ONE MONTH AHEAD
FIGURE 4-13. COMMONWEALTH EDISON: SIMPLE EXPONENTIAL SMOOTHING MODEL ONE YEAR AHEAD
FIGURE 4-14. COMMONWEALTH EDISON: DOUBLE EXPONENTIAL SMOOTHING MODEL ONE MONTH AHEAD
FIGURE 4-15. COMMONWEALTH EDISON: DOUBLE EXPONENTIAL SMOOTHING MODEL ONE YEAR AHEAD
FIGURE 4-16. COMMONWEALTH EDISON: TRIPLE EXPONENTIAL SMOOTHING MODEL ONE MONTH AHEAD
FIGURE 4-17. COMMONWEALTH EDISON: TRIPLE EXPOENTIAL SMOOTHING MODEL ONE YEAR AHEAD
Does exponential smoothing make sense for operational load forecasting? For very near-term forecast,
horizons of one-hour or two ahead, it could prove useful for stable loads. Any load that contains many
turning points during the day, whether those are weather-driven turning points or not, is not a good
candidate for exponential smoothing. In practice, we reserve exponential smoothing for long-term energy
and peak forecasting where it is important to capture long-run growth trends and seasonal cycles in the
data.
4.2.5 ARIMA Models

Like exponential smoothing, the autoregressive integrated moving average (ARIMA) model uses historical
loads to forecast future load values. The nice thing about the simple exponential smoothing model is that
there is not much in the way of decision making on the part of the load forecast modelers. Either the
resulting smoothing model looks good or not. ARIMA models have three major pieces that require
decisions on the part of the load forecast modelers. In 1970, George Box and Gwilym Jenkins described
an iterative process that helps guide the load forecast modeler through these decisions.10 ARIMA
modeling is sometimes referred to as Box-Jenkins models in some literature and software. Despite the
confusion, a load forecast modeler can follow the Box-Jenkins process to develop a powerful ARIMA
model.
Two of the three pieces of an ARIMA model – autoregressive and integrated – are easy to understand.
The third piece – moving average – takes a little bit to wrap your head around. Fortunately, for load
10 Box, George and Jenkins, Gwilym (1970), Time Series Analysis: Forecasting and Control. San Francisco: Holden-
Day.
forecasting, the first two pieces are really about all we need. The ARIMA model can be written generally
as:
Loadd = ARIMA(P, I, Q)
Here,
P is the number of autoregressive terms that are included in the model
I is the number of times the raw time series needs to be differenced to get to a stationary time
series
Q is the length of moving average term
The beauty of ARIMA modeling is that anyone familiar with this approach knows exactly what model is
being used to forecast loads if they know the values of P, I, and Q. Easy, right? It will be after we walk
through the math. Let us start with a simple ARIMA(P,0,0) model. Here, we are saying the model contains
(P) autoregressive terms, no differencing, and no moving average terms. Formally we have:
Loadd,i = ∑ ρp Loadd−p,i + ed,i

p=1
In this case, we are saying the load on day (d) for time interval (i) is a weighted average of the previous
(P) days of loads at time interval (i) plus a random error (e). The weights are the unknown parameters
(ρp ).
To help fix ideas, let P=3. Then we have:
Loadd,i = ρ1 Loadd−1,i + ρ2 Loadd−2,i + ρ3 Loadd−3,i + ed,i
Here, the loads today for interval (i) are a function of the loads for the previous three days at time interval
(i). This model will be familiar to those use to using linear regression. To estimate the model parameters
(ρp ), we use the same optimization problem used to find the model parameters in the linear regression
case.
Trends. One of challenges to ARIMA modeling is that the series we are modeling needs to be stationary
in order to achieve reasonable forecasts. For a series to be stationary, it cannot be trending upwards or
downwards. Consider the following simple example:
Loadd = 1.2Loadd−1
Here, we have an ARIMA(1,0,0) model and the estimated coefficient for (𝜌1 ) is 1.2. The table below shows
the 10 day-ahead load forecasts using this model. Starting with the last observed load value of 1,00, for
each day in the forecast period the load forecast grows by 20%. By the tenth day in the forecast horizon,
the load forecast calls for a load that is over seven times larger than the starting value. This is an example
of a nonstationary time series.
Unlike exponential smoothing where a trend term was added to the model structure to control for non-
stationarity time series, there is no place to put a trend in an ARIMA model. To overcome this limitation,
the time series approach utilizes the difference operator to create a new detrended time series. The
detrended time series is then modeled as an ARMA(P,Q) process. In forecast mode, the forecast from the
ARMA(P,Q) model is then adjusted by undoing the original differencing. A first order difference can be
written as:
Load1stOrderDifferenced,i = Loadd,i − Loadd−1,i
A second order difference can be written as:
Load2ndOrderDifferenced,i = Load1stOrderDifferenced,i − Load1stOrderDifferenced−1,i
Load2ndOrderDifferenced,i = (Loadd,i − Loadd−1,i ) − (Loadd−1,i − Loadd−2,i )
Load2ndOrderDifferenced,i = Loadd,i − 2Loadd−1,i + Loadd−2,i
The key thing to notice is that each subsequent differencing is applied to the differenced series from the
prior step. Once the differencing has been complete, you would then fit an ARMA(P,Q) model to the
differenced series.
How many times should the original time series be differenced before an ARMA(P,Q) can be fitted to it?
This is where the Box-Jenkins modeling approach comes into play.11 In their book, Box-Jenkins present a
three-step iterative approach to building ARMA models.
Step 1. Stationarity and Seasonality. The first step is to determine whether the time series needs to be
differenced to form a stationary series. ARIMA modeling considers two types of non-stationarities
including linear trends and seasonality. Seasonality, which load data are subject to, suggests that today’s
loads are highly correlated with loads at the same time last week or last year. In most cases, it is easy to
glean from a graph whether a load contains a linear trend and seasonality.
Box-Jenkins also suggest looking at an autocorrelation plot to assess whether a time series contains a
trend and seasonality. The idea is that a non-stationary time series will have an autocorrelation plot that
dies off very slowly. Essentially if there is a trend, it is very likely that today’s load is highly correlated to
the load for the prior several days. To capture this idea mathematically, we form the sequence of
autocorrelation coefficients. For example, the autocorrelation coefficient for a one-day lag is computed
as:
1 N−1 ̅̅̅̅̅̅̅ )(Loadn+1 − Load

̅̅̅̅̅̅̅ )
∑n=1 (Loadn − Load
r1 = N
1 N ̅̅̅̅̅̅̅
2
∑
N n=1(Loadn − Load )
Here, r1 is the autocorrelation coefficient for a one-day lag, Loadn is the load at time (n), and Load̅̅̅̅̅̅̅ is
the mean load over the sample period (N). In the denominator is the variance of the raw time series. The
numerator contains the autocovariance. By dividing the autocovariance by the variance, it places the
values for the autocorrelation coefficients on a scale of -1 to 1. A positive value for the autocorrelation
coefficient indicates current loads are positively correlated to the prior day load. If yesterday’s loads were
high, then it is expected today’s load will also be high. A negative value suggests that if yesterday’s loads
were high, then it is expected today’s loads will be low.
11 Box-Jenkins, Ibid
The autocorrelation coefficient for a h-step lag would be written generically as follows:
1 N−h ̅̅̅̅̅̅̅ )(Loadn+h − Load

̅̅̅̅̅̅̅ )
∑n=1 (Loadn − Load
rh = N
1 N ̅̅̅̅̅̅̅
2
∑
N n=1(Loadn − Load )
A plot of the autocorrelation coefficients is usually presented as a bar chart with each subsequent bar
representing an additional lag. An example of the autocorrelation function (ACF) plot for two, highly
autocorrelated time series, one positively correlated and the second negatively correlated, is presented
in Figure 4-18. In this example, the Box-Jenkins approach would suggest differencing the data until the
ACF plot no longer displays a slowly decaying series of autocorrelations. An example of an ACF plot that
does not display a slowly decaying series of autocorrelations is presented in
Figure 4-19.
FIGURE 4-18. AUTOCORRELATION FUNCTION (ACF) FOR TWO HIGHLY CORRELATED TIME SERIES
FIGURE 4-19. AUTOCORRELATION FUNCTION (ACF) FOR A NON-CORRELATED TIME SERIES
Step 2. Identify P and Q. Once a stationary time series has been achieved through differencing, the next
step is to identify potential values for P and Q, the order of the autoregressive, and moving average terms.
Here, Box-Jenkins suggest reviewing both the ACF plot of the differenced series and the Partial
Autocorrelation Function (PACF). The PACF provides information as to the explanatory power of each
subsequent autoregressive lag given all the lagged terms preceding it have had their say. For example,
the PACF for the third order autoregressive lag is the contribution of that third order lag given we have
already accounted for the impact of the first and second order lags. In principle, an AR(P) process (a) will
have an ACF that decays smoothly over time, and (b) the PACF values will drop off toward zero after P
lags. The rough rule of thumb for when a PACF value is closer to zero or insignificant is when the value
2
falls below ± 𝑁, where N is the number of observations used to estimate the PACF values.
√
There is a neat trick to estimating the values for a PACF plot by computing a sequence of regressions
where each regression in the sequence adds an additional autoregressive term. The estimated coefficient
on the additional autoregressive term is the estimated partial autocorrelation. The following is an
example of the sequence of regressions that are used to estimate the partial autocorrelations for up to a
six-period lag.
Loadd,i = ρ1′ Loadd−1,i + ed,i
Loadd,i = ρ1 Loadd−1,i + ρ′2 Loadd−2,i + ed,i
Loadd,i = ρ1 Loadd−1,i + ρ2 Loadd−2,i + ρ′3 Loadd−3,i + ed,i
Loadd,i = ρ1 Loadd−1,i + ρ2 Loadd−2,i + ρ3 Loadd−3,i + ρ′4 Loadd−4,i + ed,i
Loadd,i = ρ1 Loadd−1,i + ρ2 Loadd−2,i + ρ3 Loadd−3,i + ρ4 Loadd−4,i + ρ′5 Loadd−5,i + ed,i
Loadd,i = ρ1 Loadd−1,i + ρ2 Loadd−2,i + ρ3 Loadd−3,i + ρ4 Loadd−4,i + ρ5 Loadd−5,i

+ ρ′6 Loadd−6,i ed,i
The estimated partial autocorrelations are given by the following six parameters: ρ1′ , ρ′2 , ρ′3 , ρ′4 , ρ′5 , ρ′6
Armed with an ACF and PACF plot, we are now ready to identify the values for P and Q of the ARIMA(P,I,Q)
time series. This is the where the art of time series modeling comes into play. Box-Jenkins suggest the
following guidelines for selecting values for P and Q.
◼ If the ACF decays smoothly, then that suggests an AR(P) process. The length of the AR(P) process
is then selected by reviewing the PACF plot. The point right before the PACF bars drop off suggests
the value for P.
◼ If the ACF drops off suddenly instead of decaying slowing, then that suggests a MA(Q) process. In
this case, the value for Q is set equal to the number of ACF bars before the drop off.
◼ If the ACF alternates between negative and positive values but still decays to zero, this suggests
an AR(P) process. The point right before the PACF bars goes to zero suggests the value for P.
◼ If the ACF starts to decay after several periods it suggests a mix ARMA process. The MA(Q) would
be set to right before the ACF starts to decay. The AR(P) would be selected by finding the point
at which the PACF bars start to fall off.
◼ If all the ACF and PACF values are close to zero, then you have an ARMA(0,0) model or random
noise.
◼ If the ACF bars follow a seasonal (including daily) cycle, then seasonal difference should be
applied.
◼ If the ACF effectively does not decay, then the series needs to be differenced.
Step 3. Estimate. At this step, the initial ARMA(P,Q) model is estimated and the model in-sample and
out-of-sample performance is evaluated. Depending the results, the modeler iterates through the steps
until a satisfactory model is found. In fact, there are several statistical tests and selection criterion that
can help decide when a satisfactory model is found. Some of the most common are as follows.
◼ The Dickey-Fuller Test tests the null hypothesis that a unit root is present in an autoregressive
model.12 The simplest version of this test is whether the coefficient of an AR(1) equals 1.0 (i.e.,
unit root). Ultimately what the Dickey-Fuller Test tells you is whether the time series you are
trying to model needs to be differenced.
◼ The Bayesian Information Criterion (BIC) can be used to help select the best value for P and Q13.
The BIC can be written as:
BIC = ln(N) k − 2LN(SSE)
Here, we take the log of the total number of observations (N) times the number of parameters
that are to be estimated (k), which is this case is given by the order of P and Q. As the number of
parameters grows the BIC gets bigger. At the same time, as the order of P and Q grows the Sum
of Squared Errors (SSE) of the estimated model is reduced. The choice of the order of P and Q is
when the incremental reduction in the SSE is outweighed by the penalty, ln(N), of having
additional parameters to estimate.
◼ The Akaike Information Criterion (AIC) is an alternative criterion that can be used to help select
the best value for P and Q.14 The AIC is very similar to the BIC, but places a different weight on
the number of parameters. The AIC is defined as:
AIC = 2k − 2LN(SSE)
The art to ARIMA modeling is knowing when to stop. This is where experience coupled with the BIC and
AIC comes into play.
ARIMA Working Example

Presented in Figure 4-20 is the ACF and PACF plots for the Commonwealth Edison average daily load. The
first thing to notice is that the ACF plot shows a seasonal cycle at day seven of the lag structure. This is
consistent with what we have seen earlier with the hourly graphs in that there is a day-of-the-week cycle
in the load data. The second thing to notice is that the ACF does decay slowly otherwise. The PACF
suggests that the order of the AR terms would be one (1). Shown in Figure 4-21 is the ACF and PACF plot
after seasonally differencing the time series one time. This suggests an AR(1) model should be fit to
12 Dickey, D. A., Fuller, W. A. (1979), Distribution of the Estimators for Autoregressive Time Series with a Unit Root.
Journal of the American Statistical Association. 74 (366): 427-431
13 Schwarz, Gideon E. (1978). Estimating the Dimension of a Model. Annals of Statistics, 6 (2): 461-464.
14 Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle. In Petrov, B. N.
and Csaki, F. 2nd International Symposium on Information Theory, Tsahkasdor, Armenia, USSER, September 2-8,
1971, Budapest: Akademiai Kiado, pp. 267-281.
seasonally differenced time series. When this model is estimated, the result in an estimated coefficient
on the AR(1) term of 0.760.
FIGURE 4-20. ACF AND PACF FOR COMMONWEALTH EDISON AVERAGE DAILY LOAD
FIGURE 4-21. ACF AND PACF AFTER ONE-TIME SEASONAL DIFFERENCING
The forecast model performance is shown Figure 4-22 through Figure 4-24. The first figure shows that
like the exponential smoothing model, an ARMA model will fit tightly to the in-sample data because of
the heavy reliance on autoregressive terms. Like the triple exponential smoothing model, the resulting
ARMA model captures the day-of-the-week load pattern. The last figure shows the challenge with
modeling frameworks that rely heavily on autoregressive terms. Without new information in the form of
new load measurement, the forecast is fixed throughout the forecast horizon. Since we only seasonally
differenced and did not take first order differences, the forecast does not exhibit the growth trend that
came with the double and triple exponential smoothing models.
FIGURE 4-22. COMMONWEALTH EDISON: SEASONALLY DIFFERENCED, AR(1) MODEL FIT ALL DATA
FIGURE 4-23. COMMONWEALTH EDISON: SEASONALLY DIFFERENCED, AR(1) MODEL ONE MONTH AHEAD
FIGURE 4-24. COMMONWEALTH EDISON: SEASONALLY DIFFERENCED, AR(1) MODEL ONE YEAR AHEAD
4.3 MULTIVARIATE FRAMEWORKS

Based on the hourly load graphs and scatter plots for Commonwealth Edison and DAY, their loads vary
with the day-of-the-week, time-of-the-year, and weather. This section introduces the use of multivariate
regression and neural network models to leverage forecasts of calendar, weather, economic, and solar
conditions to forecast loads. Together these frameworks are the work horses of modern day real-time
load forecasting.
4.3.1 Regression
Most statistical based load forecast frameworks use some form of optimization to estimate the
parameters of the model. The most common optimization problem is characterized as finding values for
the model parameters (i.e., the unknows) such that some objective function is either at a maximum or
minimum value. The following example will illustrate the basic idea. Consider a vector of two variables:
(1) a dependent variable or outcome, and (2) an explanatory variable. Further, assume that the
relationship between the dependent variable and the explanatory variable is roughly linear, which can
be expressed as follows:
Yn = mX n + b + en
Here,
Yn is a vector (a column) of observations on the dependent variable where the elements of the
vector are indexed by (n)
X n is a vector of observations of the explanatory variable
en is a vector of model errors
b is a parameter (a single number) that is the intercept of the line
m is a parameter (a single number) that represents how much the dependent variable will
change with an incremental change in the explanatory variable (i.e., the slope)
This equation should look familiar to the reader since most of us were taught that Y=mX+b represents a
line. The one element that might look strange is the model error. The following figure illustrates the
relationship between Y, the predicted value from a trend line, and e. In this figure, the dependent
variable (Y) is measured on the vertical axis and the explanatory variable (X) is measured on the
horizontal axis. The red dots on the plot represent the (Yn, Xn) observations. The black line is the trend
line that Microsoft Excel® fit to these data. The green vertical lines represent the difference between
the observed Y and what the Excel trend line predicts for Y. In other words, the green lines represent
the model errors. The error values are computed as:
̂n
en = Yn − Y
Where,
̂n represents the fitted or predicted value for Y

Y
FIGURE 4-25. FITTING A LINE CHART VIEW
Recall, the most common optimization problem is characterized as finding values for the model
parameters (e.g., b and m) such that some objective function is either at a maximum or minimum value.
For the data presented in the figure above, this is equivalent to finding the line (i.e., values for b and m)
that “best” fits the data. Since there are an infinite number of lines that fit these data, we need a rule or
definition for selecting the “best” line. Consider defining “best” as the line that leads to the smallest
sum of errors. In this case, the optimization problem can be expressed as:
N N
Find b, m such that ∑ en = ∑ Yn − mXn − b is a small as possible

n=1 n=1
Here the optimization problem is to find values for the parameters (b and m) such that the sum of
model errors is as small as possible. In this case, the objective function is defined as the sum of the
model errors. The problem with this objective function is that there are several lines that will lead to
the same value of the objective function, namely 0.0. That is, there are several ways of setting b and m
so that the sum of the positive errors exactly offset the sum of the negative errors. Although this
objective function reduced the number of lines down from infinity it does not lead to a single unique
line. As a result, we are still left with how to select the “best” line among the set of lines that lead to the
same value of the objective function.
The problem with the above objective function is positive errors cancel negative errors. To avoid this
problem, we need to transform the errors in such a way that positive and negative errors do not negate
each other. One such transformation is to use the absolute function. This leads to the following
objective function.
N N
𝐹ind b, m such that ∑|en | = ∑|Yn − mXn − b| is a small as possible

n=1 n=1
Here, the task is to find values for the parameters (b and m) such that the sum of the absolute model
errors is as small as possible. This is an admirable objective function, but it is computationally challenging
to solve. Most optimization algorithms are based on differential calculus. The core idea is that the point
at which a function is at a local minimum (maximum) is where the first order derivative equals 0.0. To
determine whether that point is a minimum or maximum requires evaluating the sign of the second order
derivative. The problem with minimizing the sum of the absolute errors is that the objective function is
not differentiable. As result, optimization algorithms that rely on computing the first and second order
derivatives cannot be employed to find the “best” values for b and m.
The advantage of the absolute function is that positive and negative errors do not negate each other.
What is needed is a transformation of the errors that preserves this idea but leads to an objective function
that is continuous and everywhere differentiable. It turns out that such a transformation does exist in the
form of squaring the errors. This leads to the following objective function.
N N
2 2
Find b, m such that ∑(en ) = ∑(Yn − mX n − b) is a small as possible
n=1 n=1
Here, the task is to find values for the parameters (b and m) that lead to the smallest sum of squared
model errors. This quadratic objective function not only meets the criteria of being continuous and
everywhere differentiable, but it is also easy to work with. What we mean by “easy to work with” is the
first and second order derivatives are easy to derive by hand. If we can derive the derivatives by hand,
then it is easy to write a computer program that computes the derivatives.
Define the optimization problem as:
N N
2 2
Minimize w. r. t. b, m; L = ∑(en ) = ∑(Yn − b − mXn )
n=1 n=1
Here, we want to minimize the value of the objective function (L) with respect to (w.r.t.) the unknown
parameters b and m.
To find the estimated value for the intercept term (b), we take the first order derivative with respect to
(b) which is written as follows:
N
ϑL
= ∑ 2(Yn − b − mX n )(−1) = 0
ϑb
n=1
After some rearranging, we have the equation for the optimal value of b as:
∑N
n=1 Yn ∑N
n=1 X n
b̂ = −m =̅ ̂̅
Y−m X
N N
Here, the estimated value for the intercept (b̂) is the difference between the mean of Y (Y̅) and the mean
̂ ). One way of thinking about (b̂) is it represents the portion of the
̅) times the estimated slope (m
of X (X
mean of Y that is not explained by the mean of X times its weight or slope. In the extreme case where X
̂ = 0), then (b̂) is equal to the mean of Y (Y
does not explain any movement in Y, (i.e., m ̅).
Next, we find the estimated value for the slope (m) by taking the first order derivative with respect to (m),
which is written as follows:
N
ϑL
= ∑ 2(Yn − b − mXn )(−Xn ) = 0
ϑm
n=1
̅ − mX
Substituting (Y ̅) in for b in the above equation leads to the solution for the estimated value of m
as:
∑N ̅ ̅
n=1(Yn − Y)(X n − X )
m
̂ = N
∑n=1(X n − ̅
X )(Xn − ̅X )
Here, the estimated slope is equal to the variation of X and Y around their respective means over the
variation of X around its mean squared. If X and Y are positively correlated, then the value in the
numerator will be positive and the estimated value for m will be positive. If X and Y are negatively
correlated, then the value in the numerator will be negative and the estimated value for m will be
negative. If the joint variation of Y and X is greater than the variation of X with itself, then the estimated
value for m will be greater than 1.0 in absolute value. Essentially, the estimated value for m captures the
extent to which variations in X around its mean explains or is correlated with variations of Y around its
mean.
The beauty of the quadratic objective function is that there is one and only one optimal solution. This
means there are no other values for b and m that will lead to a smaller objective function value. As it
turns out, the process of finding the parameters that minimizes the sum of squared errors has a name
first coined in the scientific literature by Adrien-Marie Legendre as least squares. Later, Carl Friedrich
Gauss laid claimed to the invention of least squares and an academic war ensued. It was not until Francis
Galton applied the least squares technique to a study of the height of the descendants of trees that the
term regression was used. In his 1886 paper, Regression Towards Mediocrity in Hereditary Stature, he
wrote:
“It appeared from these experiments that the offspring did not tend to resemble their parents seeds in
size, but to be always more mediocre than they – to be smaller than the parents, if the parents were
large; to be larger than the parents, if the parents were small.”
Put differently, the size of the offspring regressed toward the mean. In general, when a forecast model is
referred to as a regression, it means the unknown parameters are estimated using the method of least
squares described by Legendre and Gauss.
Unfortunately, there are many variations on the theme in the load forecasting literature, which can cause
confusion. Linear regression and linear least squares are two names for the same thing. Simple linear
regression usually means there is one explanatory variable in addition to the intercept term. By the way,
the intercept term is an explanatory variable. A multivariate linear regression allows for multiple
explanatory variables. Simple linear regression and multivariate linear regression are the same thing with
the same least squares objective function.
Where things become confused in the machine learning literature is around the labels “regression” and
“linear”. A large portion of the machine learning literature is written around solving classification
problems where the dependent variable or the thing that is forecasted takes on one of a finite set of
values, categories, or classes. For example, a credit card transaction is fraudulent or not. When machine
learning is applied to continuous data, the problem is labeled as a “regression” problem. In this case, the
outcome that is returned is in the form of an average value, which harkens back to Galton’s description.
The point is, when the term regression is used in machine learning literature, it does not necessarily mean
the method of least squares.
The term “linear” is perhaps the most misunderstood term in both the load forecasting and the machine
learning literature. In the context of a linear regression or a linear multivariate regression, the term
“linear” refers to the parameters and not to the explanatory variables. The confusion arises when the
author mistakenly infers that the term “linear” refers to the explanatory variables. An example of a linear
regression that is linear in the parameters, but nonlinear in the explanatory variables, follows.
Loadd,i = β0 + β1 Temperatured,i + β2 Temperature2d,i + β3 Temperature3d,i + ed,i
Here, load on day (d) and time interval (i) is modeled as a third order polynomial in temperature. This
equation is highly nonlinear in temperature, but linear in the model parameters (𝛽0 , 𝛽1 , 𝛽2 , 𝛽3 ). The
fact that this equation is linear in the model parameters means the method of least squares will lead to a
unique solution for the four parameters by setting the first order derivatives equal to 0.0.
In contrast, the following equation is nonlinear in the model parameters.
β2 Temperatured,i β5
Loadd,i = β0 β1 + + β4 Temperatured,i + ed,i
β3 WindSpeedd,i
Here, we introduce three types of nonlinearities in the parameters. The first is a simple multiplication of
β0 and β1 . The reason this is a problem for least squares is there are infinite combinations of β0 and
β1 that, when multiplied together, give the same result. In other words, there is no one unique solution.
The second is a ratio of β2 and β3 . Again, there is no one unique solution. Finally, we have one
parameter β4 raised to the power of a second parameter β5 . This also leads to multiple solutions.
The problem of “linearity” is not a problem with the explanatory variables, which is commonly the knock
against linear regression posited in the machine learning literature, but rather with the model parameters.
Fortunately, there are an unlimited number of ways of expressing the nonlinear response between loads
and temperatures that do not violate the need for the equation to be linear in the model parameters. The
fact that we can construct forecast models that are nonlinear in the explanatory variables is the key to
powerful load forecasting models.
Why least squares? Turns out the solution to the least squares problem has some very attractive features
that can be summarized by the acronym BLUE. Formally, the Gauss-Markov Theorem states that in a
linear regression model in which the errors have expectation zero and are uncorrelated and equal
variances, the Best Linear Unbiased Estimator (BLUE) of the model parameters is given by the Ordinary
Least Squares (OLS) estimator.15 Here,
◼ Best means the model parameter estimates have the lowest variance, which means test statistics
that rely on measures of distance between a null hypothesis and the estimated parameter will be
as tight as possible.
◼ Linear is the requirement that the equation estimated is linear in the model parameters.
◼ Unbiased means the estimated model parameters will be centered around the true model
coefficients.
◼ Estimator means the value that is derived is an estimate of the true parameter
◼ Ordinary in Ordinary Least Squares means each observation is given equal weight.
From a practical point of view, this means the estimated parameters we obtain using least squares are
very good estimates of the true model coefficients. In the age of desktop computing where estimating
the parameters of a linear regression takes next to zero effort, this may not seem like a big deal. But in
the pre-computer age, where estimating the model parameters would take days of effort and a ream of
paper, knowing that least squares was the right thing to do was extremely comforting.
Another benefit of linear regression is that it is relatively easy to interpret the results. Let us return to the
general conclusion of Francis Galton that the offspring tended to regress toward the mean. Consider the
following linear regression.
Loadd = β1 Mondayd + β2 Tuesdayd + β3 Wednesdayd + β4 Thursdayd + β5 Fridayd

+ β6 Saturdayd + β7 Sundayd + ed
Here, the Load on day (d) is given by the weighted sum of seven day-of-the-week binary variables:
Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday. In this case, the binary variables
take on one of two values—1.0 if the day happens to be the day represented by the binary variable,
otherwise 0.0. For example:
◼ April 23, 2018 is a Monday. For this day, the Monday binary variable has a value of 1.0. The
remaining six binary variables have a value of 0.0.
◼ April 24, 2018 is a Tuesday. For this day, the Tuesday binary variable has a value of 1.0. The
15 The Gauss-Markov Theorem is named after Carl Friedrich Gauss and Andrey Markov.
◼ April 25, 2018 is a Wednesday. For this day, the Wednesday binary variable has a value of 1.0.
The remaining six binary variables have a value of 0.0.
◼ April 26, 2018 is a Thursday. For this day, the Thursday binary variable has a value of 1.0. The
◼ April 27, 2018 is a Friday. For this day, the Friday binary variable has a value of 1.0. The remaining
six binary variables have a value of 0.0.
◼ April 28, 2018 is a Saturday. For this day, the Saturday binary variable has a value of 1.0. The
◼ April 29, 2018 is a Sunday. For this day, the Sunday binary variable has a value of 1.0. The
In effect, each day in the estimation has one and only one day-of-the-week binary variable with a value of
1.0. The other day-of-the-week binary variables will have a value of 0.0.
To estimate the parameters for this model, we write the least squares optimization problems as follows:
L = ∑(Loadd − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd − β4 Thursdayd

d=1
2
− β5 Fridayd − β6 Saturdayd − β7 Sundayd )
Here, the least squares optimization problem is to find the parameters (β1 , β2 , β3 , β4 , β5 , β6 , β7 ) that
minimize the value of the objective function, L. To do this, we set the first order derivatives with respect
to each parameter equal to 0. The first order equations look like:
δL
= ∑D
d=1 2(Loadd − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd − β4 Thursdayd −
δβ1
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Mondayd )=0
δL
= ∑D
δβ2
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Tuesdayd )=0
δL
= ∑D
δβ3
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Wednesdayd )=0
δL
= ∑D
δβ4
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Thursdayd )=0
δL
= ∑D
δβ5
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Fridayd )=0
δL
= ∑D
δβ6
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Saturdayd )=0
δL
= ∑D
δβ7
β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Sundayd )=0
This seems like a mess, but let us work through the equation for the Monday coefficient (β1 ). First
multiply through by the term (−Mondayd ). This gives us:
δL
= ∑D 2
d=1 2(−Loadd Mondayd + β1 Mondayd + β2 Tuesdayd Mondayd +
δβ1
β3 Wednesdayd Mondayd + β4 Thursdayd Mondayd + β5 Fridayd Mondayd +
β6 Saturdayd Mondayd + β7 Sundayd Mondayd ) =0
Even more of a mess, but we can clean things up quite a bit by noting that:
• Tuesdayd Mondayd = 0 for all days

• Wednesdayd Mondayd = 0 for all days
• Thursdayd Mondayd = 0 for all days
• Wednesdayd Mondayd = 0 for all days
• Fridayd Mondayd = 0 for all days
• Saturdayd Mondayd = 0 for all days
• Sundayd Mondayd = 0 for all days
This leaves us with:
δL
= ∑D 2
d=1 2(−Loadd Mondayd + β1 Mondayd ) =0
δβ1
Rearranging terms leads to:
D D
β1 ∑ Mondayd2 = ∑ Loadd Mondayd

d=1 d=1
Pulling the sum on the left-hand side over to write gives us:
∑D
d=1 Loadd Mondayd
β̂1 =
∑D 2
d=1 Mondayd
The numerator contains the sum of all the loads that fell on a Monday; that is, the days when the Monday
binary takes on a value of 1.0. The denominator is the number of Mondays in the estimation period. As
a result, the least squares estimate of the Monday coefficient is the average Monday load. The estimated
coefficients for the other six day-of-the-week coefficients are presented below.
∑D
d=1 Loadd Tuesdayd
β̂2 =
∑D 2
d=1 Tuesdayd
∑D
d=1 Loadd Wednesdayd
β̂3 =
∑D 2
d=1 Wednesdayd
∑D
d=1 Loadd Thursdayd
β̂4 =
∑D 2
d=1 Thursdayd
∑D
d=1 Loadd Fridayd
β̂5 =
∑D 2
d=1 Fridayd
∑D
d=1 Loadd Saturdayd
β̂6 =
∑D 2
d=1 Saturdayd
∑D
d=1 Loadd Sundayd
β̂7 =
∑D 2
d=1 Sundayd
In this case, where the regression model includes the set of seven day-of-the-week binary variables, the
least squares solution is the average load by day-of-the-week. In fact, you would achieve the same
solution if you were to sort the daily load data by day-of-the-week and then compute the average by day-
of-the-week. The estimated coefficients of the regression are the mean values. Understanding that linear
regression is a way of estimating the average value of loads for different subsets of the data is important
for building powerful forecasting models. With this understanding, the modeler’s task is to construct
explanatory variables that create subsets of data that capture the key load variations across days, seasons,
solar and weather patterns, and special event days. The trick to powerful forecast models is the selection
of explanatory variables.
The Intercept Term. In the original definition of an equation, Y = mX + b, we include an intercept term.
To make this obvious, rewrite the equation as Y = mX + bIntercept. Here, the explanatory variable,
Intercept, takes on a value of 1.0 for all observations.
Let us carry this idea of having one of the explanatory variables be given by the intercept variable in the
day-of-the-week regression model presented above. Start by adding the intercept variable to our day-of-
the-week regression model as follows:
Loadd = β0 Intercept d + β1 Mondayd + β2 Tuesdayd + β3 Wednesdayd + β4 Thursdayd

+ β5 Fridayd + β6 Saturdayd + β7 Sundayd + ed
Here, the explanatory variable called, Intercept, has a value of 1.0 for all days (d) in the estimation period
(D). To estimate the parameters for this model, we write the least squares optimization problems as
follows:
L = ∑(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd

d=1
2
− β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd )
The first order equations look like:
D
δL
= ∑ 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd
δβ0 d=1
− β4 Thursdayd − β5 Fridayd − β6 Saturdayd
− β7 Sundayd )(−Intecept d ) = 0
δL
= ∑D
d=1 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd −
δβ1
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Mondayd )=0
δL
= ∑D
δβ2
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Tuesdayd )=0
δL
= ∑D
δβ3
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Wednesdayd )=0
δL
= ∑D
δβ4
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Thursdayd )=0
δL
= ∑D
δβ5
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Fridayd )=0
δL
= ∑D
δβ6
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Saturdayd )=0
δL
= ∑D
δβ7
β4 Thursdayd − β5 Fridayd − β6 Saturdayd − β7 Sundayd ) (−Sundayd )=0
Solving for the estimated coefficient on the intercept variable, we begin by taking the first order derivative
with respect to (β0 ) and multiplying through by the intercept variable. This gives:
∑ 2(−Loadd Intercept d + β0 Intercept 2d + β1 Mondayd Intercept d

d=1
+ β2 Tuesdayd Intercept d + β3 Wednesdayd Intercept d
+ β4 Thursdayd Intercept d + β5 Fridayd Intercept d
+ β6 Saturdayd Intercept d + β7 Sundayd Intercept d ) = 0
β0 ∑ Intercept 2d
d=1
D
= ∑(Loadd − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd

d=1
Which simplifies to:
D
1
β̂0 = ∑(Loadd − β̂1 Mondayd − β̂2 Tuesdayd − β̂3 Wednesdayd − β̂4 Thursdayd
D
d=1
− β̂5 Fridayd − β̂6 Saturdayd − β̂7 Sundayd )
Since one of the day-of-the-week binary variables takes on a value of 1.0 for every observation in the
estimation data set, we need to know what the values of the other model parameters are before we can
solve for the coefficient on the intercept term.
Solving the remaining seven first order conditions gives:
∑D 2
d=1 2(−Loadd Mondayd + β0 Intercept d Mondayd + β1 Mondayd )=0
Which resolves down to:
∑D
d=1(Loadd Mondayd − β0 Intercept d Mondayd )
β1 =
∑D
d=1 Mondayd
2
∑D ̂
d=1(Loadd Mondayd − β0 Mondayd )
β̂1 =
∑D 2
d=1 Mondayd
In a similar fashion the estimated values for the other Day-of-the-Week coefficients look like:
∑D ̂
d=1(Loadd Tuesdayd − β0 Tuesdayd )
β̂2 =
∑D
d=1 Tuesdayd
2
∑D ̂
d=1(Loadd Wednesdayd − β0 Wednesdayd )
β̂3 =
∑D 2
d=1 Wednesdayd
∑D ̂
d=1(Loadd Thursdayd − β0 Thursdayd )
β̂4 =
∑D
d=1 Thursdayd
2
∑D ̂
d=1(Loadd Fridayd − β0 Fridayd )
β̂5 =
∑D 2
d=1 Fridayd
∑D ̂
d=1(Loadd Saturdayd − β0 Saturdayd )
β̂6 =
∑d=1 Saturdayd2
D
∑D ̂
d=1(Loadd Sundayd − β0 Sundayd )
β̂7 =
∑D 2
d=1 Sundayd
Given these equations, let us back into the value for the coefficient on the intercept variable starting with:
D
β0 ∑ Intercept 2d
d=1
D
= ∑(Loadd − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd

d=1
Which can be rewritten as:
D D D D
βD
0D = ∑ Loadd − β1 ∑ Mondayd − β2 ∑ Tuesdayd − β3 ∑ Wednesdayd
d=1 d=1 d=1 d=1
D D D
− β4 ∑ Thursdayd − β5 ∑ Fridayd − β6 ∑ Saturdayd

d=1 d=1 d=1
D
− β7 ∑ Sundayd
d=1
Note, the sum across all days (D) in the estimation period of the Monday binary variable returns the total
number of Mondays in the estimation period. We will call this number Mondays. We can do the same
for the other days-of-the-week. This simplifies the equation to:
βD
0D = ∑ Loadd − β1 Mondays − β2 Tuesdays − β3 Wednesdays − β4 Thursdays
d=1
− β5 Fridays − β6 Saturdays − β7 Sundays
The next step is to substitute in the values for each of the day-of-the-week parameters that we get from
the first order derivatives. This gives:
D
∑D ̂
βD
0D = ∑ Loadd − Mondays
∑d=1 Mondayd2
D
d=1
∑D ̂
− Tuesdays
∑D
d=1 Tuesdayd
2
∑D ̂
− Wednesdays
∑Dd=1 Wednesdayd
2
∑D ̂
− Thursdays
∑Dd=1 Thursdayd
2
∑D ̂
− Fridays
∑Dd=1 Fridayd
2
∑D ̂
− Saturdays
∑Dd=1 Saturdayd
2
∑D ̂
d=1(Loadd Sundayd − β0 Sundayd )
− Sundays
∑Dd=1 Sundayd
2
We can further simplify this expression by recognizing the summations in the denominators count the
number of days of each day-of-the-week. Cancelling out the number of days by day-of-the-week results
in:
D D D D
βD
0D = ∑ Loadd − ∑ Loadd Mondayd + β0 ∑ Mondayd − ∑ Loadd Tuesdayd
d=1 d=1 d=1 d=1
D D D
+ β0 ∑ Tuesdayd − ∑ Loadd Wednesddayd + β0 ∑ Wednesdayd

d=1 d=1 d=1
D D D
− ∑ Loadd Thursdayd + β0 ∑ Thursdayd − ∑ Loadd Fridayd

d=1 d=1 d=1
D D D
+ β0 ∑ Fridayd − ∑ Loadd Saturdayd + β0 ∑ Saturdayd

d=1 d=1 d=1
D D
− ∑ Loadd Sundayd + β0 ∑ Sundayd

d=1 d=1
Now collecting terms leads to:
βD
0 (D − Mondays − Tuesdays − Wednesdays − Thursdays − Fridays − Saturdays
− Sundays)
D
= ∑ Loadd (1 − Mondayd − Tuesdayd − Wednesdayd − Thursdayd

d=1
− Fridayd − Saturdayd − Sundayd )
One last rearrangement and we have the expression for the least squares estimate for the coefficient on
the intercept explanatory variable. However, we have a significant problem. The value of the total
number of days in the estimation period (D) exactly equals the sum of the total number of Mondays,
Tuesdays, Wednesdays, Thursdays, Fridays, Saturdays, and Sundays. In other words, the last step requires
dividing by zero which does not work.
The Problem of Multicollinearity. Where did things go wrong? Notice that to solve for the estimated
value for each of the parameters on the day-of-the-week binary variables, we need to know the estimated
value for the coefficient on the intercept variable. However, to solve for the value for the coefficient on
the intercept variable, we need to know the estimated values for the seven day-of-the-week coefficients.
This is a problem. Effectively, we have eight unknowns (the coefficients that need to be estimated) and
really seven equations. Although we have written down eight first order derivatives, one of these eight
equations can always be written as a combination of the other seven first-order derivatives. With eight
unknowns and seven equations, there is no unique solution. This problem is referred to in the literature
as multicollinearity. This happened when we introduced the intercept term, which created a case that for
every observation (d) in the estimation dataset (D), the intercept term is a linear combination of the day-
of-the-week binary variables. To see this, consider the following seven days of data. If for each day, you
sum the values of the seven day-of-the-week variables, you obtain a sum value of 1.0, which happens to
be exactly equal to the value of intercept variable. In other words, the intercept variable is a linear
combination of the seven day-of-the-week binary variables.
In the pre-computer and early computer days, multicollinearity was the bane of forecast modelers
because they would do all this work only to find out there was no unique solution. With advanced
computing power, multicollinearity is no longer an issue for solving for a solution because most least
squares solution algorithms are programmed to search for multicollinearity and take steps to remove one
or more explanatory variables until the problem goes away.
It would be better if the modeler understood the problem of multicollinearity and took steps to avoid the
problem in the first place. How would we avoid this issue? Consider dropping the Sunday binary variable
from the day-of-the-week regression equation. This gives us the following model:
Loadd = β0 Intercept d + β1 Mondayd + β2 Tuesdayd + β3 Wednesdayd + β4 Thursdayd

+ β5 Fridayd + β6 Saturdayd + ed
By dropping the Sunday binary variable, the intercept variable is no longer a linear combination of the
remaining day-of-the-week binary variables. Now see if by dropping the Sunday binary variable, we have
eliminated the problem of multicollinearity.
To estimate the parameters for this model, we write the least squares optimization problems as follows:
L = ∑(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd

d=1
2
− β4 Thursdayd − β5 Fridayd − β6 Saturdayd )
The first order equations look like:
D
δL
= ∑ 2(Loadd − β0 Intercept d − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd
δβ0 d=1
− β4 Thursdayd − β5 Fridayd − β6 Saturdayd )(−Intecept d ) = 0
After rearranging terms, we have:
D D
β0 ∑ 1 = ∑(Loadd − β1 Mondayd − β2 Tuesdayd − β3 Wednesdayd − β4 Thursdayd

d=1 d=1
− β5 Fridayd − β6 Saturdayd )
Expanding gives us:
D D D D
Dβ0 = ∑ Loadd − β1 ∑ Mondayd − β2 ∑ Tuesdayd − β3 ∑ Wednesdayd

d=1 d−1 d−1 d−1
D D D
− β4 ∑ Thursdayd − β5 ∑ Fridayd − β6 ∑ Saturdayd

d−1 d−1 d−1
Plugging in the first-order derivatives for the six day-of-the-week binary variable coefficients leads to:
D
∑D ̂
Dβ0 = ∑ Loadd − Mondays
∑D
d=1 Mondayd
2
d=1
∑D ̂
− Tuesdays
∑D
d=1 Tuesdayd
2
∑D ̂
− Wednesdays
∑Dd=1 Wednesdayd
2
∑D ̂
− Thursdays
∑Dd=1 Thursdayd
2
∑D ̂
− Fridays
∑D 2
d=1 Fridayd
∑D ̂
− Saturdays
∑D
d=1 Saturdayd
2
β0 (D − Mondays − Tuesdays − Wednesdays − Thursdays − Fridays − Saturdays)

D D D
= ∑ Loadd − ∑ Loadd Mondayd − ∑ Loadd Tuesdayd

d=1 d−1 d−1
D D D
− ∑ Loadd Wednesdayd − ∑ Loadd Thurdayd − ∑ Loadd Fridayd

d−1 d−1 d−1
D
− ∑ Loadd Saturdayd
d−1
If we let
Sundays = (D − Mondays − Tuesdays − Wednesdays − Thursdays − Fridays − Saturdays)
We can write the final solution for the estimated coefficient on the intercept variable as:
∑D
d=1 Loadd Sundayd
β̂0 =
Sundays
In this case, the estimated coefficient on the intercept variable equals the average load for the day-of-
the-week that was left out of the original regression equation, specifically Sunday.
The estimated coefficients for the six day-of-the-week binary variables are found by setting their
respective first order derivatives equal to 0. For the Monday binary variable, the calculations are:
δL
= ∑D
δβ1
β4 Thursdayd − β5 Fridayd − β6 Saturdayd ) (−Mondayd )=0
This can be simplified to:
D
δL
= ∑ 2(−Loadd Mondayd + β0 Intercept d Mondayd + β1 Mondayd2 )
δβ1 d=1
D D
β1 ∑ Mondayd2 = ∑ Mondayd (Loadd − β̂0 )

d=1 d=1
The estimated value for the coefficient on the Monday binary variable is then expressed as:
∑D ̂
d=1 Mondayd (Loadd − β0 )
β̂1 =
Mondays
Recall, the estimated coefficient on the intercept variable was equal to the average load for the day-of-
the-week variable that was left out of the equation, in this case Sunday. The estimated coefficient on the
Monday binary variable is the average difference between the Monday loads and the average Sunday
load.
In the same fashion, the estimated coefficients for the remaining day-of-the-week binary variables are
expressed as:
∑D ̂
d=1 Tuesdayd (Loadd − β0 )
β̂2 =
Tuesdays
∑D ̂
d=1 Wednesdayd (Loadd − β0 )
β̂3 =
Wednesdays
∑D ̂
d=1 Thursdayd (Loadd − β0 )
β̂4 =
Thursdays
∑D ̂
d=1 Fridayd (Loadd − β0 )
β̂5 =
Fridays
∑D ̂
d=1 Saturdayd (Loadd − β0 )
β̂6 =
Saturdays
The estimated coefficients on the day-of-the-week binary variables capture the average load on a
Monday, Tuesday, Wednesday, Thursday, Friday, and Saturday that is not already accounted for by the
intercept variable. In this way, the intercept variable acts like an anchor. It describes the base behavior
and the day-of-the-week binary variables describes how the base behavior changes by day-of-the-week.
In general, intercept variables are included in models to exploit this anchoring effect. The intercept
variable captures behavior that is not accounted for by all the other variables included in the model.
In the above example, we demonstrate that a solution to the multicollinearity problem was to remove
one of the day-of-the-week binary variables. It does not matter which binary variable is removed. Any
one of them will ensure that the intercept variable is not a linear combination of the remaining day-of-
the-week binary variables. Which day-of-the-week binary variable is left out is a matter of preference. In
general, hourly loads tend to be lowest on Sundays. By leaving Sunday out, the estimated coefficients on
the remaining binary variables will tend to be positive. This does not change the forecast performance of
the model, but it does make the estimated coefficients look “pretty”. If optics are important, then drop
out the day-of-the-week with the lowest average load.
Does multicollinearity show up in other cases? The most common cases of multicollinearity arise when
day-of-the-week, monthly, and seasonal binary variables are used. In all three cases, the solution is the
same. Drop out one of the day-of-the-week, month, and season binaries and leave the intercept variable
in the model. Also, if you had a model with no intercept variable, but all seven day-of-the-week and all
12 monthly binary variables, you will have an issue with multicollinearity because the sum of the seven
day-of-the-week binary variables will equal the sum of the twelve monthly binary variables. A good habit
to follow is to always remove one of the day-of-the-week, one of the month, and one of the season binary
variables from the model specification.
Can linear regressions really be non-linear in the explanatory variables? Just to prove the point that
linear regressions can be used to address the non-linear response between loads and weather, we will
solve for the parameters for the following equation.
Loadd = β0 + +β1 Weekendd + β2 Td + β3 Td2 + β4 Td3 + β5 Weekend × Td + β6 Weekend × Td2

+ β7 Weekend × Td3 + ed
Here,
Weekendd is a binary variable that takes on a value of 1.0 if the day (d) is a non-working day,
otherwise 0
Td is the average temperature for day (d)
Td2 is the average temperature for day (d) squared
Td3 is the average temperature for day (d) cubed
Weekend × Td is the average temperature for day (d) interacted with (i.e., multiplied by) the
weekend binary variable
Weekend × Td2 is the average temperature for day (d) squared interacted with (i.e., multiplied
by) the weekend binary variable
Weekend × Td3 is the average temperature for day (d) cubed interacted with (i.e., multiplied by)
the weekend binary variable
In this case, the model is nonlinear in temperature, but is linear in the model parameters. The regression
estimates for Commonwealth Edison and Dayton Power & Light are shown in Figure 4-26 and Figure 4-27.
FIGURE 4-26. ESTIMATED NONLINEAR WEATHER RESPONSE FUNCTION FOR COMMONWEALTH EDISON
FIGURE 4-27. ESTIMATED NONLINEAR WEATHER RESPONSE FUNCTION FOR DAYTON POWER & LIGHT
4.3.2 Neural Network Models
At the heart, or the perhaps at the brain, of the neural network efforts has been the goal to build a model,
software, and hardware that could mimic the function of a brain. While work in this area ebbed and
flowed from the late 1940s through the early 1980s, it was not until the late 1980s that neural networks
gained traction across many industries. Today neural network models are employed for a wide range of
classification, pattern recognition and forecasting problems. This section presents an overview of neural
network models and their application to short-term load forecasting.
We begin the discussion on neural network models with a decision framework. Consider the forecast
problem: Is today going to be a peak load day? Since there are only two answers to this question, either
Yes or No, it can be characterized as a classification problem. An example, of a decision framework that
can be applied to answer this question is presented in Figure 4-28. Reading from left to right, we start
with three inputs to the decision: (a) the forecasted average temperature for the day, (b) a binary variable
that indicates whether the forecast day lands on a Saturday or Sunday, and (c) a second binary variable
that indicates whether the forecast day is a holiday or not. Each of these inputs is assigned a weight and
then the weighted sum (βX) is computed. This weighted sum (βX) is then passed to a decision rule where
the weighted sum is compared to a threshold value (T). The decision rule is based on the following test:
(βX ≥ T). If the test is true, then the rule returns a 1.0 indicating it is a peak day. If the test is false, the
decision rule returns a 0.0 indicating it is not a peak day.
FIGURE 4-28. BASIC ELEMENTS OF A FEED FORWARD NEURAL NETWORK USED FOR CLASSIFICATION
Shown in Figure 4-29 is this decision framework populated with sample data. Here, the average
temperature forecast is 67, the forecast day is a weekend, and no holiday is in effect on this day. The
weight applied to the temperature forecast is 5 MW/degree. The weight applied to the weekend binary
variable is -1 MW. The weight on the holiday binary variable is -2 MW. The weighted sum is computed
as:
βX = (67 × 5) + (1 × −1) + (0 × −2) = 334
With a threshold value of (T=300), the decision rule returns a value of 1.0.
FIGURE 4-29. AN EXAMPLE OF THE FEED FORWARD CALCULATIONS
The decision rule presented above is called a step activation function. The purpose of an activation
function is to convert an input signal (e.g., βX) to an output signal (e.g., 1 or 0). It should be noted that if
the activation function was described as taking the input signal and simply multiplying it by a numeric
value (e.g., 1.0 x βX) to derive the output signal, we would have something very close to a linear regression
model, with the caveat that the output signal would be a MW value and not a categorical value. The fact
that the activation function falls into a class of nonlinear functions is what distinguishes neural network
models from linear regression models. A nonlinear activation function does not eliminate the hard part
of modeling, which is determining the list of inputs (e.g., average temperature, weekend binary variable,
holiday binary variable). The model developer, not the neural network, determines the list of explanatory
variables that are used.
The step activation function used in the prior examples is called out in Figure 4-30. Graphically, the
function is 0 up to the threshold and then 1 thereafter. The challenge with step functions is they not very
easy to work with because they are discontinuous and non-differentiable. The big break through with
neural network computing came when the step activation function was replaced with a sigmoid activation
function as shown in Figure 4-31. The sigmoid function provided the same flavor of producing a switch
based on a threshold, but with a continuous and differentiable functional form. This meant optimization
algorithms that rely on taking derivatives of the activation function could be utilized. This set the stage
for handling large scale problems.
FIGURE 4-30. STEP ACTIVATION FUNCTION
FIGURE 4-31. REPLACING THE STEP ACTIVATION FUNCTION WITH A SIGMOID ACTIVATION FUNCTION
Replacing the step activation function with the sigmoid activation function leads to the feed forward
neural network shown in Figure 4-32. The feed forward describes how the input data feeds forward
through the weighting to the activation function. The activation function in turn feeds forward an output
signal which in this case is a number ranging between -1 and 1.
FIGURE 4-32. FEED FORWARD NEURAL NETWORK WITH SIGMOID ACTIVATION FUNCTION
The next step in the evolution of our description of a feed forward neural network is to collapse the
weighting of the inputs into the sigmoid activation function, which is shown in Figure 4-33.
FIGURE 4-33. COLLAPSING THE SUM FUNCTION WITH THE SIGMOID ACTIVATION FUNCTION
At this point we can introduce some concepts that are unique to the neural network modeling paradigm.
In Figure 4-34 we introduce the input layer, the hidden layer, and the output layer. The input layer
contains the set of explanatory variables that are driving the load forecast. This is equivalent to the set of
explanatory variables included in a regression model. The hidden layer serves two purposes. First, it
weights the explanatory variables. Second, it transforms the weighted sum into a predicted value on a
scale of -1 to 1. The output layer contains the forecast values. In this case, there is one output in the
output layer.
FIGURE 4-34. THE INPUT, HIDDEN AND OUTPUT LAYERS OF A NEURAL NETWORK MODEL
It turns out that that there can be more than one sigmoid activation function or nodes contained in the
hidden layer. In the figure is shown one node in the hidden layer. Further, each node can be fed by a
different set of explanatory variables contained in the input layer, as well as have a different functional
form for the activation function. Finally, the predicted values from the hidden layer can be transformed
by another set of weights to produce a forecast of loads. This final iteration of a feed forward neural
network with multiple sigmoid activation functions all leading to a load forecast is commonly used for
short-term load forecasting. This is depicted in Figure 4-35.
Variations on this base feed forward neural network would include multiple layers of nodes in the hidden
layer. In this case, the output of one or more nodes feed forward to another set of nodes, which then
feed forward either to another layer of nodes or to the output layer. Neural network models that have
multiple layers are referred to multi-layer neural networks. Finally, other continuous and differentiable
function forms can be used instead of the sigmoid function.
FIGURE 4-35. MULTIPLE NODES IN THE HIDDEN LAYER AND THE REGRESSION MODEL EQUIVALENT
Some useful properties of the sigmoid function. In addition to being very easy to work with, the sigmoid
function offers some very useful modeling features.
First, two sigmoid functions, when working together, can approximate any continuous non-linear
function. The non-linear function that is of interest in load forecasting is the nonlinear response between
loads and weather. Like a third order polynomial function in temperature, a simple neural network with
temperature included on two sigmoid activation functions will do a very good job in approximating the
nonlinear load response to weather. The estimated weather response functions for Commonwealth
Edison and Dayton Power & Light are shown in Figure 4-36 and Figure 4-37. Here, we are plotting the
predicted value from a single layer neural network with two sigmoid activation functions that include as
explanatory variables average temperature and a weekend binary variable. A third linear activation
function includes the weekend binary variable, which allows the intercept to be different between
weekdays and weekend days.
FIGURE 4-36. ESTIMATED NONLINEAR WEATHER RESPONSE FUNCTION FOR COMMONWEALTH EDISON
FIGURE 4-37. ESTIMATED NONLINEAR WEATHER RESPONSE FUNCTION FOR DAYTON POWER & LIGHT
Second, the sigmoid function will automatically interact with the explanatory variables included in the
function. To see this, recall the following properties of the exponential function:
eβ1 Temperature+β2 Weekend = eβ1 Temperature eβ2Weekend
On the left-hand side we have included (βX), which in this case is β1 Temperature + β2 Weekend. The
right-hand side expression is mathematically equivalent. If the parameters (β1 , β2) are non-zero, then the
temperature and weekend explanatory variables interact or multiply each other. This means as a modeler
you enter in temperature and whatever you want temperature to interact with into the sigmoid activation
function. The comparable linear regression specification would look like:
β1 Temperature + β2 Weekend + β3 Temperature × Weekend
4.3.3 Support Vector Regression
Support vector machines is a classification method designed to handle the case where the dependent
variable is categorical.16 Support vector regression requires a few tweaks of the support vector machine
framework to allow for a dependent variable that is continuous and not categorical. Load forecasting falls
within the class of support vector regression. At the heart of a support vector regression is a forecast
equation that defines the relationship between the target variable (e.g., load) and a set of factors
(explanatory variables). The forecast equation can be written generally as:
Loadd = βX d + ed
Here,
Loadd is the load for day (d)
X d is a matrix of (K) features or explanatory variables for day (d)
β is a vector of (K) weights or parameters for the features or explanatory variables
ed is the forecast error for day (d)
This equation should look familiar since it is your run-of-the-mill load forecast equation. The main
difference between support vector regression and linear regression is the optimization problem that is
used to estimate the model parameters (β). Linear regression solves the optimization problem of finding
the parameters that minimize the sum of the squared errors. This is written generally as:
d
2
Minimize w. r. t. β; ∑(Loadd − βX d )
d=1
To find the solution to this quadratic programming problem, all we need to do is compute the derivative
of the above objective function with respect to (w.r.t.) each parameter and set the results equal to 0.
Generally, the solution can be written as:
β̂ = (X ′ X)−1 X ′ Load
16 The concepts presented here come from a variety of sources. The machine learning lectures from MIT, Georgia
Tech, and UC Irvine found on www.youtube.com are the most useful.
Where, β̂ is a vector of (K) estimated model parameters.
Support vector regression takes a different path to solving for the parameter values. This path introduces
two new concepts to the problem. The first is regularization which is based on the idea that an over
specified regression model could overfit the sample data and render the resulting estimated model
useless for forecasting purposes. In the world of machine learning with large datasets the potential for
overfitting is high. In load forecasting applications, the problem arises in the case where one or more data
outliers cause large residuals. Since the objective of the regression is to find parameters that minimize
the sum of squared errors, the estimated model parameters will be skewed toward reducing the errors
associated with the outliers. This shows up as one or more of the parameters having larger (in absolute
value) numerical values than what would be the case if the outliers did not exist. To avoid this problem,
the least squares objective function is modified to include a penalty on large parameter estimates. The
revised objective function is called ridge regression. Formally, we have:
d K
2
Minimize w. r. t. β: ∑(Loadd − βX d ) + C ∑ β2k
d=1 k=2
Here, C is the cost associated with large parameter values. The cost is summed over all parameters except
for the parameter associated with the intercept term (k=1). The effect of the cost term is to shrink in
absolute size the estimated parameters. With large data sets, this cost function provides insurance
against possible overfitting.
Another way of thinking about shrinking the parameters is to define the following objective function.
K
1
Minimize w. r. t. B: ∑ β2k
2
k=1
Here, we are seeking the value of the parameters that minimize the sum of the squared parameters.
Alternatively, you can think of the objective function as minimizing the square of the Euclidean norm or
length of the parameter vector. To avoid the obvious solution of setting all the parameters equal to 0, we
need to constrain the problem in some way. We do this by defining the following two constraints.
Constraint 1: Loadd − βX d ≤∈
Constraint 2: βX d − Loadd ≤∈
Here, ∈ is an acceptable error margin that forces the values for the parameters to be non-zero. In load
forecasting applications, ∈, would be set equal to an acceptable forecast error (e.g., 250 MW). Now the
optimization problem is to find parameters such that the resulting forecast equation predicts loads within
an acceptable forecast error tolerance.
This leads to the concept of constrained optimization. Previously, our concept of acceptable forecast
error tolerance was defined by minimizing the sum of the squared forecast errors, we have transformed
the problem to find the smallest parameters possible that ensure the resulting forecast errors are
acceptable. The solution to Constraint 1 and 2 forms the support vectors around our notion of acceptable
forecast error. Further, the data points that lead to predicted errors that are less than ∈ do not matter.
What matters are the data points where the constraints are exactly equal to the margin. What this means
is the estimated parameters are determined by a handful of the data observations, thus reducing the
potential for overfitting.
To be fair, some additional parameters need to be added to handle the case where there is no feasible
set of parameters that meet Constraint 1 and 2. Here, we introduce the concept of a soft margin as
follows:
D
1
Minimize w. r. t. B: ‖β‖2 + C ∑(τd + τ́ d )
2
d=1
Subject to
Constraint 1: Loadd − βX d ≤∈ +τd
Constraint 2: βX d − Loadd ≤∈ +τ́ d
Constraint 3: τd ≥ 0, ∀ d
Constraint 4: τ́ d ≥ 0, ∀ d
Here, τd , τ́ d are slack parameters that allow the margin, ∈, not to bind for every observation. By imposing
a cost (C) on these parameters, we are ensured their values do not get so big as to render Constraint 1
and 2 meaningless. Otherwise, the slack parameters would be set wide enough to capture all observations
including data outliers. This would render the resulting forecast solution almost useless.
There is an interesting feature of support vector machines that has to do with addressing the possibility
that a nonlinear relationship exists between loads and one or more of the features or explanatory
variables. It turns out that there is a set of functions, called kernel functions, that can be used to capture
these nonlinearities without having to go through the tedious task of keying up the nonlinearities (e.g.,
temperature, temperature squared, temperature cubed). Further, the use of these kernel functions
reduces the number of calculations that are needed to estimate the parameters. In the world of load
forecasting this may not sound like a big-time saver. However, in the world of pattern recognition where
there could be thousands of interactions, any trick that will save on computer calculations is a very big
deal. Having a function do all the work is extremely useful. The most common kernel functions are:
Polynomial Kernel: K(X) = (X ′ X + c)d where c and d are constants
2
‖X′ X‖
Radial Basis Function Kernel: K(X) = exp (− 2σ2
) where σ2 is a parameter
Sigmoid Kernel: K(X) = tanh(cX ′ X + h) where c and h are parameters
The kernel functions take a matrix of explanatory variables (X) as an input. These raw variables are then
used to create other variables through nonlinear transformations of the raw data. The nonlinearity aspect
of the kernel functions is often cited as the reason support vector regression is well suited for load
forecasting which has an obvious nonlinear response between loads and weather.
Consider the polynomial kernel function. We have shown earlier that is plausible to capture the nonlinear
weather response in a regression by including weather, weather squared, and weather cubed. Both the
polynomial kernel and the polynomial regression are designed to address the problem in a very similar
way. In effect, the support vector regression when the polynomial kernel function is applied looks like a
constrained version of a polynomial regression.
Now consider the sigmoid kernel. This in fact describes one node in a neural network. If weather is
included in the set of explanatory variables, then the support vector regression that utilizes a sigmoid
kernel is, in effect, a constrained version of a neural network model.
What do we gain with support vector regression over linear regression and neural networks? First
consider forecasting a large system load. Further, assume we do not use any of the kernel functions, but
instead we use the same set of explanatory variables (X) in both a linear regression and a support vector
regression. The only difference between the models will be the optimization problem used to estimate
the parameters. It is unclear if the resulting support vector regression forecast will necessarily outperform
the linear regression forecast. Consider the actual equation we are using to forecast loads.
Regression Model: Loadfd+h = β̂X d+h
Support Vector Regression: Loadf′

d+h = β′X d+h
Here, the number of days into the forecast horizon is indexed by (h)
Loadfd+h is the load forecast derived from the estimated linear regression model
β̂ is the ordinary least squares parameter estimate
Loadf′
d+h is the load forecast derived from the estimated support vector regression model
̂ is the support vector regression parameter estimate

β′
The only difference between these two forecast equations is the value of the estimated parameters. We
know from ordinary least squares theory that the regression parameters, (β̂), are unbiased estimates of
the true parameter values. I have not been able to find theoretical proof that the support vector
regression parameter estimates, (β′), are also unbiased estimates of the true parameter values. The two
parameter estimates will be different. How different will depend on the size of ∈ and C.
As for the relative forecast performance, it is hard to imagine a constrained version of the parameters
outperforming the unconstrained version. So how does the machine learning literature justify the use of
support vector regression over linear regression? First, it is argued that linear regression cannot handle
the nonlinear response between loads and weather. We have shown that claim is false. Second, it is
argued that the support vector regression parameter estimates are more robust to data outliers than the
regression parameter estimates. This is a valid argument if steps are not taken to remove data outliers
from the estimation data set. Fortunately, there are well established validation methods that identify
data outliers and remove them from the regression estimation process.
So where does that leave us? I expect, for large well-behaved loads where data outliers are not an issue,
that unconstrained regression and neural networks will yield better forecast performance, as measured
by the size of the forecast errors than their constrained counterparts. In the world of individual customer
load forecasting, the robustness of the support vector regression could prove useful in handling noisy
customer loads. In this world, a robust smooth load forecast takes precedence over forecast accuracy.
However, it is fair to say that support vector regression needs to prove itself in practice (not a static,
academic data set) to determine its operational merits.
5 EXPLANATORY VARIABLES BASED ON CALENDAR
CONDITIONS
The “art” of developing powerful operational load forecast models is identifying and constructing the list
of explanatory variables that are to be included in the model specifications. When building explanatory
variables, there is no better place to think about than your own home. A typical residential home uses
electricity to drive a vast array of electric equipment. The non-exhaustive list of electric equipment that
can be found in a home includes the following:
◼ Table lamps, desk lamps, recessed lighting, task lighting, ambient lighting, outdoor lighting, garage
door opener lighting, entryway lighting, night lights, appliance lights, oven lights, BBQ lights,
decorative neon lights, …
◼ Big screen TVs, small screen TVs, surround sound systems, gaming consoles, radios, stereos, Blu-
ray Disc players, record players, DVRs, CD players, electronic pin ball machines, …
◼ Desktop computers, laptop computers, tablets, phone chargers, smart devices, …
◼ Microwaves, electric ovens and ranges, toasters, coffee makers, electric tea kettles, electric can
openers, …
◼ Primary refrigerator, secondary refrigerator, freezer, wine cooler…
◼ Clothes washers and dryers, …
◼ Dishwashers, garbage disposals, garbage compactors, …
◼ Space heating and air conditioning, ceiling fans, room fans, whole house fans, humidifiers, …
A Practitioner’s Guide to Short-term Load Forecast Modeling Explanatory Variables Based on Calendar Conditions |5-1
◼ Pool pump, spa heater, …
◼ Alarm systems, internet systems, garage door openers, vacuum cleaners, …
◼ Power tools, power tool chargers, electric lawn and garden equipment, …
◼ Car battery trickle chargers, engine block heating, electric vehicle chargers, golf cart chargers, …
With this dizzying array of equipment, the idea that we can somehow accurately forecast the load of an
individual household is daunting. In fact, forecasting the loads of an individual home is a hard task because
we will never have all the information we need to predict when someone is going to turn on the TV, the
lights in the guest bedroom, run the dishwasher, microwave some popcorn, plug in their phone to charge,
or any of dozens of actions they can take that will use electricity. Fortunately, in the world of short-term
operational forecasting, we usually forecast an aggregation of loads across dozens, if not millions, of
households. At the aggregate, the hourly load nuances of an individual house are averaged away leaving
a relatively smooth and predictable load pattern.
In general, the aggregate residential hourly load pattern can be decomposed into six major components.
◼ Calendar Conditions. This component captures the daily load variation associated with people
going to and from work, going to and from school, celebrating holidays, doing household chores,
and other repeatable behaviors or actions that impact electricity consumption.
◼ Solar Conditions. When the sun rises and sets will have a big influence on lighting loads. In
addition, air conditioning loads can be impacted by prevailing solar conditions via internal
temperature gains driven by sunlight penetrating the house core.
◼ Weather Conditions. Space heating and cooling is driven by prevailing weather conditions.
◼ Economic Conditions. The impact of overall economic conditions influences electricity
consumption by impacting available income levels.
◼ Distributed Energy Resources. The introduction of rooftop solar PV, demand response, other on-
site generation and, in the future, on-site storage, if not metered separately from the whole house
load, impacts what is measured as the demand for power. As a result, historical load patterns can
change significantly with the introduction of distributed energy resources.
◼ Electric Vehicle Charging. Electric vehicle charging is a new type of load pattern that is anticipated
to impact evening and night time loads.
5.1 CALENDAR CONDITIONS
This section presents explanatory variables designed to capture load variation associated with calendar
conditions. The work horse of explanatory variable functional forms is the binary variable. A binary
variable has one of two values. Traditionally, the values are 1.0 if the item being modeled is active and
0.0 otherwise. For example, we can construct a Monday binary variable that will take on a value of 1.0 if
the observation falls on a Monday and 0.0 otherwise.
Is it possible for a binary variable to take on values other than 1.0 and 0.0? Statistically, any combination
of two values would work, but what is nice about the use of 1.0 and 0.0 is that it eases the interpretation
of the estimation coefficient attached to the binary variable. For example, in a model that includes just
seven day-of-the-week binary variables defined on a scale of 1.0 and 0.0 (i.e., Monday, Tuesday,
Wednesday, Thursday, Friday, Saturday and Sunday), the estimated coefficient on the Monday binary
variable would be equal to the average Monday load.
The following are examples of binary variables commonly used in load forecast models.
Day-of-the-Week Binary Variables. These variables capture the average load variation across the days-
of-the-week.
Sunday: takes on a value of 1.0 if the day is a Sunday, 0.0 otherwise
Monday: takes on a value of 1.0 if the day is a Monday, 0.0 otherwise
Tuesday: takes on a value of 1.0 if the day is a Tuesday, 0.0 otherwise
Wednesday: takes on a value of 1.0 if the day is a Wednesday, 0.0 otherwise
Thursday: takes on a value of 1.0 if the day is a Thursday, 0.0 otherwise
Friday: takes on a value of 1.0 if the day is a Friday, 0.0 otherwise
Saturday: takes on a value of 1.0 if the day is a Saturday, 0.0 otherwise
Figure 5-1 and Figure 5-2 present examples of the in-sample model fit when just day-of-the-week binary
variables are used. Both figures show that the predicted values from the estimated models follow a fixed
and repeatable weekly pattern. While the day-of-the-week binary variables capture the average variation
of loads across the week, they do not capture the seasonal swings in loads.
FIGURE 5-1. DAYTON POWER & LIGHT: DAY-OF-THE-WEEK BINARY VARIABLES
FIGURE 5-2. COMMONWEALTH EDISON: DAY-OF-THE-WEEK BINARY VARIABLES
Day Type Binary Variables. These variables capture the average load variation between normal working
days and non-working days.
Weekday: takes on a value of 1.0 if the day is a normal work day (e.g., Monday, Tuesday,
Wednesday, Thursday, Friday) and not a holiday, 0.0 otherwise
Weekend: takes on a value of 1.0 if the day is a normal non-work day (e.g., Saturday, Sunday) or
a holiday, 0.0 otherwise
TWT: takes on a value of 1.0 if the day is a Tuesday, Wednesday, Thursday and not a holiday, 0.0
otherwise
Figure 5-3 and Figure 5-4 present examples of the in-sample model fit when the day type variables Wkday
(Weekdays) and Weekend (Weekend days) are used. Both figures show that the predicted values from
the estimated models follow a fixed and repeatable weekly pattern. A comparison of the model fits to
the day-of-the-week models shows how this simple day type model specification is a constrained version
limited to two types of shapes, a weekday shape (which is the average shape across all weekdays) and the
weekend shape (which is the average shape across all Saturdays and Sundays). In contrast, the day-of-
the-week model produces seven shapes, one for each day-of-the-week. Both models fail to capture
seasonal swings in loads.
FIGURE 5-3. DAYTON POWER & LIGHT: DAY TYPE WEEKDAY VERSUS WEEKEND BINARY VARIABLES
FIGURE 5-4. COMMONWEALTH EDISON: DAY TYPE WEEKDAY VERSUS WEEKEND BINARY VARIABLES
Month Binary Variables. These variables capture the average load variation across months of the year.
January: takes on a value of 1.0 if the day is in the month of January, 0.0 otherwise
February: takes on a value of 1.0 if the day is in the month of February, 0.0 otherwise
March: takes on a value of 1.0 if the day is in the month of March, 0.0 otherwise
April: takes on a value of 1.0 if the day is in the month of April, 0.0 otherwise
May: takes on a value of 1.0 if the day is in the month of May, 0.0 otherwise
June: takes on a value of 1.0 if the day is in the month of June, 0.0 otherwise
July: takes on a value of 1.0 if the day is in the month of July, 0.0 otherwise
August: takes on a value of 1.0 if the day is in the month of August, 0.0 otherwise
September: takes on a value of 1.0 if the day is in the month of September, 0.0 otherwise
October: takes on a value of 1.0 if the day is in the month of October, 0.0 otherwise
November: takes on a value of 1.0 if the day is in the month of November, 0.0 otherwise
December: takes on a value of 1.0 if the day is in the month of December, 0.0 otherwise
Figure 5-5 and Figure 5-6 present examples of the in-sample model fit when the monthly binary variables
are used. Both figures show that the predicted values from the estimated models follow a fixed and
repeatable monthly pattern. A comparison of the model fits to the day-of-the-week and day type model
specifications shows that the load shape is the same across all days in a month but varies by month. The
monthly model does a good job at capturing the seasonal swing in loads, but does not capture the day-
of-the-week load variation.
FIGURE 5-5. DAYTON POWER & LIGHT: MONTHLY BINARY VARIABLES
FIGURE 5-6. COMMONWEALTH EDISON: MONTHLY BINARY VARIABLES
Season Binary Variables. These variables capture the average load variation across seasons.
Winter: takes on a value of 1.0 if the day is in a winter month, 0.0 otherwise
Spring: takes on a value of 1.0 if the day is in a spring month, 0.0 otherwise
Summer: takes on a value of 1.0 if the day is in a summer month, 0.0 otherwise
Fall: takes on a value of 1.0 if the day is in a fall month, 0.0 otherwise
Figure 5-7 and Figure 5-8 present examples of the in-sample model fit when the seasonal binary variables
are used. Both figures show that the predicted values from the estimated models follow a fixed and
repeatable seasonal pattern. A comparison of the model fits to the day-of-the-week and day type model
specifications shows that the load shape is the same across all days in a season but varies by season. The
seasonal model does a good job at capturing the seasonal swing in loads but does not capture the day-of-
the-week load variation.
FIGURE 5-7. DAYTON POWER & LIGHT: SEASONAL BINARY VARIABLES
FIGURE 5-8. COMMONWEALTH EDISON: SEAONAL BINARY VARIABLES
Month Interaction Terms with Day-of-the-Week Variables. These variables capture the average load
variation across months/seasons by day-of-the-week. The reason there is not a separate set of interaction
terms for Sundays is because we will include in the model the monthly binary variables. The estimated
coefficient on the monthly binaries will represent the average Sunday load for each month. The estimated
coefficients on the following interactions terms represent the load difference relative to the Sunday base.
Note the choice of leaving Sunday out is purely for aesthetic reasons. On average Sunday loads are lower
than the other day-of-the-week loads. As a result, the estimated parameters on the day-of-the-week
interaction terms will tend to be positive, meaning on average the load is higher than the corresponding
Sunday load. In terms of model performance, it does not matter which day-of-the-week is set aside. The
model performance will be identical regardless of which day-of-the-week is set aside.
JanuaryMonday: January binary variable x Monday binary variable
JanuaryTuesday: January binary variable x Tuesday binary variable
JanuaryWednesday: January binary variable x Wednesday binary variable
JanuaryThursday: January binary variable x Thursday binary variable
JanuaryFriday: January binary variable x Friday binary variable
JanuarySaturday: January binary variable x Saturday binary variable
FebruaryMonday: February binary variable x Monday binary variable
FebruaryTuesday: February binary variable x Tuesday binary variable
FebruaryWednesday: February binary variable x Wednesday binary variable
FebruaryThursday: February binary variable x Thursday binary variable
FebruaryFriday: February binary variable x Friday binary variable
FebruarySaturday: February binary variable x Saturday binary variable
MarchMonday: March binary variable x Monday binary variable
MarchTuesday: March binary variable x Tuesday binary variable
MarchWednesday: March binary variable x Wednesday binary variable
MarchThursday: March binary variable x Thursday binary variable
MarchFriday: March binary variable x Friday binary variable
MarchSaturday: March binary variable x Saturday binary variable
AprilMonday: April binary variable x Monday binary variable
AprilTuesday: April binary variable x Tuesday binary variable
AprilWednesday: April binary variable x Wednesday binary variable
AprilThursday: April binary variable x Thursday binary variable
AprilFriday: April binary variable x Friday binary variable
AprilSaturday: April binary variable x Saturday binary variable
MayMonday: May binary variable x Monday binary variable
MayTuesday: May binary variable x Tuesday binary variable
MayWednesday: May binary variable x Wednesday binary variable
MayThursday: May binary variable x Thursday binary variable
MayFriday: May binary variable x Friday binary variable
MaySaturday: May binary variable x Saturday binary variable
JuneMonday: June binary variable x Monday binary variable
JuneTuesday: June binary variable x Tuesday binary variable
JuneWednesday: June binary variable x Wednesday binary variable
JuneThursday: June binary variable x Thursday binary variable
JuneFriday: June binary variable x Friday binary variable
JuneSaturday: June binary variable x Saturday binary variable
JulyMonday: July binary variable x Monday binary variable
JulyTuesday: July binary variable x Tuesday binary variable
JulyWednesday: July binary variable x Wednesday binary variable
JulyThursday: July binary variable x Thursday binary variable
JulyFriday: July binary variable x Friday binary variable
JulySaturday: July binary variable x Saturday binary variable
AugustMonday: August binary variable x Monday binary variable
AugustTuesday: August binary variable x Tuesday binary variable
AugustWednesday: August binary variable x Wednesday binary variable
AugustThursday: August binary variable x Thursday binary variable
AugustFriday: August binary variable x Friday binary variable
AugustSaturday: August binary variable x Saturday binary variable
SeptemberMonday: September binary variable x Monday binary variable
SeptemberTuesday: September binary variable x Tuesday binary variable
SeptemberWednesday: September binary variable x Wednesday binary variable
SeptemberThursday: September binary variable x Thursday binary variable
SeptemberFriday: September binary variable x Friday binary variable
SeptemberSaturday: September binary variable x Saturday binary variable
OctoberMonday: October binary variable x Monday binary variable
OctoberTuesday: October binary variable x Tuesday binary variable
OctoberWednesday: October binary variable x Wednesday binary variable
OctoberThursday: October binary variable x Thursday binary variable
OctoberFriday: October binary variable x Friday binary variable
OctoberSaturday: October binary variable x Saturday binary variable
NovemberMonday: November binary variable x Monday binary variable
NovemberTuesday: November binary variable x Tuesday binary variable
NovemberWednesday: November binary variable x Wednesday binary variable
NovemberThursday: November binary variable x Thursday binary variable
NovemberFriday: November binary variable x Friday binary variable
NovemberSaturday: November binary variable x Saturday binary variable
DecemberMonday: December binary variable x Monday binary variable
DecemberTuesday: December binary variable x Tuesday binary variable
DecemberWednesday: December binary variable x Wednesday binary variable
DecemberThursday: December binary variable x Thursday binary variable
DecemberFriday: December binary variable x Friday binary variable
DecemberSaturday: December binary variable x Saturday binary variable
Figure 5-9 and Figure 5-10 present examples of the in-sample model fit when the month, day-of-the-week
interaction binary variables are used. Both figures show that the predicted values from the estimated
models follow a monthly pattern, but within each month there is a weekly load pattern.
FIGURE 5-9. DAYTON POWER & LIGHT: MONTH/DAY-OF-THE-WEEK INTERACTION BINARY VARIABLES
FIGURE 5-10. COMMONWEALTH EDISON: MONTH/DAY-OF-THE-WEEK INTERACTION BINARY VARIABLES
Season Interaction Terms with Day-of-the-Week Variables. These variables capture the average load
variation across seasons by day-of-the-week. The reason there is not a separate set of interaction terms
for Sundays is because we will include in the model the seasonal binary variables. The estimated
coefficient on the seasonal binaries will represent the average Sunday load for each season. The
estimated coefficients on the following interactions terms represent the load difference relative to the
Sunday base. Like the month interaction terms, it does not matter which day-of-the-week is set aside.
WinterMonday: Winter binary variable x Monday binary variable
WinterTuesday: Winter binary variable x Tuesday binary variable
WinterWednesday: Winter binary variable x Wednesday binary variable
WinterThursday: Winter binary variable x Thursday binary variable
WinterFriday: Winter binary variable x Friday binary variable
WinterSaturday: Winter binary variable x Saturday binary variable
SpringMonday: Spring binary variable x Monday binary variable
SpringTuesday: Spring binary variable x Tuesday binary variable
SpringWednesday: Spring binary variable x Wednesday binary variable
SpringThursday: Spring binary variable x Thursday binary variable
SpringFriday: Spring binary variable x Friday binary variable
SpringSaturday: Spring binary variable x Saturday binary variable
SummerMonday: Summer binary variable x Monday binary variable
SummerTuesday: Summer binary variable x Tuesday binary variable
SummerWednesday: Summer binary variable x Wednesday binary variable
SummerThursday: Summer binary variable x Thursday binary variable
SummerFriday: Summer binary variable x Friday binary variable
SummerSaturday: Summer binary variable x Saturday binary variable
FallMonday: Fall binary variable x Monday binary variable
FallTuesday: Fall binary variable x Tuesday binary variable
FallWednesday: Fall binary variable x Wednesday binary variable
FallThursday: Fall binary variable x Thursday binary variable
FallFriday: Fall binary variable x Friday binary variable
FallSaturday: Fall binary variable x Saturday binary variable
Figure 5-11 and Figure 5-12 present examples of the in-sample model fit when the season, day-of-the-
week interaction binary variables are used. Both figures show that the predicted values from the
estimated models follow a seasonal pattern but within each season there is a weekly load pattern.
FIGURE 5-11. DAYTON POWER & LIGHT: SEASON/DAY-OF-THE-WEEK INTERACTION BINARY VARIABLES
FIGURE 5-12. COMMONWEALTH EDISON: SEASON/DAY-OF-THE-WEEK INTERACTION BINARY VARIABLES
Holiday Variables. These variables capture the average load change relative to a non-holiday day. The
modeling challenge with holidays is the behavior of residential, commercial, and industrial customers
depends on the holiday. For example, in the United States the behavior of retail stores on Christmas is
different than Memorial Day. It is not unusual for most retail stores to be closed for a portion of Christmas
while most retail stores remain open on Memorial Day. Other modeling challenges arise with holidays
that do not fall on the same day of the week year over year, but rather fall on a specific calendar day.
Holidays such as Christmas, which occurs on December 25 but could fall on any of the seven days of the
week, are harder to predict the load impact than, say, a bank holiday that always falls on the first Monday
of August. The challenge with a holiday like Christmas is the potential for a holiday spillover effect on the
day(s) leading into and out of the holiday. For example, the load impact of a Christmas that lands on a
Sunday may spillover over to the following Monday. On the other hand, a Christmas that lands on a
Tuesday may impact loads on the Monday before and potentially the following Wednesday.
For holidays that fall on the same day of the week year after year can be modeled with a binary variable
that takes on a value of 1.0 if that day is the holiday, 0.0 otherwise. Examples of these types of holiday
binary variables include the following.
MemorialDay: takes on a value of 1.0 if the day is Memorial Day in the U.S., 0.0 otherwise
MayBankHoliday: takes on a value of 1.0 if the day is the first Monday in May, 0.0 otherwise
AugustBankHoliday: takes on a value of 1.0 if the day is the first Monday in August, 0.0 otherwise
It should be noted that each holiday is treated as a separate binary variable. This is because the underlying
behavior (e.g., retail stores are open versus closed) and prevailing weather conditions vary by holiday
type. This means a holiday that lands in August will capture the load reduction associated with less air
conditioning. Conversely, a holiday that lands in November will capture the load reduction associated
with less space heating. These impacts will be very different. If, instead of allowing each holiday to be a
separate binary variable, we used a single holiday binary variable that took on a value of 1.0 whenever
there is a holiday and 0.0 otherwise, we would constrain the load impact to be the same across all
holidays. That means the estimated coefficient on the holiday variable will be the average impact across
bank holidays in May, bank holidays in August, Christmas, etc. In effect, we are mixing impacts of hot days
and cold days, holidays where retail stores are open, and holidays when retail stores are closed. Consider
the following four holidays.
1. May 30, Load = 1300

2. August 4, Load = 2500
3. November 13, Load = 1800
4. February 14, Load = 900
Now consider using a single holiday binary variable that takes on a value of 1.0 if the day is May 30, August
4, November 13, or February 14. This gives the following simple model:
Loadd = βHolidayd
In this case, the estimated coefficient for the holiday binary variable is going to be equal to the average
load across the four days, β̂ = 1625.
The alternative approach is to create four separate holiday binary variables, one for each holiday. In this
case, the load forecast model looks like:
Loadd = β1 Holiday_May30thd + β2 Holiday_August4thd + β3 Holiday_November13thd

+ β4 Holiday_February14thd
In this case, the estimated coefficients will be:
β̂1 = 1300
β̂2 = 2500
β̂3 = 1800
β̂4 = 900
As can be seen, the constrained version of the holiday binary variable provides a single estimate of the
holiday impact, which in this case leads to under forecasting of August 4 and November 13, and over
forecasting of May 30 and February 14. This simple example illustrates why we prefer separate holiday
binary variables.
School Holiday Variables. In some locations there is a noticeable change in aggregate loads between days
when primary and secondary schools are in session versus when the they are out of session. In these
cases, we construct a set of school holiday binary variables that capture the load shift that occurs when
schools are out of session. For example, let the school schedule for a year be defined as follows:
Winter School Holiday: December 15 through January 15
Spring School Holiday: April 1 through April 30
Summer School Holiday: July 1 through August 15
Fall School Holiday: October 1 through October 15
In this case, there will be four school holiday binary variables:
WinterSchoolHoliday: takes on a value of 1.0 if the day is from December 15 through January 15,
0.0 otherwise
SpringSchoolHoliday: takes on a value of 1.0 if the day is from April 1 through April 30, 0.0
otherwise
SummerSchoolHoliday: takes on a value of 1.0 if the day is from July 1 through August 15, 0.0
otherwise
FallSchoolHoliday: takes on a value of 1.0 if the day is from October 1 through October 15, 0.0
otherwise
5.2 SUNRISE & SUNSET CONDITIONS

This section presents explanatory variables designed to capture load variation associated with the rise
and setting of the sun. To create these explanatory variables, we need two bits of information: (1) daily
sunrise and sunset times, and (2) the date range of daylight saving observation. Examples of explanatory
variables that can be constructed include:
DayLightSavingsObservance = 1.0 if date falls within daylight saving time, 0.0 otherwise
Raw Sunrise = Hour and minute that the sunrises not adjusted for daylight saving time
Raw Sunset = Hour and minute that the sunsets not adjusted for daylight saving time
Time of Sunrise = Raw sunrise + DayLightSavingsObservance binary variable
Time of Sunset = Raw sunset + DayLightSavingsObservance binary variable
Minutes_of_Light = [(Sunset Hour – Sunrise Hour) * 60] + [60 – Sunrise Minute] + [Sunset Minute]
Fraction of Hour Ending 06:00 that was Dark = MIN(MAX(Time of Sunrise – 5,0),1)
Fraction of Hour Ending 07:00 that was Dark = MIN(MAX(Time of Sunrise – 6,0),1)
Fraction of Hour Ending 18:00 that was Dark = MIN(MAX(18 - Time of Sunset,0),1)
Fraction of Hour Ending 19:00 that was Dark = MIN(MAX(19 - Time of Sunset,0),1)
5.3 FUNCTIONAL FORMS FOR HANDLING THE NON-LINEAR RESPONSE BETWEEN

LOADS AND WEATHER
This section presents explanatory variables designed to capture the non-linear response between loads
and weather.
Temperature Bin Variables. The most straightforward way of estimating the non-linear response
between loads and weather is the use of temperature bin variables. As an example, let the observed
average daily temperatures range between -20 and 35° Celsius. We can define five-degree temperature
bins as follows:
Bin1 = 1.0 if average temperature < -15, 0.0 otherwise
Bin2 = 1.0 if -15 <= average temperature < -10, 0.0 otherwise
Bin3 = 1.0 if -10 <= average temperature < -5, 0.0 otherwise
Bin4 = 1.0 if -5 <= average temperature < 0, 0.0 otherwise
Bin5 = 1.0 if 0 <= average temperature < 5, 0.0 otherwise
Bin10 = 1.0 if 25 <= average temperature, 0.0 otherwise
The temperature bin variables would then be placed in a regression such as:
10
j
Loadd = ∑ βj Bind + ed
j=1
In this case, the estimated values for the temperature bin variables (βj ) would be the average load when
temperatures fall within that bin range. Together these estimated coefficients form the nonlinear
response between loads and temperatures. The estimated weather response function is then given by
the following equation:
10
j
EstimatedWeatherResponsed = ∑ β̂j Bind
j=1
To allow the weather response to be different between weekdays and weekends, an set of weather bin
variables that are defined as follows can be added to the model.
Bin1Weekend = Bin1 * Weekend binary variable
The model with weekend offsets would be written as follows:
10 20
j j
Loadd = ∑ βj Bind + ∑ βj BinWeekdnd + ed
j=1 j=11
An example of the bin-based estimated weather response function without and with weekend offsets for
the Dayton Power & Light and Commonwealth Edison loads are shown in Figure 5-13 and Figure 5-16.
FIGURE 5-13. ESTIMATED BIN WEATHER RESPONSE FUNCTION FOR DAYTON POWER & LIGHT
FIGURE 5-14. ESTIMATED BIN WEATHER RESPONSE WITH WEEKEND OFFSET FOR DAYTON POWER & LIGHT
FIGURE 5-15. ESTIMATED BIN WEATHER RESPONSE FUNCTION FOR COMMONWEALTH EDISON
FIGURE 5-16. ESTIMATED BIN WEATHER RESPONSE WITH WEEKEND OFFSET FOR COMMONWEALTH EDISON
The temperature bin approach described above represents a nonparametric estimate of the nonlinear
response. It is nonparametric in that a functional form like a third order polynomial is not imposed. The
advantage of a nonparametric approach is that the estimated response function is data driven. The
disadvantage is in the number of bin variables that need to be constructed. This is further complicated if
it is desired to estimate a separate response function between weekdays and weekend days. In this case,
10 additional temperature bin variables interacted with a weekend binary variable would need to be
created and then added to the regression. That would result in a total of 20 variables to estimate the
weather response. Each additional interaction term would add another 10 explanatory variables. It would
be easy to quickly have more explanatory variables than observations.
Another challenge with the nonparametric approach occurs at the boundary between two bins. For
example, consider the weekday bins between 20° and 25° and 25° plus for CE. If the temperature forecast
is for 25°, the load forecast would be a little under 11,000 MWh. If the temperature forecast is for 26°,
the load forecast jumps up to a little over 13,000 MWh. This knife edge difference between 25° and 26°
will lead to load forecasts that can jump significantly from one iteration to the next. An approach that
allows the load to vary slightly with each degree change in temperature would generate a better load
forecast. This leads to the following parametric functions to estimate the nonlinear weather response
function.
Temperature Spline Variables. Temperature spline variables are like the temperature bin variables in
that separate variables are designed to handle different temperature ranges. There are two types of
spline variables: uncapped and capped. An example of uncapped spline temperature variables follows:
HDD1 = MAX(65 – Temperature,0)
HDD2 = MAX(45 – Temperature, 0)
CDD1 = MAX(Temperature – 65, 0)
CDD2 = MAX(Temperature – 85,0)
Here, HDD stands for heating degree day and CDD stands for cooling degree day. In this example, we
define two HDD variables—one that takes on positive values when temperatures fall below 65° and the
second that takes on positive values when temperatures fall below 45°. The temperature values of 65
and 45 are determined by the forecast analyst and are commonly referred to as temperature cut points.
The HDD variables are intended to capture the increase in electric space heating when temperatures drop.
In addition, two CDD variables are defined—one that takes on positive values when temperature go above
65° and the second that takes on positive values when temperatures go above 85°. The two CDD variables
are designed to capture the load increase associated with air conditioning loads. A simple regression
specification that would use these variables is as follows:
Loadd = β0 + β1 HDD2d + β2 HDD1d + β3 CDD1d + β4 CDD2d + ed
In this case, the estimated coefficients on the CDD spline variables represent the increase in load
associated with a one degree increase in temperatures. In a similar fashion, the estimated coefficients on
the HDD spline variables represent the increase in load associated with a one degree decrease in
temperatures.
The estimated weather response function is then given by the following equation:
Estimated Weather Responsed = β̂0 + β̂1 HDD2d + β̂2 HDD1d + β̂3 CDD1d + β̂4 CDD2d
The order that the explanatory variables are placed in a regression model does not matter. We have
written this equation with the extremely cold and hot temperature splines on opposite ends for purposes
of optics only.
Capped Spline Variables. Examples of capped spline temperature variables are:
CappedHDD1 = MIN[MAX(65 – Temperature,0),(65-45)]
HDD2 = MAX(45 – Temperature, 0)
CappedCDD1 = MIN[MAX(Temperature – 65, 0),(85-65)]
CDD2 = MAX(Temperature – 85,0)
In this case, the inner temperature splines (CappedHDD1 and CappedCDD1) are capped by the difference
in the base and extreme cut points. The outer splines remain uncapped to handle the case when
forecasted temperatures fall outside of the range of temperatures used in model estimation. A simple
regression specification that would use the capped spline variables is as follows:
Loadd = β0 + β1 HDD2d + β2 CappedHDD1d + β3 CappedCDD1d + β4 CDD2d + ed
The estimated weather response function is then given by the following equation:
Estimated Weather Responsed

= β̂0 + β̂1 HDD2d + β̂2 CappedHDD1d + β̂3 CappedCDD1d + β̂4 CDD2d
Weekend interaction terms can be introduced in a fashion like the way it was done with the bin approach.
These interaction terms will allow the estimated weather response to differ between weekdays and
weekend days. A simple regression specification that includes weekend interactions is as follows:
Loadd = β0 + β1 HDD2d + β2 CappedHDD1d + β3 CappedCDD1d + β4 CDD2d + β5 Weekendd

+ β6 HDD2Weekendd + β7 CappedHDD1Weekendd + β8 CappedCDD1Weekendd
+ β9 CDD2Weekendd + ed
Note the addition of the weekend binary variable as a separate explanatory variable allows the intercept
term to differ between weekdays and weekend days. As a result, the estimated average load for a
weekday is free to be different than the estimated average load for a weekend day.
The estimated weather response function for a weekday is given by:
Estimated Weather Response Week Dayd

The estimated weather response function for a weekend day is given by:
Estimated Weather Response Weekend Dayd

+ β̂5 Weekendd + β̂6 HDD2Weekendd + β̂7 CappedHDD1Weekendd
+ β̂8 CappedCDD1Weekendd + β̂9 CDD2Weekendd
In this case, the estimated weekend day response is the sum of the weekday response plus the weekend
intercept and weather slope offsets. For example, the average weekend day load is equal to:
Average Weekend Loadd = β̂o + β̂5 Weekendd
Here, the estimated parameter, β̂5 , represents how much weekend day loads are different on average
than weekday loads.
Relative to the bin approach, the use of temperature spline variables reduces the number of explanatory
variables substantially. Further, by allowing the outer spline variables to be uncapped, the estimated
model will produce a reasonable forecast when forecasted temperatures lie outside of the in-sample data
range.
Examples of the estimated capped and uncapped weather response functions with weekend offsets for
the Dayton Power & Light and Commonwealth Edison loads are shown in Figure 5-17 through Figure 5-20.
As can been seen, there is little to no differences between the capped and uncapped approaches.
FIGURE 5-17. ESTIMATED UNCAPPED SPLINE WEATHER RESPONSE WITH WEEKEND OFFSET FOR DAYTON POWER
& LIGHT
FIGURE 5-18. ESTIMATED CAPPED SPLINE WEATHER RESPONSE WITH WEEKEND OFFSET FOR DAYTON POWER &
LIGHT
FIGURE 5-19. ESTIMATED UNCAPPED SPLINE WEATHER RESPONSE WITH WEEKEND OFFSET FOR
COMMONWEALTH EDISON
FIGURE 5-20. ESTIMATED CAPPED SPLINE WEATHER RESPONSE WITH WEEKEND OFFSET FOR COMMONWEALTH
EDISON
Polynomial Temperature Variables. The challenge with the temperature spline approach is twofold.
First, the forecast analyst needs to decide how many cut points to use and at what values should they be
set. Trial and error can be used to find a good balance between the number of cut points and their
associated values. Alternatively, the reflection points from an estimated weather response function using
a neural network model can be used to determine the number of cut points and their associated values.
Second, linear splines provide a robust but constrained estimate of the weather response function. It is
robust in that linear splines do a good job in cutting through noisy load data to capture the underlying
weather response. It is constrained in that the response is linear between one cut point to the next. In
some cases, like individual customer modeling, this is a desirable outcome that leads to stable smooth
load forecasts. In other cases, like large system load modeling, the added precision of a more flexible
functional form can significantly reduce load forecast errors. We next introduce a functional form that is
more flexible than temperature splines.
A common nonlinear functional form is a polynomial of order p. For estimating the nonlinear response
between loads and temperatures, we find that a third or possibly fourth order polynomial is sufficient. A
simple regression with a third polynomial can be written as follows:
Loadd = β0 + β1 Temperatured + β2 Temperature2d + β3 Temperature3d + ed
The estimated polynomial expression will span both the heating and cooling side of the weather response
function. The estimated weather response function is given by:
EstimaedWeatherResponsed = β̂0 + β̂1 Temperatured + β̂2 Temperature2d + β̂3 Temperature3d
Weekend interaction terms can be introduced as follows:
Loadd = β0 + β1 Temperatured + β2 Temperature2d + β3 Temperature3d + β4 Weekendd

+ β5 WeekendTemperatured + β6 WeekendTemperature2d
+ β7 WeekendTemperature3d + ed
Examples of the estimated polynomial weather response functions with weekend offsets for the Dayton
Power & Light and Commonwealth Edison loads are shown in Figure 5-21 and Figure 5-22.
FIGURE 5-21. ESTIMATED POLYNOMIAL WEATHER RESPONSE WITH WEEKEND OFFSET FOR DAYTON POWER &
LIGHT
FIGURE 5-22. ESTIMATED POLYNOMIAL WEATHER RESPONSE WITH WEEKEND OFFSET FOR COMMONWEALTH
EDISON
Neural Network Weather Response. In Chapter 5 we introduced neural network models. The specific
model specification described utilizes sigmoid activation functions. A neural network model specification
that mirrors the polynomial regression can be written as:
Loadd = β0 + β1 Hd1 + β2 Hd2 + ed
Here, the explanatory variables, Hd1 and Hd2 , are two sigmoid nodes in the hidden layer. These sigmoid
nodes can be written as:
1
Hd1 =
−(α0 +α1 Temperatured )
1+e
1
Hd2 =
−(δ0 +δ1 Temperatured )
1+e
A neural network model with weekend intercept and weather slope offsets would be written as:
Loadd = β0 + β1 Weekendd + β2 Hd1 + β3 Hd2 + ed
Where the two sigmoid nodes in the hidden layer are written as:
1
Hd1 =
−(α0 +α1 Weekendd +α2 Temperatured )
1+e
1
Hd2 =
−(δ0 +δ1 Weekendd +δ2 Temperatured )
1+e
Recall in the polynomial regression we added WeekendxTemperature, WeekendxTemperature2, and

WeekendxTemperature3 interaction terms to obtain a weekend temperature slope offset. In the case of
the neural network model, we can exploit the following property of the sigmoid function to achieve the
same offset.
1 eδ0 eδ1 Weekendd eδ2 Temperatured

=
1+e
−(δ0 +δ1 Weekendd +δ2 Temperatured ) 1 + eδ0 eδ1 Weekendd eδ2 Temperatured
Here, the left-hand side is mathematically equivalent to the right-hand side. In this case, if the estimated
parameters are non-zero then temperatures and the weekend binary mathematically interact. As a result,
we do not need to include the interaction terms explicitly in the specification of the sigmoid nodes.
Examples of the estimated neural network weather response functions with weekend offsets for the
Dayton Power & Light and Commonwealth Edison loads are shown in Figure 5-23 and Figure 5-24.
FIGURE 5-23. ESTIMATED NEURAL NET WEATHER RESPONSE WITH WEEKEND OFFSET FOR DAYTON POWER &
LIGHT
FIGURE 5-24. ESTIMATED NEURAL NET WEATHER RESPOSNE WITH WEEKEND OFFSET FOR COMMONWEALTH
EDISON
5.4 DAILY VERSUS COINCIDENT HOURLY VERSUS DAILY SUMMARY VARIABLES

Most weather service providers serve up hourly weather forecasts by weather concept. Historical or
observed hourly values for most weather concepts are available for most major airports. Given the
availability of historical and forecasted weather, the question becomes, is it better to use coincident
weather (for example, use 8AM temperatures for 8AM load modeling and forecasting) or daily weather
summaries (for example, average temperatures by time-of-use periods)? An argument for using
coincident temperatures is that the load reacts quickly to changing temperatures. An extreme example
of this is an afternoon thunderstorm that cuts temperatures and loads quickly. An argument against
coincident temperatures is that it is difficult for weather service providers to accurately predict the timing
of significant meteorological events such as an afternoon storm. An argument for using daily weather
summaries is space heating and space cooling driven by the level and pattern of temperatures leading up
to the current hour. As evidence, most systems peak two or three days into a heat wave due to the heat
building up inside buildings and homes. An argument against using daily weather summaries is that they
do capture the timing of significant weather-driven load changes. In most cases, a combination of
coincident and daily weather summaries works well for all but significant weather events.
Here are a couple of things to keep in mind about coincident weather data. Except for precipitation,
hourly meteorological measurements represent the value at the time the measurement is taken. For
example, the temperature value for 8AM could represent the temperature measurement made at
7:55AM. That value may or may not be representative of the temperatures that prevailed over the hour
ending at 8AM. It could be the case that temperature started at 55° at 07:00 and rose steadily by 5° every
15 minutes until the measurement of 72° at the measurement point of 7:55AM. The prevailing
temperatures over the hour from 7AM to 8AM averaged 64.4°. If the load data represents the
consumption for the period 07:00 to 08:00, using the coincident temperature measured at 7:55AM of 72°
overstates how warm the temperatures were during that hour. Aligning weather measurements with
load consumption is critical when using coincident weather data. The second thing to consider is how
does the weather service provider handle time changes associated with observance of daylight saving
time. If the weather service provider does not shift the weather data with the observance of day light
saving time, then it is important to design coincident weather variables that do shift.
Some useful daily summary variables are:
Daily Average, Maximum and Minimum Temperatures
Night Time Average, Maximum and Minimum Temperatures
Morning Average, Maximum and Minimum Temperatures
Afternoon Average, Maximum and Minimum Temperatures
Evening Average Maximum and Minimum Temperatures
Similar summary variables can be constructed for other weather concepts.
5.5 COMBINED HOURLY TEMPERATURES AND HUMIDITY VARIABLES

One of the concerns of load forecasting modeling is over specification. The most obvious source of over
specification is the use of too many explanatory variables. One way of reducing the number of explanatory
variables is to first build composite variables and then construct explanatory variables using the composite
variables. A common composite variable combines temperature and humidity, or temperature and dew
point into a composite weather variable. The composite weather variable is then used as the basis for
coincident and daily summary weather variables. These in turn are used to construct the temperature
spline and polynomial explanatory variables used in the models. Listed below are common composite
variables that bind temperatures with some measure of humidity. Humidity matters because air
conditioners work harder to cool a space when the air is humid. As a result, a hot dry day (e.g., 85° with
55% relative humidity) will have a lower air conditioning load than a hot, humid day (e.g., 85° with 95%
relative humidity).
NOAA Heat Index.17 The National Oceanic and Atmospheric Administration (NOAA) index combines
temperatures with relative humidity.
F F % F %
HId,h = −42.279 + 2.04901523Td,h + 10.14333127RHd,h − 0.22475541Td,h RHd,h
F 2 % 2 F % 2
− 0.00683783Td,h − 0.05481717RHd,h + 0.00122874Td,h RHd,h
F % 2 F % 2 2
+ 0.0085282Td,h RHd,h − 0.00000199Td,h RHd,h
Where,
F
HId,h is the NOAA heat index in degree Fahrenheit for day (d) and hour (h)
F
Td,h is the dry-bulb temperature in degrees Fahrenheit for day (d) and hour (h)
%
RHd,h is relative humidity in percentage for day (d) and hour (h)
Temperature-Humidity Index.18 The temperature-humidity index (THI) combines temperatures and

relative humidity to determine when cows are becoming heat stressed. The formula is:
%
F F
RHd,h F
THId,h = Td,h − [0.55 − (0.55 × )] × (Td,h − 58)
100
Where,
F
THId,h is the temperature-humidity index in degrees Fahrenheit for day (d) and hour (h)
F
17 http://www.srh.noaa.gov/images/ffc/pdf/ta_htindx.pdf
18 http://www.progressivedairy.com/dairy-basics/cow-comfort/12307-how-do-i-determine-how-do-i-calculate-
temperature-humidity-index-thi
%
Carl Schoen Heat Index.19 The Carl Schoen heat index combines dry-bulb temperatures with dew point
temperatures. The formula is:
C C C (0.0801(DPC
d,h −14))
HId,h = Td,h − (1.0799 × e(0.03755Td,h) ) × [1 − e ]
Where,
C
HId,h is the Carl Schoen heat index in degrees Celsius for day (d) and hour (h)
C
Td,h is the dry-bulb temperature in degrees Celsius for day (d) and hour (h)
C
DPd,h is the dew point temperature in degrees Celsius for day (d) and hour (h)
Summer Simmer Index.20 The Summer Simmer Index combines dry-bulb temperatures and relative
humidity. The formula is:
F F % F
SSId,h = 1.98(Td,h − [0.55 − 0.0055RHd,h ][Td,h − 58])
Where,
F
SSId,h is the Summer Simmer Index in degrees Fahrenheit for day (d) and hour (h)
F
%
Australian Apparent Temperature.21 The Australian Bureau of Meteorology’s Australian Apparent

Temperature combines dry-bulb temperatures with water vapor pressure and wind speed. The formula
is:
C C hPa m/s
ATd,h = Td,h + 0.33WVPd,h − 0.7WSd,h − 4.0
19 Schoen, Carl. “A New Empirical Model of the Temperature-Humidity Index”, Journal of Applied Meteorology
and Climatology, 44, 1413-1420, September 2005.
20 http://summersimmer.com/
21 http://www.bom.gov.au/products/IDS65004.shtml
Where,
C
ATd,h is the Australian apparent temperature in degrees Celsius for day (d) and hour (h)
C
Td,h is dry-bulb temperature in degrees Celsius for day (d) and hour (h)
hPa
WVPd,h is water vapour pressure in hPA for day (d) and hour (h)
m/s
WSd,h is wind speed (meters/second) for day (d) and hour (h)
Further,
% 17.27TC
d,h
RHd,h ( ⁄
[237.7+TC
)
hPa
WVPd,h = × 6.105 × e d,h ]
100
Canadian Humidex.22 The Canadian humidex combines dry-bulb and dew point temperatures. The
formula is:
1
(5417.7530×[(273.16)−(1⁄ )])
C C (DPC
d,h +273.16)
Humidexd,h = Td,h + [0.5555 × ([6.11 × e ] − 10)
Where,
C
Humidexd,h is the Canadian humidex in degrees Celsius for day (d) and hour (h)
C
C
DPd,h is the dew point temperature in degrees Celsius for day (d) and hour (h)
22 Masteron, J.M. and F.A. Richardson. “Humidex: A Method of Quantifying Human Discomfort due to Excessive
Heat and Humidity”, Environment Canada, Atmospheric Environment Service, Ontario, Canada, 1979
5.6 INCORPORATING WIND SPEED
The heat indices presented above work well for capturing the impact of temperatures and humidity on air
conditioning loads. The next series of indices are designed to address the impact of temperatures and
wind speed on space heating loads.
Environment Canada Wind Chill (Celsius). Environment Canada’s wind chill combines temperatures and
wind speed. The formula is:
C C km/h 0.16 C km/h 0.16

WCId,h = 13.12 + 0.6215Td,h − 11.32 (WSd,h ) + 0.3965Td,h (WSd,h )
Where,
C
WCId,h is Environment Canada’s wind chill in degrees Celsius for day (d) and hour (h)
C
km/h
WSd,h is wind speed (kilometers/hour) for day (d) and hour (h)
Environment Canada Wind Chill (Fahrenheit). Environment Canada’s wind chill combines temperatures
and wind speed. The formula is:
F F mph 0.16 F mph 0.16

WCId,h = 35.74 + 0.6215Td,h − 35.75 (WSd,h ) + 0.4275Td,h (WSd,h )
Where,
F
WCId,h is Environment Canada’s wind chill in degrees Fahrenheit for day (d) and hour (h)
F
Td,h is dry-bulb temperature in degrees Fahrenheit for day (d) and hour (h)
mph
WSd,h is wind speed (miles/hour) for day (d) and hour (h)
5.1 COMPUTING A WEIGHTED AVERAGE WEATHER DATA

In many cases, weather data and forecasts are available for two or more weather stations. If there are
two weather stations, then it is worth test building a model with two sets of weather variables, one set
for each weather station. If this does work, then the only option is to form weighted average weather
data and form the weather variables using the average weather. A word of caution…if you decide to use
weighted weather, then it is recommended to do the following.
1. First, form all the weather variables you need separately using the weather data for each weather
station. For example, create the HDD and CDD variables for each station.
2. Form the weighted weather variables by weighting the weather variables from step 1 across the
weather stations.
Why is this approach recommend? Consider two weather stations with temperatures of 60° for station 1
and 70° for station 2. Further, assume the cut point for the heating and cooling degree day variables is
65°. Averaging the temperatures results in an average temperature of 65°. The HDD and CDD variables
based on the average temperature are:
HDD = MAX(Cutpoint – 65,0) = 0
CDD = MAX(65 – Cutpoint,0) = 0
In other words, the average temperature across the two stations implies no heating or cooling takes place.
If we follow the above procedure, we first compute HDD and CDD variables for each station and then
average the results. This gives:
HDD Station 1 = MAX(Cutpoint – 60, 0) = 5
CDD Station 1 = MAX(60 – Cutpoint,0) = 0
HDD Station 2 = MAX(Cutpoint – 70, 0) = 0
CDD Station 2 = MAX(70 – Cutpoint,0) = 5
The average HDD and CDD variables that go into the model are then HDD=2.5 and CDD=2.5. In this case,
some space heating and space cooling took place.
The point is that any averaging of weather data across stations will flatten the data. To avoid this, it is
best to compute the weather variables by station and then average the results.
5.2 COMPUTING A WEIGHTED AVERAGE WIND SPEED
Often, when we are developing weather variables, we introduce wind speed either as a standalone
explanatory variable or as a contributing term in a wind chill formula. An example of the later is the
Environment Canada wind chill. In cases where you have data for a single weather station, the calculations
are straightforward. But what is the right wind speed to include in the wind chill formula when the wind
speed data are derived from a weighted average of two or more weather stations? It turns out that a
simple average of the wind speeds across the weather stations can lead to very misleading results.
Consider two weather stations: the first station is blowing 20 mph with a wind direction of northeast
(45°), and the second station is blowing 20 mph with a wind direction of northwest (315°). A simple
average of the two winds speeds is 20 mph. While it is tempting to use the 20 mph as the wind speed for
the weighted average of the two stations, it would not represent what is really happening. Specifically,
the 20-mph wind blowing northeast would be working against the 20-mph wind blowing northwest,
leaving a weighted average wind speed of 14.14 mph.
We can extend this idea to wind direction. Let us say we know that the load impact of storms flowing in
from the northwest has a different signature from storms flowing in from the northeast. We would like
to use this information by interacting the wind speed variable with a wind direction variable. In the simple
example presented above, the weighted average wind direction would be 180° (computed as [45° +
315°]/2). In other words, the weighted average wind direction would be due south, which is clearly wrong.
The weighted average wind direction is due north (0°).
How did we derive the true weighted average wind speed and wind direction? Below are the steps to
take to compute a weighted average wind speed and wind direction. The calculations will use the wind
speed and wind direction data in the following table.
Step 1. Convert Wind Direction in Degrees to Wind Direction in Radians
The formula for converting wind direction from degrees to radians is:
π
Radians = Degrees ×
180
Where, π is the numerical constant PI
The results of the conversion from degrees and radians is shown in the following table.
Step 2. For Each Station, Compute the East-West and North-South Vector
The east-west vector is computed as:
EastWestVectoris = WindSpeedsi × SIN(WindDirectionRadiansis )
The north-south vector is computed as:
NorthSouthVectoris = WindSpeedsi × COS(WindDiectionRadiansis )
Here, the weather stations are indexed by (s) and the time interval is indexed by (i). You would compute
the east-west and north-south vectors separately for each hour of wind speed and wind direction data.
The result of this step is presented in the following table.
Step 3. Compute a Weighted Sum of the East-West and North-South Vectors
The weighted sum of the east-west vector is computed as:
EastWestVectorWeightedSumi = ∑ Station Wgts s × EastWestVectoris

s=1
The weighted sum of the north-south vector is computed as:
S
NorthSouthVectorWeightedSumi = ∑ StationWgts s × NorthSouthVectoris

s=1
Where,
StationWgts s is the user inputted weather station weight for weather station (s). These
calculations assume that the sum of the weather station weights equals 1.0. If not, then the
weights need to be normalized to 1.0 prior to this step. In the example, each station is assigned
a weight of 25%.
The result of this step is shown below.
Step 4. Compute the Weighted Average Wind Speed
The weighted average wind speed is then computed as:
WeightedAverageWindSpeedi
= SQRT(EastWestVectorWeightedSum2 + NorthSouthVectorWeightedSum2 )
For this example, the weighted average wind speed is 10.32.
Step 5. Compute the Weighted Average Wind Direction
The computation of the weighted average wind direction is as follows.
a. First compute the arch tangent given the east-west and north-south vectors from Step 3 above.
If the NorthSouthVectorWeightedSum is not equal to 0.0 then
EastWestVectorWeightedSumi
WeightedWindDirectionRadiansi = ATAN ( )
NorthSouthVectorWeightedSumi
If the NorthSouthVectorWeightedSum = 0.0 then
EastWestVectorWeightedSumi
WeightedWindDirectionRadiansi = ATAN ( )
0.0001 + NorthSouthVectorWeightedSumi
This alternative formula controls for possible errors associated with dividing by 0.0.
The result of this sub-step is -0.38.
b. Now a correction is made if the north-south vector weighted sum is negative
If (NorthSouthVectorWeightedSumi < 0) Then WeightedWindDirectionRadiansi

= WeightedWindDirectionRadians_R i + π
The result of this sub-step is 2.77 radians.
c. Next the wind direction in radians is converted to degrees
180
WeightedWindDirectionDegreesi = WeightedWindDirectionRadiansi ×
π
The result is a weighted average wind direction of 158.44°.
6 ALTERNATIVE MODEL SPECIFICATIONS
This chapter presents a set of alternative model specifications. The specifications range from the most
basic model of computing the overall average of the loads to models that allow the load shapes to vary
by combination of day-of-the-week and month and weather. Logical extensions of these specifications
include:
◼ Addition of holiday variables,

◼ Replacing the simple spline variables with multi piece capped spline variables,
◼ Replacing the spline variables with polynomial temperature variables,
◼ Adding growth trends (e.g., linear trend),
◼ Adding variables to account for increased saturation of distributed energy resources, such as solar
PV generation, demand response, and storage, and
◼ Adding variables to account for increased saturation of electric vehicle charging.
For very near-term forecast horizons of one day or less, the model specifications can be extended to
include autoregressive terms. These terms can be in the form of:
◼ Prior hour loads (e.g., load at 8AM is a function of loads at 7AM, 6AM, and 5AM), or
◼ Prior day loads (e.g., load at 8AM is a function of 8AM loads from the prior seven days)
A Practitioner’s Guide to Short-term Load Forecast Modeling Alternative Model Specifications |6-1
This list of alternative model specifications is not intended to be an exhaustive list of all possible model
specifications. Consider them as a launching point for your own models.
6.1.1 Model Template: Constant

This model returns the average over the estimation period. It is useful for large stable industrial loads
where you want the forecast to be fixed. Since there is only one parameter to be estimated, you only
need a handful of days to estimate the model. This means the forecast analyst can select the date range
that best fits the loads they want to forecast.
Loadd,h = β0h Intercept d
Where,
Loadd,h hourly load data for day (d) and hour (h)
Intercept d takes on a value 1.0 for all days (d)
β0h is the coefficient for the Intercept d variable for hour (h)
6.1.2 Model Template: Constant with Daily Temperature

This model will produce a load forecast that has a fixed baseline load shape that varies with average daily
temperatures.
Loadd,h = {β0h Intercept d } + {β1h HDDd + β2h CDDd + β3h HDDd Weekendd + β4h CDDd Weekendd
+ β5h LagHDDd + β6h LagCDDd + β7h LagHDDWeekendd + β8h LagCDDWeekendd }
Where,
HDDd is the heating degree day for day (d)
CDDd is the cooling degree day for day (d)
LagHDDd is the weighted average of the prior two days of heating degree day for days where
the weights are 0.75 on the one-day lag and 0.25 on the two-day lag
LagCDDd is the weighted average of the prior two days of cooling degree day for days where the
weights are 0.75 on the one-day lag and 0.25 on the two-day lag
A Practitioner’s Guide to Short-term Load Forecast Modeling Alternative Model Specifications|6-2

Weekendd is a binary variable that takes on a value 1.0 if the day (d) is a weekend day, otherwise
0.0
HDDd Weekendd is the heating degree day interacted with a weekend binary variable
CDDd Weekendd is the cooling degree day interacted with a weekend binary variable
LagHDDWeekendd is the lagged heating degree day interacted with a weekend binary variable
LagCDDWeekendd is the lagged cooling degree day interacted with a weekend binary variable
6.1.3 Model Template: Day-of-the-Week

This model produces an average load shape by day-of-the-week.
Loadd,h = β0h Intercept d + β1h Sundayd + β2h Mondayd + β3h Tuesdayd + β4h Thursdayd
+ β5h Fridayd + β6h Saturdayd
Where,
Sundayd is a binary variable that takes on a value of 1.0 if the day (d) is a Sunday, otherwise 0.0
Mondayd is a binary variable that takes on a value of 1.0 if the day (d) is a Monday, otherwise 0.0
Tuesdayd is a binary variable that takes on a value of 1.0 if the day (d) is a Tuesday, otherwise 0.0
Thursdayd is a binary variable that takes on a value of 1.0 if the day (d) is a Thursday, otherwise 0.0
Fridayd is a binary variable that takes on a value of 1.0 if the day (d) is a Friday, otherwise 0.0
Saturdayd is a binary variable that takes on a value of 1.0 if the day (d) is a Saturday, otherwise 0.0

6.1.4 Model Template: Day-of-the-Week with Daily Temperature
This model produces an average load shape by day-of-the-week that varies with average daily
temperatures. The weather response can vary between weekdays and weekend days.
Loadd,h = {β0h Intercept d + β1h Sundayd + β2h Mondayd + β3h Tuesdayd + β4h Thursdayd
+ β5h Fridayd + β6h Saturdayd } + {β7h HDDd + β8h CDDd + β9h HDDd Weekendd
+ β10 11 12
h CDDd Weekendd + βh LagHDDd + βh LagCDDd
+ β13 14
h LagHDDWeekendd + βh LagCDDWeekendd }
6.1.5 Model Template: Day-of-the-Week with Time-of-Use Temperatures

This model produces an average load shape by day-of-the-week that varies with night, morning,
afternoon, and evening temperatures. The weather response can vary between weekdays and weekend
days.
Loadd,h = {β0h Intercept d + β1h Sundayd + β2h Mondayd + β3h Tuesdayd + β4h Thursdayd
+ β5h Fridayd + β6h Saturdayd } + {β7h NightHDDd + β8h NightCDDd
+ β9h NightHDDd Weekendd + β10 h NightCDDd Weekendd
+ βh MorningHDDd + βh MorningCDDd + β13
11 12
h MorningHDDd Weekendd
+ βh MorningCDDd Weekendd + βh AfternoonHDDd + β16
14 15
h AftenoonCDDd
17 18
+ βh AfternoonHDDd Weekendd + βh AfternoonCDDd Weekendd
+ β19 20 21
h EveningHDDd + βh EveningCDDd + βh EveningHDDd Weekendd
23
+ β22
h EveningCDDd Weekendd + βh LagHDDd + βh LagCDDd
24
+ β25 26
Where,
NightHDDd is the heating degree day over the night TOU period (12AM to 5AM)
MorningHDDd is the heating degree day over the morning TOU period (6AM to 11AM)
AfternoonHDDd is the heating degree day over the afternoon TOU period (12PM to 5PM)
EveningHDDd is the heating degree day over the evening TOU period (6PM to 12AM)
NightCDDd is the cooling degree day over the night TOU period (12AM to 5AM)
MorningCDDd is the cooling degree day over the morning TOU period (6AM to 11AM)

AfternoonCDDd is the cooling degree day over the afternoon TOU period (12PM to 5PM)
EveningCDDd is the cooling degree day over the evening TOU period (6PM to 12AM)
6.1.6 Model Template: Month

This model produces an average load shape by month.
Loadd,h = β0h Intercept d + β1h Januaryd + β2h Februaryd + β3h Marchd + β4h Mayd + β5h Juned
+ β6h Julyd + β7h August d + β8h Septemberd + β9h Octoberd + β10
h Novemberd
11
+ βh Decemberd
Where,
Januaryd is a binary variable that takes on a value 1.0 if the day (d) falls in January, otherwise 0.0
Februaryd is a binary variable that takes on a value 1.0 if the day (d) falls in February, otherwise
0.0
Marchd is a binary variable that takes on a value 1.0 if the day (d) falls in March, otherwise 0.0
Mayd is a binary variable that takes on a value 1.0 if the day (d) falls in May, otherwise 0.0
Juned is a binary variable that takes on a value 1.0 if the day (d) falls in June, otherwise 0.0
Julyd is a binary variable that takes on a value 1.0 if the day (d) falls in July, otherwise 0.0
August d is a binary variable that takes on a value 1.0 if the day (d) falls in August, otherwise 0.0
Septemberd is a binary variable that takes on a value 1.0 if the day (d) falls in September,
otherwise 0.0
Octoberd is a binary variable that takes on a value 1.0 if the day (d) falls in October, otherwise
0.0
Novemberd is a binary variable that takes on a value 1.0 if the day (d) falls in November,
otherwise 0.0

Decemberd is a binary variable that takes on a value 1.0 if the day (d) falls in December, otherwise
0.0
6.1.7 Model Template: Day Type

This model produces an average weekday and weekend day load shape by month.
Loadd,h = β0h Intercept d + β1h Januaryd + β2h Januaryd Weekendd + β3h Februaryd
+ β4h Februaryd Weekendd + β5h Marchd + β6h Marchd Weekendd + β7h Aprild
+ β8h Aprild Weekendd + β9h Mayd + β10 11
h Mayd Weekendd + βh Juned
13 15
+ β12 14
h Juned Weekendd + βh Julyd + βh Julyd Weekendd + βh August d
+ β16 17 18
h August d Weekendd + βh Septemberd + βh Septemberd Weekendd
+ β19 20 21
h Octoberd + βh Octoberd Weekendd + βh Novemberd
23
+ β22
h Novemberd Weekendd + βh Decemberd
6.1.8 Model Template: Day Type with Daily Temperature

This model produces an average weekday and weekend day load shape by month that varies with average
daily temperatures. The weather response can vary between weekdays and weekend days.
13 15
+ β12 14
+ β16 17 18
+ β19 20 21
23 25
+ β22 24
h Novemberd Weekendd + βh Decemberd + {βh HDDd + βh CDDd
+ β26 27 28 29
h HDDd Weekendd + βh CDDd Weekendd + βh LagHDDd + βh LagCDDd
+ β30 31

6.1.9 Model Template: Day Type with Time-of-Use Temperatures
This model produces an average weekday and weekend day load shape by month that varies with night,
morning, afternoon, and evening temperatures. The weather response can vary between weekdays and
weekend days.
13 15
+ β12 14
+ β16 17 18
+ β19 20 21
23
+ β22 24
h Novemberd Weekendd + βh Decemberd + {βh NightHDDd
+ β25 26 27
h NightCDDd + βh NightHDDd Weekendd + βh NightCDDd Weekendd
+ β28 29 30
h MorningHDDd + βh MorningCDDd + βh MorningHDDd Weekendd
+ β31 32 33
h MorningCDDd Weekendd + βh AfternoonHDDd + βh AftenoonCDDd
+ β34 35
h AfternoonHDDd Weekendd,h + βh AfternoonCDDd Weekendd
+ β36 37 38
+ β39 40 41
43
+ β42

6.1.10 Model Template: Extended Day Type
This model produces an average Sunday, Monday, TWT, Friday, and Saturday load shape by month.
Loadd,h = {β0h Intercept d + β1h Januaryd + β2h Januaryd Sundayd + β3h Januaryd Mondayd
+ β4h Januaryd Fridayd + β5h Januaryd Saturdayd + β6h Februaryd
+ β7h Februaryd Sundayd + β8h Februaryd Mondayd + β9h Februaryd Fridayd
+ β10 11 12
h Februaryd Saturdayd + βh Marchd + βh Marchd Sundayd
+ β13 14 15
h Marchd Mondayd + βh Marchd Fridayd + βh Marchd Saturdayd
+ β16 17 18
h Aprild + βh Aprild Sundayd + βh Aprild Mondayd
+ β19 20 21
h Aprild Fridayd + βh Aprild Saturdayd + βh Mayd
23
+ β22 24
h Mayd Sundayd + βh Mayd Mondayd + βh Mayd Fridayd
+ β25 26 27
h Mayd Saturdayd + βh Juned + βh Juned Sundayd
+ β28 29 30
h Juned Mondayd + βh Juned Fridayd + βh Juned Saturdayd
+ β31 32 33 34
h Julyd + βh Julyd Sundayd + βh Julyd Mondayd + βh Julyd Fridayd
+ β35 36 37
h Julyd Saturdayd + βh August d + βh August d Sundayd
+ β38 39 40
h August d Mondayd + βh August d Fridayd + βh August d Saturdayd
43
+ β41 42
h Septemberd + βh Septemberd Sundayd + βh Septemberd Mondayd
45 46
+ β44
h Septemberd Fridayd + βh Septemberd Saturdayd + βh Octoberd
48 49
+ β47
h Octoberd Sundayd + βh Octoberd Mondayd + βh Octoberd Fridayd
+ β50 51 52
h Octoberd Saturdayd + βh Novemberd + βh Novemberd Sundayd
+ β53 54
h Novemberd Mondayd + βh Novemberd Fridayd
+ β55 56
h Novemberd Saturdayd + βh Decemberd Sundayd
+ β57 58
h Decemberd Mondayd + βh Decemberd Fridayd
+ β59
h Decemberd Saturdayd }

6.1.11 Model Template: Extended Day Type with Daily Temperature
This model produces an average Sunday, Monday, TWT, Friday and Saturday load shape by month that
varies with average daily temperatures. The weather response can vary between weekdays and weekend
days.
+ β10 11 12
+ β13 14 15 16
h Marchd Mondayd + βh Marchd Fridayd + βh Marchd Saturdayd + βh Aprild
18 19
+ β17
h Aprild Sundayd + βh Aprild Mondayd + βh Aprild Fridayd
+ β20 21 22 23
h Aprild Saturdayd + βh Mayd + βh Mayd Sundayd + βh Mayd Mondayd
25 26
+ β24 27
h Mayd Fridayd + βh Mayd Saturdayd + βh Juned + βh Juned Sundayd
+ β28 29 30 31
h Juned Mondayd + βh Juned Fridayd + βh Juned Saturdayd + βh Julyd
+ β32 33 34
h Julyd Sundayd + βh Julyd Mondayd + βh Julyd Fridayd
+ β35 36 37
+ β38 39 40
43
+ β41 42
45 46
+ β44
48 49
+ β47
+ β50 51 52
+ β53 54 55
h Novemberd Mondayd + βh Novemberd Fridayd + βh Novemberd Saturdayd
+ β56 57 58
h Decemberd Sundayd + βh Decemberd Mondayd + βh Decemberd Fridayd
+ β59 60 61 62
h Decemberd Saturdayd } + {βh HDDd + βh CDDd + βh HDDd Weekendd
+ β63 64 65 66
h CDDd Weekendd + βh LagHDDd + βh LagCDDd + βh LagHDDWeekendd
+ β67
h LagCDDWeekendd }

6.1.12 Model Template: Extended Day Type with Time-of-Use Temperatures
This model produces an average Sunday, Monday, TWT, Friday, and Saturday load shape by month that
varies with night, morning, afternoon, and evening temperatures. The weather response can vary
between weekdays and weekend days.
+ β10 11 12
+ β13 14 15
h Marchd Mondayd + βh Marchd Fridayd + βh Marchd Saturdayd
+ β16 17 18
h Aprild + βh Aprild Sundayd + βh Aprild Mondayd
+ β19 20
h Aprild Fridayd + βh Aprild Saturdayd + βh Mayd
21
23
+ β22 24
h Mayd Sundayd + βh Mayd Mondayd + βh Mayd Fridayd
+ β25 26 27
h Mayd Saturdayd + βh Juned + βh Juned Sundayd
+ β28 29 30
h Juned Mondayd + βh Juned Fridayd + βh Juned Saturdayd
+ β31 32 33 34
h Julyd + βh Julyd Sundayd + βh Julyd Mondayd + βh Julyd Fridayd
+ β35 36 37
+ β38 39 40
43
+ β41 42
45 46
+ β44
48 49
+ β47
+ β50 51 52
+ β53 54
h Novemberd Mondayd + βh Novemberd Fridayd
+ β55 56
h Novemberd Saturdayd + βh Decemberd Sundayd
+ β57 58
h Decemberd Mondayd + βh Decemberd Fridayd
+ β59 60
h Decemberd Saturdayd } + {βh NightHDDd + βh NightCDDd
61
+ β62 63
h NightHDDd Weekendd + βh NightCDDd Weekendd
+ β64 65 66
+ β67 68
69
+ β70 71
73
+ β72 74
+ β75 76 77
+ β78 79

6.1.13 Model Template: Season
This model produces an average load shape by season.
Loadd,h = β0h Intercept d + β1h Winterd + β2h Summerd + β3h Falld
Where,
Winterd is a binary variable that takes on a value of 1.0 if the day (d) falls within the December,
January, or February months, otherwise 0.0
Summerd is a binary variable that takes on a value of 1.0 if the day (d) falls within the June, July,
or August months, otherwise 0.0
Falld is a binary variable that takes on a value of 1.0 if the day (d) falls within the September,
October, or November months, otherwise 0.0
6.1.14 Model Template: Season Day Type

This model produces an average weekday and weekend day load shape by season.
Loadd,h = β0h Intercept d + β1h Winterd + β2h Summerd + β3h Falld + β4h Weekendd
+ β5h Winterd Weekendd + β6h Summerd Weekendd + β7h Falld Weekendd
6.1.15 Model Template: Season Day Type with Daily Weather

This model produces an average weekday and weekend day load shape by season that varies with average
daily temperatures. The weather response can vary between weekdays and weekend days.
+ {β8h HDDd + β9h CDDd + β10 11
h HDDd Weekendd + βh CDDd Weekendd
13
+ β12 14
h LagHDDd + βh LagCDDd + βh LagHDDWeekendd
+ β15
h LagCDDWeekendd }

6.1.16 Model Template: Season Day Type with Time-of-Use Temperatures
This model produces an average weekday and weekend day load shape by season that varies with night,
morning, afternoon, and evening temperatures. The weather response can vary between weekdays and
weekend days.
+ {β8h NightHDDd + β9h NightCDDd + β10
h NightHDDd Weekendd
+ βh NightCDDd Weekendd + βh MorningHDDd + β13
11 12
h MorningCDDd
14 15
+ βh MorningHDDd Weekendd + βh MorningCDDd Weekendd
+ β16 17
h AfternoonHDDd + βh AftenoonCDDd
+ β18 19
+ β20 21 22
+ β23 24 25
+ β26 27
6.1.17 Model Template: Extended Season

This model produces an average Sunday, Monday, TWT, Friday, and Saturday load shape by season.
Loadd,h = β0h Intercept d + β1h Winterd + β2h Winterd Sundayd + β3h Winterd Mondayd
+ β4h Winterd Fridayd + β5h Winterd Saturdayd + β6h Spring d
+ β7h Spring d Sundayd + β8h Spring d Mondayd + β9h Spring d Fridayd
+ β10 11 12
h Spring d Saturdayd + βh Summerd + βh Summerd Sundayd
+ β13 14
h Summerd Mondayd + βh Summerd Fridayd
+ β15 16 17
h Summerd Saturdayd + βh Falld Sundayd + βh Falld Mondayd
+ β18 19
h Falld Fridayd + βh Falld Saturdayd

6.1.18 Model Template: Extended Season with Daily Weather
This model produces an average Sunday, Monday, TWT, Friday, and Saturday load shape by season that
varies with average daily temperatures. The weather response can vary between weekdays and weekend
days.
+ β10 11 12
+ β13 14
+ β15 16 17
h Summerd Saturdayd + βh Falld Sundayd + βh Falld Mondayd
+ β18 19 20
h Falld Fridayd + βh Falld Saturdayd + {βh HDDd + βh CDDd
21
23
+ β22
h HDDd Weekendd + βh CDDd Weekendd + βh LagHDDd
24
+ β25 26 27
h LagCDDd + βh LagHDDWeekendd + βh LagCDDWeekendd }
6.1.19 Model Template: Extended Season with Time-of-Use Temperatures

This model produces an average Sunday, Monday, TWT, Friday, and Saturday load shape by season that
varies with night, morning, afternoon, and evening temperatures. The weather response can vary
between weekdays and weekend days.
+ β10 11 12
+ β13 14
+ βh Summerd Saturdayd + β16
15 17
h Falld Sundayd + βh Falld Mondayd
+ β18 19 20
h Falld Fridayd + βh Falld Saturdayd + {βh NightHDDd
23
+ β21 22
h NightCDDd + βh NightHDDd Weekendd + βh NightCDDd Weekendd
25 26
+ β24
28 29
+ β27
30 31
+ βh AfternoonHDDd Weekendd,h + βh AfternoonCDDd Weekendd
+ β32 33 34
+ β35 36 37
+ β38 39

7 GUIDELINES FOR BUILDING LOAD FORECAST MODELS
The previous chapters introduced regression and neural network models and provided some common
model specifications. Now the hard part comes, which is building a model specific to your loads. Here
are some guidelines that might help achieve your goal of building an accurate load forecast model.
Guideline 1. There is no-one-size-fits-all model. All loads are different, which means each load requires
a model designed specific for the characteristics of that load.
Guideline 2. The best model today will more than likely need to evolve in six months to a year to reflect
underlying changes in customer and technology mix. To remain accurate and current, the load forecast
model specification must evolve over time. A good habit is to:
1) Re-estimate the model parameters once or twice a month,

2) Track the model performance with actual weather,
3) Make model refinements to the model specification when the model performance with actual
weather shows signs of systematic model error, and
4) At least once a year, re-evaluate the model specification and make refinements as needed.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models |7-1
Guideline 3. Once you have a good load forecast model, future variations are designed around recent
events where the model performed poorly. In most cases, this will lead to minor modifications to the
existing model specification.
Guideline 4. The best in-sample and out-of-sample model fit statistics are the mean absolute deviation
(MAD) and the mean absolute percentage error (MAPE). These statistics are in a language a non-modeler,
like a control room operator, can understand.
Mean Absolute Errord,h = MADd,h = |Loadd,h − Forecast d,h |
|Loadd,h − Forecast d,h |

Mean Absolute Percentage Errord,h = MAPEd,h = ( ) × 100
Loadd,h
The MAD presents the average absolute model error in the same units that the load being forecasted is
in. For example, the MAD may be 200 MW. This is a value a control room operator can relate to because
they know what that means in terms of whether they have sufficient generation online to meet that error.
The MAPE is useful when you are comparing load forecast errors across different load zones that could
be significantly different in size. For example, it makes more sense to compare the MAPE between the
model of DAY loads to the MAPE from the model of the Commonwealth Edison load than it would be to
compare the MAD.
Guideline 5. Remember, a model returns the average load value for user-defined segments of the
historical load data. These segments are defined by the set of explanatory variables included in the model.
It is up to the modeler to determine which segmentation scheme makes the most sense for the load data
being modeled.
Guideline 6. Building accurate load forecast models is about eliminating any possible time series pattern
in the load model errors. The goal is to define a set of explanatory variables so that when you look at a
graph of the model errors, what you see is a chaotic time series.
Guideline 7. If a graph of the model errors shows a long run trend tilt (either up or down), that suggests
a time trend needs to be included as one of the explanatory variables.
Guideline 8. If a graph of the model errors shows a shift (up or down) in the residuals for a specific range
of data (e.g., a month or week or more), then some form of binary variable that takes on a value of 1.0
when the load shift occurred is needed.
A Practitioner’s Guide to Short-term Load Forecast Modeling Guidelines for Building Load Forecast Models|7-2
Guideline 9. Three to four years of historical load data is usually sufficient. The goal is to forecast
tomorrow. The goal is most definitely not explaining what happened 10 years ago.
Guideline 10. R2 and t-statistics do not provide as strong a measure of load forecast performance as a
comparison of the in-sample and out-of-sample MAD and MAPE.
Guideline 11. Start with one of the model templates presented in Chapter 6 and fine tune from there.
Guideline 12. Spend time getting the list of weather stations and weather concepts right.
Guideline 13. Spend time cleaning the load and weather data.
Guideline 14. Put in a process that cleans load and weather data when the data arrive.
Guideline 15. Use autoregressive load terms sparingly. It is better to build the best model you can without
autoregressive load terms. Add autoregressive terms after you built your best model.
Guideline 16. Take time to talk with the control room operators about different days, weeks, or events
that occurred. Translate their knowledge about what happened into explanatory variables.
Guideline 17. Graph the load data, graph the weather data, graph the model actual versus predicted,
graph the model errors, graph the load forecasts, graph the weather forecasts. Bury yourself with graphs
of the data.
Guideline 18. Add explanatory variables to your model in logical groups. For example, start with variables
that explain load variation by day-of-the-week, month, and/or season. Next add holiday and special event
days. Then add weather variables. At each step, record the MAD and MAPE to ensure you are receiving
the expected improvement.
Guideline 19. When you add a variable to a model, anticipate what that variable will do to the model fit.
Often it is helpful to focus on a particular week and watch how the predicted value either improves or not
as you add successive explanatory variables.
Guideline 20. Model building is 99% hard thinking coupled with trial and error and 1% inspiration.
Inspiration only comes after hard thinking and trial and error. There are no shortcuts to building accurate
load forecast models.
8 INCORPORATING BEHIND THE METER SOLAR PV
GENERATION
A growing load forecasting problem is the treatment of behind-the-meter solar PV generation. This
section summarizes the basic information needed to develop a solar generation forecast given a cloud
cover forecast.
Solar Panel Basics. The most commonly used solar generation technology for homes and businesses are
solar photovoltaic (PV) panels. PV cells convert light (photons) to electricity (voltage). The first practical
PV cell was developed in 1954 when scientists at Bell Telephone Laboratories noticed that silicon created
an electric charge when exposed to sunlight. When light strikes a PV cell, a certain portion of the light is
absorbed within the silicon material. The absorbed energy knocks electrons loose, allowing them to flow
freely. The “freed” electrons form a current. The amount of electricity produced by a solar panel depends
Join us in creating a more resourceful world.

To learn more visit itron.com CORPORATE HQ
2111 North Molter Road
While Itron strives to make the content of its marketing materials as timely and accurate as possible, Itron makes no claims, promises, or
guarantees about the accuracy, completeness, or adequacy of, and expressly disclaims liability for errors and omissions in, such materials.
Liberty Lake, WA 99019 USA
No warranty of any kind, implied, expressed, or statutory, including but not limited to the warranties of non-infringement of third party Phone: 1.800.635.5461
rights, title, merchantability, and fitness for a particular purpose, is given with respect to the content of these marketing materials.
© Copyright 2018 Itron. All rights reserved. 101696MP-01 7/18 Fax: 1.509.891.3355
on the size of the panel, the efficiency of the panel, and the amount of solar energy that reaches the panel
surface.
◼ Panel size is typically measured in units of Watts/m2 of peak output. Peak output is the amount
of electricity the panel would produce when the panel receives the maximum amount of sunlight
possible and at optimal ambient temperatures (25C or 55F). Typically, a solar panel is composed
of 40 or so solar cells that, in total, generate approximately 150 Watts/m2 of peak electricity
output. From the perspective of generation forecasting, panel size is treated as a known
exogenous forecast driver to the forecast framework. For example, the size of a solar plant is
given by the total peak MW output of the plant. Embedded solar is measured typically in total
MWs of installed rooftop panels for a given geographic area.
◼ Panel efficiency measures the percentage of solar energy hitting the panel that is converted into
electricity. Solar panel efficiencies for a typical residential or commercial application range
between 10% and 20%. For example, suppose the solar energy reaching the panel surface is 1,000
Watts/m2. A panel with a maximum output rating of 150 Watts/m2 would have an efficiency of
15% (computed as 150 Watts Out/m2 over 1,000 Watts In/m2).
Efficiency and panel size ratings are under ideal conditions. Factors that impact a solar panel’s actual
electricity output include (a) panel orientation and tilt, (b) panel temperature and (c) shade.
◼ Panel orientation and tilt are important factors that impact how much direct sunlight reaches the
panel surface for any location and time. For installations in the Northern Hemisphere, the ideal
orientation is south facing, while in the Southern Hemisphere, the ideal orientation is north facing.
Further, under ideal circumstances the panel will tilt throughout the day to track the path of the
sun through the sky. Because the costs of solar tracking systems are high, they tend to be only
found in large commercial installations and solar plants. This implies most residential installations
do not operate under peak conditions.
◼ Solar panels are most efficient at temperatures around 13C or 55F. The hotter the temperature,
the less efficient a panel is in converting sun energy into electricity. For temperatures above 25C
(77F), the efficiency of a rooftop solar panel will degrade 0.48% per degree Celsius (0.27% per
degree Fahrenheit).
◼ Shade lowers the output of a panel by blocking the amount of solar energy reaching the panel
surface. Trees, clouds, nearby structures, and even dirt and snow buildup can effectively block
the amount of solar energy reaching the panels. Clouds reduce solar panel output by reducing
the amount of solar energy striking the panel. At 100% cloud cover, roughly 80% of the solar
energy is reflected out to space and 20% filters through to Earth’s surface.
Unlike solar plants where the panel orientation, tilt, and shading (other than cloud cover) are known, the
specifics of each rooftop installation are unknown. In this case, an average operating efficiency of 15%
should account for all likely combinations of panel orientation, tilt, and shade that a collection of
residential and commercial installations would have. From the perspective of forecasting embedded solar
generation, the impact of temperature and cloud cover will be treated as separate factors influencing
solar panel output on an hour-by-hour basis.
Solar Insolation. Given the basics of solar panel technology, the big question is, how do we predict the
amount of solar energy that will reach the surface of a solar panel for any location and time? Solar
insolation measures how much solar energy (Watts/m2) reaches Earth’s surface under a cloudless day and
is the key input to solar generation forecasting. For any given point on Earth, the amount of solar energy
will vary not only throughout the day as the sun tracks across the sky, but also throughout the year as the
sun cycles between the Tropic of Capricorn and the Tropic of Cancer.
Fortunately, Johannes Kepler’s First, Second, and Third Laws of Motion, combined with Sir Isaac Newton’s
explanation of the motion of planets, give us everything we need to predict solar insolation for any
location and time. The detailed calculations for estimating solar insolation are presented below.
Three parameters are needed to track the position of the sun in the sky for a specific location, date, and
time: (a) solar declination angle, (b) solar hour angle, and (c) solar altitude angle. In addition, the sunrise
hour angle and sunset hour angle are used to determine the time of sunrise and sunset. The formulas for
each parameter are presented below.
Solar Declination Angle. Solar declination angle is the angle between the sun’s rays and a plane passing
through the equator. From the perspective of the Northern Hemisphere, the solar declination angle has
a maximum value of 23.45 on June 21 and a minimum value of 23.45° on December 21. The annual cycle
for the solar declination angle is depicted in Figure 8-1.Figure 8-1. Solar Declination Angle
The solar declination angle is computed as follows:
o 284 + d
SolarDeclinationAngled = 23.45 xSIN (360ox ( ))
365
Where d is the number of the day in the year (e.g., d =1 to 365). The solar declination angle for January
1, 2010 through December 31, 2012 is shown in Figure 8-1.
Solar Hour Angle. Solar hour angle measures the position of the sun relative to solar noon at a given
location and time. The solar hour angle will be 0.0 at local solar noon—this is the point the sun is highest
in the sky. The solar hour angle will be negative before local solar noon and positive after local solar noon.
The solar hour angle changes 15 each hour or 1 every four minutes. For five-minute modeling, this
means the solar hour angle changes 1.25 every five minutes. At local solar hour 08:00, the solar hour
angle will be equal to -60. At local solar hour 16:00, the solar hour angle will be equal to 60.
FIGURE 8-1. SOLAR DECLINATION ANGLE
Solar Altitude Angle. Solar altitude angle is the angle between the sun’s rays and a horizontal plane as
the sun traverses the sky between sunrise and sunset. At the sun’s zenith (solar noon), this angle will vary
across the year. The solar altitude angle can be calculated for any location and time as follows:
SIN(α) = SIN(L)SIN(δ) + COS(L)COS(δ)COS(ω)
Where,
α is the Solar Altitude Angle

δ is the Solar Declination Angle
L is the lattitude
ω is the Solar Hour Angle
We can use this equation to determine the solar altitude angle for solar noon for a specific day and
location. For example, to calculate the solar altitude angle for solar noon on February 15 in Honolulu,
Hawaii (latitude 21.3069N; longitude 157.8583W), the formula is as follows.
First, we need to compute the solar declination angle for February 15.
o 284 + 46
SolarDeclinationAngleFeb−15 = 23.45 xSIN (360ox ( )) = −13.3
365
Second, since it is solar noon, the solar hour angle is ω = 0°.
Given this information, we have:
SIN(α) = SIN(21.3069°) × SIN(−13.3°) + COS(21.3069°) × COS(−13.3°) × COS(0°)
SIN(α) = 0.823175
α = SIN−1 (0.823175) = 55.4°
In the Northern Hemisphere as Earth rotates toward the summer solstice, the solar altitude angle will
steepen. I f we redo the calculations for solar noon on June 21, we have a solar altitude angle of 87.9°.
The solar altitude angle for February 15 and June 21 are depicted in Figure 8-2 and Figure 8-3.
FIGURE 8-2. SOLAR ALTITUDE ANGLE (FEBRUARY 15TH)
FIGURE 8-3. SOLAR ALTITUDE ANGLE (JUNE 21ST)
To determine the time difference between solar noon and sunrise and sunset, it is useful to compute the
sunrise and sunset solar hour angles. To compute the time difference, all you need to do is multiply solar
hour angle at sunrise (or sunset) by four minutes per degree. The sunrise solar hour angle will be the
point at which the solar altitude angle equals 0.0. Using the information above, we can compute the solar
hour angle at the time of sunrise and sunset for February 15 as follows:
In this case, we know the solar altitude angle at the time of sunrise and sunset is 0. This gives:
SIN(0°) = SIN(21.3069°) × SIN(−13.3°) + COS(21.3069°) × COS(−13.3°) × COS(ω)
Solving for ω gives:
Sunrise Hour Angle = -84.7
Sunset Hour Angle = 84.7
This implies the sun rises approximately 5 hours and 39 minutes (computed as 84.7 times 4 minutes per
degree) before solar noon and sets approximately 5 hours and 39 minutes after solar noon. If we repeat
the calculations for June 21, we have the sun rising approximately 6 hours and 39 minutes before solar
noon and setting 6 hours and 39 minutes after solar noon.
What time is it? It is important to note that local solar time is not the same as local standard time. For
example, solar noon as measured by a sundial will not always occur at the same time every day. The
difference between the time of solar noon and noon standard time can be up to +/-16 minutes. This also
accounts for the asymmetry in the times of sunrise and sunset. The difference between local standard
time and local solar time is referred to as the equation of time.
The equation of time has two causes:
◼ Angle of Obliquity. The plane of Earth’s equator is inclined to the plane of Earth’s orbit
around the sun, and
◼ Elliptical Orbit. The orbit of Earth around the sun is an ellipse and not a circle.
Because of the angle of obliquity, solar time changes throughout the year as the sun moves above and
below the equator. The elliptical orbit means the distance between Earth and the sun is at minimum near
December 31 and is at maximum near July 1.
The equation of time can be written as follows:
Equationoftime = 9.87 × SIN(2B) − 7.53 × COS(B) − 1.5SIN(B)
360
B= × (d − 81)
365
Here,
The equation of time is measured in minutes and d is the number of days since the start of the year.
The equation of time accounts for the physical aspects of why solar noon is not the same as noon clock
time. The time correction factor (in minutes) accounts for the variation of local solar time within a given
time zone due to the longitudinal variations within the time zone and incorporates the equation of time.
TimeCorrectionFactor = 4 × (LocalStandardMeridian − Longitude) + EquationofTime
Here,
Longitude is set by the location of the solar panel. The factor of four minutes is based on Earth rotating
1 every four minutes. In the case where the local longitude equals the local standard meridian, the time
correction factor is simply equal to the equation of time.
Given these equations, we can then compute local solar time by adjusting the local standard time as
follows:
TimeCorrectionFactor
LocalSolarTime = LocalClocktime +
60
Using the example above, we can compute the time of local solar noon for February 15 as follows. Given
February 15 is the 46th day of the year, we have:
360
B= × (46 − 81) = −34.5205
365
Equationoftime = 9.87 × SIN(2 × −34.5205) − 7.53 × COS(−34.5205) − 1.5SIN(−34.5205)

= −14.57 Minutes
Given that the local prime meridian for Hawaii is 160, we can compute the time correction factors as:
TimeCorrectionFactor = 4 × (160° − (157.858°)) − 14.57 = −6.00 Minutes
This implies solar noon will take place at roughly 11:46AM local clock time, not adjusted for daylight
saving. From the perspective of forecasting solar generation, it is important to recognize this distinction
in time. Specifically, if the generation forecast is to be in local clock time, then adjustments for daylight
savings need to be made to the engineering estimates of solar generation to keep the engineering
estimates in line with metered generation.
Solar Flux. Now that we know where the sun is in the sky for any location and time, we can determine
how much solar energy will strike a horizontal surface (i.e., solar panel). Scientists know that solar
radiation strikes Earth’s outer atmosphere on average at a rate of 1367 Watts/m 2. This is commonly
referred to as the solar constant. To account for seasonal variation due to the annual cycle in the distance
between Earth and the sun, the actual solar radiation (solar flux) hitting Earth’s atmosphere on any day
of the year can be calculated as follows:
360d
SolarFluxd = SolarConstant [1 + 0.034COS ( )]
365.25
Here, d indexes the day of the year. A depiction of the annual cycle of solar flux is shown in Figure 8-4.
FIGURE 8-4. SOLAR FLUX
Computing Solar Insolation. The amount of solar energy hitting a horizontal plane on Earth’s surface for
any location and time of day can then be computed as follows:
i i
SolarInsolationd = SolarFluxd × COS (∅d )
Here, the time interval of the day (d) is indexed by (i) and ∅ is the solar zenith angle. The solar zenith
angle is computed as 90 – the solar altitude angle. Given this, we can rewrite the equation for solar
energy that hits a horizontal plane (i.e., a solar panel) on Earth’s surface for any location and time as
follows:
i
SolarInsolationd = SolarFluxd × COS(αid − 90o )
Using the example data from above, the amount of solar insolation at solar noon on February 15 is 1,152
Watts/m2 and 1,321 Watts/m2 on June 21. The pattern of solar insolation for the weeks of February 14
and June 20 are depicted Figure 8-5 and Figure 8-6.
FIGURE 8-5. SOLAR INSOLATION WEEK OF FEBRUARY 14TH
FIGURE 8-6. SOLAR INSOLATION WEEK OF JUNE 20TH
Engineering Model of Solar Generation. Given an estimate of how much solar energy is delivered to
Earth’s surface for any location and time, we can construct an engineering estimate of how much
electricity is generated using the following relationship.
i i i
SolarGenerationd = SolarInsolationd × SolarPanelCapacityd × SolarPanelEfficiencyd
Here,
i
SolarGenerationd is the electricity generated on day (d)time interval (i) in Watts Out
i
SolarInsolationd is the solar energy delivered to the panel in Watts In⁄ 2
m
SolarPanelCapacityd is the installed capacity in m2
i
SolarPanelEfficiencyd is the solar panel efficiency in Watts Out⁄Watts In
To help fix ideas, assume solar insolation at noon of June 12 is 1,000 Watts/m2, installed capacity is 2.5
kW, and the solar panel efficiency is 15%. If we assume 150 Watts/m2 for the average panel size, we can
say the installed capacity is approximately 16.66 m2 (computed as 2500 Watts over 150 Watts/m2). With
these numbers, we have:
SolarGeneration = 2500 Watts = 1000 Watts⁄ 2 × 16.667m2 × 0.15

m
Factoring in Temperature Impacts. The hotter a solar panel becomes, the less efficient it is in converting
sun energy into useful electricity. This leads to the following adjustment to the solar panel efficiency.
i
SolarPanelEfficiencyd = RatedEfficiency × (1 − [MAX (Tempid − ThresholdTemp, 0) × ∇])
Here,
i
SolarPanelEfficiencyd is the solar panel operating efficiency for day (d)time interval (i)
RatedEfficiency is the peak output efficiency
Tempid is the temperature of the panel
ThresholdTemp is the temperature above which the efficiency of the panel degrades
∇ is the rate of efficiency degradation per degree (0.48%℃ or 0.27%℉).
Factoring in Cloud Cover. Cloud cover lowers the output of a solar panel by reducing the amount of solar
energy reaching the panel. While the exact impact of cloud cover on a location is difficult to measure, we
can assume that at 100% cloud cover, only about 20% of the solar flux reaches Earth’s surface. That is the
cloud albedo is 80% at 100% cloud cover. We can use this information to adjust the engineering estimate
of solar insolation by incorporating the following relationship.
i i
CloudAlbedod = CloudCoverPercentaged × 80%
i i
SolarInsolationd = SolarFluxd × COS(αid − 90o ) × (1 − CloudAlbedod )
The final engineering model of solar generation can then be written as follows:
i
SolarGenerationd
i i
= SolarInsolationd × (1 − CloudAlbedod ) × SolarPanelCapacityd
× RatedEfficiency × (1 − [MAX (Tempid − ThresholdTemp, 0) × ∇])
In practice, lining up cloud cover with the metered generation is difficult. In many cases, the only available
cloud cover values are for the closest weather station, which may be miles away. This will lead to poor
model fits and skewed values for the adjustment parameter. To mitigate this impact, it is best to estimate
the model using only days that were cloudless or very nearly cloudless. With these selected days, the
adjustment parameter is then free to synchronize the model to observed generation output. The impact
of cloud cover will then be given by the cloud albedo assumption of 80%.
8.1 INCORPORATING THE IMPACT OF SUNSHINE ON ELECTRICITY DEMAND

A side benefit of developing a model of solar generation is that it is then possible to incorporate the impact
of solar insolation on heating and cooling loads. On a clear sunny day, average indoor temperatures can
rise by about 9C (16F) at around solar noon. This leads to following relationships.
0.009 ℃
Indoor Temperature Increase(℃) ≅
Watt⁄
m2
0.017℉
IndoorTemperatureIncrease(℉) ≅
Watt⁄
m2
Given estimates of solar insolation, we can then adjust the temperature data used in the energy
forecasting models using the above relationship. That is, if the temperature forecast is 25C and it is a
clear sunny day in the middle of summer, the effective indoor temperature is more like 34C. It is effective
temperature we want to use in the model. The effective temperatures then form the basis of computing
heating degree day and cooling degree day model variables.
8.2 INCORPORATING THE IMPACT OF SOLAR PV GENERATION ON LOADS

According to the European Photovoltaic Industry Association (EPIA), there is over 102 GW of installed solar
generation capacity worldwide.23 The EPIA estimates that approximately 31 GW of solar generation
capacity was installed in 2012, which was about 15% higher than the estimated 30.4 GW installed in 2011.
In the EPIA report, Europe remains the world leader in installed solar capacity, accounting for
approximately 70 GW of the 2012 total installed capacity of 102.2 GW worldwide. Together, China (8.3
GW), USA (7.8 GW), Japan (6.9 GW), Australia (2.4 GW) and India (1.2 GW) account for approximately 26%
of the world solar capacity. The Solar Energy Industries Association estimates that by the end of 2013,
there will be a new solar project installed in the USA every four minutes.24
The worldwide statistics represent both utility solar installations, where the electricity generated feeds
directly to the grid, and non-utility installations (referred to elsewhere as embedded solar generation),
where the generation offsets on-site consumption. From the perspective of load forecasting, the non-
utility installations are of critical interest since these installations directly impact measured load. Since
short-term load forecast models are based on measured load, the following examples illustrate how
embedded solar generation can impact a load forecast. In these examples, assume the demand for
electricity at noon, regardless of how it is sourced, is 1,300 KW.
No Embedded Solar Generation. Under this first example, there is no embedded solar generation. As a
result, metered demand, which is the load that a system operator sees, equals actual demand. That is,
Metered DemandNoon
d = DemandNoon
d
23 Global Market Outlook for Photovoltaics 2013-2017, European Photovoltaic Industry Association
24 www.seia.org
Now consider developing a forecasting model of demand for electricity. If we have a year’s worth of
measured demand, we could fit the following regression model.
Metered DemandNoon
d = β1 Constant Noon
d + eNoon
d
Here, metered demand is regressed on a variable that takes on a value of 1.0 for every observation. In
this case of no embedded solar generation, the estimated coefficient on the constant variable will be
equal to the average metered demand, or 1,300 KW. As a result, the forecast from the estimated model
will provide a forecast of actual demand for electricity.
With Constant Embedded Solar Generation. Now assume that 100 KW of embedded solar generation is
produced every day at noon. We can rewrite metered demand as follows:
Metered DemandNoon
d = DemandNoon
d − SolarGenerationNoon
d
Because metered demand will be 100 KW lower, the estimated coefficient from regressing the new lower
metered demand on the constant variable will lead to a different estimated coefficient. Specifically, the
estimated coefficient will be equal to 1,200 KW, which is the new lower average metered demand. In this
case, the resulting forecast model will under predict actual demand for electricity by 100 KW.
From the perspective of system operations, the fact that the forecast model under predicts actual demand
for electricity is not a concern, since they can rely on the 100 KW of solar generation being there all the
time.
With Volatile Embedded Solar Generation. Unfortunately, solar generation is not this reliable. We can
introduce uncertainty into the amount of solar generation that is available by assuming that half the time
cloud cover is thick enough to drive the solar generation to 0. The other days are perfectly clear and the
solar generation is 100 KW. This means half the time the load is 1,200 KW and the other half of the time
the noon load is 1,300 KW. If the cloudy and sunny days are equal in number, the estimated coefficient
will be equal to the average load. Specifically, the estimated coefficient on the constant variable will be
equal to 1,250 KW.
The variability in solar generation means that the statistical model that was fitted to metered demand will
under predict loads on cloudy days and over predict loads on sunny days. From the perspective of system
operations, this means they will need spinning reserves available to cover the load variability and
subsequent load forecast error introduced by the volatile embedded solar generation.
Accounting for Average Solar Generation. Is it possible to improve the accuracy of the load forecast?
Assume we can obtain a perfect forecast of cloud over, and hence we can accurately predict how much
solar generation is going to be available tomorrow. It seems reasonable to adjust the baseline load
forecast with the forecast of solar generation. Specifically, our forecast of actual demand can be
constructed as:
DemandNoon
d = Predicted Metered DemandNoon
d + PredictedSolarGenerationNoon
d
On a sunny day, the forecast of demand will be equal to the predicted value of 1,250 KW from the model
of metered demand plus 100 KW of solar generation, or 1,350 KW. On a cloudy day, the forecast of
demand will be equal to the predicted value of 1,250 KW from the model of metered demand plus 0 KW
of solar generation. Unfortunately, both forecasts of actual demand are in error. On a sunny day, this
approach over predicts actual demand by the amount of 50 KW, which is equal to the average amount of
solar generation that took place over the period that was used to estimate the coefficient of the model of
metered demand. On a cloudy day, this approach under predicts by 50 KW, which again is the average
amount of solar generation that took place over the period that was used to estimate the coefficient of
the model of metered demand.
In the current example, the average solar generation over the estimation period was 50 KW. As a result,
the estimated coefficient of the metered demand model embodies this average. Since 50 KW is already
accounted for by the metered demand model, we need to add the difference between the predicted solar
generation for the day in question and the average solar generation already accounted for by the model
coefficient. This results in the following calculation:
DemandNoon
d = Predicted Metered DemandNoon
d + SGdNoon + (AvgSGNoon − SGdNoon )
In the above equation, SG represents the actual level of solar generation on day (d) at noon. AvgSG is the
average solar generation over the model estimation period at noon. The third part in the above equation
corrects for how much of the current day solar generation is already embedded in the model coefficients.
Using the current example where AvgSG equals 50 KW, we have the two cases of a sunny day when SG =
100 KW and a cloudy day when SG = 0 KW. The calculations are:
Sunny Day
DemandNoon
d = 1,250 + 100 + (50 − 100) = 1,300
Cloudy Day
DemandNoon
d = 1,250 + 0 + (50 − 0) = 1,300
These examples illustrate the potential for additional forecast error arising from embedded solar
generation. It is important to recognize that the coefficients of the short-term forecast model embody
the average impact of solar generation on loads. This means that in areas where there has been significant
penetration of embedded solar generation the short-term forecast will tend to under forecast loads on
cloudy days and over forecast loads on sunny days. Ignoring the problem is not an option. There are three
practical approaches to dealing with the impact of embedded solar generation.
◼ Error Correction. The error correction approach implements what many system operators do
initially when faced with the problem of solar PV generation. Namely, they make ex post
adjustments of the load forecast to account for forecasted values of solar PV generation. On
sunny days, the adjustment is to lower the load forecast and on cloudy days, the load forecast is
adjusted upward. The key advantage of the error correction approach is the existing load forecast
model can continue to be used without any changes. All that is needed is a means of forecasting
solar PV generation.
◼ Reconstituted Loads. Under the reconstituted loads approach, the historical time series of
measured load is reconstituted by adding back estimates of solar PV generation. The load forecast
model is then re-estimated against the reconstituted loads. The subsequent reconstituted load
forecasts are then adjusted ex post by subtracting away forecasts of solar PV generation to form
a forecast of measured loads. The advantage of this approach is any inherent bias that might be
imposed on the estimated coefficients of a model of measured loads is controlled for by
estimating the model coefficients against a time series of demand for power regardless of how it
is sourced. The disadvantage is a historical time series of solar PV generation needs to be
developed and maintained to estimate the load forecast model coefficients. Further, this
approach assumes that the historical solar PV generation time series is accurate. This may not
necessarily be true, in which case this approach places too high of a weight on the solar PV
generation values.
◼ Model Direct. Under this approach, the weight placed on the solar PV generation data is
estimated directly by including these data as an explanatory variable in the load forecast models.
The estimated coefficient on the solar PV generation variable is the weight. Also, in principle, by
including solar PV generation as an explanatory variable, the coefficients on the remaining
explanatory variables should not be biased. This approach also provides a direct forecast of
measured loads that accounts for solar PV generation, thus avoiding any ex post processing of the
load forecast. Like the reconstituted load approach, this approach requires developing and
maintaining an historical time series of solar PV generation.

A Practioners Guide To Short Term Load Forecast Modeling

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Practioners Guide To Short Term Load Forecast Modeling

Uploaded by

Copyright:

Available Formats

A Practitioner’s Guide to

Short-Term Load Forecast

July 21, 2018

1.1 PURPOSE ............................................................................................................................................................................................ 1-2

2.1 SOURCES OF HISTORICAL LOAD DATA ................................................................................................................................................ 2-1

3.1 OUTLIER DETECTION ........................................................................................................................................................................... 3-2

4.1 LOAD FORECAST ALGORITHMS ........................................................................................................................................................... 4-1

5.1 CALENDAR CONDITIONS ..................................................................................................................................................................... 5-3

Forecasting Handbook Table of Contents|i

6.1.1 Model Template: Constant ................................................................................................................................................................ 6-2

8 INCORPORATING BEHIND THE METER SOLAR PV GENERATION ................................................................................. 7-1

8.1 INCORPORATING THE IMPACT OF SUNSHINE ON ELECTRICITY DEMAND............................................................................................. 8-12

Figure 2-2. Commonwealth Edison All Data View................................................................................................................................................ 2-12

Forecasting Handbook Table of Contents|ii

Figure 2-12. Commonwealth Edison Two-Day View: Storm ................................................................................................................................. 2-19

Figure 4-1. Weather Response of Loads to Temperatures .................................................................................................................................... 4-6

Figure 4-2. Binary split of the Weather Response................................................................................................................................................. 4-7

Figure 4-8. Exponential Smoothing Weights (𝜶 = 𝟎. 𝟓) ................................................................................................................................ 4-24

Figure 4-9. Commonwealth Edison: Average Daily Load ................................................................................................................................... 4-28

Forecasting Handbook Table of Contents|iii

Figure 4-25. Fitting a Line Chart View ................................................................................................................................................................. 4-44

Figure 4-29. An Example of the Feed Forward Calculations................................................................................................................................ 4-66

Figure 4-30. Step Activation Function .................................................................................................................................................................. 4-67

Forecasting Handbook Table of Contents|iv

Figure 5-2. Commonwealth Edison: Day-of-the-Week binary Variables .............................................................................................................. 5-5

Figure 5-6. Commonwealth Edison: Monthly Binary Variables........................................................................................................................... 5-10

Figure 5-8. Commonwealth Edison: Seaonal Binary Variables ........................................................................................................................... 5-12

Figure 5-10. Commonwealth Edison: Month/Day-of-the-Week Interaction binary Variables............................................................................. 5-18

Forecasting Handbook Table of Contents|v

Figure 8-2. Solar Altitude Angle (February 15th).................................................................................................................................................... 8-5

Figure 8-3. Solar Altitude Angle (June 21st) ........................................................................................................................................................... 8-6

Figure 8-4. Solar Flux ............................................................................................................................................................................................ 8-9

Figure 8-5. Solar Insolation Week of February 14th ............................................................................................................................................ 8-10

Figure 8-6. Solar Insolation Week of June 20th ................................................................................................................................................... 8-10

Table 2-4. CommonWealth Edison Data Summary Statistics ................................................................................................................................. 2-8

Table 2-5. CommonWealth Edison Monthly Data Summary Statistics.................................................................................................................... 2-9

Table 4-1. Exponential Smoothing Weights (𝜶 = 𝟎. 𝟓).................................................................................................................................. 4-23

Forecasting Handbook Table of Contents|vi

Prior to diving into the specifics of each framework, it is helpful to

A Practitioner’s Guide to Short-term Load Forecast Modeling Introduction|1-1

A Practitioner’s Guide to Short-term Load Forecast Modeling Introduction |1-2

A Practitioner’s Guide to Short-term Load Forecast Modeling Introduction |1-3

A Practitioner’s Guide to Short-term Load Forecast Modeling Introduction |1-4

2.1 SOURCES OF HISTORICAL LOAD DATA

2.1.1 Handbook Data

2.2 WAYS TO LOOK AT HISTORICAL LOAD DATA

2.2.1 Tabular Review

3 The source for the hourly load data is (http://www.pjm.com/markets-and-operations/ops-analysis/historical-

2.2.2 Graphical Review

FIGURE 2-1. DAYTON POWER & LIGHT ALL DATA VIEW

◼ Month View. This type of view reveals:

FIGURE 2-5. COMMONWEALTH EDISION MONTH VIEW: FEBRUARY 2017

◼ Weekly View. This type of view reveals:

FIGURE 2-9. COMMONWEALTH EDISION WEEK VIEW: FEBRUARY 2017

◼ Day View. The type of view reveals:

FIGURE 2-12. COMMONWEALTH EDISON TWO-DAY VIEW: STORM

A Practitioner’s Guide to Short-term Load Forecast Modeling Data Cleaning|3-1

3.1 OUTLIER DETECTION

3.1.1 Visual Inspection

A Practitioner’s Guide to Short-term Load Forecast Modeling Data Cleaning |3-2

3.1.2 Validation Tests

3.2 CLEANING APPROACHES

A Practitioner’s Guide to Short-term Load Forecast Modeling Data Cleaning |3-3