Professional Documents
Culture Documents
Case 1 Blog
Case 1 Blog
We have 10 years of historical data from 2010-2019 of electricity consumption and prices
In [5]: data[0].head()
Out[5]:
date cons p
https://tianyi.io/chicago1nb.html Page 1 of 23
case 1 blog 4/1/21, 11:11 PM
In [86]: plt.xlabel("Day")
plt.ylabel("Cumulative Consumption (KWh)")
plt.title("Consumption/day in 2010")
plt.plot(data[0].cons)
plt.show()
Price/day in 2010
https://tianyi.io/chicago1nb.html Page 2 of 23
case 1 blog 4/1/21, 11:11 PM
In [85]: plt.xlabel("Day")
plt.ylabel("Price ($)")
plt.ylim(0, 4.8)
plt.axvline(180, c="red")
plt.axvline(242, c="red")
plt.axvline(302, c="red")
plt.axvline(364, c="red")
plt.plot(data[0].p[1:])
plt.show()
https://tianyi.io/chicago1nb.html Page 3 of 23
case 1 blog 4/1/21, 11:11 PM
https://tianyi.io/chicago1nb.html Page 4 of 23
case 1 blog 4/1/21, 11:11 PM
In [10]: gas = []
for year in data:
row = year[(year.p < 3) & (year.p > 2)]
gas.append((row.cons - (3 - row.p) * row.dailycons).values[0])
plt.title("Gas capacity/Year")
plt.xlabel("Year")
plt.ylabel("Gas capacity (KWh)")
plt.plot(range(2010, 2020), gas)
plt.show()
https://tianyi.io/chicago1nb.html Page 5 of 23
case 1 blog 4/1/21, 11:11 PM
In [11]: coal = []
for i, year in enumerate(data):
row = year[(year.p < 3.01) & (year.p > 3)].iloc[0]
c = row.cons - (row.p - 3) * 1000 - gas[i]
coal.append(c)
plt.title("Coal capacity/Year")
plt.xlabel("Year")
plt.ylabel("Coal capacity (KWh)")
plt.plot(range(2010, 2020), coal)
plt.show()
It looks like there is a linear growth in natural gas and coal capacity over the years. Perhaps we can model
them with linear models.
https://tianyi.io/chicago1nb.html Page 6 of 23
case 1 blog 4/1/21, 11:11 PM
Gas production
m: 13.419965801206784 c: 1297.330695128393
2020 mean: 1431.5303531404609
Unbiased estimate of variance: 45.99518922123725
Coal production
m: 3.0905406916305362 c: 461.8418539140977
2020 mean: 492.7472608304031
Unbiased estimate of variance: 57.38547189035349
Modelling Consumption
Once we have a good idea of gas/coal capacity, we have to investigate consumption patterns. Specifically,
we want to answer the following questions:
https://tianyi.io/chicago1nb.html Page 7 of 23
case 1 blog 4/1/21, 11:11 PM
In [16]: plt.title("Consumption/Day")
plt.xlabel("Day")
plt.ylabel("Consumption (KWh)")
plt.plot(cons)
plt.show()
https://tianyi.io/chicago1nb.html Page 8 of 23
case 1 blog 4/1/21, 11:11 PM
0.1252364948706893
https://tianyi.io/chicago1nb.html Page 9 of 23
case 1 blog 4/1/21, 11:11 PM
We have a strong suspicion that the underlying distribution is exponential. But we still have to confirm that
there aren't any periodic patterns in the data.
https://tianyi.io/chicago1nb.html Page 10 of 23
case 1 blog 4/1/21, 11:11 PM
https://tianyi.io/chicago1nb.html Page 11 of 23
case 1 blog 4/1/21, 11:11 PM
https://tianyi.io/chicago1nb.html Page 12 of 23
case 1 blog 4/1/21, 11:11 PM
10 0.045732008518092615
14 -0.03654910372937772
19 -0.03024981752334226
24 -0.03749542551664027
43 0.03193238383270123
47 -0.031447572727001556
https://tianyi.io/chicago1nb.html Page 13 of 23
case 1 blog 4/1/21, 11:11 PM
We'll ignore the June contract for now (it's a little more mathematically complicated to price).
Historically, these 3 contracts expire only after natural gas capacity is used up, and therefore their prices
would be $3.00 + $.001 * amount of electricity usage exceeding coal capacity. Let N be the number of days
until the contract expires, G be the expected natural gas capacity (KWh), and C be the expected coal
capacity (KWh), E be the amount of electricity already consumed (KWh).
Then, the expected value of the contract is 3.00 + 𝑚𝑎𝑥(0, .001 × ( 1 × 𝑁 + 𝐸 − 𝐺 − 𝐶)), where 𝜆 =
𝜆
0.1252364948706893 was computed earlier. Here's our predictive formula in action for the December
contract in 2015.
plt.show()
https://tianyi.io/chicago1nb.html Page 14 of 23
case 1 blog 4/1/21, 11:11 PM
G = 1364.43, C = 477.29
As the diagram shows, the model overestimates the contract price initially, but approached the true value
closer to the contract expiry date. This is mainly because consumption decreased significantly over the later
half of the year, bringing down the final electricity price. This seemed good enough for our purposes.
We used the mean value of the regression prediction for gas and coal capacity in the previous example, but
it could result in ridiculous situations where the predicted capacity is lower than the capacity we have
already observed. If we predicted 1300KWh of gas capacity but have already observed 1320KWh of gas
being used, then we should adjust our expectations for the amount of total gas capacity.
Since we assumed a normal distribution for gas capacity, N(𝜇, 𝜎 2 ), we can change it to a truncated normal
distribution (https://en.wikipedia.org/wiki/Truncated_normal_distribution) by cutting off the impossible lower
tail. This is basically combining a normal prior with an uniform posterior starting from the observed capacity.
https://tianyi.io/chicago1nb.html Page 15 of 23
case 1 blog 4/1/21, 11:11 PM
The June contract is different from the other contracts. Historically, it either settled to $2 or $3, depending on
whether natural gas capacity was exhausted by expiry, with 2 of 10 cases expiring at $2. Our previous
approach would not be able to accurately price the June contract at all!
There are 2 ways around this: using the Central Limit Theorem (CLT) to assume normality in the distribution
of consumption, or to use the Erlang distribution to compute the probability exactly using marginalization
and integration. We went with the latter, but I'll show both methods and the differences in their predictions.
From 10 years of consumption data, we have mean consumption of 7.98 and variance of 66.95. If there are
N days till the expiry of the June contract (and N sufficiently large), and X is the amount of electricity
consumed during the N days, then by the CLT X ~ N(7.98 N, 66.95 N). On the other hand, remaining gas
capacity G ~ N(𝜇, 𝜎 2 ).
Probability of settling at $3, P = P(G-X < 0) = norm.cdf(0, loc=𝜇 - 7.98N, scale=𝜎 2 + 66.95N)
https://tianyi.io/chicago1nb.html Page 16 of 23
case 1 blog 4/1/21, 11:11 PM
https://tianyi.io/chicago1nb.html Page 17 of 23
case 1 blog 4/1/21, 11:11 PM
Suppose there are N days until the contract expiry. The amount of electricity used in one day ~ Exp(𝜆 ).
What's the distribution of the amount of electricity, E, used in N days?
It turns out that E is an Erlang distribution with k = N and rate = 𝜆 . If you want to learn more about the
derivation, you can read this resource (https://towardsdatascience.com/sum-of-exponential-random-
variables-b023b61f0c0f).
Here's what the distribution looks like for various values of N, compared to the normal distribution. For large
values for N, the normal distribution closely approximates the Erlang distribution, but at smaller values of N
the difference diverges significantly, motivating us to compute the probabilities more accurately.
https://tianyi.io/chicago1nb.html Page 18 of 23
case 1 blog 4/1/21, 11:11 PM
https://tianyi.io/chicago1nb.html Page 19 of 23
case 1 blog 4/1/21, 11:11 PM
https://tianyi.io/chicago1nb.html Page 20 of 23
case 1 blog 4/1/21, 11:11 PM
While this integral looks daunting to compute mathematically, we can simply compute it numerically to
enough precision. We reduce the infinite range (-inf, inf) to (𝜇 - 5𝜎 , 𝜇 + 5𝜎 ), and take k = 10000 evenly
spaced points in this range. With this, our integral becomes a summation which we can compute via a dot
product.
𝜇+5𝜎
P(G - E < 0) = ∑𝜇−5𝜎 𝑃 (𝑔 − 𝐸 < 0)𝑃 (𝐺 = 𝑔)
https://tianyi.io/chicago1nb.html Page 21 of 23
case 1 blog 4/1/21, 11:11 PM
The CLT prediction seems to be good enough for most purposes. Zooming into the last few days, we see
that the prices diverge by a few cents, still significant but in our competition's context isn't going to make a
big difference in profitability.
https://tianyi.io/chicago1nb.html Page 22 of 23
case 1 blog 4/1/21, 11:11 PM
Fairs = computed
Before the competition, we put a lot of effort into analyzing historical data and computing our fairs, so we
thought that a lot of our success came from these good fairs. Looking back, these fairs may have put us
above the poor or middle performing teams, but didn't differentiate us from the other top teams. Read on for
what really mattered!
https://tianyi.io/chicago1nb.html Page 23 of 23