You are on page 1of 31

1

Season 11 Taric Jungle Metrics


A League of Legends Research Paper
James Booth
November 23, 2021
2

Introduction
As a League of Legends player since Season 3 and an off meta enjoyer, I spent a large amount of
time watching other off meta players on YouTube. Off meta means strategies and playstyles that are
different than the common practice. In League of Legends, this may consist of playing a toplaner, such as
Malphite in support, a jungler like Skarner in top, or other strategies. These strategies are frequently, but
not always, sub-optimal and may even be considered trolling by many. As such it is of interest when a
player manages to make one of these picks work in ‘high elo’, frequently qualified as Diamond 2 or
above. This involves some discussion about what quantifies making a pick work.
For the sake of this paper, a viable off meta strategy is one with a win rate above 45% in over 100
games but particularly viable at or above 50%. Repeated data samples of League of Legends games have
shown that after the first 100 games on a particular champion, variation drastically decreases. As such,
game populations at or above 100 games will be denoted as a Significant Game Size (SGS). Furthermore,
a win rate of at least 45% over a significant game size indicates, at worst, 10 more losses than wins. This
may be due to one or two bad days rather than non-viability of a particular build or strategy and even
small variations, 5% or less, may bring this win rate up to a more respectable metric.
The aim of this paper is to examine the games of one player, lightrocket2. Lightrocket2 is a North
American jungle player who peaked Grandmaster in Season 10 and twice in Season 11 while consistently
holding the rank of Master over 2,000 games. His main champion is Taric, a support who he has played in
jungle for over two thousand games. The goal of this paper is to look at his total games, through
November 15, 2021, and to examine what works and what doesn’t about his strategy while concluding
with multiple predictive models based on qualitative and quantitative factors. Data was obtained from
u.gg and op.gg from leagueofgraphs.com.

Abstract
This section will cover the basics of lightrocket2’s strategy while detailing what metrics will be
examined in throughout. First, we begin with an overview of Taric and the mechanics of the champion.
Taric is a durable, low damage fighter who excels in protecting and enabling his teammates while
providing considerable crowd control with his E, Dazzle. Taric jungle works by taking advantage of
Taric’s passive, which grants him a large amount of attack speed while reducing ability cast times. This is
in exchange for a large amount of mana. As a support, Taric rarely utilizes this passive due to low range
and mana struggles in lane, instead defending against all-in’s and enabling engages with Dazzle, Q, and
R. As a jungler, Taric is granted high mana regeneration allowing him to repeatedly use his passive and
clear camps with high attack speed. The combination of the attack speed steroid, low ability cooldowns,
and good base stats also allows Taric to fight many meta junglers in 1v1 battles and contest early
objectives such as scuttlecrab, dragon, and buffs.
In this paper, we will begin with an overview of lightrocket2’s Taric jungle statistics. Statistics
included in the overview will be total win rate, rune specific win rates, item specific win rates, item and
rune specific win rates, matchup win rates, and win rates by game time. This will lead to a more specific
discussion which will involve win rate based on jungle matchups, win rate based on game time, and win
rate by summoner spell. Finally, the paper will conclude with progressively more complex regression
methods applied to multiple Taric jungle datasets to create a functional predictive model for Taric jungle.
This section will also include multiple variables modeled against each other to illustrate relationships as
part of an effort to construct such a model.
As stated, we begin with an overview of lightrocket2’s Taric Jungle in Season 11.
3

Performance Review
In this paper I will aim to propose multiple models of game prediction which I believe may
appropriately predict the results of lightrocket2’s games based on qualitative and quantitative factors. The
models will be presented based on linear, logistic, tree, and vector-based analysis. At the end of that
section, models will be compared and evaluated against each other based on model accuracy, complexity
vs simplicity, and interpretability. To create such models, we must first review lightrocket2’s games in
Season 11 after which we will create simple linear models and examine relationships between variables.
Let us begin with the statistics.

Overview Part I
Below we provide an overview of Lightrocket2’s Taric games in each role by total games played
in each with win rate listed in the fourth column. Games analyzed were Ranked Solo-Duo in Season 11.
Data was rounded to two decimal places.
Table A-1

ROLE Wins Losses Total Games Winrate Net LP


Jungle 963 939 1902 50.63% 384
Support 58 59 117 49.57% -16
ADC 2 3 5 40.00% -16
Top 0 1 1 0.00% -16

The coloring scheme corresponds to win rates. Darker shades of blue indicate a win rate increasingly > 50%
whereas deeper shades of orange indicate a win rate further and further below 50%. As seen in Table A-1, a win
rate of 50.71% corresponds to a light blue shading and a win rate of 49.57%, a pale orange.

For net LP change, green indicates an increase in LP and red indicates a decrease. Net LP Change is estimated by
assuming a +16/-16 for every ranked game played (stable MMR).

As observed in Table A-1, lightrocket2 has played the vast majority (93.89%) of his Taric games
in jungle. In this role, he has a 50.71% win rate. In 5.81% of his games, he has played support with a
49.57% win rate and the remaining six games were divided between Taric ADC and Taric top, losing the
majority of those games. Lightrocket2’s highest win rate is with Taric jungle as he does slightly worse
with Taric support and does not significantly play Taric ADC or top. While his highest win rate is in
jungle, it is only slightly above a coin toss. Is it viable? We refer to two factors introduced at the
beginning of this paper: SGS and competitive win rates.
According to u.gg, in Master tier, the lowest win rate jungler is Rumble with a 45.75% win rate.
In comparable elos of Diamond and Grandmaster, the lowest win rate junglers hover around a similar
mark and never drop below 45%. As such, 45% is set as our established benchmark for viability of a
jungling strategy with SGS considered. Lightrocket2’s Taric jungle win rate of 50.71% puts him firmly in
the middle tier of junglers at Master elo with win rates competitive Zac, Elise, and Fiddlesticks, all
recognized as meta junglers. As such, lightrocket2’s strategy is competitive regarding win rate in the elo
he is in.
Is it competitive in terms of sample size? Clearly, the answer is no due to a lack of Taric jungle
players. However, considerations must be made regarding the number of games played by a single player.
Lightrocket2 has played almost twice the number of Taric jungle games all season as Master Zac jungle
players have played in one patch. When one considers SGS and the vast decrease in variation with
samples above 100, it is unreasonable to speculate his win rate would significantly (> 5% swing) change
4

with further games. With a large sample size of Taric jungle games as a single player at a high level of
play, lightrocket2 has established Taric jungle as a viable strategy.
Now that the viability of Taric jungle at a high level of play has been established we begin with a
discussion of aspects of the strategy. Lightrocket2 plays Taric jungle with a variety of item and rune
combinations, the purpose of which is to enhance the power of the pick in different ways depending on
the game. In the following we provide an overview of lightrocket2’s runes, items, rune and item
combinations, and so on for every game played this season.
We begin with the runes.
Table A-2

RUNE Wins Losses Total Games Played Winrate Rune Ranking


Glacial Augment 343 343 686 50.00% A-
Press the Attack 340 272 612 55.56% A+
Phase Rush 119 121 240 49.58% B+
Conqueror 83 98 181 45.86% B-
Unsealed Spellbook 41 53 94 43.62% N/A
Omnistone 41 48 89 46.07% N/A
Guardian 0 1 1 0% N/A
Hail of Blades 0 1 1 0% N/A

Rune Ranking is a tiered system ranking every rune from D thru S+ based on win rate for => 100 ranked games
played. Win rates for each ranking are shown below.

Table A-3

if winrate is…
Ranking
60%+ S+
58-59.99% S
56-57.99% S-
54-55.99% A+
52-53.99% A
50-51.99% A-
48-49.99% B+
46-47.99% B
44-45.99% B-
42-43.99% C+
40-41.99% C
Below 40%D
The highest rune ranking is S+ and the lowest is D for greater than 60% or lower than 40% win rates respectively.
Rankings in between that range are divided into 1.99% ranges. Shades of orange indicate a win rate below 50% and
ranks between D and B+. Shades of blue indicate a win rate equal to or above 50% and ranks between A- and S+.

Table A-2 provides information for every rune lightrocket2 has used for Taric jungle in Season 11.
His most used rune is Glacial Augment with 686 games (36.07% of total) played with a 50.00% win rate.
His second most used rune is Press the Attack (PTA) with 612 games (32.17% of total) played with a
55.56% win rate. The remaining six runes all saw much lower play rates and with lower win rates. Phase
Rush is on par with Glacial Augment in terms of its win rate but with less than half the games. Conqueror
5

sees even less use and has a 9.81% lower win rate than PTA, lightrocket2’s best rune. Omnistone and
Unsealed Spellbook are niche runes which lightrocket2 has used on occasion but not frequently enough to
judge their efficacy as they are both below SGS. This is especially true for Omnistone as its win rate is
higher compared to the numbers for Usealed Spellbook.
As mentioned lightrocket2’s best rune is PTA as it has both a large number of games and a high
win rate. This inspires confidence that it is a good pick for Taric jungle compared to other options. It is not
hard to see why. PTA offers a large amount of bonus damage upon hitting an enemy with 3 basic attacks.
Due to Taric’s efficient abuse of his passive, he can reliably use PTA. This may be particularly valuable in
early skirmishes which can set the tone for the remainder of the game. For example, first bloods, securing
the first kill in a game, are shown to lead to a win 59.2% of the time according to leagueofgraphs. A first
blood is worth 400 gold, a large lead early in the game, which can turn into a larger lead and a win.
On the other hand, many of the other runes lack PTA’s early combat power. Glacial augment is a
utility rune that slows enemies for a short period. Phase rush is a mobility rune that offers a large burst of
movement speed and slow resistance. Omnistone is a utility rune that offers a random choice of rune
depending on the situation and Guardian is a pure defense rune that excels at keeping allies alive. Indeed,
the only runes that offer comparable combat power to PTA are Conqueror and Guardian, the merits of
which we aim to explore below.
Conqueror is a damage stacking and healing rune which grants increasing attack damage (or ability
power) based on how many times it has been stacked. In the early stages of the game Conqueror will provide
1.2 attack damage per stack, stacking up to 12 times for up to 14.4 bonus attack damage at lvl 1. Considering
that Taric enjoys extended fights, why is lightrocket2’s Conqueror win rate so low and his PTA win rate so
high? Based on the merit of the rune alone, it requires 6 auto attacks for Taric to fully stack Conqueror and
gain the maximum damage and the healing of the rune, versus 3 for PTA. Taric has attack range of 150, is
immobile, and has no long-range abilities. As a result, it is not difficult for many champions to kite Taric
making Conqueror hard to stack and this can greatly undermine the value of the rune. Hail of Blades offers
110% bonus attack speed for the first three autos, which looks strong on paper though it cannot be quantified
due to a lack of games.
Clearly, PTA is established as the best offensive choice for Taric jungle. However, it is important
to consider what factors may be influencing rune win rates aside from the runes themselves. This will be
explored later in this paper but for the time being we will suggest several factors. The first and most
important factor are the circumstances in which PTA is being picked. Is it being picked into hard jungle
matchups or easier ones? What mythic item(s) was it picked with? Was it picked in many instances where
lightrocket was not matched with auto filled players versus ones where he was?
To clarify for the reader, autofill refers to an instance where a player has been ‘filled’ into a position
that they did not queue for by the league of legends matchmaking system. For example, someone who has
queued the support role but was placed into mid when they did not queue for it, may play worse than
someone who was not auto filled. This is a factor that can lead to losses. Some of these factors are difficult
to quantify and, as a result, will not be discussed in this paper. However, many other factors, such as mythic
items, will be quantified, and can be far more related to winning or losing than even a higher win rate rune
like PTA. Next, we discuss items, listed in Table A-4 on the following page.
Lightrocket2 has used, at varying points in Season 11, 15 mythic items with Taric jungle. His most
picked mythic, by far, is Shurelyias with 43.64% of his 1902 games using this item. Stridebreaker comes
in second at 21.66% of games with Divine Sunderer and Galeforce a distant third and fourth respectively.
All other items are picked at levels far below SGS and whose merit cannot be evaluated on the numbers in
Table A-3. Lightrocket2’s winningest item with SGS is Divine Sunderer with a 67.44% win rate in 172
games, a surplus of 60 wins. His second-best item of even larger game size is Stridebreaker with a 52.43%
6

Table A-4

ITEM Wins Losses Total Games Played


Winrate Net LP %Total Games
Shurelyias 429 401 830 51.69% 448 43.64%
Stridebreaker 216 196 412 52.43% 320 21.66%
Divine Sunderer 116 56 172 67.44% 960 9.04%
Galeforce 60 35 95 63.16% 400 4.99%
Chemtank 28 30 58 48.28% -32 3.05%
Kraken Slayer 17 20 37 45.95% -48 1.95%
Sunfire 14 18 32 43.75% -64 1.68%
Moonstone Renewer 14 8 22 63.64% 96 1.16%
Goredrinker 9 8 17 52.94% 16 0.89%
Shieldbow 5 6 11 45.45% -16 0.58%
Trinity Force 0 1 1 0% -16 0.05%
Locket of the Iron Solari 0 2 2 0% -32 0.11%
Frostfire Gauntlet 1 1 2 50.00% 0 0.11%
Prowler's Claw 1 0 1 100% 16 0.05%
Everfrost 0 1 1 0% -16 0.05%

For reference, dark green denotes a win rate of 100% and grey a win rate of 0%. The reader may also notice that
the %Total Games does not add up to 100%. This is due to the exclusion of 190 games where a mythic item was not
purchased.

win rate and Shurelyias has 830 games with a slight positive win rate of 51.69%. Galeforce has a very
high win rate of 63.16% but due to a relatively low level of play, it is expected that this win rate will
decrease over time and cannot be taken as seriously.
We now arrive at an important question: why is the win rate for Divine Sunderer so high
compared to win rates for other frequently used mythic items, Shurelyias and Stridebreaker? Like PTA,
this is likely due to a combination of factors. Let us first consider the merits of the item as it pertains to
Taric. Divine Sunderer is a sheen item, meaning it provides bonus damage on an auto attack after an
ability use. This is particularly strong for a champion like Taric who can rotate his abilities on minimal
cooldowns with strong attack speed by abusing his passive. With Divine Sunderer Taric gains a large
number of sheen uses, dealing a large amount of damage to any target or champion he hits and healing in
the process. Divine Sunderer also offers 400 health which combined with healing capitalizes on Taric’s
strong tank statistics in armor and magic resist.
However, Divine Sunderer was only picked in 172 out of 1902 games, and only in the latter half
of the season indicating that it was used selectively. As mentioned above, Divine Sunderer is best when
Taric can attack his opponent often, this particularly being the case against low mobility melee
champions. In which case, the use of Divine Sunderer may be matchup dependent, and its win rate may
be inflated unlike if it was being picked nearly every game. Divine Sunderer is also frequently paired with
PTA, as we will see below, indicating another possible source of inflation of its win rate.
Unlike runes, win rates for items have considerably more spread. For mythic items with SGS, win
rates vary from 51.69% to 67.44%, a 15.75% spread versus a spread of 9.81% for runes. This indicates
that items may have higher impact on winning or losing than runes due to higher deviations from a coin
toss or a 50/50 chance of winning. This is a subject which will be investigated later in this paper including
possible relationships between multiple variables influencing win rates at large.
As we will see, combinations of variables can provide greatly different results, a key example is
mythic items and runes.
7

Table A-5 (shown on following page) illustrates wins, losses, total games played, win rate, and
net LP change for every Taric jungle build lightrocket2 has used in Season 11. A build in league of
legends is generally defined as a primary rune with a mythic item, as these two components are the main
aspects of the build. Lightrocket2 has run 41 unique builds involving 15 different mythics and 8 runes. As
noted in Table A-4, lightrocket2’s most used mythic item was Shurelyias and unsurprisingly, his two
most played builds utilize Shurelyias. The first is Shurelyias + Glacial and the second is Shurelyias +
PTA, both having a similar play rate but significantly different win rates. Shurelyias + Glacial has a
49.35% win rate versus Shurelyias + PTA which has a win rate 4.77% higher at 54.12%. This is likely
due to the strength of PTA compared to Glacial since the builds do not differ in any other meaningful
aspect.
Stridebreaker + Glacial is the third most played build, used heavily when Stridebreaker offered a
dash which could create a powerful slow field when combined with Glacial Augment. The Stridebreaker
strategy focused on trapping enemies in a single area during team fights with a combination of slows and
stuns as opposed to Shurelyias whose purpose is a mobility enhancer for Taric’s team. This build enjoyed
a strong win rate of 57.35% and despite not being used since June 2021, when Stridebreaker’s dash was
removed, it still has the highest win rate of any build of SGS. Stridebreaker’s unique power with Glacial
Augment is demonstrated by the much lower win rate of Stridebreaker + Phase Rush at 45.14%, 12.21%
lower than its counterpart with a similar game size. This is likely due to Stridebreaker lacking any
powerful interaction with Phase Rush as is seen with Glacial Augment.
All other builds have too few games played to be of SGS. Of note is No Mythic + Any which
denotes games where no mythic was built combined with any rune. The win rate of this category is
extremely low at 23.96%. There are several reasons for this: 1) short games, 2) lopsided games, 3) lack of
gold income, and 4) lack of mythic power.
We will discuss each of these points respectively. Shorter games imply that lightrocket2 is unable
to complete his mythic, due to the expense of mythic items, usually implying a stomp but also a potential
away from keyboard (AFK) player. Further in this report we will examine samples of lightrocket2’s
games which indicates his win rate increases with game time, having up to a 65% chance of winning a 44-
minute game versus a 40% chance of winning a 10-minute game. As a result, it can be reasonably
assumed that shorter games are a factor in No Mythic’s 23.96% win rate. Second, lopsided games, where
one team is very far ahead and the other behind, tend to be forfeited either at 15 or 20 minutes. As a
result, if a game is shorter, it is more likely to be lopsided which may contribute both to lightrocket2’s
lower win rate in shorter games and, in turn, a lower win rate for No Mythic.
Lack of gold income may be due to the previous two factors but may also indicate a poor
performance on lightrocket2’s part irrespective of his team or the game time, possibly being so poor that
it results in his team losing. Mythic items were intended to be the most powerful single item purchases for
champions in Season 11. Lacking such an item reduces lightrocket2’s ability to impact the game as he
might be able to otherwise do. This is particularly impactful in closer games which tend to go on longer
and may be shifted one way or the other by any comparatively minor factor, such as having or not having
a completed mythic item. Lacking a mythic item may damage the chances of winning close games,
further lowering the win rate of the No Mythic category.
Finally, we will discuss Net LP Change. As mentioned in Table A-1, net LP change is estimated
based on assuming +16/-16 for every ranked game. This is predicated on a stable match making ratio
which lightrocket2 likely has as he has consistently maintained a similar rank (Master) for much of
Season 11. As shown in Table A-5 lightrocket2’s best build in terms of LP gained is Divine Sunderer +
PTA, with an estimated 576 LP gained. It has gained almost 100 more LP than the next most successful
build, Stridebreaker + Glacial despite being played slightly more than a third as often (82 games vs 204).
8

Table A-5

BUILD Wins Losses Total Games Played


Winrate Net LP Change
Shurelyias + Glacial 191 196 387 49.35% -80
Shurelyias + PTA 184 156 340 54.12% 448
Stridebreaker + Glacial 117 87 204 57.35% 480
No Mythic + Any 46 146 192 23.96% -1600
Stridebreaker + Phase Rush 65 79 144 45.14% -224
Divine Sunderer + Conqueror 57 33 90 63.33% 384
Divine Sunderer + PTA 59 23 82 71.95% 576
Shurelyias + Omnistone 38 39 77 49.35% -16
Stridebreaker + Spellbook 32 28 60 53.33% 64
Galeforce + Phase Rush 39 16 55 70.91% 400
Chemtank + Glacial 28 29 57 49.12% -16
Kraken Slayer + PTA 13 13 26 50.00% 0
Galeforce + Glacial 10 11 21 47.62% -16
Moonstone + PTA 13 5 18 64.71% 128
Galeforce + PTA 10 6 16 62.50% 64
Sunfire + Spellbook 5 9 14 35.71% -64
Sunfire + Conqueror 6 7 13 46.15% -16
Shurelyais + Spellbook 4 9 13 30.77% -80
Goredrinker + Conqueror 6 6 12 50.00% 0
Kraken Slayer + Conqueror 4 7 11 36.36% -48
Sunfire + Phase Rush 3 4 7 42.86% -16
Shieldbow + Conqueror 2 4 6 33.33% -32
Sunfire + PTA 3 3 6 50.00% 0
Goredrinker + PTA 3 2 5 60.00% 16
Shieldbow + PTA 3 2 5 60.00% 16
Moonstone + Omnistone 1 2 3 33.33% -16
Galeforce + Conqueror 0 2 2 0% -32
Stridebreaker + Conqueror 2 0 2 100% 32
Stridebreaker + PTA 1 1 2 50.00% 0
Shurelyias + Phase Rush 1 1 2 50% 16
Stridebreaker + Hail of Blades 0 1 1 0% -16
Trinity Force + Phase Rush 0 1 1 0% -16
Prowlers Claw + Conqueror 1 0 1 100% 16
Moonstone + Guardian 0 1 1 0% -16
Everfrost + Glacial 0 1 1 0% -16
Frostfire + Phase Rush 1 0 1 100% 16
Frostfire + PTA 0 1 1 0% -16
Chemtank + Omnistone 0 1 1 0% -16
Galeforce + Omnistone 1 0 1 100% 16
Locket + Omnistone 0 1 1 0% -16
Locket + PTA 0 1 1 0% Decreased 16
9

This is due to Divine Sunderer + PTA’s 71.95% win rate, indicating that although many of the lesser
played builds cannot be properly evaluated from a statistical perspective, they all impact LP gains and, in
turn, lightrocket2’s rank. The Divine Sunderer builds are the best example of this as, combined, they have
increased lightrocket2’s LP by an estimated 960, more than his two best builds of SGS combined.
Table A-5 is not as accurate as other tables due to a 19-game discrepancy, listing only 1883
games of the 1902 total. However, it provides unique, detailed insight into all 41 of lightrocket2’s builds
in Season 11 and is given here with that caveat in mind.

Overview: Part II
In Overview: Part I we began the paper with a very general overview of lightrocket2’s Taric
jungle statistics, namely his overall win rate, his win rate for each role, and his jungle win rates for mythic
items, runes, and builds. In this section we will continue aspects of the discussion in Overview: Part I but
in greater detail as mentioned in the abstract of this paper. The information in the previous section and
this section will be used to gain a better understanding of what aspects of lightrocket2’s strategy work and
which aspects do not. This will be used to make variables for a series of linear models at the end of the
paper to further investigate Taric jungle and, by extension, to create predictive models. We begin by
investigating the first of these variables: game time.
Table B-1

Game Duration Wins Losses Total Games Win rate


10 or less 7 3 10 70.00%
15-19:59 39 82 121 32.23%
20-24:59 121 160 281 43.06%
25-29:59 169 143 312 54.17%
30-34:59 103 93 196 52.55%
35-39:59 33 19 52 63.46%
40-44:59 11 3 14 78.57%

Game time is given in minutes. Game times between 10 and 45 minutes are divided into 5-minute intervals. This is
consistent with similar data from leagueofgraphs.com. There were no games longer than 44:59 in this sample.

Table B-1 lists wins, losses, total games played, and win rate for game times between 0 and 44:59
for a sample of 986 Taric jungle games lightrocket2 played in Season 11. Sample size of 986 was chosen
to be sufficiently large to draw meaningful conclusions from most game time intervals (meeting SGS)
while also capturing any large differences based on time intervals. When collecting this data, it was
expected that Taric jungle would lose shorter games and win longer ones. This is due to Taric being a
hard scaling champion who fully utilizes his abilities later on while being relatively weak in the early
stages of the game. Table A-6 reflects this expectation.
We will inspect time intervals between 15 and 35 minutes as these intervals have SGS and can be
treated with statistical significance. Lightrocket2’s worst performance was in games between 15 and 20
minutes with a 32.23% win rate in 121 games. His win rate improved to 43.06% for games between 20
and 25 minutes and greatly improved to 54.17% for games between 25 and 30 minutes in 312 games, his
most frequently played time interval. However, lightrocket2’s win rate declines slightly to 52.55% for
games between 30 and 35 minutes in 196 games, a 1.62% decrease. It is possible this is a statistical
anomaly and would not exist with an even larger sample. Regardless, the data demonstrates that
lightrocket2 consistently wins games 25 minutes or longer with a 316-258 record (55.05% win rate, 574
games) and generally loses games shorter than that with a 167-245 record (40.53% win rate, 412 games).
10

These win rates are particularly noteworthy. For game times where lightrocket2 consistently loses he
loses by a bigger margin than the times where he wins: -9.47% versus 5.05%, which may be due to
lopsided games and early forfeits. Forfeits are available in league of legends starting at 10 minutes but
traditionally occur between 15 and 20 minutes. This can negatively skew win rate especially with a
champion that does not stomp the early game like Taric.
Another area of interest is the relationship between game time and instances where lightrocket2
did not complete a mythic item. It is possible, as previously discussed, that shorter game time may impact
the frequency with which lightrocket2 can purchase a mythic item but lack of purchasing a mythic item
may also impact game time. The relationship between game time and mythic items will be further
explored in the next section.
Table B-2

Secondary Summoner Wins Losses Total Games Winrate Net LP Change


Flash 362 352 714 50.70% 112
Ghost 183 185 368 49.86% -32
Exhaust 41 51 92 44.57% -160
Ignite 1 0 1 100% 16

Table B-2 lists wins, losses, total games, win rate, and estimated LP change for every secondary
summoner smell lightrocket2 used in a sample of 1175 games. Secondary summoner refers to the
summoner lightrocket2 took aside from smite, which is one of two summoners used by every jungler. In
this sample, he ran 4 secondary summoners with Flash being used most of the time (714 games) and ghost
a distant second (714 games). Exhaust was occasionally used as part of a build involving Unsealed
Spellbook, which lightrocket2 used in 92 games or 4.84% of the time.
Unlike game times, there is very little difference between using one summoner or another for
both summoners of SGS and even Exhausts’ win rate is not particularly low being only 6.13% lower than
the best performing summoner, Flash. One way to quantify the predictive power of a particular variable is
to determine how different a factor, such as win rate, is from a coin toss – often described as ‘random
chance’ – or a 50/50 result. If the resulting win rate is significantly different from 50% the variable likely
has significant predictive power unless it can be described otherwise. What determines significant
predictive power depends on the p-value, which will be discussed in detail further in this paper.
Table B-3

Secondary Summoner %Change 50 SGS Predictive Power


Flash 0.70% Yes None
Ghost -0.14% Yes None
Exhaust -5.43% No None
Ignite 100% No None

Table B-3 illustrates the predictive power of Flash, Ghost, Exhaust, and Ignite summoners with
respect to Taric jungle win rate. Two are of SGS and none have any predictive power. Flash and Ghost
lack predictive power due to insufficient p-values, Exhaust due to insufficient p-value and low sample
size and Ignite due to sample size issues. Unlike game time, secondary summoners were not impactful on
lightrocket2’s Taric jungle win rate. Finally, we discuss lightrocket2’s performance based on jungle
matchup.
11

Table B-4

Champion Wins Losses Total Games Winrate Net LP


Graves 75 67 142 52.82% 128
Lee Sin 72 56 128 56.25% 256
Kha'Zix 50 49 99 50.51% 16
Hecarim 49 39 88 55.68% 160
Kayn 36 48 84 42.86% -192
Ekko 36 36 72 50.00% 0
Nidalee 35 33 68 51.47% 32
Nunu 32 32 64 50.00% 0
Diana 32 31 63 50.79% 16
Viego 30 33 63 47.62% -48
Kindred 29 29 58 50.00% 0
Xin Zhao 32 23 55 58.18% 144
Fiddlesticks 25 30 55 45.45% -80
Evelynn 25 29 54 46.30% -64
Rek'Sai 28 26 54 51.85% 32
Lilla 21 30 51 41.18% -144
Zac 19 23 42 45.24% -64
Shaco 16 22 38 42.11% -96
Jarvan IV 26 25 36 50.98% 16
Karthus 18 18 36 50.00% 0
Olaf 22 13 35 62.86% 144
Elise 13 22 35 37.14% -144
Ivern 14 21 35 40.00% -112
Volibear 19 10 29 65.51% 144
Udyr 18 10 28 64.29% 128
Rumble 14 14 28 50.00% 0
Master Yi 14 14 28 50.00% 0
Rengar 12 15 27 44.44% -48
Nocturne 9 14 23 39.13% -60
Talon 13 9 22 59.09% 64
Qiyana 10 12 22 45.45% -32
Taliyah 9 12 21 47.06% -48
Gragas 5 15 20 25.00% -160
Rammus 15 4 19 78.95% 176
Trundle 9 6 15 60.00% 48
Vi 9 7 16 56.25% 32
Shyvana 9 6 15 60.00% 48
Poppy 6 8 14 42.86% -32
Gwen 8 5 13 61.53% 48
Morgana 4 8 12 33.33% -64
Zed 6 5 11 54.55% 16
12

Table B-4 (cont.)

Champion Wins Losses Total Games Win rate Net LP


Dr Mundo 3 4 7 42.86% -16
Warwick 3 2 5 60.00% 16
Mordekaiser 1 3 4 25.00% -32
Amumu 2 2 4 50.00% 0
Pantheon 0 3 3 0.00% -48
Yone 3 0 3 100.00% 48
Sylas 1 1 2 50.00% 0
Wukong 2 0 2 100.00% 32
Jax 1 1 2 50.00% 0
Sion 1 1 2 50.00% 0
Garen 2 0 2 100.00% 32
Twitch 0 1 1 0.00% -16
Camille 1 0 1 100.00% 16
Yone 1 0 1 100.00% 16
Shen 1 0 1 100.00% 16
Darius 1 0 1 100.00% 16
Riven 1 0 1 100.00% 16

Lightrocket2 played against 60 different champions as Taric jungle in Season 11. His most played
matchup was against Graves and his least played matchups were against Twitch, Camille, Yone Shen,
Darius, and Riven. Only two of his matchups were of SGS, Graves at 142 games and Lee Sin at 128
games. Of the two, lightrocket2 had his best win percentage vs Lee Sin at 56.25%. Due to the generally
smaller samples of matchup data, SGS for this dataset was 10 games or more for any given matchup.
With that criterion in mind, lightrocket2’s best matchup was Rammus with a 78.95% win rate in 19
games and his worst matchup was Gragas with a 25% win rate in 20 games. Lightrocket2’s worst
matchup in terms of LP lost was Kayn with 192 LP lost due to a relatively low win rate (42.86%) in many
games. His best matchup for the same metric was Lee Sin with 256 LP gained.
The majority of lightrocket2’s matchups with 10 games or more were favorable, with 19 being
winning matchups, 16 being losing matchups, and 6 being neutral matchups. Favorability, or lackthereof,
was divided into extremely, very, moderately, or slightly favorable/unfavorable and neutral based on the
matchup win rate. Matchups with win rates at 60% or above were denoted extremely favorable and those
between 57.5% and 60%, 52.5% and 57.5%, and 50.01% and 52.4% as very, moderately, and slightly
favorable respectively. Win rates below 40% were classified as extremely unfavorable, those between
47.5% and 49.99%, 42.6% and 47.5%, and 40% and 42.5% were classified as slightly, moderately, and
extremely unfavorable. Of lightrocket2’s winning matchups, 7 were extremely favorable, 4 were very
favorable, 3 were favorable, and 5 were slightly favorable. Of his losing matchups, none were slightly
unfavorable, 9 were unfavorable, 3 were very unfavorable, and 4 were extremely unfavorable. In other
words, 11 of lightrocket2’s 19 winning matchups had a win rate above 57.5% and only 7 of his losing
matchups had a win rate below 42.5%.
Against all matchups lightrocket2 secured herald 70-80% of the time and rarely won duels more
than 40% of the time against any champion. His teamfights win rate was strong against almost every
jungler, frequently above 50% and even 60% or higher against easier matchups such as Rammus, Udyr,
and Lee Sin. Average deaths varied based on matchup, as low as 3 per game in the easiest matchups, 5.7
for Kayn, and 6.1 for his hardest matchup, Gragas. Higher deaths for worse matchups make sense and we
13

will see further in this report that deaths are correlated to many variables that will predict increased losses,
as well as losses themselves. Lightrocket2 preformed about the same vs ranged champions as he did vs
melee. This is likely due to lightrocket2 having very strong teamfights on Taric jungle regardless of 1v1
fights.
We have concluded our overview of lightrocket2’s Taric jungle performance. In the following
section we will dig deeper into the data with mathematical models.

Relationships and Predictive Methods


In this section we will discuss multiple models to determine relationships between different
aspects of Taric jungle. First, we will start with linear regression, proceed to logistic regression, and finish
with several tree-based models. We begin by introducing the linear regression model.
Equation C-1
𝑦 =β +β 𝑥 +e
Equation C-1 is the general form of the linear regression model. It consists of three variables, y, β , and
β . β and β are the regression coefficients, the y-intercept, and the slope respectively. 𝑥 is the
dependent variable and y being the independent variable, also termed the predictor and the response. e is
the error. What the linear regression model does is fit a straight line of intercept β and slope β to a
scatterplot of data. In this case that is Taric jungle data, and the purpose of the regression line is to
identify relationships between, for example, assists and game time. Intuitively we would expect a very
strong relationship between assists and game time to exist as the longer the game goes, the more kills will
be occurring and by extension, assists to those kills. A plot of this relationship with a fitted regression line
and its equation is given on Figure C-1 on the following page.
In Figure C-1 we have Assists set as the response and Game Time set as the predictor, y and x,
respectively. As we can see, there is a general upward trend in Assists for increasing values of Game
Time. That is, the longer the game, the more assists lightrocket2 will have. We have fit a regression line
to this data by creating and then plotting a linear model between Assists and Game Time. How well does
the regression line fit the data? We have several ways to determine this, but it first requires a thorough
understanding of how the linear regression model works. How well does the regression line fit the data?
We have several ways to determine this, but it first requires a thorough understanding of how the linear
regression model works. The linear regression model in Equation C-1 returns a y output for every input
𝑥 . In the linear model created for Figure C-1, the input values are 301 game times between 10 and 45
minutes. Since the model is a straight line, it cannot possibly capture every point in the scatterplot given
that the data, obviously, does not lie on a straight line. What the linear model does instead is it estimates
the y intercept and the slope β and β such that the sum of squared errors is minimized.
The sum of squared errors, denoted SSE, is given in Equation C-2
Equation C-2

(𝑦 − 𝑦 )

where y denotes the actual y value and 𝑦 denotes the estimated y value from Equation C-1. For every
input 𝑥 the linear regression model estimates an output value 𝑦 which lies on the line. There is a
significant discrepancy between the y-values of the regression line and the actual (or observed) y values
in Figure C-1 and we call these discrepancies residuals or errors. The summation of the squared errors is
given by Σ which is the capital Greek letter sigma. In mathematics it means ‘the sum’. β and β are
14

Figure C-1

chosen to minimize the SSE and the method of choosing is described by two different formulas for each
coefficient (given below). To obtain these formulas, we preform the following calculus:

= ∑(y − y ) = ∑(y − β − β x ) = −2(∑(y − β − β x )) = 0

To minimize the SSE, we need to evaluate the partial derivative for each linear regression coefficient and set these
partial derivatives to 0. The derivative is also known as the instantaneous rate of change on an infinitesimal scale.
Regarding squared error, when its instantaneous rate of change is zero it is at a minimum, or no change at all. This
minimizes the sum of squared errors.

−2(∑(y − β − β x )) = 0  ∑(y − β − β x ) = 0  ∑ y − ∑ β − ∑ β ∑ x = 0
In Equation C-2 the errors are summed and squared for every observation, n. That is, 𝛴 sums n number of errors.
Breaking this up into its components gives the sum of n observations of y, the sum of n observations of 𝑥 , and the
sum of n-copies of β . The sum of all observed y values divided by the total number of values, n, gives the mean y-
value, 𝑦. The same is true for the sum of all observed x values divided by n, which gives 𝑥̅ .
15

∑β = ∑y − ∑β ∑x  β = ∑𝑦 − ∑β 𝑥

Equation C-3
β =𝑦 − β x
The result at the far right is the formula that minimizes the SSE for the intercept coefficient, 𝛽 . Note how it is
dependent on β this means that we must first calculate the value of β before calculating β . We do this below.

= ∑(y − β − β x ) = (∑ − 2x (y − β − β x )) = ∑ x (y − β − β x ) = 0

∑ x (y − β − β x ) = ∑ x y − β ∑ x − β ∑(x ) = 0

We can substitute 𝛽 = ∑ 𝑦 − ∑ 𝛽 𝑥 for 𝛽 in ∑ 𝑥 𝑦 − 𝛽 ∑ 𝑥 − 𝛽 ∑(𝑥 ) obtaining the following

∑x y − ∑y − ∑β x ∑ x − β ∑(x )  ∑ x y − ∑ y ∑ x + ∑ β (x ) − β ∑(x )  ∑ x y −
∑ y ∑ x = −β ∑(x ) + β ∑(x )  ∑ x y − ∑ y ∑ x = β − ∑(x ) + ∑(x )

Dividing the left-hand side by the expression multiplied to 𝛽 on the right gives Equation C-3

Equation C-4
∑ ∑ ∑
β =
∑( ) ∑( )

Equation C-2 and C-3 are formulas used to calculate the regression coefficients for the linear model, such
as the one shown in Figure C-1. Due to the process of calculating these coefficients by hand being
exhausting, we use software to do the work for us. In that respect it may seem pointless to provide
Equation C-2 or C-3 but it is useful to show that β and β do not come from nowhere and that there is
mathematical reasoning behind them.
Below is the equation of the regression line in Figure C-1. This was obtained thru RStudio (a
statistical programming language).
Equation C-5
𝑦 = -7.99624 + 0.74416𝑥
We know that 𝑦 is the estimated value of y. Suppose we enter the maximum value for Game
Time, 43.67 (or 43 minutes, 40 seconds) and see what value the linear regression model estimates for y,
the number of assists. 𝑦 = -7.99624 + 0.74416(43.67) = 24.5012272 or approximately 25 assists are
expected for this game. The actual number of assists, y, was 32 giving us an error of 32 – 24.5012272 =
7.4987728 or approximately 7.5. To calculate the SSE, this process is repeated for every observation
where the actual value of y has the estimated value 𝑦 subtracted from it, giving us the error for every
observation. Every time error is calculated it is squared This is done because some errors may be
negative: for example, an estimated number of assists 𝑦 = 20 versus the actual number of assists y = 14
will clearly lead to a negative result in y - 𝑦 . When summing the errors, negative error may cancel out
with positive error which is undesirable since this can lead to strange situations where the summed error
may be zero despite large visual discrepancies between y and 𝑦 as observed in Figure C-1. Squaring error
makes every discrepancy positive and avoids this problem.
As mentioned above, summation of the squared errors is given by Σ which is the capital Greek
16

letter sigma. In mathematics it means ‘the sum’. Computationally this means adding every single
(𝑦 − 𝑦 ) term together which, literally, gives the sum of squared error (SSE). In our example involving
Assists and Game Time there are 301 such terms to add together as the data was taken from a sample of
301 Taric jungle games. The goal of the linear regression model is to minimize the sum of squared error,
and that is done by choosing β and β such that those errors are minimized. This should make intuitive
sense as changing β either raises or lowers the line along the y-axis and changing β changes the slope
of the line. In Figure C-1 we would obviously have a very poor fit if the y-intercept was 50 or if the
intercept remained unchanged from Equation C-5 but the slope was negative. Neither would lead to a
linear model that fit the data well. By minimizing the SSE, Equation C-3 provides a reasonably good fit to
the data in Figure C-1. Now we can begin looking deeper into the data for this linear model.
The SSE provides the total squared deviation of the predicted y value, 𝑦 , from the actual y value
however this does not tell us much on its own. The Residual Standard Error (RSE) is the standard error or
the average amount of error between 𝑦 and y. Its value is much easier to understand than the SSE. In this
case it answers the question: on average, how far off do our predicted number of assists tend to be from
the actual number of assists for a given game time? It is given by the following equation
Equation C-6

∑(y − y )
n

The RSE is derived from the SSE. Notice that the expression in the upper half of the fraction is
the SSE. It is then square rooted to give the total error, rather than the squared error, and then divided by
the total observations n to give the average error for every data point. The regression line in Equation C-5
is off by 4.942 or about 5 assists on average. Lightrocket2’s average number of assists for this sample is
11.33555 so clearly, our regression has a significant amount of error. To properly quantify whether it is a
poor predictor of assists given game time we use the coefficient of determination.
The coefficient of determination, known as 𝑅 is a statistical tool used to determine the strength
of the relationship between the predictor and response variables. It is given by
Equation C-7

1-

That is 𝑅 equals 1 minus the ratio of the sum of squared error and the sum of squares total. The
sum of squares total is given by
Equation C-8

(y − y)

Which is the sum of the squared difference between the actual y value and the mean y value, 𝑦.
This gives the total error of y. That is, how much does y differ from its mean value overall? We call that
quantity the SST or total error of y. The ratio of SSE and SST is the squared error between y and 𝑦
divided by the squared error between y and 𝑦. We call this the proportion of variability not explained by
the linear model. If no variability is explained, then SSE = SST and the linear model equals 𝑦. If all
variability is explained, then = 0 leading to an 𝑅 value of 1. In other words, the better the linear
model is compared to the mean model (y = 𝑦), the lower the SSE and the better the value for 𝑅 . Higher
17

values of 𝑅 mean that the linear model fits the data well and vice versa. The linear model (Equation C-5)
does not fit the data in Figure C-1 particularly well, having an 𝑅 value of 0.439. This indicates a
mediocre fit with significant variability.
On the surface the relatively low 𝑅 value would seem to indicate that there isn’t much of a
relationship between Assists and Game Time, contradicting our assumption that such a relationship exists.
Visually this does not make much sense either. However, the linear model exists to predict the number of
assists lightrocket2 will have based on a given game time. Thus the 𝑅 value simply means that the linear
model isn’t a very good fit to the data, not that game time is unrelated to the number of assists
lightrocket2 will obtain in each game. To determine this association, we use correlation.
Equation C-9
∑(x − x)(y − y)
r=
∑(x − x) (y − y)

The correlation coefficient is given by r. Correlation is used to determine to what extent a change
in x causes a change in y for predictor x and response y. For example, if there is no correlation r = 0 this
means that changing x by 1 changes y by 0 and if there is a perfect correlation changing x by 1 changes y
by 1, a relationship often described as ‘one to one correlation’. A strong correlation is typically +/- 0.7 or
greater, a moderate correlation is between +/- 0.3 and 0.7, and a weak correlation is anything below +/-
0.3. Thus, correlation is a useful tool in determining whether, in this instance, Assists and Game Time are
related to each other. Their correlation is r = 0.66 indicating a moderate-strong correlation. While the
linear model in Equation C-5 did not yield a good fit, Assists and Game Time are related to each other,
relatively strongly, as was expected.
Now that we have covered an example, we can begin to explore other relationships among the
variables in lightrocket2’s Taric jungle data. A sample of n = 301 (n is a variable typically used to
denoted sample size) of lightrocket2’s Taric jungle games was collected for 22 variables. Some of these
variables are quantitative (can be counted) and others are qualitative (cannot be counted). The quantitative
variables are as follows: game time, number of completed items, kills, deaths, assists, creep score (CS)
per minute, vision score, tower advantage, dragon difference, and kill difference. The qualitative variables
are: conqueror, glacial, PTA, sunderer, shurelyias, no mythic, ghost, flash, ghost, melee, ranged, kayn,
win or loss, and baron. On the following page is Figure C-2, a plot of all the quantitative variables against
each other.
Correlated variables increase or decrease with each other, and this is easily observed between any
two of the eight plotted variables in Figure C-2. For example, in Figure C-2 kills are clearly not correlated
with deaths as decreasing or increasing the number of kills does not lead to any meaningful change in the
number of kills. This can be confirmed by calculating the correlation between kills and deaths using
Equation C-9. In this case, r, the correlation coefficient, is -0.08533498. Since the correlation coefficient
for kills versus deaths is very close to zero there is essentially no relationship between them confirming
our visual observations.
In this paper we are not very interested in uncorrelated variables outside of using them as an
example. Useful statistical analysis is best obtained by identifying variables which have correlations of at
least +/- 0.3 or higher, a moderate to strong correlation. This is important to identify potential
relationships among the data. In Figure C-2 there are several relationships that seem to be well correlated.
We identify these relationships, and others, in Table C-1 on the following page. In Table C-1 we notice
there are 19 significant (moderate-strong) correlations, 2 of which are strong correlations and the
remaining 17 are varying strengths of moderate correlations. We will discuss each of these correlations
beginning with the strongest.
18

The strongest correlation in Table C-1 is that between the tower advantage and kill difference of a
given team at r = 0.871482. This is a strong positive correlation meaning the more kills lightrocket2’s
team has than the opposing team, the more towers lightrocket2’s team will take compared to the opposing
team. Intuitively, this makes a great deal of sense as if the opposing team is consistently dead, then they
are not able to defend objectives, like towers, and will fall behind in both towers and kills. This leads to a
strong correlation between the two variables as seen in the table. The strong relationship seen here will be
further demonstrated later in this section with logistic modeling involving kill difference and tower
advantage separately.
The second strongest correlation is between items completed and game time r = 0.8483243. This
is straightforward as all players gain more gold in longer games leading to completion of more items
irrespective of score or in game advantages. The weakest strong correlation is between game time and
vision score. Their relationship is relatively trivial. Sight wards last for longer periods of time the longer
the game continues and since vision score is calculated by a combination of total wards placed, total
wards cleared, and total time of vision provided, it is natural that vision score should be strongly related to
game time. Next, we discuss moderate correlations beginning with the strongest.
The following correlations: Tower advantage and deaths, CS per minute and deaths, kill
difference and deaths, game time and assists, vision score and assists, dragon difference and kill
difference, tower advantage and dragon difference, items completed and assists, and items completed, and
vision all have similar strength correlations (around +/- 0.6 to 0.69 respectively). We discuss each of these
correlations respectively.
As mentioned, tower advantage is a variable which tracks the number of towers one team has
over the other for a given game. Deaths represents the number of times lightrocket2 has died in each
game. The correlation between tower advantage and the number of times lightrocket2 has died is
-0.64051 meaning there is a moderate to strong negative association here. This is likely due to
lightrocket2 dying more frequently in games that he is losing, and in games that he is losing, his team
tends to be behind in towers (shown later in this report). However, the reason lightrocket2 may be losing
is because he is dying more frequently, making it the cause not the consequence of suffering a tower
deficit and losing the game.
CS per minute and deaths also have a moderate to strong negative association. The more
lightrocket2 dies the less he will CS and the less he dies the more he will CS. This association is
reasonably self-evident since the more time lightrocket2 is dead the less time he has to farm CS and vice
versa. It is a slightly stronger negative correlation r = -0.68755 than that between tower advantage and
kills r = -0.64051. Presumably this is due to lightrocket2 playing a supportive low damage jungler who
does not need a particularly good kill death ratio (KDA) to win, thus the association should be weaker.
Game time and assists have a moderate-strong positive correlation at r = 0.662553. This
correlation is also straightforward as longer games tend to involve more kills, which means lightrocket2
will obtain more assists to those kills based on the nature of Taric jungle. The association between vision
score and assists is more interesting. Vision score has a moderate-strong positive association with assists
with r = 0.591578. One reason for this could be that if lightrocket2 is obtaining more assists his team is
obtaining more kills and winning the game, meaning his vision score will be higher since he can win
more freely. But if this was true, there would be a strong positive association between vision and
variables like kill difference or tower advantage yet there isn’t. A more likely explanation is like what was
given above: more assists are associated with longer games and longer games are associated with more
vision; thus, more assists are associated with more vision.
Dragon difference refers to the number of dragons one team has compared to the other for a given
game. If lightrocket2’s team has more dragons the value of this variable is positive, zero if tied, and
negative if his team is behind. This variable has a moderate-strong positive association with kill
19

Figure C-2

Table C-1

Variable Deaths Assists KillDff CSPerMinute Vision DragonDff Kills TowerAdv GameTime
ItemsCompleted 0.094117 0.695167 0.283133 -0.01799594 0.667665 0.2213412 0.419659 0.2023129 0.8483243
GameTime 0.358623 0.662553 0.064707 -0.2267058 0.730193 0.0707009 0.211837 0.0020359
TowerAdv -0.64051 0.442434 0.871482 0.4633678 0.215488 0.6218571 0.265406
Kills -0.08533 0.301213 0.332517 0.07246628 0.220779 0.2525618
DragonDff -0.48849 0.317648 0.626097 0.4208185 0.251889
Vision 0.077987 0.591578 0.246147 0.02783116
CSPerMinute -0.68755 -0.07965 0.454451
KillDff -0.6479 0.54312
Assists -0.00397
Correlation table between every quantitative predictor for sample n = 301 of lightrocket2’s Taric jungle games.
Strong correlations (> +/- 0.7) are highlighted in dark orange. Weakest moderate correlations (approaching +/-
0.3) are highlighted in pale orange with darker shades used to indicate stronger correlations (approaching +/- 0.7).
Weak correlations are not highlighted. Entries without data are colored in grey.
20

difference. The reasoning is straightforward: if lightrocket2’s team has more kills they are likely
in control of their lanes and able to control objectives such as dragon, thereby gaining a dragon
advantage. Tower advantage and dragon difference also have a moderate-strong positive correlation r =
0.6218571 and for similar reasons.
Finally, we discuss the two moderate-strong positive correlations shared between the number of
completed items and assists and vision respectively. These are both trivial associations and likely exist for
the same reason. Assists have been shown to be positively correlated with game time and items completed
are also positively correlated with game time, thus assists are positively correlated with items completed.
Likewise, vision and game time have strong positive correlations with game time and thus have moderate-
strong correlations with each other. On the following page we plot the 12 strongest correlations (all 9
moderate-strong and the 3 strong correlations) with regression fits and 𝑅 values for each. This is a less
jumbled view of the data than provided in Figure C-2 and lets us view important information more
clearly.
In Table C-1 we noted 3 strong correlations. That is, we noted 3 correlations with values > +/-
0.7. If any two variables have an (approximately) linear relationship with each other and are strongly
correlated, we say that they are collinear. Similarly, if three or more variables have a linear relationship
and are strongly correlated, we say they are multicollinear. The 3 strong correlations are between items
completed and game time, game time and vision, and tower advantage and kill difference. The 𝑅 values
in Figure C-3 allow us to determine if they have linear relationships with each other.
Items completed and game time are strongly correlated and have an 𝑅 = 0.7197 meaning that
approximately 72% of the variability in the data is explained by a linear model. Therefore, a linear model
is a good fit to the data and items completed and game time can be considered collinear. Game time and
vision have a strong positive correlation however their 𝑅 = 0.5332 indicating that a linear model is a
mediocre fit to the data. It is not likely that game time and vision are collinear due to a relatively low 𝑅
value. Tower advantage and kill difference also have a strong positive correlation. The 𝑅 value for their
fitted linear model is high at 𝑅 = 0.7595 indicating likely collinearity between tower advantage and kill
difference.
It is very unlikely that any of the 9 moderate-strongly correlated variable pairs are collinear. This
is due to none of them having high 𝑅 values and all having values below 0.5. That is, while there is a
relationship between, for example, dragon difference and kill difference it is not explained by a linear
model. All other variable pairs in Table C-1 have moderate, moderate-low, or low correlation and none
are likely to have high 𝑅 values as the variable pairs are not correlated well if at all. Table C-2 (see p.22)
is a supplement to Figure C-3 and provides detailed information for every quantitative variable including
all variables plotted in Figure C-3.
We have confidence that a linear relationship exists between items completed and game time and
between tower advantage and kill difference due to their strong 𝑅 values. We can investigate this further
by plotting the linear models for each relationship, as seen in Figure C-4.
Figure C-4 (see p.24) provides the plot of the linear models between items completed and game
time (plot 1) and tower advantage and kill difference (plot 2). In this part of the report, we will clarify
what each graph means and its ramifications for linearity. Specifically, we will discuss four requirements
for a linear model: that every, that variance of error is constant, and that there are no high leverage points,
and no outliers. If a model meets these requirements, in addition to having a high 𝑅 value, it is a proper
linear model.
The Residuals vs Fitted graph in plot 1 is a graph of the residuals, also called errors, plotted
against the fitted values. The fitted values are the estimated y values, 𝑦 , for every input value x. In other
words, this graph is the error for every number of completed items predicted by the linear model for a
21

Figure C-3

Plots of all 12 moderate-strong and strongly correlated variable pairs for lightrocket2 Taric jungle games, sample
size n = 301. Red lines indicate the linear model fit for each plot. 𝑅 denotes the correlation of determination for
each.

given game time (x-value). The red line is a fitted curve that minimizes the squared error. One of the
assumptions for linearity is that the actual y values, y, can be fit with a linear model, also called a straight
line. If y can be fit with a straight line, then the errors should be too, and that is the case here.
The normal Q-Q plot is called the normal quantile quantile plot. It plots two sets of data against
each other and if both sets are normally distributed, having a bell curve, then they have a linear
relationship. Visually, this means that the Q-Q plot will give a curve very similar to a straight line, as seen
in both plot 1 and plot 2 in Figure C-4. If not, then the plotted variables, though they may be related, will
not have a linear relationship.
For the final two graphs, the Scale-Location and Residuals vs Leverage, we will introduce the
following equations:
22

Equation C-10
σ
√n
Equation C-10 is the standardized residual or the standard error. It is given by the sample
standard deviation divided by the square root of the population. For clarity, standard deviation of error is
Table C-2

Variable Min 1st Qrt Median Mean 3rd Qrt Max


GameTime 10.18 22.32 25.63 25.98 29.68 43.67
ItemsCompleted 0 3 3 3.326 4 6
Kills 0 1 2 2.329 3 15
Deaths 0 3 5 4.691 7 13
Assists 0 6 11 11.34 16 32
CSPerMinute 3.1 5.1 5.6 5.621 6.1 8
Vision 11 23 29 29.57 35 68
TowerAdv -11 -6 1 0.1927 6 11
DragonDff -4 -1 0 0.402 2 5
KillDff -30 -16 1 -1.548 11 30

Table providing the minimum, maximum, median (the middle), and mean values for all 10 quantitative variables for
lightrocket2’s Taric jungle games. 1st Qrt and 3rd Qrt refer to the first and third quartiles of the data. Sample size
n = 301.

a measure of dispersion of the error. The standardized residual is the number of standard deviations a
given residual is from the mean residual, 0. It can be thought of as the standard deviation for a sample, in
this case of n = 301, rather than a population.
One of the main assumptions of linear regression is that errors are of, roughly, equal variance for
any ‘fitted value’, 𝑦 . Variance is equal to the square root of the standard deviation. For sample data like
our n = 301 sample of lightrocket2’s Taric jungle games, will be calculated by the square root of the
standard error. The Scale-Location plot measures the variance of the error. If it is constant, then the red
line illustrated in Figure C-4 will be roughly straight and otherwise if it is not. The curve in both Scale-
Location plots is, roughly, straight. Therefore, the constant variance assumption of linearity is satisfied for
both linear models.
Equation C-11
1 (x − x)
h = +
n ∑(x − x)
Equation C-11 is the equation for leverage. Leverage is defined as points that lie far away from
other points in the data, though not necessarily outliers. It is listed on the x-axis for the Residuals vs
Leverage plot. The purpose of this plot is to identify any residuals, or error terms, that lie far away from
the mean error and have high leverage compared to other error terms.
23

Equation C-12

∑ y ( ) − y ( )( )
D =
1
p ∑(y − y )
n
This equation is Cook’s Distance, and it measures the influence of individual observations in
linear regression analysis. Influence refers to the overall impact an observation has on the linear curve.
Does an observation cause a regression line to point further up or down and if so by how much? Is the
sort of question answered by influence. If influence and leverage is high for a given observation, then we
have an outlier. Outliers may also exist in cases of exceptionally high leverage but low influence and vice
versa. Cook’s distance is mentioned in the Residuals vs Leverage graph and an observation will be
flagged in red if it has a high Cook’s distance and is an outlier. No observations are flagged in red in
Residuals vs Leverage chart in Plot 1 or Plot 2, thus there are no outliers in either linear model.
We have satisfied all four criteria for linear models with high 𝑅 value. The relationship between
items completed and game time is linear, as is the relationship between tower advantage and kill
difference. We call any two variables that have a linear relationship and strong correlation collinear. Items
completed is collinear with game time and tower advantage is collinear with kill difference. Further into
this section we will create a model used to predict wins and losses for any game that lightrocket2 plays. It
is very important that every variable used in that model does not depend on other variables. This is
because we expect any change in each variable x to change the value of the response, 𝑦 – but if this is not
the case and a change in x causes a change in 𝑥 or 𝑥 then the relationship between each variable in the
response is much different. It becomes hard to know the degree to which any single variable is
responsible for a change in the response, which worsens the quality of the model than if every variable is
an independent, meaningful predictor.
Regarding Taric jungle, if we wished to create a model to predict a win or a loss for any given
solo queue game, we wish for a very high-quality model to do this most accurately. In such a model, we
would not include collinear variables, since we cannot know what influence tower advantage
independently has due to its positive linear relationship with kill difference. Instead, we aim to develop a
model with no collinear variables that independently increase or decrease the chance of winning. During
the research period of this project, I collected data on quantitative and qualitative variables, of which
winning or losing is the former. Whether lightrocket2 won or lost a given game was quantified as a
binary, or two numbered variable, where 1 corresponded to a win and 0 to a loss. Linear regression is not
well suited to modeling relationships between binary variables, since the best linear fits minimize the
SSE, but this is impossible when the only y outputs are a 1 and a 0.
Logistic regression is an extension of linear regression used, as it will be here, to predict
qualitative responses for quantitative or a combination of quantitative and qualitative predictors. We will
attempt to explain the rationale for its use and why it works below. For our logistic models, tower
advantage and kill difference will not be included, nor will items completed and game time due to their
respective collinearities.
The logistic regression model is given by Equation C-13
Equation C-13
( )
y= ( )

The format of Equation C-13 is an extension of the sigmoid function, given by y = which returns
values between 1 and 0. Likewise, so does Equation C-13 as the only difference is the values e is raised to
24

Figure C-4
25

but the output (y) being no different. For clarity, e is euler’s number, approximately 2.718. We provide
some justification for Equation C-13 in the following lines.
The goal of this paper is to create a model that determines, with high accuracy, the likelihood of
lightrocket2 winning a Taric jungle game based on multiple predictors. The way the model will perform
this is by calculating a probability of winning, equal to the value of y, which will be counted as either a
“win” or a “loss” based on a cutoff point. A good cutoff point is y = 0.5 with a probability of winning
greater than that indicating a win and a probability lower than that indicating a loss. One way this is
written is p(y = 1) which denotes probability (p) of a win, written numerically as y = 1 (with y = 0
indicating a loss). Since the probability of winning a game can never be below 0 and never more than 1
negative values and returns above 1 do not make sense.
Therefore, we require a function that will only return values between 0 and 1. An exponential
function, as seen in Equation C-13, is a natural choice since a positive number such as e can never output
a negative value. Furthermore, its similarity to the sigmoid function means it cannot output a value greater
than 1, or a 100% chance of winning a game. For example, suppose x = 100 then the sigmoid function on
( )
pg. 23 returns ( ) = 1. Such is also the case with the equation in C-13 for input 𝑥 and coefficients β
and β .
On pg. 26 we have plotted logistic models between each of the 22 predictor variables 𝑥 … 𝑥( )
and WinLoss, the response variable denoting the probability of winning. The purpose of these graphs is to
determine significant, or lack thereof, associations between the probability of winning and any single
predictor. We begin our discussion with the CS model. The CS model took all of lightrocket2’s creep
scores for a sample of n = 301 Taric jungle games which was plotted in the logistic model in the upper
left of Figure C-5. The probability of winning or losing a game is labeled on the y-axis of the graph. As
we can see, lightrocket2 has a low chance of winning, often below 40%, for creep scores below 5. This
dramatically increases past 5 CS per minute, and he rarely loses with 7 or more CS with at least an 80%
chance of winning at that point. 68% accuracy in the heading means the CS model accurately predicts a
win or a loss 68% of the time for the 301-game sample. Table C-3 (pg. 27) shows complete prediction
data for wins and losses for all 22 models.
The most accurate of these models in predicting Taric jungle wins was Tower and Kills, the
former referring to tower advantage and the latter referring to kill advantage of one team over another.
Both accurately predicted wins 96.93% of the time. The most accurate model for predicting losses was
tower advantage which correctly predicted this result 96.38% of the time. As expected, tower advantage
was also the most accurate model, making the correct win or loss prediction 96.68% of the time. The
worst model for predicting wins was Baron, referring to when lightrocket2’s team secures a Baron, only
correctly predicting this result 66.26% of the time. PTA, Shurelyias, Flash, and Sunderer never predicted
a loss and only predicted wins whether these items or runes were picked or not.
Also of interest is the probabilities of winning for the maximum and minimum values of each
predictor. For most of the variables the likelihood of winning increased with larger values of, for instance,
CS, dragons, kill difference, kills, wards and so on. However, for some variables the opposite was true.
For the maximum number of deaths lightrocket2 had, 13, his probability of winning was only 0.90%.
Regarding qualitative variables, lightrocket2 only had a 46.88% chance of winning when taking
Conqueror but a 55.01% chance of winning when he did not. Some variables, such as Conqueror, may not
be particularly good predictors but they demonstrate some relation to winning or losing with noticeably
lower probabilities of winning if they are taken versus when they are not. So why bother with these
variables at all?
Indeed, we may say that we are done as we have found two suitable models that accurately
predict the result of lightrocket2’s Taric jungle matches approximately 96% of the time. However, the
26

Figure C-5
27

Table C-3

Model Wins Acc Loss Acc Total Acc Max Prob Min Prob
CS 74.23% 60.87% 68.11% 95.94% 5.16%
Baron 66.26% 86.23% 75.42% 85.04% 31.61%
Dragon 72.39% 77.53% 74.75% 98.39% 3.04%
Kill 96.93% 94.92% 96.01% 99.99% 0.00%
Time 90.80% 13.77% 55.48% 64.36% 44.75%
PTA 100% 0% 54.15% 56.90% 50.39%
Tower 96.93% 96.38% 96.68% 99.99% 0.02%
Item 89.57% 31.88% 63.12% 82.26% 18.05%
Vision 73.01% 47.83% 61.46% 88.96% 32.30%
NoMythic 93.87% 13.04% 56.81% 35.71% 56.04%
Death 81.60% 64.49% 73.75% 0.90% 95.20%
Kills 73.62% 52.90% 64.12% 98.95% 35.54%
Assists 71.78% 71.01% 71.43% 98.84% 11.64%
Shurelyias 100% 0% 54.15% 52.08% 57.80%
Conqueror 90.80% 12.32% 54.82% 46.88% 55.01%
Flash 100% 0% 54.15% 50.00% 55.71%
Ranged 73.62% 32.62% 54.82% 48.86% 56.34%
Melee 73.62% 32.62% 54.82% 56.33% 48.86%
Glacial 73.62% 32.62% 54.82% 50.00% 56.04%
Kayn 96.32% 5.80% 52.82% 42.86% 54.70%
Ghost 73.62% 31.88% 54.49% 56.07% 49.42%
Sunderer 100% 0% 54.15% 66.19% 50.43%

The columns in Table C-3 list all models between the 22 predictor variables and the response (probability of
winning). Of the n = 301 games the predictors were sampled for there were 163 wins and 138 losses. Wins Acc is
the accuracy the model demonstrated in predicting the wins, Loss Acc was the accuracy in predicting the losses.
Total Acc was the accuracy of predictions in all 301 games.

Max prob and Min prob denote the win probability for the maximum or minimum values of each variable. For
qualitative variables (i.e., Conqueror) it was the probability of winning if the variable input was 1 and 0 for Min
Prob.

variables that make such predictions – kill and tower advantage – are collinear with each other, thus it
cannot be determined how well these variables individually predict game results. As a result, we cannot
use them in the logistic model and may be disregarded. Time and Item, referring to game time and the
number of completed items, are also collinear and may be removed as well. This leaves us with 18
predictor variables, none of which have overall accuracy above 74.75%. 74.75% accuracy in predicting
wins and losses is not bad but we may be able to do better by combining predictive power.
In the final section of this paper, we will construct a model of all 18 variables, use statistical
methods to improve this model, assess its predictive power, and create a regression tree for ease of
understanding.
Equation C-14
. . . . . . . . . . . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( ) . ( )

y= . . . . . . . . . . . . . . . . . .
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
28

Equation C-14 can be read if zoomed in several times. It is a logistic model incorporating 17 of
the 18 non-collinear Taric jungle predictors with Melee excluded due to an error. This equation was
generated in Rstudio and reads as follows:
WinLoss ~ Baron + DragonDff + Kills + Deaths + Assists + Kayn + Glacial + Shurelyias + PTA +
Sunderer + Conqueror + Ranged + Flash + Ghost + Vision + NoMythic + CSPerMinute
Where WinLoss, or the probability of lightrocket2 winning a Taric jungle game, is the response y
in Equation C-14. Baron denotes the first predictor 𝑥 with slope β = 5.04622, DragonDff denotes the
second predictor, Kills the third, and so on until the last predictor 𝑥( ) denoted by CSPerMinute with
slope β = 0.79590. Using several statistical methods in Rstudio, this model was found to accurately
predict wins or losses for all n = 301 Taric jungle games 95.02% of the time. It predicted wins correctly
95.71% of the time and losses 94.20% of the time. The data was then divided into 150 training
observations – observations that would be used to estimate the coefficients and create the model – and test
observations. Test observations mimic new games lightrocket2 might play which were not used to create
the model which they are being tested against. This is a way to artificially simulate how the model in
Equation C-14 would preform on new, random data. The model excelled here too, being 96.03% accurate
for test observations.
On the surface, these results might lead us to believe that we have created a very strong model
that does a good job predicting the results of Taric jungle games. However, there are many problems with
this model, one of which is immediately obvious from Equation C-14. Many of the coefficients do not
make any sense. For example, Glacial has a slope of -15.37015 indicating that if Glacial Augment is ever
picked, it has an extremely powerful negative effect on lightrocket2’s chances of winning a game. This is
not true, not only because of Glacial Augment’s 50% season-long win rate but also because, in the n =
301 game sample used to create the model, it had a very casual association with winning or losing. It was
a negative association, but only slightly negative, as observed in Table C-3 and Figure C-5 on the
previous pages. This is clearly nonsense. Similar issues arise with the slopes corresponding to variables
PTA, Conqueror, Sunderer, and Vision. PTA and Conqueror both have large negative slopes despite PTA
having a positive association with winning and Conqueror a modest negative association. Sunderer and
Vision have slight negative slopes (-2.11975 and -0.12633) which are also nonsense. Higher vision score
clearly indicates a greater chance of winning – as does picking Sunderer, not the reverse.
What has occurred with Equation C-14 is the creation of a model that, while accurate, is
predicated on a significant number of variables which return nonsensical values and do not reflect their
individual associations with the data. We seek a better model that has coefficients which make sense
while also being relatively accurate in predicting the result of lightrocket2’s Taric jungle games. Such a
model is presented below:
Equation C-15
( . . . . . . )
y= ( . . . . . . )

Where variables Baron, DragonDff, Kills, Deaths, Assists, and Kayn correspond to
𝑥 , 𝑥 , 𝑥 , 𝑥 , 𝑥 and 𝑥 respectively. This model is considerably better than the one presented in Equation
C-14, both due to its comparative simplicity but also because its coefficients are much more sensible.
Baron, DragonDff, Kills, and Assists were all shown to have positive associations with winning and all
have positive slopes in Equation C-15. Deaths and Kayn, on the other hand, had significant negative
associations with the chance of winning and both have negative slopes here.
29

This model is also a good predictor of lightrocket2’s wins and losses. It is 93.69% accurate for n
= 301 Taric jungle games, only 1.33% worse than the previous model. It is also very accurate for
predicting both wins and losses, 93.48% for losses and 93.87% for wins. Like before, we split the data
into a n = 150 training set and a n = 151 test set to estimate the model’s accuracy for new Taric jungle
games. As expected, it is 92.72% accurate for the 151 test observations. We expect a slight, in this case a
0.97%, decrease in accuracy between predictions for training and test observations. This is due to the
training observations being used to make the model so naturally, the model will predict them more
accurately than data it has not seen before (the test data). The test accuracy of this model is also slightly
worse than that of the first model, with a 3.31% decrease in accuracy. However, the model is much
cleaner than the first and only incorporates important predictors versus a significant amount of ‘noise’
with insignificant predictors such as Flash, Ghost, and Glacial. We say the predictors are important for
two reasons. First, all six predictors had noticeable associations with winning or losing with different
values for each variable predicting a win or a loss depending on the value. Second, they encompass a
broad scope of factors impacting lightrocket2’s games: his individual performance (Kills, Deaths,
Assists), his team’s performance (Baron, DragonDff, Assists), and his matchups (Kayn). This is
preferable to say, a model focusing only on lightrocket2’s contributions, his team’s contributions, or on
pre-game factors (runes, matchups) alone. We consider this our best model for predicting wins and losses
for Taric jungle.
Below we have constructed a regression tree for the model in Equation C-15. The tree is an
imperfect representation of the model, due to its exclusion of the Kayn predictor. This leads to a lower
prediction rate of 86.75% for test data (5.97% decrease) compared to the logistic model. However, it
provides a simple, interpretable view of model decision making and what combinations of values lead to
wins or losses. It is best used for that purpose.
Figure C-6

Decision tree for Equation C-15 including 5 of 6 predictors. There are 15 different combinations for win/loss.
30

Conclusion
In this report we have provided an extensive overview of lightrocket2’s Taric jungle games for
the entirety of Season 11. We have discussed his season statistics for a variety of metrics and attempted to
explain the rationale for both his successes and his failures. This also involved using more specific
statistics which may have contributed to his seasonal numbers. In the second half of this paper we have
applied multiple statistical methods to a sample of 301 Taric jungle games including linear and logistic
regression and a tree based model. These models expanded on the analysis presented in Overview Parts I
and II and identified relationships between several variables including collinearity. We also presented two
logistic models to accurately predict the chance of lightrocket2 winning or losing on Taric jungle and
have done so successfully. While our efforts were not comprehensive and leave some information to be
desired, it is an attempt at detailed statistical analysis of the performance of a single League of Legends
player.
This report was the culmination of almost six months of data collection, research, and analysis of
lightrocket2’s games. While no formula can ever capture the entirety of lightrocket2’s skill or strategy I
am proud of the work accomplished here and look to accomplish more in the future. I hope this report was
informative, if not instructive, in the power of statistical analysis in gaming and provides as many insights
to you as it did to me. Thank you for your support in completing this report, and for the encouragement of
everyone in the discord which helped make it possible. Best of luck in your future games.
31

Cited Works

“Lightrocket2 - Taric Jungler Performance - League of Legends.” LeagueOfGraphs.com,


Leagueofgraphs.com,
https://www.leagueofgraphs.com/summoner/champions/taric/na/lightrocket2/jungle/soloqueue.

U.gg, U.gg, https://u.gg/lol/profile/na1/lightrocket2/overview.

“Lightrocket2 - Summoner Stats - League of Legends.” OP.GG North America, OP.GG,


https://na.op.gg/summoner/userName=lightrocket2.

“Press the Attack.” League of Legends Wiki, FANDOM,


https://leagueoflegends.fandom.com/wiki/Press_the_Attack.

“Conqueror.” League of Legends Wiki, FANDOM, https://leagueoflegends.fandom.com/wiki/Conqueror.

You might also like