You are on page 1of 12

Forecasting Success in the National Hockey League Using Advanced Performance Metrics

Josh Weissbock 4521930 University of Ottawa CSI-5388 April 18, 2013

Abstract In this project I collected a number of traditional and advanced statistics, over a period of 10 weeks, in the National Hockey League. I use this data to train a number of dierent classiers to forecast success in an NHL game. In the rst half I compare classiers using traditional stats, advanced stats and both. Then I use these results and expand upon the advanced stats, using variants of the PDO stat, and train new classiers. The best classier is a NeuralNetwork able to accurately predict success 84.33% of the time using 10-fold cross-validation and the PDO of each team over the last 3 games.

1 Introduction 2 Background 3 Data 3.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Classication & Results 5 Conclusion and Future Work 2 2 3 3 5 6 7


Performance Metrics in hockey, or Advanced Statistics are a new form of statistics in hockey to measure the performance of hockey players and teams. Traditional stats in hockey have been used by teams a few decades but these stats do not do a great job in terms of prediction or measuring performance of teams. Stats such as Plus/Minus, the summary of goals scored for and against while a player was on the ice, do not tell how or even if a specic player was involved. Despite these issues, traditional media still cites these numbers to back arguments, regardless of their content. Other traditional stats such as hits, blocked shots, giveaways and takeaways have less of a correlation with results than previously thought. Performance metrics are starting to become more popular over the last decade due to their recent developments. Despite their use mainly by bloggers and some hockey analysts there is evidence that professional NHL teams are using performance metrics to help analyze their own gameplay [4]. There are many dierent types of performance metrics in hockey such as Fenwick, a measure of possession, Corsi, an expanded version of Fenwick, PDO, a measure of luck and many variants of these representing the dierent game states. In this project I am trying to determine success in NHL games, by forecasting the winner of a game between two teams. This can give a cutting edge competitive advantage to team managers, scouts, hockey analysts and gamblers. I do this forecasting in two parts, rst looking at classication of wins and losses by traditional stats, advanced stats and combination. In the second part I look at classication by modifying the dierent states of these advanced statistics, specically PDO.


To the best of my ability, I have not found any other research in using machine learning methods to predict who is more likely to win a game between two NHL teams. Related research I have found include Gramacy et al. who used regularized linear regression methods to try and estimate individual player contributions in the NHL [5]. The authors argue that traditional stats such as Plus/Minus do not tell the full story on the eect of individual players. Often players who are strongest in possession are the least well known. The authors use linear regression with a Bayesian approach and a Laplace prior distribution and demonstrate that some superstars dont have as large of an 2

aect on the ice relative to their high salaries while many low paid players are the ones to drive possession. Warner [8] tries predicting the margin of victory in NFL games by comparing machine learning methods to the Las Vegas line. Using the data of over 2,000 games that took place over a decade the author tries to predict the winners using logistic regression and achieves an accuracy of 64%, 2% higher than the Las Vegas Line. Charron [1] argues that traditional stats (hits, blocks etc) do not tell a very descriptive story on the success of teams. Many of these stats are counted by people employed by the home team in their home arena and it is well demonstrated that these statistic trackers have a bias towards the home team. Despite this bias Charron shows that Fenwick Close (0.623 and 0.218 for Home and Away games) has the highest r2 correlation with points achieved in the standings. This is much higher than other stats including Wins, Goals For, Goal Dierential, Giveaways, Takeaways, Hits and others. Murphy [7] looks at the r2 correlation between traditional stats and 5on-5 Goals For/Goals Against ratio (5-5F/A). Using the data from all seasons, from 2005 onwards, he shows that 5-5F/A is the most correlated to wins (0.605) and points (0.655) compared to traditional stats such as Goals Against / Game (0.472, 0.510), Power Play % and Power Kill % (0.372, 0.390) and nine others.



I collected a number of traditional and advanced stats daily for each game that took place that evening. After the games were completed additional stats on the results were also collected and compiled. Data was collected every day during the period of this course for a total of 386 games over approximately 10 weeks. A Python script was created to run daily to automatically collect the data. Each of the stats were mined from a number of dierent websites. The traditional data collected before the games were: Goals For (GF), Goals Against (GA), Goal Dierential (GlDi), Power Play Success Rate (PP%), Power Kill Success Rate (PK%), Shot Percentage (Sh%), Save Percentage (Sv%), Winning Streak and Conference Standing. This data came from and which is updated daily. The Advanced Statistics that were collected were Fenwick Close %, PDO and 5-5 F/A. These stats are not published by the NHL instead are maintained by individual online hockey 3

analysts such as The next day the post-game information is collected and includes the winner and loser, the teams goals scored for and against, their shots percentage, their save percentage and their PDO (at all states). Conference Standing

Fenwick Close %

Win Streak






Toronto Ottawa

Away 44.92 108 100 8 18.7 85 892 919 1027 Home 49.85 89 72 17 29.8 89.4 929 939 1010 Table 1: Example of data for a game



2 3

6 5

1.05 Win 1.12 Loss

The traditional stats were selected because they are well known and often cited amongst the media. The advanced stats were chosen due to their high correlation with wins and points as shown by Charron, Drance and Murphy. Fenwick Close is a stat representing possession, it is a rate of all shots and missed shots a team has taken, divided by the number of shots and missed shots against [1]. Close refers to the state of the game where both teams are at even strength (5-on-5, 4-on-4 etc) when the score is tied in the third period, and when there is no more than a one goal separation between the two teams in the rst two periods [1]. Teams who are higher than 50% are said to be a positive puck possession team, the more often you have the puck, the more you shoot the puck, the more you score. The advanced stats community in hockey often discusses luck. Luck can be described as the results that fall outside of normal boundaries and variance in the players performance [3]. A player cannot maintain a shot percentage multiple standard deviations higher or lower than the average for long periods of time. A goalie will not be able to stop every shot all season, nor will a goalie allow a goal on every shot. This is referred to as luck, when the results of the players performance is better (good luck) or worse (bad luck) than the normal average and variance. Over the long term the luck of all teams will regress but in the short term you can see which teams have been luckier than others. PDO is a stat of luck, luck role in hockey at the NHL level due to the near even skill level between teams. Skill does determine a large percentage of the game but luck is still involved in 38% of the results and standings [6]. 4






PDO doesnt stand for anything and it is the summation of Sh% and Sv%. Teams who play with a PDO over 100% are exceeding their expectation and are seen as lucky while teams who play with a PDO less than 100% are not meeting their level of skill and are seen as unlucky [2]. Over a regular season PDO for all teams will regress to close to 100%. Within 25 games PDO will be at 100% 2% [3] as can be seen in gure 1. This is useful to us as we can see in the short term who has been lucky and who has been unlucky. Over the long term this stat becomes less relevant as PDOs will regress to the norm.

Figure 1: PDO Boundaries of Chance from [3]



Preprocessing of the data was done through the entire time collecting the data. Each day new data was ensured to be valid and up to date, and to make sure it fell within the normal variance. Data summarization was completed to see how the data was skewed. Data cleaning followed up to make sure there were no missing values and no values were outliers that might cause issues. Data integration and transformation was completed. There was no data reduction as there was only data for 386 games and data discretization was the nal step to place the nominal and ordinal values into appropriate ARFF data for Weka. Python was then used to format all the data from a .csv le into its appropriate ar le. The data was represented as the dierential between the statistics of the two teams with the winning team receiving the label Win and the losing team receiving the label Loss. An example of how 5

the data from table 1 would be represented in the arf le can be seen in table 2 Away -4.93 19 28 -9 -11.1 -4.4 -37 -20 17 -1 Home 4.93 -19 -28 9 11.1 4.4 37 20 -17 1 Table 2: Dierential for Weka of data in table 1 In the rst part of the experiment an ar le was created with only the traditional stats, one ar le with only the advanced stats and one ar le with both. In the second part of the experiment dierent variations of the PDO stat were used. As discussed previously PDO regresses to the norm of 100% for all teams over the long term. Using a smaller game sample we can represent if a team has been lucky or unlucky in the previous n games. In the second part of this experiment I created ar les of the data, using Python; similar to the rst half I used all stats but used the PDO of each team over the last 3, 5, 10, and 25 games. 1 -0.07 Win -1 0.07 Loss

Classication & Results

This problem becomes a binary classication problem, either Win or Loss. I used 10-fold cross-validation and used a number of algorithms. NeuralNetworks (MultilayerPerceptron Weka variant) as it is known to work well with noisy data, NaiveBayes to look at future results based on the past, Support Vector Machines (SVM, using the Weka SMO algorithm) as it has worked well in previous experience and J48 as it produces a human readable output. All algorithms were run using Weka with their default values and all classiers were compared to the baseline using ZeroR. In the rst part of the experiment I used the classication algorithms to see what accuracy is possible with traditional stats, advanced stats and mixed stats. The PDO used in this part was the PDO of the season so far and the results can be seen in table 3 Traditional 49.48% 60.10% 59.07% 58.03% 58.68% Advanced Mixed 49.48% 49.48% 55.05% 60.10% 54.53% 57.25% 52.85% 58.29% 55.83% 58.03%

Baseline SMO NB J48 NN

Table 3: Results of the rst experiment. 6

In the second part of the experiments I used both traditional and advanced statistics but modied the PDOs to try dierent game lengths to represent the luckiness of the teams. I used game lengths of 3, 5, 10, and 25 and compared them to the PDO using the season total results as seen in table 3. This is allows us to see teams who have been luckier in the short term. Results can be seen in table 4 with futher breakdown of the best classiers can be seen in table 5. PDO3 49.48% 81.22% 68.13% 77.46% 84.33% PDO5 PDO10 PDO25 PDOall 49.48% 49.48% 49.48% 49.48% 65.29% 60.10% 60.10% 60.10% 60.36% 58.55% 58.29% 57.25% 66.32% 66.71% 55.06% 58.29% 78.24% 63.47% 60.49% 58.03%

Baseline SMO NB J48 NN

Table 4: Results of the second experiment.

NN & PDO3 NN & PDO5 SMO & PDO3 J48 & PDO3

Precision Recall F-Score ROC Curve 0.843 0.843 0.843 0.887 0.783 0.782 0.782 0.818 0.812 0.812 0.812 0.812 0.775 0.775 0.775 0.774

Table 5: Breakdown of results for the best classifers .

Conclusion and Future Work

In the rst experiment I looked at predicting success in NHL games by comparing the results of a number of classiers using traditional, advanced and mixed stats. The greatest accuracy came from using the SVM Weka implementation SMO resulting in an accuracy of 60.10%. This result was the same for both traditional stats and mixed stats and was able to beat the baseline of 49.48% (essentially a coin ip). Using advanced stats the accuracy decreased. My hypothesis for this is that the data used was such a short sample of a hockey season (in 2013 teams only played 48 games compared to 82 in a regular season due to a work lockout). This shortened season and small data sample does not show all of the best teams rising to the top of the standings and the worst teams dropping. 7

In the second part of the experiment by replacing the overall PDO of the season, to a much shorter time frame, the accuracy was able to increase by a lot more than my initial estimates. Using PDO of the last three games each team has played and a Neural Networks classier the accuracy of the classier has improved to 84.33%. This is much better than the baseline and much better than using PDO for the entire season. For all classiers used the accuracy increases as the number of games the PDO is calculated on decreases. By looking at the precision, recall, f-score and area under the ROC curve of the best classiers in table 5 we can conrm this is still the best classier. While I feel fairly comfortable with the success of this classier there are many future items I would like to try and expand upon this work. As this was a shortened season I would like to try it again with the 2013-2014 season and use an entire seasons worth of data. There are many other advanced stats that I did not collect on this project that I would like to add and see how it aects the results such as Fenwick Tied, Fenwick +1, Fenwick +2, Fenwick -1, Fenwick -2, Score-Adjusted Fenwick, Corsi % (similar to Fenwick but includes blocked shots), the stats of the team based on the goaltender who is playing that game, injuries, weather at the arena, change in weather for the teams, change in altitude, recent trades, change in teams after the trade deadline, scoring chances and the odds that casinos are giving each team to win. These were not collected in this experiment because of the lack of knowledge of their possible importance and at the beginning of the project I was unsure where to nd them. In addition, as the PDO used in the second half of the project was the PDO generated at all states of the game I would like to try and use PDOn, calculated only from shots and goals only when the team is at even strength. There are a few applications I would like to try this classier on. The rst is to see how successful it would be for gambling and see if it is likely to generate prots over the long term. Another application I would like to try is using the classier to try and predict playo winners. In the best of 7 series you are more likely to see the stats regress to the normal. In a one-o game luck takes up 38% of the game so I would hypothesize that this classier would be more accurate in the playos. The biggest lessons learned that came out of this project was the work that was required to collect the data. Daily the data had to be collected, due to the lack of data available in the past, for many of these stats it is only possible to see their current values. Daily the data had to be ensured it was collected, the appropriate values and to make sure nothing was missing. The python script to automate much of this helped a lot, but it still required daily checks, over 10 weeks, to validate the data was correct. 8

While this project has been successful there is still much work to be done to ensure that it was not a one-o experiment that is caused by the luck of the data.

[1] Cam Charron. Breaking news: Puck-possession is important (and nobody told the cbc). breaking-news-puck-possession-is-important-and-nobody-told-the-cbc/, 2013. [Online; accessed 12-April-2013]. [2] Cam Charron. Pdo explained. 01/21/pdo-explained/, 2013. [Online; accessed 12-April-2013]. [3] Patrick D. Studying luck other factors in pdo. 2013/1/10/studying-luck-other-factors-in-pdo, 2013. [Online; accessed 12-April-2013]. [4] Thomas Drance. Drance numbers: Which canucks defender suppresses shots most eectively? drance-numbers-which-canucks-defender-suppresses-shots-most-effectively/, 2012. [Online; accessed 12-April-2013]. [5] Robert B Gramacy, Matthew A Taddy, and Shane T Jensen. Estimating player contribution in hockey with regularized logistic regression. arXiv preprint arXiv:1209.5026, 2012. [6] Hawerchuck. Luck in the nhl standings. http://www.arcticicehockey. com/2010/11/22/1826590/luck-in-the-nhl-standings, 2010. [Online; accessed 12-April-2013]. [7] Blake Murphy. Exploring marginal save percentage and if the canucks should trade a goalie. exploring-marginal-save-percentage-and-if-the-canucks-should-trade-a, 2013. [Online; accessed 12-April-2013]. [8] Jim Warner. Predicting margin of victory in n games: Machine learning vs. the las vegas line. 2010.