A few friends have been working on an algorithm for predicting baseball game outcomes. Roughly, the model uses player level projections to simulate baseball events, a process that requires substantive MLB and web-scraping knowledge.
Although the full operation is fascinating, this post will primarily focus on the evaluation of the predictions. The particular model in question has had a decent start to the summer. So how can we judge the accuracy of these picks? And what does that tell us about the feasibility of betting on sports?
While much of this post will seem straightforward, answering these questions gave me an increased appreciation for the variability in sporting outcomes with respect to gambling. I’ve posted the code here, in case anyone else is interested in using a similar process with their own projections.
First, some background. The data consists of 659 picks made versus the game’s opening money line since the start of the 2017 season. Each pick is based on a model-estimated probability for each team in each game, which is then compared to that team’s market probability. There have been about 950 MLB games thus far, which means that the model has taken a team in about 7 of every 10 contests. On the remaining games, probabilities for each team are too close to the market’s price to have an edge. Those games were dropped from the data.
The data also contain the observed differences between the model estimated probability and implied probability, relative investments (made assuming an equal balance prior to all games), the amount to be won or lost depending on the game’s result, the actual game results (win or lose), closing money line prices, and the difference in implied team probabilities between the opening and closing odds. Note that bets are made on “units” – this could be dollars, pistachio shells, or whatever your mind can imagine. Generally, higher units are placed on bigger edges; the average unit per pick is about 0.60. Note that the highest unit is capped at 1.0, which is done given the non-zero chance that probabilities are off on account of lineup or pitching changes.
Next, some summary statistics. While a nearly identical number of picks have backed the away team as have backed the home team (51% to 49%), nearly twice as many underdogs have been backed compared to favorites (64% to 36%). Altogether, the model is up about 27 units thus far, which roughly reflects about a 7% return on investment. Game results have been most kind towards backing the home team (+24.5 units) compared to the visiting team (+2.5 units), with underdogs slightly more profitable than favorites (+19.9 to +7.1 units). While a deeper investigation could look into if these differences are meaningful, that’s not a primary goal.
One immediate anecdote that I picked up quickly is how variable things could appear in small periods of time. Here’s the cumulative profit from day one of the season (shown in red). In the background are 200 simulated season-to-date profits, done using the given market implied probabilities as the true probabilities for each team.
Within any given week (say, 75 picks), profits could vary by as much as 15 or so units. And at certain time points (say, between picks 100 and 210), all appears lost, with picks going into a deep dive. Even for me, as someone whose job entails having a decent understanding of randomness, it’s tempting to look for patterns in the red line, even though none likely exist.
Relative to random season outcomes simulated using the opening market probabilities, model picks currently stand in the 96th percentile. That is, only about 4% of sequences using random game outcomes would be doing this well if the opening market probabilities reflected the true probabilities. And note the center of the above sequences: roughly -10 units, which accounts for vig taken in by betting markets.
In addition to the chart above, I made a similar one (not shown) with one important difference; instead of market-implied prices as the truth, I used the model-generated probabilities. In expectation, this simulation will yield positive profits. But in what was a total shocker for me, it was still reasonable – it happened about 5% of the time – for such a model to turn a negative profit through 650 picks. That is, even with known, better than market probabilities for each game outcome, it’s still feasible to lose money across 650 games. First thoughts that went through my mind:
-650 games is three NFL seasons worth. That is, an NFL bettor taking every game could have three straight losing seasons in a row while still having better than market odds for each of his or her picks.
-Related: I could not be a professional gambler.
I thought it would be interesting to take a look at which team the model has picked most often (both for and against). Here’s that plot. On the x-axis is the total investment made, either for (on the left) or against (on the right) each team, and the y-axis is the season-to-date profit.
This particular model continues to back the Padres and Mets at most opportunities, while picking against the Red Sox. Altogether, those picks have mostly broken even.
Meanwhile, the model has had some success taking the Rockies, White Sox, and Rays, while likewise performing well when fading the Indians, Giants, and Blue Jays. Picking the Phillies has not been so fruitful, nor has picking against the Diamondbacks.
Our final check looks at how the model has done relative to line movement. If the model can “predict” the direction where prices will go in the moments leading up to the game, that would generally be a good thing. From what I’ve been told, closing market prices are generally more efficient than opening numbers.
Here’s a histogram showing line movement (on the probability scale). Positive changes reflect movement in the direction of the model’s chosen team.
Among the picks to date, about 1 in 20 opening lines precisely match closing lines. A tick under 58% of games have moved in the direction of the model’s team, while about 37% have moved against.
Across all contests, the average price has moved about 0.6% in the direction of the model’s chosen team. While this seems like a small number, across several hundred games, that type of advantage would seemingly add up.
There’s also a decent link between the model’s projected edge for a team and the likelihood of movement in the direction of that team. The average game moved 0.25% among games with smaller-sized edges, 0.5% on games with medium-sized edges, and a full 1.0% on games with the largest edges (putting about 200 games in each of these categories).
Assorted final notes:
-Log-loss is a proper scoring rule for binary outcomes, but it is less evident how log-loss can precisely evaluate this model, given that some picks are made with more of an edge than others (perhaps a weighted log-loss?). Additionally, there’s no immediate interpretability to log-loss. In any case, the average log-loss is -0.6845 for the market implied probabilities and -0.6836 for the model estimated probabilities (closer to 0 is better).
-It is tempting to tie team allocations (as far as supporting or fading) to changes to the game that have been seen this summer. This includes the supposed juiced ball and increases to HR/FB ratio. Something to keep an eye on.
-How do others’ evaluate picks, either their own or from others? My prior is to trust the market until proven otherwise, and that’s a very strong prior.