The Southern Cross Stunts Trophy

The Folyami Rating System

by Daniel Mlot — 2023-01-15

Source code for generating this document (on GitHub)

This document presents the design choices that shape the Folyami rating system. Though based on the well-known Elo rating system, Folyami incorporates several modifications which aim at making it suitable for racing competitions and, in particular, ZakStunts.

An implementation of Folyami can be found at https://github.com/duplode/elo-zs. While it currently lacks a polished interface, it is otherwise fully functional. Note that the description here corresponds to the default configuration of the elo-zs program. The implementation allows the various extra features of the Folyami system to be tuned, or even turned off to obtain a regular Elo system.

An Elo-like system

What makes Folyami an Elo-like system is how rating updates are done in it. In the Elo system, ratings are updated according to the results of matches, which always involve two players. The difference between the ratings of the players in a match reflects how likely a victory by the highest rated player is regarded to be. After a match, the players exchange rating points, with the loser transferring points to the winner. The formula for the change in points incorporates the difference of ratings before the match, so that the more surprising the result, the larger the change. The key formula which expresses this update procedure is:

\[R’_{X} = R_{X} + K(S_{X} - E_{X})\]

Where:

For instance, consider an Elo system using the formula above with \(K = 18\). Suppose that, before a given match, \(E_{X} = \frac{2}{3}\). That means X is deemed to have a winning probability of about 67%. If X confirms their favouritism and wins the match, they will get (and Y will lose) only 6 points (one third of \(K\)). However, if Y pulls off an upset, they will get (and X will lose) 12 points (two thirds of \(K\)).

Folyami uses the formula above to update ratings. The main differences with respect to Elo lie in how the expected score is calculated, and how the value of \(K\) is set. Before having a close look at those differences, we should mention how other key aspects of Elo translate to Folyami:

Remoteness weights

The biggest difficulty in using an Elo-like system for ratings in a racing competition is that races are not actually collections of one versus one matches. Since the position of each player in the race scoreboard is determined by their time result, the outcomes of the one versus one matches used by Folyami are not independent from each other. On the one hand, that means there is plenty of redundant information, which makes the ratings more volatile than we would like them to be (for instance, an isolated bad result in which a racer ends up last in a race with twenty competitors amounts to losing nineteen matches). On the other hand, there is also valuable information in the one to one comparisons, so we would rather not throw it all away either (specially in a competition like ZakStunts, in which races usually only happen once a month).

As a compromise between those two competing needs, Folyami incorporates all one versus one matches, but gives them different weights to keep the rating volatility from getting too high. The weight thus assigned to each match is called remoteness factor, as it depends on how far from each other the racers were in terms of race positions. The underlying premise is that the comparison between racers near to each other in the scoreboard, who were likely fighting directly for position during the race, tends to be more relevant than that between racers far apart from each other. The formula for the remoteness factor is:

\[q_{remote XY} = \frac{1}{\left(\frac{\pi}{22}\right)^2(P_{X}-P_{Y})^2+1}\]

Here, \(P_{X}\) and \(P_{Y}\) are the positions on the scoreboard: 1 for first place, 2 for second, and so forth. Draws are handled by assigning the average of what the positions would have been if a draw had not happened – for instance, if X and Y are drawn in fifth place, then \(P_{X} = P_{Y} = 5.5\).

The weighing curve is a witch of Agnesi, scaled so that in the limiting case of a scoreboard infinitely long in both directions the weights add up to 21. Below is a plot of the base modulation factor \(K = 18\) modified by the weights:

Provisional factors

One issue which any Elo-like system being ran over a long stretch of time has to deal with is effectively incorporating new players to the ranking. The initial ranking of a player is, at best, a guess, which might turn out to be unrealistic. That being so, it is desirable for new players to have their ratings updated more quickly as a consequence of their results, so that a reasonable rating is attained more quickly. Additionally, ideally we would like for the ratings of experienced players to be less influenced by matches with new players, whose ratings could be highly inaccurate.

Folyami deals with this issue through a second kind of modifier applied to the modulation factor \(K\): the provisional factor. When a racer first joins the ranking, their \(K\) is multiplied by a factor for their first twelve races. The provisional factor decreases over those twelve races, converging to 1 along an exponential. The formula for the factor is:

\[q_{prov} = b^{12 - n_{prev}}\]

Where:

As for minimising the effect of matches against new racers, the Folyami approach is to, when a settled racer (that is, one which has already completed twelve races) faces a newcomer (that is, a racer in their initial twelve races), their \(K\) is divided by the provisional factor of their opponent. A single definition of the factor which accounts for all possibilities in a match between racers X and Y might be written down thusly:

\[q_{prov X} = \begin{cases} b^{12 - n_{prev X}} & \text{if } n_{prev X} < 12 \\ b^{n_{prev Y} - 12} & \text{if } n_{prev X} \ge 12 \text{ and } n_{prev Y} < 12 \\ 1 & \text{otherwise} \end{cases}\]

The graph below shows how the base \(K = 18\) gets modified by the provisional factor in a match involving a newcomer within the initial twelve races window and a settled racer. (Note that no remoteness weights are being applied in this graph.)

A few additional remarks:

Expected score and the gamma performance model

The Folyami system also differs from the Elo system in how expected scores are calculated. While this change has only a minor effect over the ratings themselves, it comes as part of a conceptual sharpening of the system which makes it easier to use for performance predictions and simulations. To motivate the change, we will begin by considering the Elo system. Let’s start by looking at the Elo expected score formula:

\[E_{Elo X} = \frac{1}{e^{-\alpha(R_{X} - R_{Y})} + 1}\]

Where:

The expected score tends towards 1 as the rating difference increases towards infinity (and tends towards 0 as it decreases towards minus infinity), along a logistic curve.

Given that Elo match scores range from 0 to 1, the expected score can, as long as we disregard the possibility of draws, be interpreted as the victory probability for a player. Further still, the expected score formula can be derived from a performance model for the players with the following features:

Some traits of the exponential performance model described above fit racing competitions very well. The performance variable can be interpreted in terms of lap or race times, with zero corresponding, for instance, to an ideal lap, and higher values to deviations from it. Linking a rating system to a performance model opens up interesting possibilities, such as using ratings to parameterise a simulator of race results.

The exponential model, however, has an important weakness when applied to races and simulations of them. Since even in a single race lap offers innumerable occasions for mistakes, a probability density function whose highest value is at zero, like the exponential, is not realistic. A distribution with zero density at zero, thus representing the perfect lap as an unattainable ideal, would be much preferable. Auspiciously, there is a simple way of fixing this problem: replace the exponential distribution with one of its generalisations, the gamma distribution, which has the following useful characteristics:

The graph below shows a handful of gamma probability densities, to give a sense of what the curves look like. The shape 1 example amounts to an exponential, the two shape 3 examples demonstrate changes in rate (with a higher rate corresponding to a higher rating), and the shape 5 illustrates the difference a shape increase makes:

Choosing the gamma shape with simulation realism as the main goal is, for the time being, a matter to be settled empirically. While experiments suggest 7 can be a plausible shape for single laps in NoRH Stunts, the adequacy of a value is heavily dependent on track, car and competition rules. The Folyami system uses a shape of 3, which appears to make for a simple yet minimally reasonable model for the ratings.

The Folyami victory probability/expected score formula, which therefore follows from a shape 3 gamma performance model, is:

\[E_{X} = 6W_{X}^5 - 15W_{X}^4 + 10W_{X}^3\]

Where:

\[W_{X} = \frac{1}{e^{-c\alpha(R_{X} - R_{Y})} + 1}\]

In these formulas:

The graph below shows the Elo and Folyami expected scores, as well as the bare shape 3 gamma victory probability without the adjustment by \(c\).

As the graph shows, the Elo and Folyami victory probabilities are similar to each other, with the absolute difference between them never exceeding 0.01. While using the Elo expected scores would have been defensible in face of this fact, the choice was made to trade this operational simplification for the clarity of not needing approximations to convert between ratings and performance models. In particular, the shape 3 gamma density associated with a rating can be given as:

\[d_{X}(z) = \frac{\beta_{X}^3}{2} z^2 e^{-\beta_{X} z}\]

Where \(z\) is the performance variable (in arbitrary units), and the rate parameter \(\beta_{X}\) is:

\[\beta_{X} = e^{c\alpha(R_{X} - 1500)}\]

(The subtraction of 1500 from the rating is cosmetic, performed so that the initial rating of 1500 corresponds to a rate parameter of 1.)

To wrap up this section, below are plots of a few gamma densities for specific rating values:

Appendix: illustrative charts

Points gained after a win

The table below displays several examples of points gained after a match win given different combinations of rating difference (rows) and remoteness/scoreboard distance (columns). No provisional factors are applied. Values are rounded to one decimal place.

↓ΔR \ ΔP→ 1 3 6 10 15
-500 16.8 14.5 9.9 5.7 3.1
-300 15.0 12.9 8.8 5.0 2.7
-200 13.4 11.5 7.9 4.5 2.4
-100 11.2 9.7 6.6 3.8 2.1
-50 10.0 8.7 5.9 3.4 1.8
-30 9.6 8.2 5.6 3.2 1.7
-10 9.1 7.8 5.3 3.0 1.7
0 8.8 7.6 5.2 3.0 1.6
10 8.6 7.4 5.0 2.9 1.6
30 8.1 7.0 4.8 2.7 1.5
50 7.6 6.5 4.7 2.5 1.4
100 6.4 5.5 3.8 2.2 1.2
200 4.3 3.7 2.5 1.4 0.8
300 2.6 2.3 1.6 0.9 0.5
500 0.8 0.7 0.5 0.3 0.1

Victory probabilities

The following table shows examples of victory probabilities in the absence of draws given different rating advantages. The percentages are rounded to one decimal place.

ΔR Victory %
0 50.0
50 57.0
100 63.7
150 70.0
200 75.7
250 80.7
300 85.0
350 88.5
400 91.4
450 93.7
500 95.4
550 96.7
600 97.7
650 98.4
700 98.9
750 99.2
800 99.5

Acknowledgements

The Folyami system was inspired by the earlier efforts in creating rankings for the Stunts community, especially by Mark L. Rivers’ Stunts World Ranking.

Several of the design decisions in developing Folyami, especially those to do with the modifiers to the modulation factor, were informed by Glickman, Mark E., A Comprehensive Guide to Chess Ratings (1995).