This vignette looks at the function zipf_race
, which is the worker function for both zipf_hcp
and zipf_init
. It employs a method of race standardisation that was first explained by Simon Rowlands [1], of renowned racing experts Timeform, and uses zipf’s law. A detailed explanation of the function will be shown, which should make it easier to see what zipf_hcp
and zipf_init
are doing.
The example_race dataset contains two races and can be used to highlight zipf_race
, load the dataset and do some necessary cleaning (see the Data Cleaning vignette for details). I will use the dplyr
package to help reduce the amount of code.
# load dplyr and example_race dataset
library(dplyr)
data(example_race)
clean_df <- example_race %>%
mutate(race_id = paste(date, time, sep = "_"),
newtime = conv_times(times = wintime),
fintime = conv_margins(btn_l = btn_l, win_time = newtime),
btn_sec = fintime - newtime)
zipf_race
requires two dataframes of two different races, the first is a race that requires handicapping, the second is a race that is to be used to handicap the first. So races are to be split into seperate races:
race_one <- filter(clean_df, race_id == "01/01/01_2:20")
race_two <- filter(clean_df, race_id != "01/01/01_2:20")
In this example race_two will be the race used to handicap race_one, and so a bit more work is required for this race. To show how zipf_race
works arbitrary ratings are calculated for the performances of horses in this race, as such:
race_two <- race_two %>%
mutate(scale = lbs_per_sec(dist, surf = "turf"), # calculate lbs per second scale
btn_lbs = scale * btn_sec, # calculate beaten lbs
rating = 70 - btn_lbs) # calculate arbitrary ratings, winner gets 70
The race to be handicapped needs similar steps, that is beaten lbs is needed, so race_one is processed:
race_one <- race_one %>%
mutate(scale = lbs_per_sec(dist, surf = "turf"),
btn_lbs = scale * btn_sec)
Typically races will be run with horses carrying different weight, so a ‘difference at the weights’ calculation is required that takes weight carried into account (as well as beaten lbs), see Data Preparation for an example use of the diff_at_wgts
function.
The zipf_race
function can now be used, it takes three (sometimes 4) parameters, below is a simple table explaining the various parameters for zipf_race
, including the inputs that are to be entered from this vignette:
param | details | example input |
---|---|---|
race | dataframe of a race to be handicapped | race_one |
btn_var | name of variable which contains margins between horses in race | "btn_lbs" |
race_2 | dataframe of race to be used to handicap race | race_two |
rating | name of ratings variable (if applicable) in race_2 | "rating" |
So zipf_race
is called, and a rating for the winner of race_one is returned:
output <- zipf_race(race = race_one, btn_var = "btn_lbs", race_2 = race_two, rating = "rating")
The winner of race_one has earned a rating of 65.85
To understand how 65.85 was arrived at we can have a look under the hood at the zipf_race
function. There is a small amount of housework done early in the function:
In the example in this vignette, recall from the table and function call above the params entered into the zipf_race
function, these can be seen in Step One below:
# Step One
race_one_margins <- race_one[["btn_lbs"]]
race_two_ratings <- race_two[["rating"]]
# Step Two
race_one_margins <- race_one_margins[!is.na(race_one_margins)]
race_two_ratings <- race_two_ratings[!is.na(race_two_ratings)]
# Step Three
if(length(race_one_margins) != length(race_two_ratings)) {
if(length(race_one_margins) > length(race_two_ratings)) {
race_one_margins <- race_one_margins[1:length(race_two_ratings)]
} else {
race_two_ratings <- race_two_ratings[1:length(race_one_margins)]
}
}
With the three steps above performed, there are now two vectors of equal length, one with race margins, the other with race_2 ratings:
race_one_margins
## [1] 0.00 0.84 2.52 3.36 5.88
race_two_ratings
## [1] 70.000 65.144 62.716 57.860 45.720
The handicapping, using zipf’s law, can begin in earnest now:
(ratings <- race_two_ratings + race_one_margins)
## [1] 70.000 65.984 65.236 61.220 51.600
(zipf <- 1 / (1:length(ratings)))
## [1] 1.0000000 0.5000000 0.3333333 0.2500000 0.2000000
(ratings <- ratings * zipf)
## [1] 70.00000 32.99200 21.74533 15.30500 10.32000
(total <- sum(ratings, na.rm = TRUE))
## [1] 150.3623
(winning_rating <- round(total / sum(zipf), 2))
## [1] 65.85
Hopefully this explains the process a little more clearly. zipf_race
is used by zipf_init
and zipf_hcp
.
In zipf_hcp
a race is handicapped using a dataset of races, so zipf_race
is called for each unique race in that dataset. It returns a list with various elements, one of which is a dataframe of ratings, each row consists of the id for a race in the dataset of races, and the rating returned by zipf_race
.
In zipf_init
, it’s use is a little different, because it initialises a handicap, so entering a single dataset, it groups races according to the users input, and assesses the performance of each race in each group by using the all races in the same group (including itself, which always returns a rating of 0). This means it calls zipf_hcp
and returns a list of various elements, one of which is a dataframe of ratings, each row belongs to one race in the single dataset, including a rating, the mean of all ratings returned by zipf_hcp
for that one race.
[1]: Simon Rowlands article explaining the use of zipf’s law