betfaiR

This vignette will walk through the use of betfaiR to collect data over time - the markets we will focus on is the match odds markets for Man City vs Spurs and Arsenal vs Leicester, as well as the Premier League outright market. However this approach can be used for collecting data about other markets, to potentially build models and establish trading strategies.

I will try to explain how to perform similar tasks on both Windows and Linux/OSX machines, this will involve creating an R script and using a Task scheduler (Task Scheduler on Windows and cron on Linux/OSX); this blog by Tyler Rinker has been very useful to me.

As an example, the plot below shows the minute by minute Betfair price data during the course of a football match between Manchester United and Manchester City:

There are a few things that we will need to do:

Find markets

First, some familiarity with events, competitions, and markets will help with creating an R script to automate the collection of data. As we are focussing on the Premier League, we need an eventTypeId, a competitionId, which will help us to find the markets we are interested in.

# load betfaiR and login
library(betfaiR)
bf <- betfair(usr = Sys.getenv("bf_usr"),
              pwd = Sys.getenv("bf_pwd"),
              key = Sys.getenv("bf_key"))
Login successful
# find correct eventTypeIds
eventTypeIds <- bf$eventTypes()
eventTypeIds[grepl("soccer|football", eventTypeIds$eventType_name, ignore.case = TRUE),]
   eventType_id    eventType_name marketCount
1             1            Soccer        8078
20         6423 American Football          23
# soccer/football has eventTypeId of 1, lets use that along with marketCountries to find the Premier League competition
competitions <- bf$competitions(filter = marketFilter(eventTypeIds = 1,
                                                      marketCountries = "GB"))
competitions[grepl("premier", competitions$competition_name, ignore.case =  TRUE),]
   competition_id                           competition_name marketCount
2         6447264      Northern Premier League Challenge Cup          48
5         4644737 Northern Premier League Division One South          24
7        10403285                         Premier League Cup          48
17             31                     English Premier League         322
22            105                       Scottish Premiership          99
23         820582                           Isthmian Premier          24
29        4644728 Northern Premier League Division One North          72
   competitionRegion
2                GBR
5                GBR
7                GBR
17               GBR
22               GBR
23               GBR
29               GBR
# the Premier League has a competition id of 31, we can find the different market types for this competition
marketTypes <- bf$marketTypes(filter = marketFilter(competitionIds = 31))
   marketType marketCount
3      WINNER           1
22 MATCH_ODDS          20

The marketIds of the two matches (Man City vs Spurs and Arsenal vs Leicester) and the outright winner of the Premier League are shown below:

The following are the three markets of interest, which were collected and saved sometime on Weds 10th Feb 2016.


Market ID:      1.122843814
Event ID:       27674404 
Market Name:    Match Odds
Event Name:     Man City v Tottenham 
Matched:        224247.78

Runners:     3 
 selectionId runnerName handicap sortPriority
       47999   Man City        0            1
       48224  Tottenham        0            2
       58805   The Draw        0            3

 ---------------------------------------------------------------------------
Market ID:      1.122843669
Event ID:       27674402 
Market Name:    Match Odds
Event Name:     Arsenal v Leicester 
Matched:        206538.46

Runners:     3 
 selectionId runnerName handicap sortPriority
        1096    Arsenal        0            1
       48461  Leicester        0            2
       58805   The Draw        0            3

 ---------------------------------------------------------------------------
Market ID:      1.118280148
Event ID:       2022802 
Market Name:    2015/16 Winner
Event Name:     Barclays Premier League 
Matched:        13511634.58

Runners:     20 
 selectionId runnerName handicap sortPriority
        1096    Arsenal        0            1
       48461  Leicester        0            2
       47999   Man City        0            3
       48224  Tottenham        0            4
       48351    Man Utd        0            5
       48756   West Ham        0            6

 ---------------------------------------------------------------------------

R script

A script will need to login, find the correct markets, retreive the relevant data, and then save the data. The process of finding the correct markets needn’t be repeated time after time, so this data can be retrieved once, saved and then loaded every time. Although the script needs to be able to run straight off the bat, so it might look something like the code below:

library(betfaiR)

bf <- betfair(usr = Sys.getenv("BETFAIR_USR"),
              pwd = Sys.getenv("BETFAIR_PWD"),
              key = Sys.getenv("BETFAIR_KEY"))

files <- list.files()

if(!("marketIds.RDS" %in% files)) {
    cittot <- bf$marketCatalogue(filter = marketFilter(competitionIds = 31,
                                                       textQuery = "Tottenham",
                                                       to = "2016-02-15",
                                                       marketTypeCodes = "MATCH_ODDS"),
                                 maxResults = 10)
    arslei <- bf$marketCatalogue(filter = marketFilter(competitionIds = 31,
                                                       textQuery = "Arsenal",
                                                       to = "2016-02-15",
                                                       marketTypeCodes = "MATCH_ODDS"),
                                 maxResults = 10)
    winner <- bf$marketCatalogue(filter = marketFilter(competitionIds = 31,
                                                       marketTypeCodes = "WINNER"))
    marketIds <- list(cittot = cittot, arslei = arslei, winner = winner)
    saveRDS(marketIds, "marketIds.RDS")
} else {
    marketIds <- readRDS("marketIds.RDS")
}

The complete R script which will be run each time can be found at the bottom of this page or on github, the script is commented so you should be able to follow it.

schedule tasks

Tasks will be run every minute on Sunday 14th Feb between 10am (GMT) and 7pm (GMT). So data might catch pre-match team announcment movements, and how these may ripple over other markets.

windows

Windows does have a GUI to help with Scheduling tasks (read about that here) but some of it is quite restrictive, eg. you can (Windows 8) only schedule a task to run every 5minutes. So we will be using the command line as an administrator, docs about the arguments which can be used can be found here.

The command entered for this task will be

schtasks /create /sc minute /sd 14/02/2016 /st 10:00 /ed 14/02/2016 /et 19:00 /tn betfairsupersunday /tr C:\Users\TomHeslop\Documents\Github\betfaiR\vignette_two\task.bat

So walking through that:

  1. schtasks /create creates a task, what follows are the arguments or parameters
  2. /sc minute /sd 14/02/2016 /st 10:00 informs the schedule frequency (/sc minute), the start date (/sd 14/02/2016) and the start time (/st 10:00)
  3. /ed 14/02/2016 /et 19:00 informs the end date (/ed 14/02/2016) and the end time (/et 19:00)
  4. /tn betfairsupersunday informs the task name so it can be found/deleted/updated
  5. /tr C:\Users\TomHeslop\Documents\Github\betfaiR\vignette_two\task.bat informs the task to run, in this case the task.bat file

I used a couple of additional parameters, /ru and /rp, which informs the scheduler to run the task under a certain user (/ru, and their password /rp), this means that the CMD prompt won’t pop up every minute when the task launches, and the task will run quietly.

linux and osx

crontab (which may require installation) isn’t as familiar to me as windows, so any input/corrections would be very welcome. It involves editing a file with your tasks. For each task there are 6 categories, some examples are below (taken from cronhowto):

min hour day month weekday command task
01 04 1 1 1 /usr/bin/somedirectory/somecommand run on Jan 1st (and every monday in January) at 04:01am
01 04 * * * /usr/bin/somedirectory/somecommand run every day of every month at 04:01am

Unlike Windows I don’t believe it’s easy (or more likely I don’t know how) to set a start and end time, nevertheless the task would possibly look something like below

*/1 10,11,12,13,14,15,16,17,18,19 14 2 0 Rscript /path/to/collect_data.R

Walking through the above (hopefully correctly):

  1. */1 informs which minutes the task should be run, I believe this means every minute, * is wildcard for all, and /1 means divisible by 1, so */10 would be minutes 10, 20, 30, 40, 50.
  2. 10,11,12,13,14,15,16,17,18,19 informs the hours to run the task, 10am through to 7pm
  3. 14 informs the day of the month
  4. 2 informs the month
  5. 0 informs the day, 0 is Sunday
  6. Rscript /path/to/collect_data.R is the task to run

I think the above should ensure the task is run once on the 14th of Feb 2016, and it’ll be a 2021 when the 14th of Feb falls on a Sunday again, so plenty of time to delete the task :-)

part two - findings

A short post looking at the data returned by using the code outlined in this post can be found here.

complete R script

The script below is called every time a task runs. So we first load libraries, set our working environment and log in to betfair via the betfair function. We then retrieve the files in our current directory, to establish whether there is a marketIds.RDS file we can load, or whether we need to retrieve the marketIds, as discussed above.

The next steps save the current time - to help with comparing data from the three markets - which will be added to each of the markets’ data. The code then establishes whether the market is still available using the marketCatalogue method; Arsenal vs Leicester kics off at 12pm so will be finished and closed before Man City vs Tottenham. So the available_markets variable becomes the markets whose data we will retrieve.

We then loop through these available_markets and retrieve data via the marketBook method. We will read in any existing data for this market, append the newly returned data (gradually building a larger and larger list) to the existing data, and then save our updated list.

library(betfaiR)
setwd("C:/Users/TomHeslop/Documents/Github/betfaiR/vignette_two/")
readRenviron("~/.Renviron") # slightly curious why I needed to do this?

# log in
bf <- betfair(usr = Sys.getenv("BETFAIR_USR"),
              pwd = Sys.getenv("BETFAIR_PWD"),
              key = Sys.getenv("BETFAIR_KEY"))

# return list of files to check whether marketIds have been collected before
files <- list.files()
# if marketIds are not available collect and save them, otherwise load them
if(!("marketIds.RDS" %in% files)) {
    cittot <- bf$marketCatalogue(filter = marketFilter(competitionIds = 31,
                                                       textQuery = "Tottenham",
                                                       to = "2016-02-15",
                                                       marketTypeCodes = "MATCH_ODDS"),
                                 maxResults = 10,
                                 marketProjection = c("EVENT", "RUNNER_DESCRIPTION"))
    arslei <- bf$marketCatalogue(filter = marketFilter(competitionIds = 31,
                                                       textQuery = "Arsenal",
                                                       to = "2016-02-15",
                                                       marketTypeCodes = "MATCH_ODDS"),
                                 maxResults = 10,
                                 marketProjection = c("EVENT", "RUNNER_DESCRIPTION"))
    winner <- bf$marketCatalogue(filter = marketFilter(competitionIds = 31,
                                                       marketTypeCodes = "WINNER"),
                                 marketProjection = c("EVENT", "RUNNER_DESCRIPTION"))
    markets <- list(cittot = cittot, arslei = arslei, winner = winner)
    saveRDS(markets, "marketIds.RDS")
} else {
    markets <- readRDS("marketIds.RDS")
}

# get current time, to allow comparison between each of the 3 markets
currentTime <- Sys.time()
# extract marketIds, and their names
marketIds <- sapply(markets, function(i) i[[1]]$market$marketId)
# use marketIds to see if markets are still available
available_markets <- bf$marketCatalogue(filter = marketFilter(marketIds = as.vector(marketIds)),
                                        maxResults = 3)
available_markets <- sapply(available_markets, function(i) i$market$marketId)
# build logical vector
test <- marketIds %in% available_markets
# return names of markets that are still available
available_markets <- names(marketIds[test])

# loop through each market and retrieve data, add currentTime to make comparison easier
plyr::l_ply(available_markets, function(i, files, currentTime, bf, markets) {

    filename <- paste0(i, ".RDS")
    if(filename %in% files) {
        tmp <- readRDS(filename)
    } else {
        tmp <- list()
    }
    cur <- markets[[i]]
    cur <- bf$marketBook(marketIds = cur,
                         priceProjection = c("EX_ALL_OFFERS", "EX_TRADED"))
    cur[[1]]$collectedAt <- currentTime

    tmp <- append(tmp, cur)

    saveRDS(tmp, filename)

}, files = files,
    currentTime = currentTime,
    bf = bf,
    markets = marketIds)