This vignette will walk through the use of betfaiR
to collect data over time - the markets we will focus on is the match odds markets for Man City vs Spurs and Arsenal vs Leicester, as well as the Premier League outright market. However this approach can be used for collecting data about other markets, to potentially build models and establish trading strategies.
I will try to explain how to perform similar tasks on both Windows and Linux/OSX machines, this will involve creating an R script and using a Task scheduler (Task Scheduler on Windows and cron on Linux/OSX); this blog by Tyler Rinker has been very useful to me.
As an example, the plot below shows the minute by minute Betfair price data during the course of a football match between Manchester United and Manchester City:
There are a few things that we will need to do:
First, some familiarity with events, competitions, and markets will help with creating an R script to automate the collection of data. As we are focussing on the Premier League, we need an eventTypeId, a competitionId, which will help us to find the markets we are interested in.
# load betfaiR and login
library(betfaiR)
bf <- betfair(usr = Sys.getenv("bf_usr"),
pwd = Sys.getenv("bf_pwd"),
key = Sys.getenv("bf_key"))
Login successful
# find correct eventTypeIds
eventTypeIds <- bf$eventTypes()
eventTypeIds[grepl("soccer|football", eventTypeIds$eventType_name, ignore.case = TRUE),]
eventType_id eventType_name marketCount
1 1 Soccer 8078
20 6423 American Football 23
# soccer/football has eventTypeId of 1, lets use that along with marketCountries to find the Premier League competition
competitions <- bf$competitions(filter = marketFilter(eventTypeIds = 1,
marketCountries = "GB"))
competitions[grepl("premier", competitions$competition_name, ignore.case = TRUE),]
competition_id competition_name marketCount
2 6447264 Northern Premier League Challenge Cup 48
5 4644737 Northern Premier League Division One South 24
7 10403285 Premier League Cup 48
17 31 English Premier League 322
22 105 Scottish Premiership 99
23 820582 Isthmian Premier 24
29 4644728 Northern Premier League Division One North 72
competitionRegion
2 GBR
5 GBR
7 GBR
17 GBR
22 GBR
23 GBR
29 GBR
# the Premier League has a competition id of 31, we can find the different market types for this competition
marketTypes <- bf$marketTypes(filter = marketFilter(competitionIds = 31))
marketType marketCount
3 WINNER 1
22 MATCH_ODDS 20
The marketIds of the two matches (Man City vs Spurs and Arsenal vs Leicester) and the outright winner of the Premier League are shown below:
The following are the three markets of interest, which were collected and saved sometime on Weds 10th Feb 2016.
Market ID: 1.122843814
Event ID: 27674404
Market Name: Match Odds
Event Name: Man City v Tottenham
Matched: 224247.78
Runners: 3
selectionId runnerName handicap sortPriority
47999 Man City 0 1
48224 Tottenham 0 2
58805 The Draw 0 3
---------------------------------------------------------------------------
Market ID: 1.122843669
Event ID: 27674402
Market Name: Match Odds
Event Name: Arsenal v Leicester
Matched: 206538.46
Runners: 3
selectionId runnerName handicap sortPriority
1096 Arsenal 0 1
48461 Leicester 0 2
58805 The Draw 0 3
---------------------------------------------------------------------------
Market ID: 1.118280148
Event ID: 2022802
Market Name: 2015/16 Winner
Event Name: Barclays Premier League
Matched: 13511634.58
Runners: 20
selectionId runnerName handicap sortPriority
1096 Arsenal 0 1
48461 Leicester 0 2
47999 Man City 0 3
48224 Tottenham 0 4
48351 Man Utd 0 5
48756 West Ham 0 6
---------------------------------------------------------------------------
A script will need to login, find the correct markets, retreive the relevant data, and then save the data. The process of finding the correct markets needn’t be repeated time after time, so this data can be retrieved once, saved and then loaded every time. Although the script needs to be able to run straight off the bat, so it might look something like the code below:
library(betfaiR)
bf <- betfair(usr = Sys.getenv("BETFAIR_USR"),
pwd = Sys.getenv("BETFAIR_PWD"),
key = Sys.getenv("BETFAIR_KEY"))
files <- list.files()
if(!("marketIds.RDS" %in% files)) {
cittot <- bf$marketCatalogue(filter = marketFilter(competitionIds = 31,
textQuery = "Tottenham",
to = "2016-02-15",
marketTypeCodes = "MATCH_ODDS"),
maxResults = 10)
arslei <- bf$marketCatalogue(filter = marketFilter(competitionIds = 31,
textQuery = "Arsenal",
to = "2016-02-15",
marketTypeCodes = "MATCH_ODDS"),
maxResults = 10)
winner <- bf$marketCatalogue(filter = marketFilter(competitionIds = 31,
marketTypeCodes = "WINNER"))
marketIds <- list(cittot = cittot, arslei = arslei, winner = winner)
saveRDS(marketIds, "marketIds.RDS")
} else {
marketIds <- readRDS("marketIds.RDS")
}
The complete R script which will be run each time can be found at the bottom of this page or on github, the script is commented so you should be able to follow it.
Tasks will be run every minute on Sunday 14th Feb between 10am (GMT) and 7pm (GMT). So data might catch pre-match team announcment movements, and how these may ripple over other markets.
Windows does have a GUI to help with Scheduling tasks (read about that here) but some of it is quite restrictive, eg. you can (Windows 8) only schedule a task to run every 5minutes. So we will be using the command line as an administrator, docs about the arguments which can be used can be found here.
The command entered for this task will be
schtasks /create /sc minute /sd 14/02/2016 /st 10:00 /ed 14/02/2016 /et 19:00 /tn betfairsupersunday /tr C:\Users\TomHeslop\Documents\Github\betfaiR\vignette_two\task.bat
So walking through that:
schtasks /create
creates a task, what follows are the arguments or parameters/sc minute /sd 14/02/2016 /st 10:00
informs the schedule frequency (/sc minute
), the start date (/sd 14/02/2016
) and the start time (/st 10:00
)/ed 14/02/2016 /et 19:00
informs the end date (/ed 14/02/2016
) and the end time (/et 19:00
)/tn betfairsupersunday
informs the task name so it can be found/deleted/updated/tr C:\Users\TomHeslop\Documents\Github\betfaiR\vignette_two\task.bat
informs the task to run, in this case the task.bat fileI used a couple of additional parameters, /ru
and /rp
, which informs the scheduler to run the task under a certain user (/ru
, and their password /rp
), this means that the CMD prompt won’t pop up every minute when the task launches, and the task will run quietly.
crontab (which may require installation) isn’t as familiar to me as windows, so any input/corrections would be very welcome. It involves editing a file with your tasks. For each task there are 6 categories, some examples are below (taken from cronhowto):
min | hour | day | month | weekday | command | task |
---|---|---|---|---|---|---|
01 | 04 | 1 | 1 | 1 | /usr/bin/somedirectory/somecommand | run on Jan 1st (and every monday in January) at 04:01am |
01 | 04 | * | * | * | /usr/bin/somedirectory/somecommand | run every day of every month at 04:01am |
Unlike Windows I don’t believe it’s easy (or more likely I don’t know how) to set a start and end time, nevertheless the task would possibly look something like below
*/1 10,11,12,13,14,15,16,17,18,19 14 2 0 Rscript /path/to/collect_data.R
Walking through the above (hopefully correctly):
*/1
informs which minutes the task should be run, I believe this means every minute, * is wildcard for all, and /1 means divisible by 1, so */10 would be minutes 10, 20, 30, 40, 50.10,11,12,13,14,15,16,17,18,19
informs the hours to run the task, 10am through to 7pm14
informs the day of the month2
informs the month0
informs the day, 0 is SundayRscript /path/to/collect_data.R
is the task to runI think the above should ensure the task is run once on the 14th of Feb 2016, and it’ll be a 2021 when the 14th of Feb falls on a Sunday again, so plenty of time to delete the task :-)
A short post looking at the data returned by using the code outlined in this post can be found here.
The script below is called every time a task runs. So we first load libraries, set our working environment and log in to betfair via the betfair
function. We then retrieve the files in our current directory, to establish whether there is a marketIds.RDS file we can load, or whether we need to retrieve the marketIds, as discussed above.
The next steps save the current time - to help with comparing data from the three markets - which will be added to each of the markets’ data. The code then establishes whether the market is still available using the marketCatalogue
method; Arsenal vs Leicester kics off at 12pm so will be finished and closed before Man City vs Tottenham. So the available_markets
variable becomes the markets whose data we will retrieve.
We then loop through these available_markets
and retrieve data via the marketBook
method. We will read in any existing data for this market, append the newly returned data (gradually building a larger and larger list) to the existing data, and then save our updated list.
library(betfaiR)
setwd("C:/Users/TomHeslop/Documents/Github/betfaiR/vignette_two/")
readRenviron("~/.Renviron") # slightly curious why I needed to do this?
# log in
bf <- betfair(usr = Sys.getenv("BETFAIR_USR"),
pwd = Sys.getenv("BETFAIR_PWD"),
key = Sys.getenv("BETFAIR_KEY"))
# return list of files to check whether marketIds have been collected before
files <- list.files()
# if marketIds are not available collect and save them, otherwise load them
if(!("marketIds.RDS" %in% files)) {
cittot <- bf$marketCatalogue(filter = marketFilter(competitionIds = 31,
textQuery = "Tottenham",
to = "2016-02-15",
marketTypeCodes = "MATCH_ODDS"),
maxResults = 10,
marketProjection = c("EVENT", "RUNNER_DESCRIPTION"))
arslei <- bf$marketCatalogue(filter = marketFilter(competitionIds = 31,
textQuery = "Arsenal",
to = "2016-02-15",
marketTypeCodes = "MATCH_ODDS"),
maxResults = 10,
marketProjection = c("EVENT", "RUNNER_DESCRIPTION"))
winner <- bf$marketCatalogue(filter = marketFilter(competitionIds = 31,
marketTypeCodes = "WINNER"),
marketProjection = c("EVENT", "RUNNER_DESCRIPTION"))
markets <- list(cittot = cittot, arslei = arslei, winner = winner)
saveRDS(markets, "marketIds.RDS")
} else {
markets <- readRDS("marketIds.RDS")
}
# get current time, to allow comparison between each of the 3 markets
currentTime <- Sys.time()
# extract marketIds, and their names
marketIds <- sapply(markets, function(i) i[[1]]$market$marketId)
# use marketIds to see if markets are still available
available_markets <- bf$marketCatalogue(filter = marketFilter(marketIds = as.vector(marketIds)),
maxResults = 3)
available_markets <- sapply(available_markets, function(i) i$market$marketId)
# build logical vector
test <- marketIds %in% available_markets
# return names of markets that are still available
available_markets <- names(marketIds[test])
# loop through each market and retrieve data, add currentTime to make comparison easier
plyr::l_ply(available_markets, function(i, files, currentTime, bf, markets) {
filename <- paste0(i, ".RDS")
if(filename %in% files) {
tmp <- readRDS(filename)
} else {
tmp <- list()
}
cur <- markets[[i]]
cur <- bf$marketBook(marketIds = cur,
priceProjection = c("EX_ALL_OFFERS", "EX_TRADED"))
cur[[1]]$collectedAt <- currentTime
tmp <- append(tmp, cur)
saveRDS(tmp, filename)
}, files = files,
currentTime = currentTime,
bf = bf,
markets = marketIds)