fantasysocceR

The pastseasons201415 dataset contains performance data for past seasons for all current players in the fantasy.premierleague.com game. Not all players in the game have past data, such as Alexis Sanchez, these players are still included in the dataset but their performance data is recorded as NA

Once the package has been loaded, to access the dataset:

data(pastseasons201415)

The dataframe has dimensions 2162 rows and 24 columns.

str(pastseasons201415)
## 'data.frame':    2162 obs. of  24 variables:
##  $ id      : num  1 1 1 1 1 1 2 2 2 2 ...
##  $ name    : chr  "Szczesny" "Szczesny" "Szczesny" "Szczesny" ...
##  $ pos     : chr  "Goalkeeper" "Goalkeeper" "Goalkeeper" "Goalkeeper" ...
##  $ team    : chr  "Arsenal" "Arsenal" "Arsenal" "Arsenal" ...
##  $ pts     : num  47 47 47 47 47 47 120 120 120 120 ...
##  $ value   : num  5.2 5.2 5.2 5.2 5.2 5.2 6.1 6.1 6.1 6.1 ...
##  $ pct     : num  4.7 4.7 4.7 4.7 4.7 4.7 9.9 9.9 9.9 9.9 ...
##  $ season  : chr  "2008/09" "2009/10" "2010/11" "2011/12" ...
##  $ mins    : num  0 0 1350 3420 2250 ...
##  $ goals   : num  0 0 0 0 0 0 2 2 2 2 ...
##  $ assists : num  0 0 0 0 0 0 0 1 1 0 ...
##  $ cs      : num  0 0 6 13 10 16 9 11 7 17 ...
##  $ ga      : num  0 0 19 49 24 41 36 42 19 26 ...
##  $ og      : num  0 0 0 0 0 0 0 2 0 0 ...
##  $ pens_svd: num  0 0 1 1 1 1 0 0 0 0 ...
##  $ pens_msd: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ yel     : num  0 0 1 2 1 2 5 8 1 1 ...
##  $ red     : num  0 0 0 0 0 0 2 0 1 1 ...
##  $ saves   : num  0 0 45 82 71 113 0 0 0 0 ...
##  $ bonus   : num  0 0 0 8 3 4 0 5 11 24 ...
##  $ ea_ppi  : num  0 0 0 469 314 475 0 436 285 463 ...
##  $ bps     : num  0 0 0 0 0 194 0 0 0 228 ...
##  $ fin_val : num  4.5 4.5 4.3 5.9 5.3 5.9 5.9 5.8 5.3 5.6 ...
##  $ ssn_pts : num  0 0 62 139 102 157 85 103 88 155 ...

The pastseasons201415 dataset could potentially be used to build a model to predict performance for the coming season, but unfortunately the dataset doesn’t include the pre-season data for each of those seasons. For example, Shay Given has previously played for Newcastle and Manchester City, but only his current club (Aston Villa) is recorded in the dataset.

A model based on the points a players scored in previous season is unfortunately not that great. I reduce the dataset so it only includes data for 2010/11, 2011/12, 2012/13 and 2013/14, and reduce the number of variables.

tmp <- subset(pastseasons201415, season == "2010/11" | season == "2011/12" | season == "2012/13" | season == "2013/14",
              select = c("name", "id", "pos", "team", "pct", "value", "season", "fin_val", "ssn_pts"))
dim(tmp)
## [1] 1371    9
# we need the reshape2 package, to create a 'long' dataframe, with one player per row
library(reshape2)
ssn2ssn <- dcast(tmp, name + id + pos + team + pct + value ~ season, value.var = "ssn_pts")
head(ssn2ssn)
##          name id        pos     team pct value 2010/11 2011/12 2012/13
## 1    Szczesny  1 Goalkeeper  Arsenal 4.7   5.2      62     139     102
## 2   Koscielny  2   Defender  Arsenal 9.9   6.1      85     103      88
## 3   Vermaelen  3   Defender  Arsenal 0.1   5.0      15     132      70
## 4       Gibbs  4   Defender  Arsenal 1.7   5.2       7      61      93
## 5   Jenkinson  5   Defender West Ham 0.8   5.0      NA      12      51
## 6 Mertesacker  6   Defender  Arsenal 5.2   6.1      NA      59     135
##   2013/14
## 1     157
## 2     155
## 3      29
## 4      89
## 5      50
## 6     157
# rename the variables with numbers in them (just for ease of typing later)
names(ssn2ssn)[7:10] <- paste0("s", 1:4)

# now plot season to season points
ggplot(ssn2ssn, aes(x = s1, y = s2)) +
    geom_point() +
    geom_point(aes(x = s2, y = s3)) +
    geom_point(aes(x = s3, y = s4)) +
    theme_minimal() +
    labs(x = "Season 1", y = "Season 2") +
    facet_wrap(~pos)