fantasysocceR

The pastseasons dataset contains performance data for past seasons for all current players in the fantasy.premierleague.com game. Not all players in the game have past data, those who are making their debut in the game, these players are still included in the dataset but their performance data is recorded as NA

Once the package has been loaded, to access the dataset:

data(pastseasons)

The dataframe has dimensions 1920 rows and 24 columns.

str(pastseasons)
## 'data.frame':    1920 obs. of  24 variables:
##  $ id      : num  1 1 1 1 1 1 1 2 3 3 ...
##  $ name    : chr  "Szczesny" "Szczesny" "Szczesny" "Szczesny" ...
##  $ pos     : chr  "Goalkeeper" "Goalkeeper" "Goalkeeper" "Goalkeeper" ...
##  $ team    : chr  "Arsenal" "Arsenal" "Arsenal" "Arsenal" ...
##  $ pts     : num  47 47 47 47 47 47 47 76 38 38 ...
##  $ value   : num  5 5 5 5 5 5 5 5 5.5 5.5 ...
##  $ pct     : num  0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.7 30.2 30.2 ...
##  $ season  : chr  "2008/09" "2009/10" "2010/11" "2011/12" ...
##  $ mins    : num  0 0 1350 3420 2250 ...
##  $ goals   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ assists : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ cs      : num  0 0 6 13 10 16 3 8 0 0 ...
##  $ ga      : num  0 0 19 49 24 41 21 11 9 17 ...
##  $ og      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pens_svd: num  0 0 1 1 1 1 0 0 0 0 ...
##  $ pens_msd: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ yel     : num  0 0 1 2 1 2 1 0 1 0 ...
##  $ red     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ saves   : num  0 0 45 82 71 113 44 47 58 60 ...
##  $ bonus   : num  0 0 0 8 3 4 0 0 1 2 ...
##  $ ea_ppi  : num  0 0 0 469 314 475 172 241 0 0 ...
##  $ bps     : num  0 0 0 0 0 194 215 301 0 0 ...
##  $ fin_val : num  4.5 4.5 4.3 5.9 5.3 5.9 5.2 5 6.4 6.5 ...
##  $ ssn_pts : num  0 0 62 139 102 157 47 76 98 120 ...

The pastseasons dataset could potentially be used to build a model to predict performance for the coming season, but unfortunately the dataset doesn’t include the pre-season data for each of those seasons. For example, Shay Given has previously played for Newcastle and Manchester City, but only his current club (Aston Villa) is recorded in the dataset.

A model based on the points a players scored in previous season is unfortunately not that great. I reduce the dataset so it only includes data for 2010/11, 2011/12, 2012/13, 2013/14 and 2014/15, and reduce the number of variables.

tmp <- subset(pastseasons, season == "2010/11" | season == "2011/12" | season == "2012/13" | season == "2013/14" | season == "2014/15",
              select = c("name", "id", "pos", "team", "pct", "value", "season", "fin_val", "ssn_pts"))
dim(tmp)
## [1] 1392    9
# we need the reshape2 package, to create a 'long' dataframe, with one player per row
library(reshape2)
ssn2ssn <- dcast(tmp, name + id + pos + team + pct + value ~ season, value.var = "ssn_pts")
head(ssn2ssn)
##          name id        pos    team  pct value 2010/11 2011/12 2012/13
## 1    Szczesny  1 Goalkeeper Arsenal  0.1   5.0      62     139     102
## 2      Ospina  2 Goalkeeper Arsenal  0.7   5.0      NA      NA      NA
## 3        Cech  3 Goalkeeper Arsenal 30.2   5.5     158     127     144
## 4   Koscielny  4   Defender Arsenal 17.8   6.0      85     103      88
## 5 Mertesacker  5   Defender Arsenal 15.2   5.5      NA      59     135
## 6     Gabriel  6   Defender Arsenal  0.2   5.0      NA      NA      NA
##   2013/14 2014/15
## 1     157      47
## 2      NA      76
## 3     146      38
## 4     155     120
## 5     157     121
## 6      NA      16
# rename the variables with numbers in them (just for ease of typing later)
names(ssn2ssn)[7:10] <- paste0("s", 1:4)

# now plot season to season points
ggplot(ssn2ssn, aes(x = s1, y = s2)) +
    geom_point() +
    geom_point(aes(x = s2, y = s3)) +
    geom_point(aes(x = s3, y = s4)) +
    theme_minimal() +
    labs(x = "Season 1", y = "Season 2") +
    facet_wrap(~pos)