servevolleyR

Williams vs Sharapova

Today sees Serena Williams take on Maria Sharapova in the semi-final of Wimbledon 2015. This post will walk through an example use of the package using this match-up.

First make sure the package is installed and loaded:

# devtools::install_github("durtal/servevolleyR")
library(servevolleyR)

What servevolleyR requires is a probability that a player will win the point on their first serve. For this quick example, we’ll use the players performance in their last matches. Williams defeated her sister, Venus, in two sets, with some summary statistics shown in the picture below left, while Sharapova beat Vandeweghe in three sets, with some summary statistics shown in the picture below right.

The stats show that Serena won 80% of points on her first serve, while Sharapova won 73%. These two numbers, while very simple can be used to simulate a game, set, tiebreak, and match between the two. However additional parameters can also be provided, namely, the probability that a player will win a point on their second serve, and the probability that the players first serve will be in. These can also be seen in the pictures above. So for the two players (Maria’s second serve was woeful!):

williams1 <- 0.80     # serena first serve win %
williams2 <- 0.58     # serena second serve win %
williamsIn <- 0.65    # serena first serve in %

sharapova1 <- 0.73    # maria first serve win %
sharapova2 <- 0.36    # maria second serve win %
sharapovaIn <- 0.6    # maria first serve in %

We can then use these parameters in the following functions included in servevolleyR, simGame, simGames, simSet, simSets, simTiebreak, simMatch and simMatches.

simGame and simGames

simGame simulates a single service game given a number of parameters, the only one required is the probability that a player wins a point on their first serve, williams1 and sharapova1 above, but additional parameters can also be provided. To simulate many service games, there are two options, the first, probably quicker is to use simGame inside replicate, we’ll look at Serena for this example, the second is to use simGames.

Williams service games

Using simGame and replicate

williamsServe <- replicate(1e3, simGame(p = williams1,
                                        p2 = williams2,
                                        firstServe = williamsIn))

williamsServe is a vector of 1000 service game results, 1 when the server (Serena) wins, and 0 when the returner wins, the first 6 results are 1, 1, 1, 1, 1, 1. This vector can be summarised as follows.

table(williamsServe) / sum(table(williamsServe))
## williamsServe
##     0     1 
## 0.083 0.917

The simulation estimates that Serena will win 0.917 of her service games.

Sharapova service games

Using simGames

sharapovaServe <- simGames(n = 1e3,                # number of simulations
                           p = sharapova1,
                           p2 = sharapova2,
                           firstServe = sharapovaIn)

The benefit of using simGames isn’t speed, but detail, simGames returns a list with detailed data about the simulated games, data that includes the number of points each player won in a service game. The list has a class of svR_games, which comes with a few methods, such as print, summary and plot, the print method is shown below. While the plot method can be seen to the right, it shows the probability that Maria will win her service game (0.676), as well as the scores in the service games, 0 corresponds to love, 1 to 15, so the score 4-2 is W-30, 5-3 is winning after deuce:

sharapovaServe
## 
## Simulation of 1000 service games:
## Server won 0.676 (676/1000) of games.
## 
## Server probabilities:
##     p   p2 firstServe
##  0.73 0.36        0.6
plot(sharapovaServe)

simSet and simSets

simSet simulates a single set given a number of parameters for two players, the only two required is the probability that each player wins a point on their first serve, williams1 and sharapova1 above, but additional parameters can also be provided. To simulate many service games, there are two options, the first, probably quicker is to use simSet inside replicate, we’ll just use simSets.

Another thing worth bearing in mind is that player A opens the serving.

williamsSets <- simSets(n = 1e3, playTiebreak = TRUE, players = c("Serena", "Maria"),
                        pA = williams1, p2A = williams2, firstServeA = williamsIn,
                        pB = sharapova1, p2B = sharapova2, firstServeB = sharapovaIn)

Like simGames, it is likely slower than using replicate and simSet, however the list returned by simSets has much more detail about the simulated Sets. The list has a class of svR_sets, which comes with a few methods, such as print, summary, and plot. The summary method is shown below. While the plot method is shown to the right, showing the probability that player A will win the set (0.862), as well as the probability for each potential set score:

summary(williamsSets)
## 
## Simulation of 1000 sets:
## 
## Server Probabilities:
##  player    p   p2 firstServe
##  Serena 0.80 0.58       0.65
##   Maria 0.73 0.36       0.60
## 
## Player A (Serena) won 0.862 of sets.
##        playerB
## playerA     0     1     2     3     4     5     6     7
##       2                                     0.013      
##       3                                     0.003      
##       4                                     0.068      
##       5                                           0.012
##       6 0.021 0.140 0.116 0.330 0.079             0.042
##       7                               0.069 0.107
plot(williamsSets)

The list returned by simSets can be converted into a dataframe for further analysis, this dataframe effectively contains data about simulated service games, thereby not requiring the use of simGames. To convert this to a dataframe simply use the simDf function, it can sometimes take some time to convert the data, the larger the number of simulations the longer it takes:

williamsSetsDf <- simDf(williamsSets)
head(williamsSetsDf)
##   simNo     pA    pB setA setB set_res gameNo serving    p   p2 firstServe
## 1     1 Serena Maria    6    4       1      1  Serena 0.80 0.58       0.65
## 2     1 Serena Maria    6    4       1      2   Maria 0.73 0.36       0.60
## 3     1 Serena Maria    6    4       1      3  Serena 0.80 0.58       0.65
## 4     1 Serena Maria    6    4       1      4   Maria 0.73 0.36       0.60
## 5     1 Serena Maria    6    4       1      5  Serena 0.80 0.58       0.65
## 6     1 Serena Maria    6    4       1      6   Maria 0.73 0.36       0.60
##   game_res server returner
## 1        1      4        1
## 2        1      7        5
## 3        1      4        1
## 4        1      5        3
## 5        1      4        1
## 6        1      4        2

simMatch and simMatches

These two functions follow the same convention as those above, and require parameters for two players, and a few details about the type of match they are playing, ie. how many sets, play tiebreaks, etc.

I’ll just use simMatches here, with Serena opening the serving.

williamsMatch <- simMatches(n = 1e3, sets = 3, finalSetTiebreak = TRUE,
                            players = c("Serena", "Maria"), tiebreaks = TRUE,
                            pA = williams1, p2A = williams2, firstServeA = williamsIn,
                            pB = sharapova1, p2B = sharapova2, firstServeB = sharapovaIn)

Like the previous two functions, simGames and simSets, it is likely slower than using replicate and simMatch, but the detail is much better, getting data about the matches, sets, and games in each simulation. The list returned has a class of svR_matches, which comes with print, summary, and plot methods. The summary method is shown below, while the plot to the right shows the probability of Player A winning the match (0.949), as well as the final scores.

summary(williamsMatch)
## 
## Simulation of 1000 matches:
## 
## Player A (Serena) won 0.949 of matches.
## 
## Server Probabilities:
##  player    p   p2 firstServe
##  Serena 0.80 0.58       0.65
##   Maria 0.73 0.36       0.60
## 
## 
##        playerB
## playerA     0     1     2
##       0             0.021
##       1             0.030
##       2 0.758 0.191
plot(williamsMatch)

Sharapova’s second serve % will need to be much improved if she is to get close to Serena. Simulating a match between the two using just their first serves makes the match much closer:

FirstServes <- simMatches(n = 1e3, sets = 3, finalSetTiebreak = TRUE,
                          players = c("Serena", "Maria"), tiebreaks = TRUE,
                          pA = williams1, pB = sharapova1)
plot(FirstServes)

This list can also be converted to a dataframe for further analysis using the simDf function.