Data science playes a major role in tennis, you can learn about recent AI tools developed by IBM from this This Yahoo Article.

We will analyze the Tennis Major Tournament Match Statistics Data Set from the UCI ML repository. The data set has one per each game from four major Tennis tournaments in 2013 (Australia Open, French Open, US Open, and Wimbledon).

Let’s load the data and familiarize ourselves with it

d = read.csv("data/tennis.csv")
dim(d)
## [1] 943  44
str(d[,1:5])
## 'data.frame':    943 obs. of  5 variables:
##  $ Player1: Factor w/ 478 levels "A Barty","A Cornet",..: 268 262 299 109..
##  $ Player2: Factor w/ 472 levels "A Dulgheru","A Kerber",..: 332 34 110 3..
##  $ Round  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Result : int  0 1 0 1 0 0 0 1 0 1 ...
##  $ FNL1   : int  0 3 0 3 1 1 2 2 0 3 ...

Let’s look at the few coluns of the randomly selected five rows of the data

d[sample(1:943,size = 5),c("Player1","Player2","Round","Result","gender","surf")]

We have data for 943 matches and for each match we have 44 columns, including names of the players, their gender, surface type and match statistics.

Let’s look at the number of break points won by each player. We will plot BPW (break points won) by each player on the scatter plot and will colorize each dot according to the outcome

n = dim(d)[1]
plot(d$BPW.1+rnorm(n),d$BPW.2+rnorm(n), pch=21, col=d$Result+2, cex=0.6, bg="yellow", lwd=0.8,
     xlab="BPW by Player 1", ylab="BPW by Player 2")
legend("bottomright", c("P1 won", "P2 won"), col=c(3,2), pch=21, bg="yellow", bty='n')