# Chapter 5 Logistic Regression

When the value \(y\) we are trying to predict is *categorical* (or *qualitative*) we have a *classification* problem. For a binary output we predict the probability its going to happen
\[
p ( Y=1 | X = x ),
\]
where \(X = (x_1,\ldots,x_p)\) is our usual list of predictors.

Suppose that we have a binary response, \(y\) taking the value \(0\) or \(1\)

Win or lose

Sick or healthy

Buy or not buy

Pay or default

The goal is to predict the probability that \(y\) equals \(1\). You can then do and categorize a new data point. Assessing credit risk and default data is a typical problem.

\(y\): whether or not a customer defaults on their credit card (No or Yes).

\(x\): The average balance that customer has remaining on their credit card after making their monthly payment, plus as many other features you think might predict \(Y\).

A linear model is a powerful tool to find relations among different variables \[y = x^T\beta + \epsilon\]

When relations are non-linear we can do several things: - Add squared terms - Add interactions - Tran form \(y\) or some of the x variables

It works assuming that \(y\) variable is contentious and ranges in \((-\infty,+\infty)\). Another assumption is that conditional distribution of \(y\) is normal \(p(y\mid x^T\beta) \sim N(x^T\beta, \sigma^2)\)

What do we do when assumptions about conditional normal distributions do not hold. For example \(y\) can be a binary variable with values 0 and 1. For example \(y\) is

Outcome of an election

Result of spam filter

Decision variable about loan approval

We model response \(\{0,1\}\), using a continuous variable \(y \in [0,1]\) which is interpreted as the probability that response equals to 1. \[\bcol{p( y= 1 | x_1, \ldots , x_p ) = F \left ( \beta_1 x_1 + \ldots + b_p x_p \right ) }\] where \(f\) is increasing and \(0< f(x)<1\).

It seems logical to find a transformation \(F\) so that \(F(x^T\beta + \epsilon) \in [0,1]\). Then we can predict using \(F(x^T\beta)\) and intercepting interpret the result as a probability, i.e if \(F(x^T\beta) = z\) then we interpret it as \(p(y=1) = z\). Such function \(F\) is called a link function.

Do we know a function that maps any real number to a number in \([0,1]\) interval? What about commulative distribution function \(F(x) = p(Z \le x)\)? If we choose CDF \(\Phi(x)\) for \(N(0,1)\) then we have \[y = \Phi(x^T\beta + \epsilon)\] \[\Phi^{-1}(y) = x^T\beta + \epsilon\] \[y' = x^T\beta + \epsilon\]

You can thing of this as a change of units for variable \(y\). In this specific case, when we use normal CDF, the resulting model is called probit, it stands for probability unit. The resulting link function is \(\Phi^{-1}\) and now \(\Phi^{-1}(Y)\) follows a normal distribution!

This term was coined in the 1930’s by biologists studying the dosage-cure rate link.

```
# We can fit a probit model using glm function in R
probitModel = glm(y~x, family=binomial(link="probit"))
(mc = as.doubl\E{coef(probitModel)))
# we want to predict outcome for x = -1
xnew = -1
(yt = mc[1] + mc[2]*xnew)
(pnorm(yt))
(pred = predict(probitModel, list(x = c(xnew)), type="response"))
```

Our prediciton is the blue area which is eqal to 0.0219.

```
# lets look how well probit fits the data
pred_probit = predict(probitModel, list(x = x), type="response")
plot(x,pred_probit, pch=16, col="red", cex=0.5)
lines(x,y, type='p', pch=16, cex=0.2)
abline(v=yt)
abline(h =pnorm(yt) )
```

A couple of observations: (i) this fits the data much better than the linear estimation, and (i) it always lies between 0 and 1.

Instead of thinking of \(y\) as a probability and transforming right hand side of the linear model we can think of transforming \(y\) so that transformed variable lies in \((-\infty,+\infty)\).

We can use odds ratio, that we talked about before \[\dfrac{y}{1-y}\]

Odds ration lies in the interval \((0,+\infty)\). Almost what we need, but not exactly. Can we do another transform that maps \((0,+\infty)\) to \((-\infty,+\infty)\)? \[\log\left(\dfrac{y}{1-y}\right)\] will do the trick!

This function is called a logit function and it is This function is called a logit function and it is the inverse of the sigmoidal "logistic" function or logistic transform. The is linear in \[\log \left ( \frac{ p \left ( y=1|x \right ) }{ 1 - p \left ( Y=1|x \right ) } \right ) = \beta_0 + \beta_1 x_1 + \ldots + x_p\] These model are easy to fit in R:

`glm( y ~ x1 + x2, family="binomial")`

"" is for indicates \(y=0\) or \(1\)

"" has a bunch of other options.

Outside of specific field, i.e. behavioral economics, the logistic function is much more popular of a choice compared to probit model. Besides that fact that is more intuitive to work with logit transform, it also has several nice properties when we deal with multiple classes (more then 2). Also, it is comutationally easier then working with normal distributions.

The density function of the logit is very similar to the probit one.

```
logitModel = glm(y~x, family=binomial(link="logit"))
pred_logit = predict(logitModel, list(x = x), type="response")
plot(x,pred_probit, pch=20, col="red", cex=0.9, ylab="y")
lines(x,pred_logit, type='p', pch=20, cex=0.5, col="blue")
lines(x,y, type='p', pch=21, cex=0.5, bg="lightblue")
legend("bottomright",pch=20, legend=c("Logit", "Probit"), col=c("blue","red"),y.intersp = 2)
```