📚 The CoCalc Library - books, templates and other resources
License: OTHER
\documentclass[12pt]{article}12\usepackage{mathtools}34\title{Bayes's theorem and logistic regression}5\author{Allen B. Downey}67\newcommand{\logit}{\mathrm{logit}}8\renewcommand{\P}{\mathrm{P}}9\renewcommand{\O}{\mathrm{O}}10\newcommand{\LR}{\mathrm{LR}}11\newcommand{\LO}{\mathrm{LO}}12\newcommand{\LLR}{\mathrm{LLR}}13\newcommand{\OR}{\mathrm{OR}}14\newcommand{\LOR}{\mathrm{LOR}}15\newcommand{\IF}{\mathrm{if}}16\newcommand{\notH}{\neg H}1718\setlength{\headsep}{3ex}19\setlength{\parindent}{0.0in}20\setlength{\parskip}{1.7ex plus 0.5ex minus 0.5ex}2122\begin{document}2324\maketitle2526\begin{abstract}27My two favorite topics in probability and statistics are28Bayes's theorem and logistic regression. Because there are29similarities between them, I have always assumed that there is30a connection. In this note, I demonstrate the31connection mathematically, and (I hope) shed light on the32motivation for logistic regression and the interpretation of33the results.34\end{abstract}353637\section{Bayes's theorem}3839I'll start by reviewing Bayes's theorem, using an example that came up40when I was in grad school. I signed up for a class on Theory of41Computation. On the first day of class, I was the first to arrive. A42few minutes later, another student arrived. Because I was expecting43most students in an advanced computer science class to be male, I was44mildly surprised that the other student was female. Another female45student arrived a few minutes later, which was sufficiently46surprising that I started to think I was in the wrong room. When47another female student arrived, I was confident I was in the wrong48place (and it turned out I was).4950As each student arrived, I used the observed data to update my51belief that I was in the right place. We can use Bayes's theorem to52quantify the calculation I was doing intuitively.5354I'll us $H$ to represent the hypothesis that I was in the right55room, and $F$ to represent the observation that the first other56student was female. Bayes's theorem provides an algorithm for57updating the probability of $H$:5859\[ \P(H|F) = \P(H)~\frac{\P(F|H)}{P(F)}\]6061Where6263\begin{itemize}6465\item $\P(H)$ is the prior probability of $H$ before the other66student arrived.6768\item $\P(H|F)$ is the posterior probability of $H$, updated based69on the observation $F$.7071\item $\P(F|H)$ is the likelihood of the data, $F$, assuming that72the hypothesis is true.7374\item $P(F)$ is the likelihood of the data, independent of $H$.7576\end{itemize}7778Before I saw the other students, I was confident I was in the right79room, so I might assign $\P(H)$ something like 90\%.8081When I was in grad school most advanced computer science classes were8290\% male, so if I was in the right room, the likelihood of the83first female student was only 10\%. And the likelihood of three84female students was only 0.1\%.8586If we don't assume I was in the right room, then the likelihood of87the first female student was more like 50\%, so the likelihood88of all three was 12.5\%.8990Plugging those numbers into Bayes's theorem yields $\P(H|F) = 0.64$91after one female student, $\P(H|FF) = 0.26$ after the second,92and $\P(H|FFF) = 0.07$ after the third.939495\section{Logistic regression}9697Logistic regression is based on the following functional form:9899\[ \logit(p) = \beta_0 + \beta_1 x_1 + ... + \beta_n x_n \]100101where the dependent variable, $p$, is a probability,102the $x$s are explanatory variables, and the $\beta$s are103coefficients we want to estimate. The $\logit$ function is the104log-odds, or105106\[ \logit(p) = \ln \left( \frac{p}{1-p} \right) \]107108When you present logistic regression like this, it raises109three questions:110111\begin{itemize}112113\item Why is $\logit(p)$ the right choice for the dependent114variable?115116\item Why should we expect the relationship between $\logit(p)$117and the explanatory variables to be linear?118119\item How should we interpret the estimated parameters?120121\end{itemize}122123The answer to all of these questions turns out to be Bayes's124theorem. To demonstrate that, I'll use a simple example where125there is only one explanatory variable. But the derivation126generalizes to multiple regression.127128On notation: I'll use $\P(H)$ for the probability129that some hypothesis, $H$, is true. $\O(H)$ is the odds of the same130hypothesis, defined as131132\[ \O(H) = \frac{\P(H)}{1 - \P(H)} \]133134I'll use $\LO(H)$ to represent the log-odds of $H$:135136\[ \LO(H) = \ln \O(H) \]137138I'll also use $\LR$ for a likelihood ratio, and $\OR$ for an odds139ratio. Finally, I'll use $\LLR$ for a log-likelihood ratio, and140$\LOR$ for a log-odds ratio.141142143144145146\section{Making the connection}147148To demonstrate the connection between Bayes's theorem and149logistic regression, I'll start with the odds form150of Bayes's theorem. Continuing the previous example,151I could write152153\begin{equation} \label{A}154\O(H|F) = \O(H)~\LR(F|H)155\end{equation}156157where158159\begin{itemize}160161\item $\O(H)$ is the prior odds that I was in the right room,162163\item $\O(H|F)$ is the posterior odds after seeing one female student,164165\item $\LR(F|H)$ is the likelihood ratio of the data, given166the hypothesis.167168\end{itemize}169170The likelihood ratio of the data is:171172\[ \LR(F|H) = \frac{\P(F|H)}{\P(F|\notH)} \]173174where $\notH$ means $H$ is false.175176Noticing that logistic regression is expressed in terms of177log-odds, my next move is to write the log-odds form of178Bayes's theorem by taking the log of Eqn~\ref{A}:179180\begin{equation} \label{B}181\LO(H|F) = \LO(H) + \LLR(F|H)182\end{equation}183184If the first student to arrive had been male, we would write185186\begin{equation} \label{C} \nonumber187\LO(H|M) = \LO(H) + \LLR(M|H)188\end{equation}189190Or more generally if we use $X$ as a variable to represent191the sex of the observed student, we would write192193\begin{equation} \label{D}194\LO(H|X) = \LO(H) + \LLR(X|H)195\end{equation}196197I'll assign $X=0$ if the observed student is female and198$X=1$ if male. Then I can write:199200\begin{equation} \label{E} \nonumber201\LLR(X|H) = \left\{202\begin{array}{lr}203\LLR(F|H) & \IF ~X = 0\\204\LLR(M|H) & \IF ~X = 1205\end{array}206\right.207\end{equation}208209Or we can collapse these two expressions into one by using210$X$ as a multiplier:211212\begin{equation} \label{F}213\LLR(X|H) = \LLR(F|H) + X [\LLR(M|H) - \LLR(F|H)]214\end{equation}215216217\section{Odds ratios}218219The next move is to recognize that220the part of Eqn~\ref{F} in brackets is the log-odds ratio221of $H$. To see that, we need to look more closely at odds ratios.222223Odds ratios are often used in medicine to describe the association224between a disease and a risk factor. In the example scenario, we225can use an odds ratio to express the odds of the hypothesis226$H$ if we observe a male student, relative to the odds if we227observe a female student:228229\[ \OR_X(H) = \frac{\O(H|M)}{\O(H|F)} \]230231I'm using the notation $\OR_X$ to represent the odds ratio232associated with the variable $X$.233234Applying Bayes's theorem to235the top and bottom of the previous expression yields236237\[ \OR_X(H) = \frac{\O(H)~\LR(M|H)}{\O(H)~\LR(F|H)} =238\frac{\LR(M|H)}{\LR(F|H)}\]239240Taking the log of both sides yields241242\begin{equation} \label{G}243\LOR_X(H) = \LLR(M|H) - \LLR(F|H)244\end{equation}245246This result should look familiar, since it appears in247Eqn~\ref{F}.248249250\section{Conclusion}251252Now we have all the pieces we need; we just have to assemble them.253Combining Eqns~\ref{F} and \ref{G} yields254255\begin{equation} \label{H}256\LLR(H|X) = \LLR(F) + X~\LOR(X|H)257\end{equation}258259Combining Eqns~\ref{D} and \ref{H} yields260261\begin{equation} \label{I}262\LO(H|X) = \LO(H) + \LLR(F|H) + X~\LOR(X|H)263\end{equation}264265Finally, combining Eqns~\ref{B} and \ref{I} yields266267\[ \LO(H|X) = \LO(H|F) + X~\LOR(X|H) \]268269We can think of this equation as the log-odds form of Bayes's theorem,270with the update term expressed as a log-odds ratio. Let's compare271that to the functional form of logistic regression:272273\[ \logit(p) = \beta_0 + X \beta_1 \]274275The correspondence between these equations suggests the following276interpretation:277278\begin{itemize}279280\item The predicted value, $\logit(p)$, is the posterior log281odds of the hypothesis, given the observed data.282283\item The intercept, $\beta_0$, is the log-odds of the284hypothesis if $X=0$.285286\item The coefficient of $X$, $\beta_1$, is a log-odds ratio287that represents odds of $H$ when $X=1$, relative to288when $X=0$.289290\end{itemize}291292This relationship between logistic regression and Bayes's theorem293tells us how to interpret the estimated coefficients. It also294answers the question I posed at the beginning of this note:295the functional form of logistic regression makes sense because296it corresponds to the way Bayes's theorem uses data to update297probabilities.298299\end{document}300301302