[Note: **I've made a Jupyter Notebook (Python) for this so that you can mess around with a few of these ideas yourself. The figures come from this notebook.**]

⚫ ⚫ ⚫ ⚫

Let's motivate this with a problem. Suppose that a doctor has a way of testing some disease by putting a patient's blood sample into some machine which analyses the blood and returns a probability that the patient has a disease. The doctor picks a value (called a threshold) and says, "Okay, if the probability the machine returns to me is higher than this threshold, we'll say they have the disease." Why not pick 50%? The machine itself may have some quirks where it is not an entirely accurate prediction machine. It also may be extremely expensive to treat a patient with the disease, so 51% may only warrent more testing instead of full treatment. Either way, there's four possibilities once the doctor classifies the patients with the machine using the threshold:

- The individual
*does*have the disease and the doctor predicts*positive*. - The individual
*does*have the disease and the doctor predicts*negative*. - The individual
*does not*have the disease and the doctor predicts*positive*. - The individual
*does not*have the disease and the doctor predicts*negative*.

Let's make a small diagram to show all of these, plus a few additional items we'll need soon.

Has Disease | Doesn't Have Disease | |
---|---|---|

Test Positive | True Positive | False Positive |

Test Negative | False Negatives | True Negatives |

(Column Totals) | $$P$$ | $$N$$ |

This diagram is known as a *Confusion Matrix*. We're calling the column totals $$P$$ and $$N$$ at the bottom. We'll abbreviate the possible outcomes (center of table) by TP, FP, FN, and TN (True Positive, False Positive, False Negative and True Negative) for brevity.

A few values worth talking about. If we had $$M$$ total patients, then the True Positive Rate would be $$TP_{rate} = \frac{TP}{P}$$ and False Positive Rate would be $$FP_{rate} = \frac{FP}{N}$$. We can define the other rates similarly, but we'll only need this.

Now, suppose that the doctor has thousands of patients and can afford to try different thresholds. The doctor might pick a threshold of 70% and see what the TP Rate and FP Rate are. Then the doctor might plot them on a chart like so:

This tells us that when he picked 70% for a threshold (note that this is not displayed anywhere on the chart; we'd have to reverse the code a bit to find the threshold this point comes from) he got a False Positve Rate of less than 5%. We got a True Positive Rate of approximately 50%. Not fantastic. The ideal threshold would maximize the True Positive Rate and Minimize the False Positive Rate.

What now? Well, what if we adjusted the 20% threshold. What if we did every threshold from 0% to 100% and plotted them?

This curve is what's referred to as a ROC ("receiver operating characteristics") curve. We notice that there are a few new things on this plot: color, and a dotted red line. The dotted red line refers to the split where TP and FP Rate are equal; in general, you never want to be below this line (why?). It's often put on ROC curves as a reference point but it's not actually part of the curve itself. The other part I've included (which is also not usually part of a roc curve) is colors for the points; I've made is to that dots which are more red correspond to lower threshold values, and points which are blue are higher threshold values.

What's the best threshold then? There's a number of sophisticated methods to pull out a good number, but let's do a naive method. Notice that from left-to-right, the curve rises quickly, then stops rising so quickly, then flattens out. Maybe we can see when it "stops rising so quickly" and find a good point around there? Let's plot the (noisy!) discrete derivative.

Notice that the derivative begins to flatten out around 0.2 FP Rate (if we imagine a smooth curve running through it). Reasonable. After finding the threshold closest to this rate, we find the threshold to be around 0.38. The actual threshold I used to create this was 0.4, so we were fairly close. This value gives us a True Positive Rate of around 0.9 (we predict positive and the patient has the disease around 90% of the time) and a False Positive Rate of 0.2 (we predict negative and the patient has the disease around 20% of the time).

ROC curves can be generalized to any classification problem with a positive and negative outcome where we are predicting using a classifier which returns a probability (e.g., logistic regression). The reader should also feel free to try different values of the threshold in the Jupyter notebook above to get used to the feeling of ROC curves.