Multiclass One-Vs-All with Logistic Regression.

Introduction.

Quick post to give you the idea of one way to treat a one-vs-all problem with logistic regress. Suppose you have a bunch of features and instead of a binary classification we have classes like: "Not Sick", "Stomach Ache", "Fever", "Headache", and so on. We want to be able to classify but logistic regression will only classify things as 0 and 1.

The trick here is to do a whole bunch of logistic regressions on the data and each time we let one of the classes be "1" and the other classes are all considered "0". Once we do all of the logistic regressions, we look at the predicted probabilites for each item to be considered "1" and take the argmax — that is, we pick the class which corresponded to the "1" class for the logistic regression that predicted the highest probability for that point.

For example, if we had "Sick", "Stomach Ache", "Fever", and for some row of data $x$ we get the following predictions that $x = 1$:

  • logistic regression 0 (corresponding to "Sick" == "1" and everything else is 0) predicts 0.99
  • logistic regression 1 (corresponding to "Stomach Ache" == "1" and everything else is 0) predicts 0.18
  • logistic regression 2 (corresponding to "Fever" == "1" and everything else is 0) predicts 0.43

We would pick the logistic regression with "Sick" == "1" and everything else 0; this would tell us that our row $x$ is most likely "Sick".

Elementary Example.

Let's show how this works in detail, then (as usual) we'll show an easier way to do it.

In [207]:
# The usual imports
import numpy as np
import pandas as pd

from bokeh.plotting import show, figure, output_notebook, gridplot
from sklearn.datasets import make_blobs
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import train_test_split

output_notebook()
BokehJS successfully loaded.
In [164]:
# Creating data.
data, target = make_blobs(n_samples=40, 
                          n_features=2, 
                          centers=3, 
                          cluster_std=0.75,
                          random_state = 12345)

# Plotting this data to show the classes.
color_dict = {0: "red", 1: "blue", 2: "green"}
colors = [color_dict[t] for t in target]

p = figure(title="Our Three Classes", width=350, height=350)
p.scatter(x=data[:,0], y=data[:,1], color=colors)
q = show(p)

We'll now make a function that does exactly what we said about: turn the target into an array such that one label is "1" and the rest are "0".

In [41]:
def make_one_vs_rest(y, pos_class):
    """
    desc: 
        Creates a one-vs-rest target value array.
        
    args: 
        y (np array) : the target values.
        pos_class : the name of the class that will be set to 1.  all 
                    other classes will be set to 0.
    
    returns:
        np array of new target values with 1's for pos_class and 0 else.
    """
    
    # Check to see if the value exists in y.
    if sum(y == pos_class) == 0:
        raise ValueError("No such class value: {}".format(pos_class))

    return (y == pos_class).astype(int)
In [53]:
# Make one-vs-rest targets for our example.  We will make this more compact later.
y_0 = make_one_vs_rest(target, 0)
y_1 = make_one_vs_rest(target, 1)
y_2 = make_one_vs_rest(target, 2)

color_dict = {0: "red", 1: "blue", 2: "green"}
colors_0 = [color_dict[t] for t in y_0]
colors_1 = [color_dict[t] for t in y_1]
colors_2 = [color_dict[t] for t in y_2]

p_0 = figure(title="0 vs. Rest", width=350, height=350)
p_1 = figure(title="1 vs. Rest", width=350, height=350)
p_2 = figure(title="2 vs. Rest", width=350, height=350)

p_0.scatter(x=data[:,0], y=data[:,1], color=colors_0)
p_1.scatter(x=data[:,0], y=data[:,1], color=colors_1)
p_2.scatter(x=data[:,0], y=data[:,1], color=colors_2)

q = show(gridplot([[p_0, p_1],[p_2, None]]))
In [206]:
# Perform the logistic regressions.
# No training / test yet, we'll just do it for the whole set for demonstration.
logreg_0 = LogisticRegression().fit(data, y_0)
logreg_1 = LogisticRegression().fit(data, y_1)
logreg_2 = LogisticRegression().fit(data, y_2)

This next cell is to demonstrate what we're doing. Notice the "true" value (last column) corresponds to the column index containing the highest probability for that row.

In [191]:
np.set_printoptions(precision=3, suppress=True) #make printing nice for this

np.column_stack([logreg_0.predict_proba(data)[:, 1][:15],
                logreg_1.predict_proba(data)[:, 1][:15],
                logreg_2.predict_proba(data)[:, 1][:15],
                target[:15]])
Out[191]:
array([[ 0.998,  0.   ,  0.005,  0.   ],
       [ 0.192,  0.023,  0.942,  2.   ],
       [ 0.999,  0.   ,  0.002,  0.   ],
       [ 0.991,  0.001,  0.013,  0.   ],
       [ 0.996,  0.   ,  0.019,  0.   ],
       [ 0.122,  0.102,  0.889,  2.   ],
       [ 0.991,  0.001,  0.031,  0.   ],
       [ 0.018,  1.   ,  0.001,  1.   ],
       [ 0.055,  1.   ,  0.   ,  1.   ],
       [ 0.12 ,  0.015,  0.983,  2.   ],
       [ 0.019,  1.   ,  0.001,  1.   ],
       [ 0.992,  0.002,  0.009,  0.   ],
       [ 0.095,  0.007,  0.995,  2.   ],
       [ 0.994,  0.001,  0.01 ,  0.   ],
       [ 0.02 ,  1.   ,  0.001,  1.   ]])
In [193]:
# Take the argmax, which is exactly what the previous cell showed we should do.
logreg_probabilities = np.c_[logreg_0.predict_proba(data)[:, 1],
                             logreg_1.predict_proba(data)[:, 1],
                             logreg_2.predict_proba(data)[:, 1]]

predicted_class = np.apply_along_axis(np.argmax, 1, logreg_probabilities)
In [194]:
# Seeing if the prediction worked!

color_dict = {0: "red", 1: "blue", 2: "green"}
colors = [color_dict[t] for t in predicted_class]

p = figure(title="Our Three Classes (Predicted)", width=375, height=350)
p.scatter(x=data[:,0], y=data[:,1], color=colors)
q = show(p)