Isolating Vocals from a Music Track

How can we isolate the vocals from a particular song? This was a real-life problem that I had a few days ago. It turns out that, practically, many modern (popular) songs are easy to do this with. It's not so easy for many other songs. Because the method is so simple and so nice, we'll focus on a particular kind of modern song where the solution is trivial: songs that have a karaoke version.

Many times musicians or the associated labels will release karaoke versions (that is, the same song without the vocal track). You might be surprised at what's available; I was surprised to see Rancid and Radiohead but less surprised to see Bon Jovi. How does having a track with no vocals help us get vocals from a track?

For the sake of simplicity, we'll assume that recording is perfect (it is not) and that digital representations of songs are mixtures of pure waves (they are not). This is a reasonable approximation and, in practice, seems to work well with the method we'll propose below. The only other requirement is that the karaoke version of the song matches the original song's background track closely (which seems to usually be the case).

To solve this, we note that a song can be represented by a huge sum of $$\sin$$ and $$\cos$$ terms. In fact, all noise can be represented this way. Let's call the background parts of a song $$Bg$$ and the vocals $$Vocals$$. Then the whole song, $$Song$$ is equal to the background parts and the vocals; that is, $$Song = Bg + Vocals$$. Now, we note that the karaoke part is only the background vocals, $$Bg$$. The solution then is to subtract the karaoke part from the original song. That is, \[Song - Bg = (Bg + Vocals) - Bg = Vocals.\]

To show that this is really something that we can do, we can experiment in Python. Code below with cool pictures.

import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

def sin(n, x, phi = 0):
    return np.sin(n*x + phi)

X = np.linspace(1, 6, 200)
# create random background part.
background_part = sum([np.random.normal()*sin(N, X, np.random.normal()) 
			for N in range(100)])

# create part that we want.
part_we_want = 3*sin(3, X, 2) + sin(5, X, 4) 

# The song is the sum of these waves.
simple_song = background_part + part_we_want


# Plot the whole thing.  Make subplots.
f, (ax1, ax2) = plt.subplots(
    nrows = 2,
    ncols = 1,
    figsize = (10, 5), 
    sharex = True, 
    sharey = True,
	)

f.subplots_adjust(hspace = .25) # space between plots.

ax1.set_title("Whole Song with Part We Want in Green")
ax1.plot(X, simple_song)
ax1.plot(X, part_we_want)

ax2.set_title("Whole Song Minus the Background Part")
ax2.plot(X, simple_song - background_part)