Certain analyses require normally-distributed data. But what happens if you don't have normally distributed data?

In [1]:

```
# the usual imports.
import numpy as np
from scipy.stats import skew, probplot
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
```

In [2]:

```
skew_norm_dist = np.random.gamma(4, 2, size=200)
f, ax = plt.subplots(1, 1)
ax.set_title("Skewed Plot")
p = ax.hist(skew_norm_dist)
```

*Tukey Transformation*. Let's see how that works.

In [3]:

```
def tukey_transform(data, L):
"""Transforms data by a power of L."""
return pow(data, L)
```

In [4]:

```
skew_norm_dist = np.random.gamma(4, 2, size=200)
f, axes = plt.subplots(2, 2, figsize=(10,10))
axes[0,0].set_title("Lambda = 0.1")
axes[0,1].set_title("Lambda = 0.3")
axes[1,0].set_title("Lambda = 0.5")
axes[1,1].set_title("Lambda = 0.7")
axes[0,0].hist(tukey_transform(skew_norm_dist, 0.1))
axes[0,1].hist(tukey_transform(skew_norm_dist, 0.3))
axes[1,0].hist(tukey_transform(skew_norm_dist, 0.5))
axes[1,1].hist(tukey_transform(skew_norm_dist, 0.7))
p = ax.hist(skew_norm_dist)
```

That's look a lot more normal! A similar transform (which seems to be more widely known) is the *Box-Cox Transform*. The difference between the Tukey Transform and the Box-Cox transform is, on the surface, a matter of scale. It's defined to be $x\mapsto \frac{x^{\lambda} - 1}{\lambda}$. One interesting property about the Box-Cox transform is that as $\lambda\rightarrow 0$ this becomes $\log(x)$.

That is, as $\lambda\rightarrow 0$ the Tukey Transform will bring all values to 1 (there is that $x = 0$ problem which we can simply define to be 1) but the Box-Cox transform will bring the values to $\log(x)$.

In [5]:

```
def box_cox_transform(data, L):
"""Transforms data by a power of L."""
return (pow(data, L) - 1) / L
```

In [6]:

```
skew_norm_dist = np.random.gamma(4, 2, size=200)
f, axes = plt.subplots(2, 2, figsize=(10,10))
axes[0,0].set_title("Lambda = 0.01")
axes[0,1].set_title("Lambda = 0.1")
axes[1,0].set_title("Lambda = 1")
axes[1,1].set_title("Lambda = 10")
axes[0,0].hist(box_cox_transform(skew_norm_dist, 0.01))
axes[0,1].hist(box_cox_transform(skew_norm_dist, 0.1))
axes[1,0].hist(box_cox_transform(skew_norm_dist, 1))
axes[1,1].hist(box_cox_transform(skew_norm_dist, 10))
p = ax.hist(skew_norm_dist)
```

In [10]:

```
# Copying and pasting because of the weird axes thing.
skew_norm_dist = np.random.gamma(4, 2, size=200)
f, axes = plt.subplots(2, 2, figsize=(10,10))
L = [0.01, 0.1, 1, 10]
ax_idx = [[0,0], [0,1], [1,0], [1,1]]
for n, aidx in enumerate(ax_idx):
a = probplot(box_cox_transform(skew_norm_dist, L[n]))
axes[aidx[0], aidx[1]].set_title("L = {}, r = {}".format(L[n],
round(a[1][2], 3)))
axes[aidx[0], aidx[1]].scatter(*a[0])
```

In [12]:

```
skew_norm_dist = np.random.gamma(4, 2, size=200)
gridvals = np.arange(0.01, 2, 0.01)
r_vals = np.array([probplot(box_cox_transform(skew_norm_dist, gridvals[i]))[1][2]
for i in range(len(gridvals))])
best_val = gridvals[np.argmax(r_vals)]
f, axes = plt.subplots(1, 3, figsize=(10,5))
axes[0].set_title("Original Dist")
axes[1].set_title("Optimal Transform")
axes[2].set_title("L = {} Prob Plot".format(best_val))
q0 = axes[0].hist(skew_norm_dist)
q1 = axes[1].hist(box_cox_transform(skew_norm_dist, best_val))
q2 = axes[2].scatter(*probplot(box_cox_transform(skew_norm_dist, best_val))[0])
```