Generating Correlation Heat Maps in Seaborn


It’s Getting Hot In Here.

Let’s revisit a previous post, where we had been working with Abalone data from the UCI Machine Learning Repository.

Previously, we had generated a scatter matrix to look for linear correlations, here’s a refresher of our results:

Scatter Matrix - Abalone Data


This time, let’s use the same dataset to generate a Seaborn Heat Map of correlation coefficients.

We’ll be utilizing the following Python modules

Imports

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

I’ve modularized the previous data call, which will give us a Pandas DataFrame of the online data.

Grab All The Data

def getDF(data_url, columns):
    #retrieve data from url, create dataframe, return it
    data = pd.read_csv(data_url, names=columns)
    return data

Next, let’s build our heat map! It’s important to note that we will be creating our own color gradient map, with high correlations displayed in increasing red shades, and lower correlations trending into the blue hues.

We will also add our correlation values inside the graph itself, displaying floating numbers in each category.

Seaborn Heat Map

def heatMap(df):
    #Create Correlation df
    corr = df.corr()
    #Plot figsize
    fig, ax = plt.subplots(figsize=(10, 10))
    #Generate Color Map
    colormap = sns.diverging_palette(220, 10, as_cmap=True)
    #Generate Heat Map, allow annotations and place floats in map
    sns.heatmap(corr, cmap=colormap, annot=True, fmt=".2f")
    #Apply xticks
    plt.xticks(range(len(corr.columns)), corr.columns);
    #Apply yticks
    plt.yticks(range(len(corr.columns)), corr.columns)
    #show plot
    plt.show()

Visualize It


Awesome! However, we can do better. The diagonal plane mirrors itself on either side, with both redundant results AND self to self correlations, which we should drop.

Toggling Our Plot, Apply Mask Parameter

Let’s alter our previous code a little, by adding in a ‘mirror’ parameter that will allow us to drop our redundant mappings.

If the user enters a second paramenter, ‘mirror’, as ‘False’, we will generate half the visualization as we did previously.

def heatMap(df, mirror):

   # Create Correlation df
   corr = df.corr()
   # Plot figsize
   fig, ax = plt.subplots(figsize=(10, 10))
   # Generate Color Map
   colormap = sns.diverging_palette(220, 10, as_cmap=True)
   
   if mirror == True:
      #Generate Heat Map, allow annotations and place floats in map
      sns.heatmap(corr, cmap=colormap, annot=True, fmt=".2f")
      #Apply xticks
      plt.xticks(range(len(corr.columns)), corr.columns);
      #Apply yticks
      plt.yticks(range(len(corr.columns)), corr.columns)
      #show plot

   else:
      # Drop self-correlations
      dropSelf = np.zeros_like(corr)
      dropSelf[np.triu_indices_from(dropSelf)] = True
      # Generate Color Map
      colormap = sns.diverging_palette(220, 10, as_cmap=True)
      # Generate Heat Map, allow annotations and place floats in map
      sns.heatmap(corr, cmap=colormap, annot=True, fmt=".2f", mask=dropSelf)
      # Apply xticks
      plt.xticks(range(len(corr.columns)), corr.columns);
      # Apply yticks
      plt.yticks(range(len(corr.columns)), corr.columns)
   # show plot
   plt.show()
   


Much better! We’ve dropped a substantial amount of redundant information that only served to act as distractive noise. Now we can focus on the relationships with ease.

Code Summary

Part I

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

def getDF(data_url, columns):
    #retrieve data from url, create dataframe, return it
    data = pd.read_csv(data_url, names=columns)
    return data
    
    def heatMap(df):
    #Create Correlation df
    corr = df.corr()
    #Plot figsize
    fig, ax = plt.subplots(figsize=(10, 10))
    #Generate Color Map
    colormap = sns.diverging_palette(220, 10, as_cmap=True)
    #Generate Heat Map, allow annotations and place floats in map
    sns.heatmap(corr, cmap=colormap, annot=True, fmt=".2f")
    #Apply xticks
    plt.xticks(range(len(corr.columns)), corr.columns);
    #Apply yticks
    plt.yticks(range(len(corr.columns)), corr.columns)
    #show plot
    plt.show()

Part II

    def halfHeatMap(df, mirror):

       # Create Correlation df
       corr = df.corr()
       # Plot figsize
       fig, ax = plt.subplots(figsize=(10, 10))
       # Generate Color Map
       colormap = sns.diverging_palette(220, 10, as_cmap=True)

       if mirror == True:
          #Generate Heat Map, allow annotations and place floats in map
          sns.heatmap(corr, cmap=colormap, annot=True, fmt=".2f")
          #Apply xticks
          plt.xticks(range(len(corr.columns)), corr.columns);
          #Apply yticks
          plt.yticks(range(len(corr.columns)), corr.columns)
          #show plot

       else:
          # Drop self-correlations
          dropSelf = np.zeros_like(corr)
          dropSelf[np.triu_indices_from(dropSelf)] = True
          # Generate Color Map
          colormap = sns.diverging_palette(220, 10, as_cmap=True)
          # Generate Heat Map, allow annotations and place floats in map
          sns.heatmap(corr, cmap=colormap, annot=True, fmt=".2f", mask=dropSelf)
          # Apply xticks
          plt.xticks(range(len(corr.columns)), corr.columns);
          # Apply yticks
          plt.yticks(range(len(corr.columns)), corr.columns)
       # show plot
       plt.show()
Written on June 6, 2018