Creating A HexBin Visualization


Data Voodoo

HexBinGenerator

Today, we’re going to develop a small program to generate a handy HexBin plot that will allow us to simply define the source file, our desired x and y attributes, gridsize, and extents.

What is a Hex Bin?

A Hex Bin is a plot formed by hexagonal tessellation of bivariate data points. For example, we could be looking at a dataset containing observations of drug treatment dosages, and enzymatic rates, both recorded as continuous data. Perhaps we could hope to identify the ideal dose by comparing our suspected effect on a target enzyme.

What Does It Do?

Unlike a simple scatter plot, which plots each observation as a single point, we are instead plotting the density of observations within a bin.

Why Is That Helpful?

Many reasons! Like everything in statistics, there are many rules where it might be useful, and where it might not be useful. One such case of the approach being advantageous, is when there are many overlapping datapoints. A swarmplot could be used to seperate the points, but we’ll talk about that another time. In the case of the Hexbin, we get a clear picture of density, distributions, and relative ranges, similar to a heat map. However, unlike a heat map, the shape of the hexagon allows us to limit the effects of edge biases found in square bins, while retaining the ability to form a continuous grid.

Getting Started

Documentation

In order to build a plot, we’ll first peek at the documentation, we find the simplest case: matplotlib.pyplot.hexbin(x, y) where x and y are bivariate datasets.

We will build on this for our program adding in the following parameters:

gridsize - Set the number of hexagons in the x-direction, or x and y direction. The default is 100.

extents - Set X & Y bin sizes.

cmap - Set a unique color map.

Another useful parameters we aren’t using:

edgecolor- Allows us to set the color around each bin, useful for clearly separating each bin via ‘white’.

Imports

import pandas as pd
import matplotlib.pyplot as plt

Build DataFrame: Chunk, Iterate, Concatenate For Big Files!

def get_data(file):
    """
    :param file: file location/name
    :return df: dataframe of file
    """
    df = pd.read_csv(file, iterator=True, chunksize=100)
    df1 = pd.concat(df, ignore_index=True)
    return df1

Build HexBin Plot

Ok, let’s go ahead and revisit our old abalone dataset that we had used to create a scatter matrix and generate the default hex plot.

def hexBinning(df, x, y):
    """
    :param df: dataframe with target data
    :param x: name of x feature
    :param y: name of y feature
    :return: generates a plot
    """
    #Build Hexbin
    plt.hexbin(df[x], df[y])
    # Title, Axes
    plt.title('Hex Plot')
    plt.xlabel(x)
    plt.ylabel(y)
    # Add color bar
    plt.colorbar()
    # Display
    plt.show()

Output Of Default Parameters


Getting Specific

We’re going to expand our plot method to include gridsize. extents, a cmap, and edgecolors.

Add Grid Size


def hexBinning(df, x, y):
    """
    :param df: dataframe with target data
    :param x: name of x feature
    :param y: name of y feature
    :param ext: [xmin, xmax, ymin, ymax] from a list for hexbinning, similar to setting axes
    :return: generates a plot
    """
    #Build Hexbin
    plt.hexbin(df[x], df[y], gridsize=(15, 12))
    # Title, Axes
    plt.title('Hex Plot')
    plt.xlabel(x)
    plt.ylabel(y)
    # Add color bar
    plt.colorbar()
    # Display
    plt.show()


Choosing a new Color Map


def hexBinning(df, x, y):
    """
    :param df: dataframe with target data
    :param x: name of x feature
    :param y: name of y feature
    :param ext: [xmin, xmax, ymin, ymax] from a list for hexbinning, similar to setting axes
    :return: generates a plot
    """
    #Build Hexbin
    plt.hexbin(df[x], df[y], gridsize=(15, 12), cmap='inferno')
    # Title, Axes
    plt.title('Hex Plot')
    plt.xlabel(x)
    plt.ylabel(y)
    # Add color bar
    plt.colorbar()
    # Display
    plt.show()


Specifying Extents


def hexBinning(df, x, y, ext):
    """
    :param df: dataframe with target data
    :param x: name of x feature
    :param y: name of y feature
    :param ext: [xmin, xmax, ymin, ymax] from a list for hexbinning, similar to setting axes
    :return: generates a plot
    """
    #Build Hexbin
    plt.hexbin(df[x], df[y], gridsize=(15, 12), cmap='inferno', extent=(ext[0], ext[1], ext[2], ext[3]))
    # Title, Axes
    plt.title('Hex Plot')
    plt.xlabel(x)
    plt.ylabel(y)
    # Add color bar
    plt.colorbar()
    # Display
    plt.show()


Run It

if __name__ == '__main__':
    #pass file location/name
    file1 = input("File")
    df = get_data((file1))
    x_col = input("X Column Name") 
    y_col = input("Y Column Name")
    x_min = input("X Min")
    x_max = input("X Max")
    y_min = input("Y Min")
    y_max = input("X Max")
    axes = [x_min, x_max, y_min, y_max]
    hexBinning(df,x_col, y_col, axes)
Written on July 1, 2018