Using Grid Heat Maps for Data Visualization

Heat maps represent values in a matrix as colors. Traditionally, heat maps have been used to indicate the level of activity in different systems. For example, a load test result can represent requests to different parts of the application as a heat map. The heat map appears as a mass of colors chosen from a color scheme with gradients from one color to the other.

Here is a typical example from Wikipedia:

640px-WOA09_sea-surf_SAL_AYool

By Plumbago – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=23016243

Above is a geographical heat map of ocean salinity using a rainbow colormap.

Another interesting use of heat maps is to understand the degree of relationship between two variables. This results in a grid where the axes are obtained from the range of each variable. The rest of this post describes the usage of grid heat maps in different scenarios.

House Hunting?

trulia_full

This visualization, taken from Trulia’s trends, depicts the degree to which the day of the week and the time of the day are correlated for house hunting. The full visualization suggests that most house hunting is done on weekdays at 9PM and Sunday evenings.

Web Usage

A web application’s logs can be analyzed to understand the usage patterns. If you take the day of the week on the y-axis and the time of the day in the x-axis, the grid color can be determined by the number of requests or by user sessions, measured over a period of time.

grid-web-usage

Source

The grid heat maps are not limited to time units on both axis. The next three examples show usage in other domains.

Weekly Inventory Prediction

In a recent project, I proposed a prediction model that analyzed weather trends and advised on the inventory for perishable items for each day of the week. In order to depict this, I plotted the items (categorized as A, B, C…) on the x-axis and the day of the week (Mon, Tue…) on the y-axis. The grid color was influenced by the amount of inventory to maintain for a particular item and day. The resulting visualization was quite similar to the web usage example.

Correlation Matrix

A correlation matrix denotes the correlation coefficients between variables at the same time. A heat map grid can be used to represent these coefficients to build a visual representation of the dependence between the variables. This makes it easy to spot the strong dependencies. A positive correlation indicates a strong dependency while a negative correlation indicates a strong inverse dependency; a correlation coefficient closer to zero indicates weak dependency.

grid-correlation

Source


The data source is mtcars data set from R development environment. It comprises of different aspects of automobile design and performance for 32 automobiles. You can refer to the data set to understand the variables used in the correlation matrix. In the matrix, the blue circles indicate positive correlation, while red circles indicate negative correlation.

Confusion Matrix

A confusion matrix is a table that is used to denote the performance of a classifier on test data for which the true labels are known. A typical confusion matrix looks quite like a correlation matrix, except the cells denote the number of times an event (from the test data) was mislabelled. A grid heat map can quickly show the degree of confusion.

grid-confusion

Source


This data set represents classification of images taken by satellites. The type of satellite image is a function of the image features. Can you tell which are the most mislabelled images?

Clock In-Out time

Enough of examples. Let us understand how to build a grid heat map with a faux problem (but real data!).

You are the operations head in an organization and you are health conscious. You want to provide fresh fruits to employees because you are concerned that they keep snacking on unhealthy choices. In order to do that, you want to time the shelf stocking (when do the fruits come out and when do they go in). One way for you to time the activities is when most employees clock in and when they clock out.

We start with collecting the raw data and using a suitable data format logic, we get the in-out records for a month for every employee. Here’s a sample of three days for an employee:

Mon::10:57:21::18:50:05,Tue::09:54:11::18:37:54,Wed::10:25:21::18:06:50

Each record denotes the day of the week, in-time in 24-hour clock format and out-time in 24-hour clock format.

The next step is to read each record and bin the in and out times in a matrix with hours-of-the-day as the x-axis and day-of-week as the y-axis. For example, the record shown above will increase the count in (Mon, 10), (Tue, 9), (Wed, 10) cells of the in-matrix and (Mon, 18), (Tue, 18), (Wed, 18) of the out-matrix.

You get two matrices for day-of-week versus in-time hours and day-of-week versus out-time hours. The cell value is the number of times any employee clocks in (or out) on the day of the week and the hour. Each matrix would look like this:

01234567891011121314151617181920212223
Mon000000063113533242820268134336632000
Tue000000011341082873651415183023200010
Wed000000072611228533717160133422530003
Thu00000006279528335814958140127713000
Fri000000073411026532416442184618612000
Sat0000000103156111100111000
Sun000000163100101000000001

Let us denote frequency of in-time values with a blue scheme and frequency of out-time values as a red scheme. However, we have a problem we have not seen in the previous examples: so far, we have seen a single variable vary between the axes, but in this problem there are two variables – one is the in-time and the other is the out-time. For the purpose of this visualization, we will consider the larger value only because the chances of people leaving office when it is time to arrive at work and vice versa are quite low. We merge the two matrices cell by cell, with precedence given to the variable with a larger value:

with open("data/in_out_series.csv", mode='w') as outfile:
    writer = csv.writer(outfile)
    # "series" is 0 for in-time and 1 for out-time
    writer.writerow(["day", "hour", "value", "series"])
    for row_index, row in enumerate(in_matrix):
        for col_index, in_value in enumerate(row):
            out_value = out_matrix[row_index][col_index]
            in_out_row = [row_index + 1, col_index + 1]
            if in_value >= out_value:
                in_out_row.append(in_value)
                in_out_row.append(0)
            else:
                in_out_row.append(out_value)
                in_out_row.append(1)

            writer.writerow(in_out_row)

 

This gives us a CSV series that is loaded by D3js. The supporting JavaScript creates two color sequences for blue (inColors) and red (outColors), generated from this excellent ColorBrewer scale. It uses these sequences to create a blue and red scale:

// buckets is fixed at 9; so we have 9 colors for blue and red
var blueScale = d3.scale.quantile()
    .domain([0, buckets - 1, d3.max(data, function (d) { return d.value; })])
    .range(inColors);

var redScale = d3.scale.quantile()
    .domain([0, buckets - 1, d3.max(data, function (d) { return d.value; })])
    .range(outColors);

 

Next, each grid cell is drawn as a ‘card.’ All cards start with the same color and transition to a color either in the blue or red scale, depending on the “series” attribute:

var cards = svg.selectAll(".hour")
    .data(data, function(d) {return d.day+':'+d.hour;});

cards.enter().append("rect")
    .attr("x", function(d) { return (d.hour - 1) * gridSize; })
    .attr("y", function(d) { return (d.day - 1) * gridSize; })
    .attr("rx", 4)
    .attr("ry", 4)
    .attr("class", "hour bordered")
    .attr("width", gridSize)
    .attr("height", gridSize)
    .style("fill", inColors[0]) // == outColors[0], initial color is same
    .append("title");

cards.transition().duration(1000)
    .style("fill", function(d) { return d.index == 0 ? blueScale(d.value) : redScale(d.value); });

 

The final visualization:

grid-clockin-inout

 

Isn’t it easy to spot that everyone comes in only after 8 AM and most people leave by 9 PM? So, now you know when to put fresh fruits on the table and when to put them away.

Write to us or leave us a comment if you think this can help you with a business case.

Sayantam Dey

Sayantam Dey

Senior Director Engineering

Sayantam Dey is the Senior Director of Engineering at 3Pillar Global, working out of our office in Noida, India. He has been with 3Pillar for ten years, delivering enterprise products and building frameworks for accelerated software development and testing in various technologies. His current areas of interest are data analytics, messaging systems and cloud services. He has authored the ‘Spring Integration AWS’ open source project and contributes to other open source projects such as SocialAuth and SocialAuth Android.

Leave a Reply

Related Posts

Designing the Future & the Future of Work – The I... Martin Wezowski, Chief Designer and Futurist at SAP, shares his thoughts on designing the future and the future of work on this episode of The Innovat...
The 4 Characteristics of a Healthy Digital Product Team Several weeks ago, I found myself engaged in two separate, yet eerily similar, conversations with CEOs struggling to gain the confidence they needed t...
Recapping Fortune Brainstorm Tech – The Innovation Eng... On this episode of The Innovation Engine, David DeWolf and Jonathan Rivers join us to share an overview of all the news that was fit to print at this ...
4 Reasons Everyone is Wrong About Blockchain: Your Guide to ... You know a technology has officially jumped the shark when iced tea companies decide they want in on the action. In case you missed that one, Long Isl...
The Connection Between Innovation & Story On this episode of The Innovation Engine, we'll be looking at the connection between story and innovation. Among the topics we'll cover are why story ...