KNOWLEDGE BASE

Creating a Correlation Value Matrix


Published: 15 Aug 2017
Last Modified Date: 18 Jun 2018

Question

How to create a correlation value matrix in Tableau Desktop.

For example, this view may answer the question: is there a correlation between what sub-categories of products a customer buys, tracked by sales? Or another way to phrase it, if a customer buy products from X sub-category, are they more or less likely to also buy from Y sub-category?

Environment

Answer

Starting in Tableau Desktop 10.2, the function CORR() can be used to calculate the Pearson Correlation Coefficient. The attached example workbook Correlaton Value Matrix.twbx uses the sample data set Superstore to demonstrate the following directions:

Step 1 - Set-Up the Self Join

  1. Navigate to the Data Source tab
  2. Drag a second copy of the Orders table into the canvas area
  3. Create an INNER join on Customer Name = Customer Name (Orders)
Note: This example asks if a customer is more likely to buy products from X sub-category if they already bought from Y sub-category. A variation of this question might be, is the same order more likely to contain products from X sub-category if it already contains product from Y sub-category. In this variation, the join should be on Order ID = Order ID (Orders).

In other words, the join clause should be on the field or fields that define the scope of comparison.

Step 2 - Calculate the Pearson Correlation Coefficient

  1. Select Analysis > Create calculated field
  2. Name the field "Pearson Correlation Coefficient", enter the following formula and click OK
    CORR( { INCLUDE [Customer Name] : SUM( [Sales (Orders)])}, { INCLUDE [Customer Name] : SUM( [Sales])})
Note: [Customer Name] should be replaced with the same field(s) used in the join clause in step 1-3. It is not necessary to use the INCLUDE expression if there is only record per customer in the underlying data.

Step 3 - Create a Calculated Field to filter the Value

  1. Select Analysis > Create calculated field
  2. Name the field "Not same sub category filter", enter the following formula and click OK
    [Sub-Category] != [Sub-Category (Orders)]

Step 4 - Build the View

  1. Drag [Sub-Category] to the Columns shelf
  2. Drag [Sub-Category (Orders)] to the Rows shelf
  3. Drag the [Pearson Correlation Coefficient] to Color on the Marks card
  4. Drag the [Pearson Correlation Coefficient] to Text on the Marks card
  5. In the dropdown on the Marks card, change the Mark type to Square
  6. Drag [Not same sub category filter] to the Filters shelf
  7. In the Filter dialog, check True and click OK

Additional Information

Pearson Correlation Coefficient is a sophisticated statistics tool, and a deeper understanding of how this tool works is recommended before using it. For more information about this subject, see the following articles:  Note: the workbook Correlaton Value Matrix.twbx attached to this article also contains a correlation scatter example.


 
Did this article resolve the issue?