KNOWLEDGE BASE

Finding the Pearson Correlation


Published: 15 Feb 2017
Last Modified Date: 22 Feb 2018

Question

How to find the Pearson correlation of two measures in Tableau Desktop.

Environment

Tableau Desktop
 

Answer

CLICK TO EXPAND STEPS
Option 1 - Using Tableau 10.2 and later versions

Step 1: Create a scatterplot

This example uses Superstore sample data and is attached to this article. Open the workbook Pearson Correlation.twbx for more information.  
  1. Drag Profit to Columns and Sales to Rows.
  2. In the Analysis menu, uncheck Aggregate Measures
  3. Right-click the view and choose Trend Lines > Show Trend Lines.
  4. Right-click the view again and select Trend Lines > Describe Trend Model
  5. Locate the R-Squared value in the Describe Trend Model dialog box. In this example, the R-Squared value is 0.229503.

Step 2: Calculate the Pearson correlation

You can use different options to find the Pearson correlation. For example: 

  1. Use a calculator or other program
    1. Calculate the square root of the R-squared value. Which will be your correlation (r): √0.229498 = 0.4791
    2. Rounded to two digits, the value in this example is 0.48.
  2. Create a calculated field using the CORR function. 
    • Enter a formula similar to the following and click OK
      CORR([Profit], [Sales])
    • This formula returns the Pearson correlation coefficient of two expressions. The Pearson correlation measures the linear relationship between two variables. Results range from -1 to +1 inclusive, where 1 denotes an exact positive linear relationship, as when a positive change in one variable implies a positive change of corresponding magnitude in the other, 0 denotes no linear relationship between the variance, and −1 is an exact negative relationship.
  3. Create a calculated field using the WINDOW_CORR function. 
    • Enter a formula similar to the following and click OK
      WINDOW_CORR(SUM([Profit]), SUM([Sales]))
    • Returns the Pearson correlation coefficient of two expressions within the window. The window is defined as offsets from the current row. Use FIRST()+n and LAST()-n for offsets from the first or last row in the partition. If start and end are omitted, the entire partition is used.
The new correlation and covariance functions were added to Tableau Desktop 10.2, for more information see What's new in Tableau Desktop.
 
To view the steps showed in the below video, please expand the above section.
Note: the video has no sound.
 
 
CLICK TO EXPAND STEPS
Option 2 - Using earlier versions of Tableau Desktop
The equivalent of the three calculations used in Option 1 can also be reproduced using the following formulas. 

Instead of CORR
  1. Select Analysis > Create calculated field
  2. Name the calculated field 
  3. Enter the following formula and click OK
    COVAR([Profit], [Sales]) / (STDEV([Profit])*STDEV([Sales]))
No offset specified. 
  1. Select Analysis > Create calculated field
  2. Name the calculated field 
  3. Enter the following formula and click OK
    WINDOW_COVAR(SUM([Profit]), SUM([Sales]))/
    (WINDOW_STDEV(SUM([Profit]))*WINDOW_STDEV(SUM([Sales])))
With offsets.
  1. Select Analysis > Create calculated field
  2. Name the calculated field 
  3. Enter the following formula and click OK
    WINDOW_COVAR(SUM([Profit]), SUM([Sales]),-5,0)/
    (WINDOW_STDEV(SUM([Profit]),-5,0)*WINDOW_STDEV(SUM([Sales]),-5,0))
 
For more examples and advanced explanations, see Covariance, Trend Lines, Correlation Coefficient R and R-Squared in the Community.

Additional Information

  • A correlation, r, is a single number that represents the degree of relationship between two measures. The correlation coefficient is a value such that -1 <= r <= 1.
  • A positive correlation indicates a relationship between x and y measures such that as values of x increase, values of y also increase.
  • A negative correlation indicates the opposite—as values of x increase, values of y decrease.
  • The closer the correlation, r, is to -1 or 1, the stronger the relationship between x and y.
  • If r is close to or equal to 0, there is a weak relationship or no relationship between the measures.
  • As a general rule, you can interpret r values this way:
    • +.70 or higher indicates a very strong positive relationship
    • +.40 to +.69 indicates a strong positive relationship
    • +.20 to +.39 indicates a moderate positive relationship
    • -.19 to +.19 indicates no or a weak relationship
    • -.20 to -.39 indicates a moderate negative relationship
    • -.40 to -.69 indicates a strong negative relationship
    • -.70 or lower indicates a very strong negative relationship

See supported data sources in Tableau Help when using the CORR() aggregate function.
Did this article resolve the issue?