Last Modified Date: 16 Aug 2016
Article Note: This article is no longer actively maintained by Tableau. We continue to make it available because the information is still valuable, but some steps may vary due to product changes.
Cohort Analysis is used to study the behavior or outcomes associated with a group of people over time. In Tableau you can perform cohort analyses and explore different groupings to understand consumer preferences, cause and effect relationships, and what is likely to happen to members of a particular cohort over time.
In business, cohort analysis is most often used to study customer retention. A cohort is a group of customers or subjects that have a common characteristic. Cohorts can be determined by the date at which they joined a website or became a customer, by their age, their demographic, or any other attribute that could be used to group a set of individuals. It is then assumed that something about the cohort drives certain behavior over time.
A popular subject area for cohort analysis is tracking user engagement on social-networking websites over the months following sign-up. A cohort can be based on the month and year in which a user joins the site. A metric could be the percent of users in cohort that visited the site each month after joining. For example, Facebook redesigned the look and feel of their site to make it more attractive and engaging for users. Growth of new users continued and even increased with the announcement. A cohort analysis would show whether the people who joined Facebook after the redesign found it easy and enjoyable to use.
Connect to date- or time-based data
The data you connect to should consist of a date- or time-based list of events per individual. These events should record behavior you want to track—purchase date, sign-in date, message posting date, and so on. It can include more than one event type, and in that case you would need a unique row per individual, event, date, and time, along with any other attributes you need to form your cohorts.
Define the cohort
Determine how you will define a cohort. If the cohort is not represented as a dimension in the data, you can create calculated fields or sets. In the example in this article, the month of a customer’s first purchase is used to define the cohort, and the data source used to show the view includes a field for date of first purchase.
Calculate month of first purchase
In this example, using the DATETRUNC function in a custom calculation will round down the date to the nearest month. You could also use quarter or year instead of month. The [DateFirstPurchase] field is already part of the data source.
Select Analysis > Create Calculated Field.
In the Calculated Field dialog box, complete the following steps.
- For Name, type Month of First Purchase.
- In the Formula box, build the following formula:
- Confirm that the status message indicates that the formula is valid, and then click OK
Calculate months between first and subsequent purchases
This calculated field gets the number of months between the first purchase and each subsequent transaction. The [DateKey] field represents the date of a subsequent sale.
Select Analysis > Create Calculated Field.
In the Calculated Field dialog box:
- For Name, the example in this article uses Months Since First Purchase.
- For Formula, it uses the following formula:
Determine series for each cohort
Drag the Month of First Purchase calculation to Color on the Marks card.
Because the calculation produces a date, when you drag the date to Color, Tableau derives the year by default. Instead you want to show each discrete date, in this case, months.
Right-click the field in the Marks card, and select Exact Date.
Right-click the field again, and select Discrete.
Right-click the field again, select Format, and then select a date format.
See month since first purchase
To see the months since first purchase on the x-axis, you would drag the Months Since First Purchase field you created to the Columns shelf.
Determine distinct customers
The following steps show how you would see the number of distinct customers that bought something in each month.
A CustomerKey field is put on the Rows shelf.
To change the field’s aggregation, right-click it on the Rows shelf, and select Measures (Sum), and then select Count or Count Distinct.
Step 3In the Marks card list of view types, select Line.
See percent of customers in cohort
To see the percent of customers in that Cohort that purchased something in that month.
Right-click the CustomerKey field in the Rows shelf, and select Quick Table Calculation > Percent of Total.
Right-click the field again and select Compute using > Months Since First Purchase.
In month zero, customer participation is at 100 percent. The analysis shows the percent of customers from the cohort with transactions for each subsequent month.
You can use the following techniques to extract additional information from your data:
- Enable legend highlighting to spot trends in each cohort.
- Drag different dimensions to Color on the Marks card to perform rapid analysis by cohort.
- Use Size on the Marks card to incorporate other elements into the analysis, such as the average sale per customer.
From this, you can see that the average monthly, per-customer amount of money spent is high initially, and it declines over time.
This style of cohort analysis measures customer retention and loyalty. Using different metrics, such as count of transactions, or website visits and activity, you can measure how effective your website design or online shopping experience is.
Create an inline view for a new field
If your data source does not include a field that captures a key metric for the cohort analysis—for example, the date a customer first made a purchase&mash;you can create the field by using custom SQL with the Tableau data connection.
Custom SQL to create a data field for the connection
The following SQL query creates an inline view against the Superstore sample, creating a new field for date of first purchase. If you have a connection to the Superstore sample, edit the data connection, select Custom SQL and insert the following query. Bold, highlighted text indicates new code you would insert into the existing query.
SELECT [Orders$].[City] AS [City],
[Orders$].[Customer ID] AS [Customer ID],
[Orders$].[Customer Name] AS [Customer Name],
[Orders$].[Customer Segment] AS [Customer Segment],
[Orders$].[Discount] AS [Discount],
[Orders$].[Order Date] AS [Order Date],
[Orders$].[Order ID] AS [Order ID],
[Orders$].[Order Priority] AS [Order Priority],
[Orders$].[Postal Code] AS [Postal Code],
[Orders$].[Product Base Margin] AS [Product Base Margin],
[Orders$].[Product Category] AS [Product Category],
[Orders$].[Product Container] AS [Product Container],
[Orders$].[Product Name] AS [Product Name],
[Orders$].[Product Sub-Category] AS [Product Sub-Category],
[Orders$].[Profit] AS [Profit],
[Orders$].[Quantity ordered new] AS [Quantity ordered new],
[Orders$].[Region] AS [Region],
[Orders$].[Row ID] AS [Row ID],
[Orders$].[Sales] AS [Sales],
[Orders$].[Ship Date] AS [Ship Date],
[Orders$].[Ship Mode] AS [Ship Mode],
[Orders$].[Shipping Cost] AS [Shipping Cost],
[Orders$].[State or Province] AS [State or Province],
[Orders$].[Unit Price] AS [Unit Price],
[o1].[Date First Purchase]
INNER JOIN (SELECT [Customer Name], MIN([Order Date]) as [First Purchase Date] FROM [Orders$] GROUP BY [Customer Name]) o1
on [Orders$].[Customer Name] = o1.[Customer Name]
Use sets in a calculated field
If you do not want to use custom SQL to analyze a cohort, the following process is an alternative.
This process explains how to calculate a customer’s first date of purchase using sets. This method is simple if the number of cohorts is small. If there are only four cohorts (e.g. Year 2010, 2011, 2012, 2013) then you need to create four sets. If there are 48 cohorts (e.g., Jan 2010, Feb 2010, ...Dec 2013) then you need to create 48 sets, which can be time consuming. In that case, the custom SQL solution is best.
Extract the data if you are using a live connection to an excel file. Excel's Jet engine does not support using sets in calculated fields. For more information on how to extract your data, see Creating an Extract topic in Tableau Help.
Create sets using Customer Name field.
- Right-click Customer Name in Dimensions pane and click Create Set.
- Go to the Condition tab, select By Formula:, and enter the following formula:
max( iif( year([Order Date])=2010, 1, 0) )=1
Sets return boolean values, so this formula returns 'true' for customers whose first year of purchase is in 2010 and 'false' for customers whose first year of purchase is not in 2010. This means this formula specifies that the order date of customers in this set must be 2010.
- Right click the newly created set in the Data window and select Duplicate.
- Edit the formula to be:
max( iif( year([Order Date])=2011, 1, 0) )=1
- Rename this set 2011, and repeat steps 3-5 to create a set for each cohort value.
NOTE: With this method you must hardcode the cohort values. If the cohort values are dynamic (such as on the month level) use custom SQL instead.
Create a calculated field to bring all the sets into one field and name it Cohort.
- Select Analysis, and click Create Calculated Field...
- In the Formula dialog box, type the following formula to bring all of the sets into a single cohort field:
IF  THEN '2010'
ELSEIF  THEN'2011'
ELSEIF  THEN '2012'
ELSEIF  THEN '2013'
This formula defines Cohort field based on the sets you created. Now, when you add the Cohort field to the view, you will see a cohort for 2010, 2011, 2012, and 2013.
Place the Cohort on the view.
In this example, the trend of average sales per cohort based on order date is shown, where color shows details about the cohort.
Alternate Search Terms:cohort, cohort analysis, tracking, websites