KNOWLEDGE BASE

Fill Gaps in Sequential Data per Category


Published: 29 Mar 2022
Last Modified Date: 30 Mar 2022

Question

The "New Rows" step in Tableau Prep fills missing dates or numbers in a field across the entire data set, but what if you wanted to fill gaps in a date or number field per category?

For example, the field [Order Number] in the sample data set Superstore has the date #12/27/2021# and thus it would not be filled in by the "New Rows" step, but this date is missing for the "Furniture" category.

Environment

Tableau Prep

Answer

The following steps are demonstrated in the packaged prep flow "Fill missing dates per category.tflx," which can be downloaded from the right-hand pane of this article.
  1. Connect to the sample superstore data set
  2. From the input step, create an aggregation step and add Order Date to the Grouped Fields
  3. From the "Aggregate 1" step, add a new rows step
  4. In the "New Rows 1" step, select Values from one field and then select Order Date in the dropdown
  5. From the "New Rows 1" step, add a clean step
  6. In the "Clean 1" step, create a calculation called "Join Calc" with the formula "1"
  7. From the input step, create a new aggregation step (this will create a new branch) and add Category to the Grouped Fields
  8. From the "Aggregate 2" step, add a clean step
  9. In the "Clean 2" step, create a calculation called "Join Calc" with the formula "1"
  10. Drag "Clean 2" onto "Clean 1" and drop on Join
  11. In "Join 1" join on Join Calc = Join Calc
  12. Drag "Join 1" onto the input step and drop on Join
  13. In "Join 2" do the following:
    1. Set the join to a RIGHT join, which keeps all of the data from "Join 1"
    2. Add join clauses on Order Date = Order Date and Category = Category
  14. From the "Join 2" step, add a clean step
  15. In "Clean 3" do the following:
    1. Remove Join Calc and Join Calc-1
    2. Merge Order Date and Order Date-1
    3. Merge Category and Category-1

Additional Information

This prep flow first creates a complete list of all dates and joins that on 1=1 to a complete list of categories. The join on 1=1 forces duplication so that every category has every date. Joining back to the original data requires a RIGHT join because the complete list of all dates per all categories has more records than the original data.
Did this article resolve the issue?