In the Tableau Prep workbook "duplicates from join.tfl", the "January Inventory" table lists each product once with its total inventory. The "January Orders" table may have many rows for a product if multiple orders were for the same product. After joining the "Sales" and "Inventory" table, the measure [Inventory] has been duplicated. The first flow demonstrates the issue.
Option 1: Use a Union instead of a Join
None of the values of [Inventory] will be directly related with any of the data from "January Orders". This may cause issues when creating calculations without aggregations, or in views that show a higher level of granularity then the field(s) from the join clause, which would be [Product] in this example
Option 2: Create a Calculation to Evenly Distribute Measure Values
We can use an aggregate step to find the number of orders per product and then divide inventory by the number of orders. Then final result will be that an equal portion of the inventory will be represented for each order. For example, there are 2 orders for hats. [Inventory] will be 100 for each hat order. [New Inventory] will be 50 for each order.Please note:
this may cause inaccurate inventory information in a view if only some orders for a product are filtered out.
- Create an INNER join "January Inventory" and "January Orders" on [Product] = [Product]
- Add Aggregate from Join 2
- In the Aggregate 1 step, do the following:
- Drag either [Product] to Grouped Fields
- Drag [Order ID] to Aggregated Fields
- Change the aggregation for Order ID to Count Distinct
- Rename Order ID to something like Orders per Product
- Instead of [Order ID], use the field from the other table that has a unique value for every row.
- If there is not one unique identifier, then before creating the aggregate step, add a step to create a calculated field that concatenates the dimensions required to have a unique value for every row.
- For example, STR([Order ID])+STR([Date])
- Drag Join 2 over Aggregate 1 and drop Join 2 onto New Join
- In the Join 3 step, create a LEFT join on [Product] = [Product]
- Add a Step from Join 3
- In the Clean 1 step, click the Create Calculated Field… button
- In the Calculated Field dialog box that opens, do the following, and then click OK:
- Name the calculated field. In this example, the calculated field is named "New Inventory"
- In the formula field, create a calculation similar to the following:
[Inventory]/[Orders per Product]
Option 3: Aggregate data so that the tables have a 1:1 relationship
- Add an Aggregate step from January Orders 4
- In the Aggregate 2 step, do the following:
- Drag [Product] to the Grouped Fields pane
- Drag [Sales], [Quantity], [Order ID] and [Customer] to the Aggregated Fields pane
- Click CNT on the [Customer] field select Count Distinct
- Click SUM on the [Order ID] field select Count Distinct
- Drag Aggregate 2 over January Inventory 4 to create an INNER join on Product = Product
NOTE: The [Product] Field was dragged to the Grouped Fields pane because that is the only field used to join on. The final table will only have one row per product, which means details on each order will be lost.
Option 4: Use a FIXED expression in Tableau Desktop
For detailed directions, see Removing Duplicate Data with LOD Calculations
The advantage of using a FIXED expression in Tableau Desktop is that level of detail (LOD) expressions have special aggregation rules. Therefore, a de-duplicated inventory field using FIXED could show the full, unduplicated inventory for a product whether there was only one order from that product in the view, or multiple.