KNOWLEDGE BASE

Histograms Display Decimal Values in Incorrect Bins


Published: 08 May 2019
Last Modified Date: 09 May 2019

Issue

When creating histograms in Tableau Desktop that use decimal values for bin sizes, some decimal values may not be placed into the proper bins.

Environment

  • Tableau Desktop (all versions)
  • Data sets with fields containing decimal values

Resolution

This behavior is by design as a result of how decimal values are handled by software (including and not limited to Tableau.) 

Workaround
  • Create a calculated field from the original floating decimal field that multiplies it by a factor of 100: [Field]*100
  • Use the calculated field to create bins, scaling the bin size by a factor of 100 as well.
  • Use the new bins - the values should be distributed as expected now.
  • If the values continue to be placed into unexpected bins, increase the multiplication factor from 100 to 1000.
Note: This workaround will change the histogram axis to display scaled values for the bin sizes as well. It is possible to adjust the axis showing bin size values in the histogram view when the bin is used as a discrete field. This can be done by right-clicking the bin field and setting aliases for the scaled values:
For example, if the bin size for the original decimal bins was .1, and the scaled bins use 10, the aliases would be set where 10 = .1, 20 = .2, etc.

A sample workbook (DecimalBins.twbx) is attached to the article with three worksheets:
  1. Original issue:
    • The histogram has bins based on decimal values with a bin size of .01.
    • The data set contains a single row for each of the values between 0 and .45 increasing by .01 every time.
    • You will notice that the original decimal measure has a value of .29, and in the histogram the values of .28 and .29 are placed in the same bin (.28) instead of each being distributed to an individual corresponding bin as expected. (.28 goes in .28, .29 goes in .29)
  2. Underlying decimal values vs. original decimal values
    • Certain values have long trailing decimal points that go much deeper than the level used in the data set.
    • The value for .29 is actually .28999999999 (as a result of the floating point arithmetic outlined above), and the bins are using these first two digits to determine that .29 belongs in the .28 bin.
  3. Workaround using a scaling factor of 100 with a continuous scale. (Note that this does not change the distribution)
  4. Workaround using a scaling factor of 1000 with a discrete scale. (This changes the distribution to what is expected.)

Cause

This behavior is caused by floating point issues.

Floating point arithmetic uses FLOOR() functions when interpreting decimal values like so: 

FLOOR(0.95/0.001) = 949 ; FLOOR(0.95/0.0001) = 9500 

This is due to the fact that none of 0.95, 0.001, or 0.0001 can be represented exactly in floating point, so they are represented with close, but not quite the same numbers, which when divided and floored produce inconsistent results. What most often happens is the decimal points are displayed as expected values, and their underlying data is based on FLOOR() functions that can slightly off-set the values.
This has potential to show some unexpected values/value behavior such as histogram binning, when displaying the decimals in different formats, or using them in calculated fields, etc.

 

A useful workaround, as noted above, is to multiply numbers by a constant scale factor. This will tend to make the results more accurate. Unfortunately, this isn't always true and the size of the scale factor is difficult to determine. If the user didn't get the right results multiplying by 100, they should try multiplying by 1000.

Did this article resolve the issue?