KNOWLEDGE BASE

Extracting Character Delimited Substrings From Strings


Published: 13 Dec 2013
Last Modified Date: 11 Mar 2024

Question

How to extract a string of characters, delimited by specific start and end characters, from a larger string of characters.

Environment

  • Tableau Desktop

Answer

The following instructions describe how to extract a sub string from a long string of characters and can be reviewed in the attached workbook.
  1. Select Analysis > Create Calculated Field.
  2. In the Calculated Field dialog box that opens:
    • Name the calculated field. In the example workbook, the calculated field is named Substring.
    • In the formula field, create a calculated field similar to the following:
      IF CONTAINS([<Dimension>], '<Identifying Character 1>') THEN
      MID(
          [<Dimension>],
          FIND([<Dimension>],'<Identifying Character 1>') + 1,
          FIND([<Dimension>],'<Identifying Character 2>') - (FIND([<Dimension>],'<Identifying Character 1>') +1)
      )
      END
      
  3. Click OK.
  4. Drag Substring to the Rows shelf.

Additional Information

  • This calculation does not include the identifying characters in the final substring. To include the identifying characters, remove "+1" in both placs is appears in the above calculation.
  • The IF statement using the CONTAINS([<Dimension>], '<Identifying Character 1>') condition filters out any strings that do no contain the first identifying character. This is necessary because FIND() will return 0 if the character is not found, which is still a valid starting position for MID().
  • "Identifying Character 1" would be at the beginning of the desired sub string.
  • "Identifying Character 2" would be at the end of the desired sub string.
  • When trying to find the substring between 2 instances of the same character (for example "dog" in the string "the *dog* is brown"), use FINDNTH() instead of FIND() for the second instance. This may look similar to:
    • MID(
          [<Dimension>],
          FIND([<Dimension>],'<Identifying Character>') + 1,
          FINDNTH([<Dimension>],'<Identifying Character>', 2) - (FIND([<Dimension>],'<Identifying Character>') +1)
      )
      
Did this article resolve the issue?