KNOWLEDGE BASE

Moving tsm controller Fails with ZooKeeper timeout


Published: 24 May 2018
Last Modified Date: 20 Jun 2018

Issue

When moving tsm controller to a different node after a node failure, using the move-tsm-controller script might fail due a timeout connecting to ZooKeeper on the failed node:

Failed to move controller to the specified node. Please check the node connection and retry. 
Generic exception raised during installation. 
java.lang.RuntimeException: java.util.concurrent.TimeoutException: Timed out while waiting for ZooKeeper connection.

Environment

  • Tableau Server 10.5.4, 2018.1.1
  • Linux (CentOS)

Resolution

As a Tableau Server Administrator, upgrade to Tableau Server 2018.1.2, 10.5.5 or a later version. For more information, see Upgrading Tableau Server in Tableau Help. 

Cause

Timeout settings in the affected versions cause the script to attempt to connect to ZooKeeper on all nodes for an extended period of time, eventually leading to a timeout failure when it cannot connect to the failed node in a reasonable amount of time.

Additionally, this behavior is related to a known issue (ID: 776691) which has been fixed in a recent release of Tableau Server.

Additional Information

Due to the nature of ZooKeeper, the move-tsm-controller script can only succeed if a majority of ZooKeeper nodes are still up and running. For example, if two nodes in a three node environment are down, the script will fail regardless.
Did this article resolve the issue?