KNOWLEDGE BASE

After A Node Fails, Deploying A New Topology Fails When Using --Ignore-Node-Status Flag


Published: 16 Nov 2020
Last Modified Date: 09 Mar 2021

Issue

Following a node failure, attempting to follow the instructions listed on the article below may encounter errors.
https://help.tableau.com/current/server/en-us/distrib_node_fail.htm

When running the command to remove the failed node from the coordination service ensemble, an error will occur that a timeout was exceeded waiting for Tableau Server to stop:

Exception occurred while starting the asynchronous job. Unable to determine if the job was started.
Resource Conflict: There is only one Tableau Server Coordination Service ensemble.


 

Environment

  • Tableau Server 2020.2.3

Resolution

Ensure there is a recent backup of Tableau Server cluster. At this point the cluster will need to be reinstalled to work around this issue. 
Please open a support case with Tableau Technical Support for additional information.

This issue is fixed in Tableau Server 2020.2.8, 2020.3.3, 2020.4.1 and later. Consider switching to the fixed version if possible.

Cause

Flag --ignore-node-status is not working as expected and will not allow a new coordination ensemble to be created or the old one removed when a node is down.
This is a known issue that is fixed in Tableau Server 2020.2.8, 2020.3.3 and later.
 

Additional Information

Tabadmincontroller logs will show status is waiting for nodeX (failed node) to have processes in a good state, indicating the --ignore-node-status flag is not behaving as expected. (Expected is that we would not wait for processes on the failed node).
Did this article resolve the issue?