KNOWLEDGE BASE

After A Node Fails, Deploying A New Topology Fails When Using --Ignore-Node-Status Flag


Published: 16 Nov 2020
Last Modified Date: 18 Nov 2020

Issue

Following a node failure, attempting to follow the instructions listed on the article below may encounter errors.
https://help.tableau.com/current/server/en-us/distrib_node_fail.htm

When running the command to remove the failed node from the coordination service ensemble, an error will occur that a timeout was exceeded waiting for Tableau Server to stop:

Exception occurred while starting the asynchronous job. Unable to determine if the job was started.
Resource Conflict: There is only one Tableau Server Coordination Service ensemble.


 

Environment

  • Tableau Server 2020.2.3

Resolution

Ensure there is a recent backup of Tableau Server cluster. At this point the cluster will need to be reinstalled to work around this issue. 
Please open a support case with Tableau Technical Support for additional information.

Cause

This is a known issue that is being addressed by Tableau development team. As a result, the --ignore-node-status flag is not working as expected and will not allow a new coordination ensemble to be created or the old one removed when a node is down.
 

Additional Information

Tabadmincontroller logs will show status is waiting for nodeX (failed node) to have processes in a good state, indicating the --ignore-node-status flag is not behaving as expected. (Expected is that we would not wait for processes on the failed node).
Did this article resolve the issue?