Recently we had a restart of start of servers due to security linux patching. Post restart one of the standby container stopped working.
Upon investigation, we have observed the pg service was not running, however all the settings are as expected with respect to the PG settings.
We have rebuild the container on the standby node, below are the observations.
Point to be noted here : Added an additional server to the existing 3 servers in order to facilitate the Data centre recovery scenario, this server does the backups but not part of the cluster.
- Standby node has been rebuilt.
- On the master, generated the seed file for the standby node
- On the standby node, updated the pg settings (max_connections and wal_senders), initiated evoke configure standby
- On the standby node, ran the cluster enrolment command (before this removed the node from the master node cluster information)
- After this the PG settings roll backed to default, updated the changes and restarted pg service.
- Under /opt/conjur/etc/conjur.conf found that the node which is used for emergency standby got added, as a result we are getting below error due to inconsistency in the configurations across the cluster.
Unsure from where the below configuration is fetched, this is impacting the cluster on whole.
root@ceb55c9c2d46:/opt/conjur/etc# cat conjur.conf
Container logs :
<134>1 2021-05-31T06:15:11.000+00:00 3bb3f52ec1d8 etcd - - [meta sequenceId=“2095”] 2021-05-31 06:15:11.544575 E | rafthttp: request sent was ignored (cluster ID mismatch: peer[d4e897d5e29dbf11]=c16fccdd7461ad8d, local=38604ee41b6337a5)
- On the second standby node, this was not the case.
- On the master node, below is the setting
root@6eb130b51c68:/opt/conjur/etc# cat conjur.conf
/etc/chef/solo.json has the additional host added into the cluster member list as well.
Thanks and regards,