Getting Standbys and followers to use LB hostname instead of master-specific hostname

We have a multi-master DAP cluster with one active master and two standbys. Several followers are running within Kubernetes clusters. Each master has its own hostname:

An alias record, “ca-dap-dev.neovest.com”, resolves to the hostname of the active master. It’s intended that standbys and followers, as well as users, can always use “ca-dap-dev.neovest.com” regardless of which of the 3 masters are currently active. At installation time, this was “ut-ca-master1.neovest.com”. Now, after an upgrade across the cluster to conjur-appliance 11.4.0, this has been changed to reflect the currently-active master: “ut-ca-master2.neovest.com”.

While this works great from a user perspective (I can hit “ca-dap-dev.neovest.com” in the browser and access the UI serviced by master2), the followers and standbys appear to be directly referencing “ut-ca-master1.neovest.com”, which has made it impossible to configure master1 as a standby and is causing failures within the followers.

From master2, I have copied the seed to master1 according to the documentation for configuring an HA cluster:

[cyberark@ut-ca-master2 ~]$ docker exec dap evoke seed standby ut-ca-master1.neovest.com | ssh cyberark@ut-ca-master1.neovest.com "docker exec -i dap evoke unpack seed -"

And on master1 attempted to configure it as a standby. This does not work, as master1 tries to replicate from itself instead of using the load balancer hostname to find the active master:

[cyberark@ut-ca-master1 ~]$ docker exec dap evoke configure standby
psql: could not connect to server: No route to host
Is the server running on host "ut-ca-master1.neovest.com" (10.2.0.156) and accepting
TCP/IP connections on port 5432?

The followers that were already running in Kubernetes have suddenly started to fail, as they seem to try to connect to master1 directly:

$ kubectl logs -f dap-follower-7649c48fd8-hrzpq
<131>1 2020-07-18T16:23:17.000+00:00 dap-follower-7649c48fd8-hrzpq postgres 21989 - [meta sequenceId="97"] [3-1] FATAL:  could not connect to the primary server: could not connect to server: Connection refused
<131>1 2020-07-18T16:23:17.000+00:00 dap-follower-7649c48fd8-hrzpq postgres 21989 - [meta sequenceId="98"] [3-2] 		Is the server running on host "ut-ca-master1.neovest.com" (10.2.0.156) and accepting
<131>1 2020-07-18T16:23:17.000+00:00 dap-follower-7649c48fd8-hrzpq postgres 21989 - [meta sequenceId="99"] [3-3] 		TCP/IP connections on port 5432?

This is strange, as the deployment manifest explicitly uses “ca-dap-dev.neovest.com”:

initContainers:
- env:
  - name: CONJUR_SEED_FILE_URL
    value: https://ca-dap-dev.neovest.com/configuration/neovest/seed/follower

Seeking help on how to configure standbys and followers to leverage the load balancer hostname to remedy these issues. Thanks in advance for any guidance. @jgarabedian

This was resolved by issuing a new SSL certificate to update the subject (ex. subject=loadbalancerdns)

Future upgrades should be able to complete with this new certificate.

1 Like

The default behavior when creating a seed file is that the master address is chosen from the CN value of the certificate subject. This can be overridden by passing a master address argument to your seed file creation. For example, evoke seed follower my.follower.fqdn my.master-lb.fqdn.

Note: Standbys participating in an auto-failover cluster MUST communicate with the initial master directly and not through the master load balancer or alias.

Note2: Followers and standbys not participating in an auto-failover cluster should communicate to the master via the master load balancer address or alias.

Note3: There is no override for the master address when working with the seed fetcher (auto-deploy of followers in OpenShift/Kubernetes). For this reason, we recommend always having the master LB address as the CN value in the certificate subject.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.