Follower logs to troubleshoot for container not ready

I am having issues when configuring a follower in an OpenShift cluster and the pod logs don’t say much. I suspect it is related with the SSL certificates from the Conjur Master sitting outside of the OpenShift cluster, but, it has been hard to find the right logs to determine the root-cause.

Currently I only see the readiness probe failing:

Readiness probe failed: Get https://10.129.2.19:443/health: dial tcp 10.129.2.19:443: connect: connection refused

And the pod logs:

System error
2020-06-18T06:45:01.000+00:00 conjur-follower-5f9dbbf655-48zrm CRON[2548]: PAM audit_log_acct_message() failed: Operation not permitted
2020-06-18T06:45:01.000+00:00 conjur-follower-5f9dbbf655-48zrm CRON[2548]: System error

System error
2020-06-18T06:55:01.000+00:00 conjur-follower-5f9dbbf655-48zrm CRON[2550]: PAM audit_log_acct_message() failed: Operation not permitted
2020-06-18T06:55:01.000+00:00 conjur-follower-5f9dbbf655-48zrm CRON[2550]: System error

Any tips on the right logs I should look for in this troubleshoot ?

Thanks in advance.

Cheers,

Jose

This is the full log from a follower pod startup.
No clear information on the root-cause of the startup not being successful.


Starting follower services…
Joined session keyring: 554612560
*** Running /etc/my_init.d/00_regen_ssh_host_keys.sh…
*** Running /etc/my_init.d/01-clear-run.sh…
*** Running /etc/my_init.d/10_local_hosts.rb…
*** Running /etc/my_init.d/10_syslog-ng.init…
2020-06-18T08:30:12.231+00:00 conjur-follower-5f9dbbf655-c4xdh syslog-ng[19]: syslog-ng starting up; version=‘3.13.2’
*** Running /etc/my_init.d/dhgen.sh…
*** Booting runit daemon…
*** Runit started as PID 27

  • exec conjur-plugin-logger etcd
    2020-06-18T08:30:12.000+00:00 conjur-follower-5f9dbbf655-c4xdh cron[43]: (CRON) INFO (pidfile fd = 3)
    2020-06-18T08:30:12.000+00:00 conjur-follower-5f9dbbf655-c4xdh cron[43]: (CRON) INFO (Running @reboot jobs)
    2020-06-18T08:30:12.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-info: [2020-06-18 08:30:18] INFO WEBrick 1.4.2
    2020-06-18T08:30:18.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-info: [2020-06-18 08:30:18] INFO ruby 2.5.5 (2019-03-15) [x86_64-linux-gnu]
    2020-06-18T08:30:18.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-info: [2020-06-18 08:30:18] INFO WEBrick::HTTPServer#start: pid=51 port=5611
    2020-06-18T08:30:12.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-seed: [2020-06-18 08:30:18] INFO WEBrick 1.4.2
    2020-06-18T08:30:18.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-seed: [2020-06-18 08:30:18] INFO ruby 2.5.5 (2019-03-15) [x86_64-linux-gnu]
    2020-06-18T08:30:18.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-seed: [2020-06-18 08:30:18] INFO WEBrick::HTTPServer#start: pid=56 port=5612
    2020-06-18T08:30:12.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-health: [2020-06-18 08:30:18] INFO WEBrick 1.4.2
    2020-06-18T08:30:18.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-health: [2020-06-18 08:30:18] INFO ruby 2.5.5 (2019-03-15) [x86_64-linux-gnu]
    2020-06-18T08:30:18.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-health: [2020-06-18 08:30:18] INFO WEBrick::HTTPServer#start: pid=55 port=5610
    2020-06-18 08:30:20.499 UTC [156] LOG: database system was shut down at 2020-06-18 08:30:20 UTC
    2020-06-18 08:30:20.502 UTC [156] LOG: MultiXact member wraparound protections are now enabled
    2020-06-18 08:30:20.504 UTC [63] LOG: database system is ready to accept connections
    2020-06-18 08:30:20.506 UTC [160] LOG: autovacuum launcher started

Hi Jose,

You can get a little more insight into what is going on by looking at the logs for the init container (usually named “authenticator”, but you might have a different name in your deployment). If that doesn’t get you unblocked, then you might need to turn on debug logging on the master. Before going that far though, check to make sure that you’ve initialized the CA for the authenticator, that the follower pod is authorized to use both the authn-k8s webservice as well as the seed service, and that you aren’t doing SSL termination on the load balancer sitting in front of the master (assuming you are using a load balancer). That last one is a common one that trips folks up. When the LB does the SSL offloading, it inadvertently strips the CSR used by our authenticator as well.

I’d also suggest using the latest versions of the appliance and seed-fetcher init container if you aren’t already as we’ve made some improvements in the logging facilities that make troubleshooting a little easier. I hope that helps get you going. Please be sure to let us know what you find as the root cause!

Regards,
Nate

Hi @nathan.whipple, thanks for your reply. I’ll go through it and post more details when possible. I am not using an LB in front of the Conjur master just yet, but will be mindful of that in the future.

From the authenticator pod logs I only see: WARN: Seed URL not found - assuming seedfile exists on the follower!

That is expected as in the script I cloned from https://github.com/cyberark/kubernetes-conjur-deploy, I have pointed to a local file.

Just checking that the authenticator configuration is correct:

docker exec dap \

chpst -u conjur conjur-plugin-service possum \
  rake authn_k8s:ca_init["conjur/authn-k8s/openshift"]

/root is not writable.
Bundler will use `/tmp/bundler/home/unknown’ as your home directory temporarily.
Rails Error: Unable to access log file. Please ensure that /opt/conjur/possum/log/appliance.log exists and is writable (ie, make it writable for user and group: chmod 0664 /opt/conjur/possum/log/appliance.log). The log level has been raised to WARN and the output directed to STDERR until the problem is fixed.
Populated CA and Key of service conjur/authn-k8s/openshift
To print values:
conjur variable value conjur/authn-k8s/openshift/ca/cert
conjur variable value conjur/authn-k8s/openshift/ca/key

docker exec dap bash -c
‘echo CONJUR_AUTHENTICATORS=“authn,authn-k8s/openshift” >>
/opt/conjur/etc/conjur.conf &&
sv restart conjur’

root@ad1bb0ce48dd:/# cat /opt/conjur/etc/conjur.conf
CONJUR_ACCOUNT=admin
ENABLED=true
LOG_LEVEL=warn
TRUSTED_PROXIES=127.0.0.1/32
DATABASE_URL=postgres:///conjur

CONJUR_AUTHENTICATORS=“authn,authn-k8s/openshift”

Hi Jose,

I just noticed the above. I’m pretty confident our CA will not issue certificates with IP addresses in the name. Assuming you’re using self-signed certificates, I wouldn’t expect the SSL handshake on this health probe to work. Can you confirm if the certificate for the follower has IP addresses in the subject or subject alternate names? Does the health probe work if the name in the follower certificate is used for the probe instead?

  • Nate