Follower logs to troubleshoot for container not ready

joalmaraz · June 18, 2020, 6:55am

I am having issues when configuring a follower in an OpenShift cluster and the pod logs don’t say much. I suspect it is related with the SSL certificates from the Conjur Master sitting outside of the OpenShift cluster, but, it has been hard to find the right logs to determine the root-cause.

Currently I only see the readiness probe failing:

Readiness probe failed: Get https://10.129.2.19:443/health: dial tcp 10.129.2.19:443: connect: connection refused

And the pod logs:

System error
2020-06-18T06:45:01.000+00:00 conjur-follower-5f9dbbf655-48zrm CRON[2548]: PAM audit_log_acct_message() failed: Operation not permitted
2020-06-18T06:45:01.000+00:00 conjur-follower-5f9dbbf655-48zrm CRON[2548]: System error

System error
2020-06-18T06:55:01.000+00:00 conjur-follower-5f9dbbf655-48zrm CRON[2550]: PAM audit_log_acct_message() failed: Operation not permitted
2020-06-18T06:55:01.000+00:00 conjur-follower-5f9dbbf655-48zrm CRON[2550]: System error

Any tips on the right logs I should look for in this troubleshoot ?

Thanks in advance.

Cheers,

Jose

joalmaraz · June 18, 2020, 8:31am

This is the full log from a follower pod startup.
No clear information on the root-cause of the startup not being successful.

Starting follower services…
Joined session keyring: 554612560
*** Running /etc/my_init.d/00_regen_ssh_host_keys.sh…
*** Running /etc/my_init.d/01-clear-run.sh…
*** Running /etc/my_init.d/10_local_hosts.rb…
*** Running /etc/my_init.d/10_syslog-ng.init…
2020-06-18T08:30:12.231+00:00 conjur-follower-5f9dbbf655-c4xdh syslog-ng[19]: syslog-ng starting up; version=‘3.13.2’
*** Running /etc/my_init.d/dhgen.sh…
*** Booting runit daemon…
*** Runit started as PID 27

exec conjur-plugin-logger etcd
2020-06-18T08:30:12.000+00:00 conjur-follower-5f9dbbf655-c4xdh cron[43]: (CRON) INFO (pidfile fd = 3)
2020-06-18T08:30:12.000+00:00 conjur-follower-5f9dbbf655-c4xdh cron[43]: (CRON) INFO (Running @reboot jobs)
2020-06-18T08:30:12.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-info: [2020-06-18 08:30:18] INFO WEBrick 1.4.2
2020-06-18T08:30:18.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-info: [2020-06-18 08:30:18] INFO ruby 2.5.5 (2019-03-15) [x86_64-linux-gnu]
2020-06-18T08:30:18.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-info: [2020-06-18 08:30:18] INFO WEBrick::HTTPServer#start: pid=51 port=5611
2020-06-18T08:30:12.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-seed: [2020-06-18 08:30:18] INFO WEBrick 1.4.2
2020-06-18T08:30:18.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-seed: [2020-06-18 08:30:18] INFO ruby 2.5.5 (2019-03-15) [x86_64-linux-gnu]
2020-06-18T08:30:18.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-seed: [2020-06-18 08:30:18] INFO WEBrick::HTTPServer#start: pid=56 port=5612
2020-06-18T08:30:12.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-health: [2020-06-18 08:30:18] INFO WEBrick 1.4.2
2020-06-18T08:30:18.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-health: [2020-06-18 08:30:18] INFO ruby 2.5.5 (2019-03-15) [x86_64-linux-gnu]
2020-06-18T08:30:18.000+00:00 conjur-follower-5f9dbbf655-c4xdh evoke-health: [2020-06-18 08:30:18] INFO WEBrick::HTTPServer#start: pid=55 port=5610
2020-06-18 08:30:20.499 UTC [156] LOG: database system was shut down at 2020-06-18 08:30:20 UTC
2020-06-18 08:30:20.502 UTC [156] LOG: MultiXact member wraparound protections are now enabled
2020-06-18 08:30:20.504 UTC [63] LOG: database system is ready to accept connections
2020-06-18 08:30:20.506 UTC [160] LOG: autovacuum launcher started

nathan.whipple · June 18, 2020, 2:25pm

Hi Jose,

You can get a little more insight into what is going on by looking at the logs for the init container (usually named “authenticator”, but you might have a different name in your deployment). If that doesn’t get you unblocked, then you might need to turn on debug logging on the master. Before going that far though, check to make sure that you’ve initialized the CA for the authenticator, that the follower pod is authorized to use both the authn-k8s webservice as well as the seed service, and that you aren’t doing SSL termination on the load balancer sitting in front of the master (assuming you are using a load balancer). That last one is a common one that trips folks up. When the LB does the SSL offloading, it inadvertently strips the CSR used by our authenticator as well.

I’d also suggest using the latest versions of the appliance and seed-fetcher init container if you aren’t already as we’ve made some improvements in the logging facilities that make troubleshooting a little easier. I hope that helps get you going. Please be sure to let us know what you find as the root cause!

Regards,
Nate

joalmaraz · June 24, 2020, 5:21am

Hi @nathan.whipple, thanks for your reply. I’ll go through it and post more details when possible. I am not using an LB in front of the Conjur master just yet, but will be mindful of that in the future.

From the authenticator pod logs I only see: WARN: Seed URL not found - assuming seedfile exists on the follower!

That is expected as in the script I cloned from https://github.com/cyberark/kubernetes-conjur-deploy, I have pointed to a local file.

joalmaraz · June 24, 2020, 5:42am

Just checking that the authenticator configuration is correct:

docker exec dap \

chpst -u conjur conjur-plugin-service possum \
  rake authn_k8s:ca_init["conjur/authn-k8s/openshift"]

/root is not writable.
Bundler will use `/tmp/bundler/home/unknown’ as your home directory temporarily.
Rails Error: Unable to access log file. Please ensure that /opt/conjur/possum/log/appliance.log exists and is writable (ie, make it writable for user and group: chmod 0664 /opt/conjur/possum/log/appliance.log). The log level has been raised to WARN and the output directed to STDERR until the problem is fixed.
Populated CA and Key of service conjur/authn-k8s/openshift
To print values:
conjur variable value conjur/authn-k8s/openshift/ca/cert
conjur variable value conjur/authn-k8s/openshift/ca/key

docker exec dap bash -c
‘echo CONJUR_AUTHENTICATORS="authn,authn-k8s/openshift" >>
/opt/conjur/etc/conjur.conf &&
sv restart conjur’

root@ad1bb0ce48dd:/# cat /opt/conjur/etc/conjur.conf
CONJUR_ACCOUNT=admin
ENABLED=true
LOG_LEVEL=warn
TRUSTED_PROXIES=127.0.0.1/32
DATABASE_URL=postgres:///conjur

CONJUR_AUTHENTICATORS=“authn,authn-k8s/openshift”

nathan.whipple · June 24, 2020, 12:51pm

Hi Jose,

I just noticed the above. I’m pretty confident our CA will not issue certificates with IP addresses in the name. Assuming you’re using self-signed certificates, I wouldn’t expect the SSL handshake on this health probe to work. Can you confirm if the certificate for the follower has IP addresses in the subject or subject alternate names? Does the health probe work if the name in the follower certificate is used for the probe instead?

Nate

Topic		Replies	Views
Health check fails on Openshift Guides and HowTo's	1	1023	August 14, 2020
Conjur follower installation in minikube cluster Secrets Management - Conjur, Secrets Hub & CP kubernetes	4	990	August 1, 2020
Evoke configure follower hangs (journal logs don't say much) Secrets Management - Conjur, Secrets Hub & CP	11	2496	April 30, 2020
Conjur postgresql.conf update Conjur Enterprise	1	540	March 24, 2021
Error authenticator.go:185: CAKC029 Received invalid response to certificate signing request. Reason: status code 401 Conjur Enterprise openshift , kubernetes , conjur	5	49	November 6, 2024

Follower logs to troubleshoot for container not ready

Related topics