Database error after backup/restore

Hi,

although it is not well-described how to backup and restore your Conjur data, I figured this :
# Create backup
docker exec conjur_server rm /opt/conjur-server/tmp/pids/server.pid docker stop conjur_server
docker exec -ti -u postgres postgres_database postgres@xxxxxxxx:/ pg_dump postgres > /tmp/db.dump

# Restore backup
postgres@xxxxxxxx:/$ dropdb postgres
postgres@xxxxxxxx:/$ createdb postgres
postgres@xxxxxxxx:/$ psql postgres < /tmp/db.dump
postgres@xxxxxxxx:/$ exit
$ docker start conjur_server

This looks like it’s gonna work, since I’m able to backup my initial Conjur environment, then run my unittest that creates/updates/deletes all sorts of stuff, and then restore back to my initial Conjur environment again.
However, running my unittest once again yields an HTTP 500 on the REST interface.
‘docker logs conjur_server’ shows :

WARNING:  modifying data outside of policy load: BCIE:policy:ont/myfolder
CONTEXT:  SQL statement "DELETE FROM role_memberships rm
          WHERE rm.role_id = $1 AND
            member_id = $2 AND
            ownership = true"
PL/pgSQL function delete_role_membership_of_owner(text,text) line 5 at SQL statement
SQL statement "SELECT delete_role_membership_of_owner(OLD.resource_id, OLD.owner_id)"
PL/pgSQL function delete_role_membership_of_owner_trigger() line 3 at PERFORM
SQL statement "DELETE FROM ONLY "public"."resources" WHERE $1 OPERATOR(pg_catalog.=) "policy_id""
WARNING:  modifying data outside of policy load: BCIE:policy:ont/myfolder
CONTEXT:  SQL statement "DELETE FROM role_memberships rm
          WHERE rm.role_id = $1 AND
            member_id = $2 AND
            ownership = true"
PL/pgSQL function delete_role_membership_of_owner(text,text) line 5 at SQL statement
SQL statement "SELECT delete_role_membership_of_owner(OLD.resource_id, OLD.owner_id)"
PL/pgSQL function delete_role_membership_of_owner_trigger() line 3 at PERFORM
SQL statement "DELETE FROM ONLY "public"."resources" WHERE $1 OPERATOR(pg_catalog.=) "policy_id""
WARNING:  modifying data outside of policy load: BCIE:policy:ont/myfolder
WARNING:  modifying data outside of policy load: BCIE:policy:ont/myfolder
[origin=172.18.0.5] [req_id=8053f498-cc3e-4250-8a49-9a15cd434dd0] 
Sequel::ForeignKeyConstraintViolation (PG::ForeignKeyViolation: ERROR:  insert or update on table "resources_textsearch" violates foreign key constraint "resources_textsearch_resource_id_fkey"
DETAIL:  Key (resource_id)=(BCIE:host:ont/myfolder/mysubfolder/mysubfolder) is not present in table "resources".
CONTEXT:  SQL statement "UPDATE resources_textsearch rts
            SET textsearch = (
              SELECT r.tsvector FROM resources r
              WHERE r.resource_id = rts.resource_id
            ) WHERE resource_id = OLD.resource_id"
PL/pgSQL function annotation_update_textsearch() line 12 at SQL statement
):
  app/models/loader/types.rb:292:in `delete!'
  app/models/loader/orchestrate.rb:318:in `map'
  app/models/loader/orchestrate.rb:318:in `perform_deletion'
  app/models/loader/orchestrate.rb:83:in `load'
  app/controllers/policies_controller.rb:42:in `load_policy'
  app/controllers/policies_controller.rb:21:in `patch'
  app/controllers/application_controller.rb:68:in `block in run_with_transaction'
  app/controllers/application_controller.rb:67:in `run_with_transaction'

…which happens upon a policy delete where the policy has subpolicies.

I can’t explain why the ForeignKeyConstraintViolation occurs only after the database restore. If I do not restore the database, I can run my unittest succesfully over and over again.

Hopefully someone can help me out with this, since I cannot run Conjur in a production environment if I’m not able to backup and restore my secrets…

Thanks in advance for any help.
Pieter.

( I noticed the pre-formatting of the backup steps doesn’t work for some reason… )

More information : the ForeignKeyConstraintViolation reported :
DETAIL: Key (resource_id)=(BCIE:host:ont/myfolder/mysubfolder/mysubfolder) is not present in table "resources".

but querying the database shows it’s there :

postgres=# select * from resources where resource_id like 'BCIE:host:ont/myfolder/mysubfolder/mysubfolder';
                  resource_id                   |               owner_id               |         created_at         |        policy_id         
------------------------------------------------+--------------------------------------+----------------------------+--------------------------
 BCIE:host:ont/myfolder/mysubfolder/mysubfolder | BCIE:policy:ont/myfolder/mysubfolder | 2019-12-03 10:07:10.792919 | BCIE:policy:ont/myfolder

Hi Pieter,

Is this OSS or Conjur Enterprise (DAP)?

  • Nate

Hi Nate,

this is OSS Conjur

thanks !

To eliminate my code, I’ve created a testcase that shows the problem. Start from a fresh Conjur OSS install, where an account has been created and inited.

------------ TESTCASE ---------------
base.yml :
-----------

- !group
  id: bcie_conjur_admin_grp
  owner: !user admin
  annotations:
    Title: bcie_conjur_admin_grp
    UserName: bcie_conjur_admin_grp
    URL:
    Notes:

- !user
  id: bcie_conjur_admin
  owner: !group bcie_conjur_admin_grp
  annotations:
    Title: bcie_conjur_admin
    UserName: bcie_conjur_admin
    URL:
    Notes:

- !grant
  role: !group bcie_conjur_admin_grp
  members:
    - !user bcie_conjur_admin

- !policy
  id: ont
  owner: !group bcie_conjur_admin_grp
  annotations:
    Title: ont
    UserName: ont 
    URL:
    Notes:


one.yml :
-----------

- !policy
  id: myfolder
  annotations:
    Title: myfolder

  body:
    - !permit
      resource: !policy
      privileges: [ read, create, update, execute ]
      role: !group /bcie_conjur_admin_grp


two.yml :
-----------

- !policy
  id: mysubfolder
  annotations:
    Title: mysubfolder

  body:
    - !permit
      resource: !policy
      privileges: [ read, create, update, execute ]

    - !host
      id: mysubfolder
      annotations:
        account: mysubfolder
        comment: IPAS Instance

    - !permit
      resource: !policy
      privileges: [ read, create, update, execute ]
      role: !group /bcie_conjur_admin_grp


three.yml :
-----------

- !variable
  id: myfirstuser
  annotations:
    Title: 'mysubfolder'
    UserName: 'myseconduser'
    URL: ''
    Notes: ''

- !permit
  resource: !variable myfirstuser
  privileges: [ read, execute ]
  role: !host mysubfolder

- !permit
  resource: !variable myfirstuser
  privileges: [ read, update, execute ]
  role: !group /bcie_conjur_admin_grp


delete.yml
-----------

- !delete
  record: !policy myfolder




Steps to recreate ( from a fresh Conjur install with an initial account configured ):
----------------------------------------------------------------------------------

# Configure base structure
docker-compose exec client conjur authn login -u admin 
docker-compose exec -T client conjur policy load --replace root base.yml


# Backup base Conjur
docker exec -ti -u postgres postgres_database bash
postgres@1daf5413b127:$ pg_dump postgres > /tmp/db.bck
postgres@1daf5413b127:$ exit


# Do some mutations
docker-compose exec client conjur authn login -u bcie_conjur_admin
docker-compose exec -T client conjur policy load  ont one.yml
docker-compose exec -T client conjur policy load  ont/myfolder two.yml
docker-compose exec -T client conjur policy load  ont/myfolder/mysubfolder three.yml

# Show current Conjur environment
docker-compose exec client conjur list

# Remove entries
docker-compose exec -T client conjur policy load  --delete ont delete.yml

# Show current Conjur environment
docker-compose exec client conjur list

# Re-run testcase to show it can reiterate
docker-compose exec -T client conjur policy load  ont one.yml
docker-compose exec -T client conjur policy load  ont/myfolder two.yml
docker-compose exec -T client conjur policy load  ont/myfolder/mysubfolder three.yml
docker-compose exec -T client conjur policy load  --delete ont delete.yml

# Restore to base situation
docker exec conjur_server rm /opt/conjur-server/tmp/pids/server.pid
docker stop conjur_server

docker exec -ti -u postgres postgres_database bash
postgres@1daf5413b127:$ dropdb postgres
postgres@1daf5413b127:$ createdb postgres
postgres@1daf5413b127:$ psql postgres < /tmp/db.bck 
postgres@1daf5413b127:$ exit

docker start conjur_server

# Show current Conjur environment
docker-compose exec client conjur list

# Do some more mutations
docker-compose exec -T client conjur policy load  ont one.yml
docker-compose exec -T client conjur policy load  ont/myfolder two.yml
docker-compose exec -T client conjur policy load  ont/myfolder/mysubfolder three.yml
docker-compose exec -T client conjur policy load  --delete ont delete.yml

error: 500 Internal Server Error
---------------- END TESTCASE

Can somebody maybe comment on how I backup and restore the Conjur database ? I mean, is this basically a valid way to backup/restore ?

Thanks in advance on any help…

Hey @Pieter,

This sounds like a really interesting project and I would love to hear more about it.

The enterprise version of Conjur users a tool called Evoke, but Evoke is not part of Conjur open source, so we need to use the Postgres backup procedure instead.

See here what @Ryan wrote for details:
In order to back up that database you’d follow the normal procedures for backing up Postgres 9.3, except that you also need to have a backup of the data key. (the one the quick-start tells you to generate and then secure.) When you restore the database & restart the Conjur service, it’ll need that key in order to decrypt the database.

I hope this helps!

Thanks,
John

Hi John,

thanks for your update. I know about the data key. Maybe I didn’t describe it explicit enough, but in my testcase I mentioned : ‘( from a fresh Conjur install with an initial account configured )’… This means that the datakey is available as an environment variable. ( export CONJUR_DATA_KEY="$(< creds/data_key)" ).

Running my testcase shows the error upon policy deletion. Running all other kinds of dtabase mutations ( create, read, update ) all work fine, so I guess the decryption of the data is actually fine.

Hi Pieter,

Does it only do this when you try to load your delete.yml policy or any policy you try to load after backing up and restoring?

CC: @AndrewCopeland

Hi Jake,

Yes, after the restore, I can still succesfully read, create and update. It’s just the delete
that fails.
Also see my previous append where I stated this, and running my testcase will exactly show the issue.

Thanks

I’m still looking into this… all table , trigger and constraint definitions before and after the database restore are identical… The exception mentions it cannot find the resource_id in the resources table, but I can see it’s there, at least right after the exception… Can this be a timing problem ? Maybe it’s a Postgres issue…

I found a workaround that enables me to backup and restore the database by stopping postgres and then backup the volume that the postgres container exposes. My testcase continues to work after restoring data this way .

3 Likes

@Pieter it has been a while, but I wanted to update that we are actively looking into resolving this issue - we’ve filed this bug and we’re hoping to integrate a fix and release it soon.

Thank you for reporting!! I am sorry it took us so long to address this, but your stellar report has helped us to make good progress quickly.