Actions
pre-refresh-check (optional)
Before the user runs juju refresh
, they should run this action on the leader unit.
The leader unit will run pre-refresh health checks (e.g. backup created) & preparations (e.g. switch primary).
Optional: In the user documentation, this step will not be marked as optional (since it improves the safety of the refresh—especially on Kubernetes). However, since forgetting to run the action is a common mistake (it has already happened on a production PostgreSQL charm), it is not required.
This action will fail if run before a rollback.
pre-refresh-check:
description: Check if charm is ready to refresh
additionalProperties: false
Unit tearing down
$ juju run postgresql-k8s/leader pre-refresh-check
Running operation 1 with 1 task
- task 2 on unit-postgresql-k8s-0
Waiting for task 2...
Action id 2 failed: Unit tearing down
Non-leader unit
Action id 2 failed: Must run action on leader unit. (e.g. `juju run postgresql-k8s/leader pre-refresh-check`)
Health checks & preparations failed
Action id 2 failed: Charm is not ready for refresh. Pre-refresh check failed: Backup in progress
Health checks & preparations succeeded
Kubernetes
$ juju run postgresql-k8s/leader pre-refresh-check
Running operation 1 with 1 task
- task 2 on unit-postgresql-k8s-0
Waiting for task 2...
result: |-
Charm is ready for refresh. For refresh instructions, see https://charmhub.io/postgresql-k8s/docs/refresh/16/1.19.0
After the refresh has started, use this command to rollback (copy this down in case you need it later):
`juju refresh postgresql-k8s --revision 602 --resource postgresql-image=registry.jujucharms.com/charm/kotcfrohea62xreenq1q75n1lyspke0qkurhk/postgresql-image@sha256:e53eb99abd799526bb5a5e6c58180ee47e2790c95d433a1352836aa27d0914a4`
force-refresh-start
If the refresh is incompatible, the automatic pre-refresh health checks & preparations fail, or the refresh is to a workload version not supported by Canonical, the user will be prompted to rollback. If they accept potential data loss & downtime and want to proceed anyways (e.g. to force a downgrade), the user can run this action on the first unit to refresh.
After this action is run and the first unit’s workload refreshes (machines) or attempts to start (Kubernetes), the compatibility, pre-refresh, and workload support checks will not run again (unless the user runs juju refresh
[and if juju refresh
is a rollback, the pre-refresh and workload support checks will still not run again]).
force-refresh-start:
description: |
Potential of data loss and downtime
Force refresh of first unit
Must run with at least one of the parameters `=false`
params:
check-compatibility:
type: boolean
default: true
description: | (1)
Potential of data loss and downtime
If `false`, force refresh if new version of PostgreSQL and/or charm is not compatible with previous version
run-pre-refresh-checks:
type: boolean
default: true
description: |
Potential of data loss and downtime
If `false`, force refresh if app is unhealthy or not ready to refresh (and unit status shows "Pre-refresh check failed")
check-workload-container:
type: boolean
default: true
description: | (1)
Potential of data loss and downtime during and after refresh
If `false`, allow refresh to PostgreSQL container version that has not been validated to work with the charm revision
additionalProperties: false
1 | PostgreSQL will be replaced with the upstream workload name |
Unit tearing down
$ juju run postgresql-k8s/2 force-refresh-start
Running operation 1 with 1 task
- task 2 on unit-postgresql-k8s-2
Waiting for task 2...
Action id 2 failed: Unit tearing down
(Machines only) Not possible to determine if a refresh is in progress
Action id 2 failed: Determining if a refresh is in progress. Check `juju status` and consider retrying this action
Without 1+ parameters as false
$ juju run postgresql-k8s/2 force-refresh-start
Running operation 1 with 1 task
- task 2 on unit-postgresql-k8s-2
Waiting for task 2...
Action id 2 failed: Must run with at least one of `check-compatibility`, `run-pre-refresh-checks`, or `check-workload-container` parameters `=false`
First unit to refresh has outdated charm code
With 1+ parameters as false
Part 1: check-workload-container
check-workload-container=false
$ juju run postgresql-k8s/2 force-refresh-start [...] check-workload-container=false
Running operation 1 with 1 task
- task 2 on unit-postgresql-k8s-2
Waiting for task 2...
12:15:34 Skipping check that refresh is to PostgreSQL container version that has been validated to work with the charm revision
check-workload-container=true and check succeeded
$ juju run postgresql-k8s/2 force-refresh-start [...]
Running operation 1 with 1 task
- task 2 on unit-postgresql-k8s-2
Waiting for task 2...
12:15:34 Checked that refresh is to PostgreSQL container version that has been validated to work with the charm revision
check-workload-container=true and check failed
$ juju run postgresql-k8s/2 force-refresh-start [...]
Running operation 1 with 1 task
- task 2 on unit-postgresql-k8s-2
Waiting for task 2...
Action id 2 failed: Refresh is to PostgreSQL container version that has not been validated to work with the charm revision. Rollback by running `juju refresh postgresql-k8s --revision 602 --resource postgresql-image=registry.jujucharms.com/charm/kotcfrohea62xreenq1q75n1lyspke0qkurhk/postgresql-image@sha256:e53eb99abd799526bb5a5e6c58180ee47e2790c95d433a1352836aa27d0914a4`
Part 2: check-compatibility
check-compatibility=false
$ juju run postgresql-k8s/2 force-refresh-start [...] check-compatibility=false
[...] # check-workload-container
12:15:34 Skipping check for compatibility with previous PostgreSQL version and charm revision
check-compatibility=true and check succeeded
$ juju run postgresql-k8s/2 force-refresh-start [...]
[...] # check-workload-container
12:15:34 Checked that refresh from previous PostgreSQL version and charm revision to current versions is compatible
check-compatibility=true and check failed
$ juju run postgresql-k8s/2 force-refresh-start [...]
[...] # check-workload-container
Action id 2 failed: Refresh incompatible. Rollback by running `juju refresh postgresql-k8s --revision 602 --resource postgresql-image=registry.jujucharms.com/charm/kotcfrohea62xreenq1q75n1lyspke0qkurhk/postgresql-image@sha256:e53eb99abd799526bb5a5e6c58180ee47e2790c95d433a1352836aa27d0914a4`
Action id 2 failed: Refresh incompatible. Rollback with `juju refresh`
Part 3: run-pre-refresh-checks
run-pre-refresh-checks=false
$ juju run postgresql-k8s/2 force-refresh-start [...] run-pre-refresh-checks=false
[...] # check-workload-container
[...] # check-compatibility
12:15:39 Skipping pre-refresh checks
run-pre-refresh-checks=true and checks succeeded
$ juju run postgresql-k8s/2 force-refresh-start [...] run-pre-refresh-checks=false
[...] # check-workload-container
[...] # check-compatibility
12:15:34 Running pre-refresh checks
12:15:39 Pre-refresh checks successful
run-pre-refresh-checks=true and check failed
$ juju run postgresql-k8s/2 force-refresh-start [...] run-pre-refresh-checks=false
[...] # check-workload-container
[...] # check-compatibility
12:15:34 Running pre-refresh checks
Action id 2 failed: Pre-refresh check failed: Backup in progress. Rollback by running `juju refresh postgresql-k8s --revision 602 --resource postgresql-image=registry.jujucharms.com/charm/kotcfrohea62xreenq1q75n1lyspke0qkurhk/postgresql-image@sha256:e53eb99abd799526bb5a5e6c58180ee47e2790c95d433a1352836aa27d0914a4`
Action id 2 failed: Pre-refresh check failed: Backup in progress. Rollback with `juju refresh`
Part 4: All checks succeeded or were skipped
$ juju run postgresql-k8s/2 force-refresh-start [...]
[...] # check-workload-container
[...] # check-compatibility
[...] # run-pre-refresh-checks
result: PostgreSQL refreshed on unit 2. Starting PostgreSQL on unit 2
12:15:39 Refreshing unit 2 result: Refreshed unit 2
resume-refresh
After the user runs juju refresh
, if pause_after_unit_refresh is set to all
or first
, the refresh will pause.
The user is expected to manually check that refreshed units are healthy and that clients connected to the refreshed units are healthy. For example, the user could check that the transactions per second, over a period of several days, are similar on refreshed and non-refreshed units. These manual checks supplement the automatic checks in the charm. (If the automatic checks fail, the charm will pause the refresh regardless of the value of pause_after_unit_refresh.)
When the user is ready to continue the refresh, they should run this action.
resume-refresh:
description: |
Refresh next unit(s) (after you have manually verified that refreshed units are healthy)
If the `pause_after_unit_refresh` config is set to `all`, this action will refresh the next unit.
If `pause_after_unit_refresh` is set to `first`, this action will refresh all remaining units.
Exception: if automatic health checks fail after a unit has refreshed, the refresh will pause.
If `pause_after_unit_refresh` is set to `none`, this action will have no effect unless it is called with `check-health-of-refreshed-units` as `false`.
params:
check-health-of-refreshed-units:
type: boolean
default: true
description: |
Potential of data loss and downtime
If `false`, force refresh (of next unit) if 1 or more refreshed units are unhealthy
Warning: if first unit to refresh is unhealthy, consider running `force-refresh-start` action on that unit instead of using this parameter.
If first unit to refresh is unhealthy because compatibility checks, pre-refresh checks, or workload container checks are failing, this parameter is more destructive than the `force-refresh-start` action.
additionalProperties: false
Which unit the action is run on
Kubernetes
On Kubernetes, the user should run this action on the leader unit.
If the StatefulSet partition is lowered and then quickly raised, the Juju agent may hang. This is a Juju bug: https://bugs.launchpad.net/juju/+bug/2073473. To avoid a race condition, only the leader unit lowers the partition. (If that bug were resolved, this action could be run on any unit.)
To improve the robustness of rollbacks, this action runs on the leader unit instead of the next unit to refresh. If a unit is refreshed to an incorrect or buggy charm code version, its charm code may raise an uncaught exception and may not be able to process this action to rollback its unit. (The improvement in robustness comes from this action running on a unit that is different from the unit that needs to rollback.) This is different from machines, where the charm code is rolled back separately from the workload and the charm code on a unit needs to run to rollback the workload (i.e. snap) for that unit.
If the charm code on the leader unit raises an uncaught exception, the user can manually patch (e.g. using kubectl) the StatefulSet partition to rollback the leader unit (after juju refresh
has been run to start the rollback).
From the perspective of the refresh design, if the user is instructed properly, this is safe (since it uses the same mechanism as a normal rollback).
However, any rollback has risk and there may be additional risk if the leader unit did something (e.g. modified a relation databag in a previous Juju event) before it raised an uncaught exception.
Machines
On machines, the user should run this action on the next unit to refresh. That unit is shown in the app status.
This improves the robustness of rollbacks by requiring only the charm code on the unit that is rolling back to be healthy (i.e. not raising an uncaught exception). (If the action was run on the leader unit, rolling back a unit would require the charm code on both the leader unit & the unit rolling back to be healthy.)
If check-health-of-refreshed-units=true
(default), a unit rolling back will also check that units that have already rolled back are healthy.
In case a refreshed unit is unhealthy and the user wants to force the refresh to continue, check-health-of-refreshed-units=false
allows the user to run this action on any unit that is not up-to-date—so that they can skip over the unhealthy unit.
However, the user should be instructed to follow the refresh order (usually highest to lowest unit number) even though they have the power to refresh any unit that is not up-to-date.
Unit tearing down
$ juju run postgresql-k8s/leader resume-refresh
Running operation 1 with 1 task
- task 2 on unit-postgresql-k8s-0
Waiting for task 2...
Action id 2 failed: Unit tearing down
(Machines only) Not possible to determine if a refresh is in progress
Action id 2 failed: Determining if a refresh is in progress. Check `juju status` and consider retrying this action
Incorrect unit
check-health-of-refreshed-units=false
Kubernetes
$ juju run postgresql-k8s/leader resume-refresh check-health-of-refreshed-units=false
Running operation 1 with 1 task
- task 2 on unit-postgresql-k8s-0
Waiting for task 2...
12:15:39 Ignoring health of refreshed units
result: Attempting to refresh unit 1
"Attempting to" is included because on Kubernetes we only control the partition, not which units refresh. Kubernetes may not refresh a unit even if the partition allows it (e.g. if the charm container of a higher unit is not ready).
check-health-of-refreshed-units=true
pause_after_unit_refresh is none
Action id 2 failed: `pause_after_unit_refresh` config is set to `none`. This action is not applicable.
Refresh not started
(Refresh is incompatible, the automatic pre-refresh health checks & preparations failed, or the refresh is to a workload version not supported by Canonical—and force-refresh-start has not been successfully run)
Action id 2 failed: Unit 2 is unhealthy. Refresh will not resume.