Actions

pre-refresh-check (optional)

Before the user runs juju refresh, they should run this action on the leader unit. The leader unit will run pre-refresh health checks (e.g. backup created) & preparations (e.g. switch primary).

Optional: In the user documentation, this step will not be marked as optional (since it improves the safety of the refresh—especially on Kubernetes). However, since forgetting to run the action is a common mistake (it has already happened on a production PostgreSQL charm), it is not required.

This action will fail if run before a rollback.

actions.yaml

pre-refresh-check:
  description: Check if charm is ready to refresh
  additionalProperties: false

Unit tearing down

$ juju run postgresql-k8s/leader pre-refresh-check
Running operation 1 with 1 task
  - task 2 on unit-postgresql-k8s-0

Waiting for task 2...
Action id 2 failed: Unit tearing down

Refresh in progress

Action id 2 failed: Refresh already in progress

Non-leader unit

Action id 2 failed: Must run action on leader unit. (e.g. `juju run postgresql-k8s/leader pre-refresh-check`)

Health checks & preparations failed

Action id 2 failed: Charm is not ready for refresh. Pre-refresh check failed: Backup in progress

Health checks & preparations succeeded

Kubernetes

$ juju run postgresql-k8s/leader pre-refresh-check
Running operation 1 with 1 task
  - task 2 on unit-postgresql-k8s-0

Waiting for task 2...
result: |-
  Charm is ready for refresh. For refresh instructions, see https://charmhub.io/postgresql-k8s/docs/refresh/16/1.19.0
  After the refresh has started, use this command to rollback (copy this down in case you need it later):
  `juju refresh postgresql-k8s --revision 602 --resource postgresql-image=registry.jujucharms.com/charm/kotcfrohea62xreenq1q75n1lyspke0qkurhk/postgresql-image@sha256:e53eb99abd799526bb5a5e6c58180ee47e2790c95d433a1352836aa27d0914a4`

Machines

result: |-
  Charm is ready for refresh. For refresh instructions, see https://charmhub.io/postgresql/docs/refresh/16/1.19.0
  After the refresh has started, use this command to rollback:
  `juju refresh postgresql --revision 602`

force-refresh-start

If the refresh is incompatible, the automatic pre-refresh health checks & preparations fail, or the refresh is to a workload version not supported by Canonical, the user will be prompted to rollback. If they accept potential data loss & downtime and want to proceed anyways (e.g. to force a downgrade), the user can run this action on the first unit to refresh.

After this action is run and the first unit’s workload refreshes (machines) or attempts to start (Kubernetes), the compatibility, pre-refresh, and workload support checks will not run again (unless the user runs juju refresh [and if juju refresh is a rollback, the pre-refresh and workload support checks will still not run again]).

actions.yaml

force-refresh-start:
  description: |
    Potential of data loss and downtime

    Force refresh of first unit

    Must run with at least one of the parameters `=false`
  params:
    check-compatibility:
      type: boolean
      default: true
      description: | (1)
        Potential of data loss and downtime

        If `false`, force refresh if new version of PostgreSQL and/or charm is not compatible with previous version
    run-pre-refresh-checks:
      type: boolean
      default: true
      description: |
        Potential of data loss and downtime

        If `false`, force refresh if app is unhealthy or not ready to refresh (and unit status shows "Pre-refresh check failed")
    check-workload-container:
      type: boolean
      default: true
      description: | (1)
        Potential of data loss and downtime during and after refresh

        If `false`, allow refresh to PostgreSQL container version that has not been validated to work with the charm revision
  additionalProperties: false

1	`PostgreSQL` will be replaced with the upstream workload name

Unit tearing down

$ juju run postgresql-k8s/2 force-refresh-start
Running operation 1 with 1 task
  - task 2 on unit-postgresql-k8s-2

Waiting for task 2...
Action id 2 failed: Unit tearing down

Unit not first to refresh

Action id 2 failed: Must run action on unit 2

Refresh not in progress

Action id 2 failed: No refresh in progress

(Machines only) Not possible to determine if a refresh is in progress

Action id 2 failed: Determining if a refresh is in progress. Check `juju status` and consider retrying this action

Without 1+ parameters as false

$ juju run postgresql-k8s/2 force-refresh-start
Running operation 1 with 1 task
  - task 2 on unit-postgresql-k8s-2

Waiting for task 2...
Action id 2 failed: Must run with at least one of `check-compatibility`, `run-pre-refresh-checks`, or `check-workload-container` parameters `=false`

First unit to refresh has outdated charm code

Kubernetes

Action id 2 failed: Unit 2 is outdated and waiting for its pod to be updated by Kubernetes

Machines

Action id 2 failed: This unit is waiting for a Juju upgrade-charm or config-changed event. See `juju debug-log`

Refresh already started

Action id 2 failed: Unit 2 already refreshed

With 1+ parameters as false

Part 1: check-workload-container

check-workload-container=false

$ juju run postgresql-k8s/2 force-refresh-start [...] check-workload-container=false
Running operation 1 with 1 task
  - task 2 on unit-postgresql-k8s-2

Waiting for task 2...
12:15:34 Skipping check that refresh is to PostgreSQL container version that has been validated to work with the charm revision

check-workload-container=true and check succeeded

$ juju run postgresql-k8s/2 force-refresh-start [...]
Running operation 1 with 1 task
  - task 2 on unit-postgresql-k8s-2

Waiting for task 2...
12:15:34 Checked that refresh is to PostgreSQL container version that has been validated to work with the charm revision

check-workload-container=true and check failed

$ juju run postgresql-k8s/2 force-refresh-start [...]
Running operation 1 with 1 task
  - task 2 on unit-postgresql-k8s-2

Waiting for task 2...
Action id 2 failed: Refresh is to PostgreSQL container version that has not been validated to work with the charm revision. Rollback by running `juju refresh postgresql-k8s --revision 602 --resource postgresql-image=registry.jujucharms.com/charm/kotcfrohea62xreenq1q75n1lyspke0qkurhk/postgresql-image@sha256:e53eb99abd799526bb5a5e6c58180ee47e2790c95d433a1352836aa27d0914a4`

Part 2: check-compatibility

check-compatibility=false

$ juju run postgresql-k8s/2 force-refresh-start [...] check-compatibility=false
[...]  # check-workload-container
12:15:34 Skipping check for compatibility with previous PostgreSQL version and charm revision

check-compatibility=true and check succeeded

$ juju run postgresql-k8s/2 force-refresh-start [...]
[...]  # check-workload-container
12:15:34 Checked that refresh from previous PostgreSQL version and charm revision to current versions is compatible

check-compatibility=true and check failed

$ juju run postgresql-k8s/2 force-refresh-start [...]
[...]  # check-workload-container

Kubernetes

Action id 2 failed: Refresh incompatible. Rollback by running `juju refresh postgresql-k8s --revision 602 --resource postgresql-image=registry.jujucharms.com/charm/kotcfrohea62xreenq1q75n1lyspke0qkurhk/postgresql-image@sha256:e53eb99abd799526bb5a5e6c58180ee47e2790c95d433a1352836aa27d0914a4`

Machines

Action id 2 failed: Refresh incompatible. Rollback with `juju refresh`

Part 3: run-pre-refresh-checks

run-pre-refresh-checks=false

$ juju run postgresql-k8s/2 force-refresh-start [...] run-pre-refresh-checks=false
[...]  # check-workload-container
[...]  # check-compatibility
12:15:39 Skipping pre-refresh checks

run-pre-refresh-checks=true and checks succeeded

$ juju run postgresql-k8s/2 force-refresh-start [...] run-pre-refresh-checks=false
[...]  # check-workload-container
[...]  # check-compatibility
12:15:34 Running pre-refresh checks
12:15:39 Pre-refresh checks successful

run-pre-refresh-checks=true and check failed

$ juju run postgresql-k8s/2 force-refresh-start [...] run-pre-refresh-checks=false
[...]  # check-workload-container
[...]  # check-compatibility
12:15:34 Running pre-refresh checks

Kubernetes

Action id 2 failed: Pre-refresh check failed: Backup in progress. Rollback by running `juju refresh postgresql-k8s --revision 602 --resource postgresql-image=registry.jujucharms.com/charm/kotcfrohea62xreenq1q75n1lyspke0qkurhk/postgresql-image@sha256:e53eb99abd799526bb5a5e6c58180ee47e2790c95d433a1352836aa27d0914a4`

Machines

Action id 2 failed: Pre-refresh check failed: Backup in progress. Rollback with `juju refresh`

Part 4: All checks succeeded or were skipped

$ juju run postgresql-k8s/2 force-refresh-start [...]
[...]  # check-workload-container
[...]  # check-compatibility
[...]  # run-pre-refresh-checks

Kubernetes

result: PostgreSQL refreshed on unit 2. Starting PostgreSQL on unit 2

Machines

12:15:39 Refreshing unit 2

result: Refreshed unit 2

resume-refresh

After the user runs juju refresh, if pause_after_unit_refresh is set to all or first, the refresh will pause.

The user is expected to manually check that refreshed units are healthy and that clients connected to the refreshed units are healthy. For example, the user could check that the transactions per second, over a period of several days, are similar on refreshed and non-refreshed units. These manual checks supplement the automatic checks in the charm. (If the automatic checks fail, the charm will pause the refresh regardless of the value of pause_after_unit_refresh.)

When the user is ready to continue the refresh, they should run this action.

actions.yaml

resume-refresh:
  description: |
    Refresh next unit(s) (after you have manually verified that refreshed units are healthy)

    If the `pause_after_unit_refresh` config is set to `all`, this action will refresh the next unit.

    If `pause_after_unit_refresh` is set to `first`, this action will refresh all remaining units.
    Exception: if automatic health checks fail after a unit has refreshed, the refresh will pause.

    If `pause_after_unit_refresh` is set to `none`, this action will have no effect unless it is called with `check-health-of-refreshed-units` as `false`.
  params:
    check-health-of-refreshed-units:
      type: boolean
      default: true
      description: |
        Potential of data loss and downtime

        If `false`, force refresh (of next unit) if 1 or more refreshed units are unhealthy

        Warning: if first unit to refresh is unhealthy, consider running `force-refresh-start` action on that unit instead of using this parameter.
        If first unit to refresh is unhealthy because compatibility checks, pre-refresh checks, or workload container checks are failing, this parameter is more destructive than the `force-refresh-start` action.
  additionalProperties: false

Which unit the action is run on

Kubernetes

On Kubernetes, the user should run this action on the leader unit.

If the StatefulSet partition is lowered and then quickly raised, the Juju agent may hang. This is a Juju bug: https://bugs.launchpad.net/juju/+bug/2073473. To avoid a race condition, only the leader unit lowers the partition. (If that bug were resolved, this action could be run on any unit.)

To improve the robustness of rollbacks, this action runs on the leader unit instead of the next unit to refresh. If a unit is refreshed to an incorrect or buggy charm code version, its charm code may raise an uncaught exception and may not be able to process this action to rollback its unit. (The improvement in robustness comes from this action running on a unit that is different from the unit that needs to rollback.) This is different from machines, where the charm code is rolled back separately from the workload and the charm code on a unit needs to run to rollback the workload (i.e. snap) for that unit.

If the charm code on the leader unit raises an uncaught exception, the user can manually patch (e.g. using kubectl) the StatefulSet partition to rollback the leader unit (after juju refresh has been run to start the rollback). From the perspective of the refresh design, if the user is instructed properly, this is safe (since it uses the same mechanism as a normal rollback). However, any rollback has risk and there may be additional risk if the leader unit did something (e.g. modified a relation databag in a previous Juju event) before it raised an uncaught exception.

Machines

On machines, the user should run this action on the next unit to refresh. That unit is shown in the app status.

This improves the robustness of rollbacks by requiring only the charm code on the unit that is rolling back to be healthy (i.e. not raising an uncaught exception). (If the action was run on the leader unit, rolling back a unit would require the charm code on both the leader unit & the unit rolling back to be healthy.)

If check-health-of-refreshed-units=true (default), a unit rolling back will also check that units that have already rolled back are healthy.

In case a refreshed unit is unhealthy and the user wants to force the refresh to continue, check-health-of-refreshed-units=false allows the user to run this action on any unit that is not up-to-date—so that they can skip over the unhealthy unit. However, the user should be instructed to follow the refresh order (usually highest to lowest unit number) even though they have the power to refresh any unit that is not up-to-date.

Unit tearing down

$ juju run postgresql-k8s/leader resume-refresh
Running operation 1 with 1 task
  - task 2 on unit-postgresql-k8s-0

Waiting for task 2...
Action id 2 failed: Unit tearing down

Refresh not in progress

Action id 2 failed: No refresh in progress

(Machines only) Not possible to determine if a refresh is in progress

Action id 2 failed: Determining if a refresh is in progress. Check `juju status` and consider retrying this action

Incorrect unit

Kubernetes

Action id 2 failed: Must run action on leader unit. (e.g. `juju run postgresql-k8s/leader resume-refresh`)

Machines

check-health-of-refreshed-units=true

Action id 2 failed: Must run action on unit 1

check-health-of-refreshed-units=false and unit already up-to-date

Action id 2 failed: Unit already refreshed

check-health-of-refreshed-units=false

Kubernetes

$ juju run postgresql-k8s/leader resume-refresh check-health-of-refreshed-units=false
Running operation 1 with 1 task
  - task 2 on unit-postgresql-k8s-0

Waiting for task 2...
12:15:39 Ignoring health of refreshed units

result: Attempting to refresh unit 1

"Attempting to" is included because on Kubernetes we only control the partition, not which units refresh. Kubernetes may not refresh a unit even if the partition allows it (e.g. if the charm container of a higher unit is not ready).

Machines

$ juju run postgresql/1 resume-refresh check-health-of-refreshed-units=false
Running operation 1 with 1 task
  - task 2 on unit-postgresql-1

Waiting for task 2...
12:15:39 Ignoring health of refreshed units
12:15:39 Refreshing unit 1

result: Refreshed unit 1

check-health-of-refreshed-units=true

pause_after_unit_refresh is none

Action id 2 failed: `pause_after_unit_refresh` config is set to `none`. This action is not applicable.

Refresh not started

(Refresh is incompatible, the automatic pre-refresh health checks & preparations failed, or the refresh is to a workload version not supported by Canonical—and force-refresh-start has not been successfully run)

Action id 2 failed: Unit 2 is unhealthy. Refresh will not resume.

1+ refreshed units are unhealthy

Action id 2 failed: Unit 2 is unhealthy. Refresh will not resume.

Refresh successfully resumed

pause_after_unit_refresh is first

Kubernetes

result: Refresh resumed. Unit 1 is refreshing next

Machines

12:15:39 Refresh resumed. Refreshing unit 1

result: Refresh resumed. Unit 1 has refreshed

pause_after_unit_refresh is all

Kubernetes

result: Unit 1 is refreshing next

Machines

12:15:39 Refreshing unit 1

result: Refreshed unit 1