2020-09-10

How to shut down a Ceph cluster

Shutting down a Ceph cluster is not just a matter of running poweroff on each node. If you do that, the cluster will get confused when you start to bring it up. The MONs will see only a subset of the OSDs, and will start marking the missing ones down and PGs will go into degraded state. This is because the cluster cannot by itself distinguish between a node that has been powered down and one that has been vaporized, or burnt to a crisp (for example).

It is of course possible to do a controlled shutdown of all the nodes and then bring the cluster back up without any bad effects. But the following procedure might not be it! Don't assume that it will work for your cluster!


Shutdown procedure

I don't advise anyone to use this procedure, because it might cause loss of precious data. Use the procedure AT YOUR OWN RISK!

  1. Stop the clients from using your cluster (this step is only necessary if you want to shutdown your whole cluster)
  2. Issue the command ceph -s and verify that the cluster is HEALTH_OK status
  3. ceph osd set noout
  4. ceph osd set nobackfill
  5. ceph osd set norecover
  6. On each node: systemctl stop ceph.target
  7. poweroff

Bring the cluster back up

I don't advise anyone to use this procedure, because it might cause loss of precious data. Use the procedure AT YOUR OWN RISK!

Once the cluster has been powered up, the OSD flags that were set to protect the cluster from itself while it is being powered down and back up again must be unset in reverse order:

  1. ceph osd unset norecover
  2. ceph osd unset nobackfill
  3. ceph osd unset noout

No comments:

Post a Comment