How to shut down a Ceph cluster

2020-09-10

How to shut down a Ceph cluster

Shutting down a Ceph cluster is not just a matter of running poweroff on each node. If you do that, the cluster will get confused when you start to bring it up. The MONs will see only a subset of the OSDs, and will start marking the missing ones down and PGs will go into degraded state. This is because the cluster cannot by itself distinguish between a node that has been powered down and one that has been vaporized, or burnt to a crisp (for example).

It is of course possible to do a controlled shutdown of all the nodes and then bring the cluster back up without any bad effects. But the following procedure might not be it! Don't assume that it will work for your cluster!

Shutdown procedure

I don't advise anyone to use this procedure, because it might cause loss of precious data. Use the procedure AT YOUR OWN RISK!

Stop the clients from using your cluster (this step is only necessary if you want to shutdown your whole cluster)
Issue the command ceph -s and verify that the cluster is HEALTH_OK status
ceph osd set noout
ceph osd set nobackfill
ceph osd set norecover
On each node: systemctl stop ceph.target
poweroff

Bring the cluster back up

I don't advise anyone to use this procedure, because it might cause loss of precious data. Use the procedure AT YOUR OWN RISK!

Once the cluster has been powered up, the OSD flags that were set to protect the cluster from itself while it is being powered down and back up again must be unset in reverse order:

ceph osd unset norecover
ceph osd unset nobackfill
ceph osd unset noout

Smithfarm - the Brain

Sections

2020-09-10