Friday, November 25, 2016

What is XFS/filesystem corruption and whose fault is it?

Sometimes OSDs fail to start. One reason this can happen is that the OSD's data store is in a partition that is formatted with an XFS filesystem, and this filesystem has been corrupted in some way.

Theoretically, XFS is "corruption-proof" but there are several preconditions that must be fulfilled before this can be relied upon:

  1. disk controller must be in "write through" mode
  2. alternatively, if the controller is in "write back" mode, it must be equipped with a Battery Back Up (BBU)
  3. if relying on "write back" mode with BBU, the battery in BBU must be in good condition (and even if the BBU battery is in good condition, it can only preserve the filesystem journal for so long)
  4. disk caches should be disabled
  5. if using an SSD, it must be equipped with a "supercapacitor" to protect against loss of data in the SSD memory buffer (most "enterprise" SSDs have this)
  6. the filesystem must not be mounted with the "nobarrier" mount option ("it's against the RADOS guarantees and expectations for an OSD to go back in time after committing an update and *can* result in failure").

NOTE: be sure to read Questions 24-28 of the XFS FAQ for details on write cache, write barrier, and disk controller settings!

If you experience a power outage or other crash, and XFS filesystems fail to mount afterwards, please double-check that these conditions were fulfilled before opening a bug against XFS or any systems, such as Ceph, that rely on XFS filesystem consistency.

Also, in any bug you open, provide detailed information about your disk controller, how it is configured, presence of BBU and state of the battery, how long the power was out, etc.

No comments:

Post a Comment