Stop services while creating snapshots during backup?

Avid Amoeba · edit-2 1 year ago

Stop services while creating snapshots during backup?

Avid Amoeba · edit-2 1 year ago

That’s the trivial scenario that we know won’t fail - stopping the service during snapshot. The scenario that I was asking people’s opinions on is not stopping the service during snapshot and what restoring from such backup would mean.

Let me contrast the two by completing your example:

docker start container
Time passes
Time to backup
docker stop container
Make your snapshot
docker start container
Time passes
Shit happens and restore from backup is needed
docker stop container
Restore from snapshot
docker start container

Now here’s the interesting scenario:

docker start container
Time passes
Time to backup
Make your snapshot
Time passes
Shit happens and restore from backup is needed
docker stop container
Restore from snapshot
docker start container

Notice that in the second scenario we are not stopping the container. The snapshot is taken while it’s live. This means databases and other files are open, likely actively being written to. Some files are likely only partially written. There are also likely various temporary lock files present. All of that is stored in the snapshot. When we restore from this snapshot and start the service it will see all of that. Contrast this with the trivial scenario when the service is stopped. Upon stopping it, all data is synced to disk, inflight database operations are completed or canceled, partial writes are completed or discarded, lock files are cleaned up. When we restore from such a snapshot and start the service, it will “think” it just starts from a clean stop, nothing extra to do. In the live snapshot scenario the service will have to do cleanup. For example it will have to decide what to do with existing lock files. Are they there because there’s another instance of the service that is running and writing to the database or did someone kill its process before it had the chance to go through its shutdown procedure. In the former case it might have to log an error and quit. In the other it would have to remove the lock files. And so on and so forth.

As for th effect of docker on any of this, whether you have docker stop container or systemctl stop service or pkill service the effects on the process and its data is all the same. In fact the docker and systemctl commands will result in a kill signal being sent to the process of the service anyway.

@null@slrpnk.net · 1 year ago

Oh I see – you’re asking a hypothetical.

The simple answer is that it’s a bad idea to take snapshots of running databases because at best they could be missing info and at worst they can corrupt.

The short answer: Don’t.