/~r/sysadmin/ Have you tried turning it on and off again?

Rsync exit status on error: 12 or 255?

If you try to use rsync to connect to a server that’s offline or blocking connections, you might notice sometimes you get a UNIX exit code of 12 and sometimes you get 255. This becomes a problem if you’re checking $? in a shell script, and you forget to handle one of these values properly. Sometimes, you get 12 consistently, and sometimes you get 255 consistently. Either way, nobody wants to deal with a flaky inconsistent exit code.

The problem is a bug (?) in older versions of rsync, where a race condition between rsync and the underlying SSH shell could cause the exit code to be indeterminate. This issue no longer occurs in newer versions of rsync, after a patch that refactored the part of rsync that generates the exit code (that’s a long link, who knows how soon it’ll break). The relevant part of the code is here:

int pid = wait_process(cleanup_child_pid, &status, WNOHANG);
if (pid == cleanup_child_pid) {
    status = WEXITSTATUS(status);
    if (status > code)
        code = exit_code = status;
}

Rsync normally exits with RERR_STREAMIO (12) when a fatal error occurs in the rsync protocol, like if the remote shell totally fails. But before exiting, rsync will check if the remote shell has also exited. The WNOHANG flag to waitpid() makes this a non-blocking operation, so a hung remote shell won’t freeze rsync as well.

The problem is that SSH’s exit code can replace our RERR_STREAMIO code, but only does some of the time, depending on whether SSH or rsync manages to exit first. If SSH exits first, then wait_process will reap SSH’s “255” exit code and report it to the user. If SSH exits later, then wait_process won’t find any dead children and the user receives “12” instead.

SSH is particularly unhelpful with its exit codes, because it can only return 0 (success) or 255 (any kind of error). Rsync, on the other hand, was programmed with a variety of helpful exit codes.

Newer versions of rsync no longer suffer this problem and will return 12 consistently.