Backing up my data as a linux user

It’s a good habit to routinely back up your important data, and over the past few years, dozens of cloud storage/backup solutions have sprung up, many of which offer a good deal of free space. Before you even start looking for a backup solution, you need to sit down and think about what kind of data you’re looking to back up. Specifically, how much data you have, how often it is changed, and how desperately do you need to keep it safe?

I have a few kinds of data that I actively backup and check for integrity. (Don’t forget to verify your backups, or there isn’t any point in backing them up at all.) Here are all the kinds of data that might be on your computer:

  • Program code – irreplaceable, small size (~100MB), frequently updated
  • Documents, and personal configuration files – irreplaceable, small size (~100MB), regularly updated
  • Personal photos – mostly irreplaceable, large size (more than 10GB), append only
  • Server configuration and data – mostly irreplaceable, medium size (~1GB), regularly updated
  • Collected Media – replaceable if needed, medium size (~1GB), append only
  • System files – easily replaceable, medium size (~1GB), sometimes updated

Several backup solutions try to backup everything. This is not a good idea. First, there are a lot of files on your computer that are easily replaceable (system files) and others that you’d rather not keep in your backup archives (program files). Second, those solutions have no way of giving extra redundancy to the things that matter most, and less redundancy to things that matter less.

In addition to these files, here are some types of data that you might not usually think about backing up:

  • Email
  • RSS and Calendar data
  • Blog content
  • Social networking content

My backup solution is a mix consisting of free online version control sites, Google, Dropbox, and a personal file server. My code, documents (essays, forms, receipts), and configuration (bash, vim, keys, personal CA, wiki, profile pictures, etc..) are the most important part of my backup. I sync these with Insync to my Google Drive, where I’ve rented 100GB of cloud storage. My Google Drive is regularly backed up to my personal file server, with about 2 weeks of retention.

Disks and old computers are cheap. Get a high-availability file server set up in your home, and you can happily offload intensive tasks to it like virtual machines, backup services, and archival storage. Mine is configured with:

  • Two 1TB hard drives configured in RAID-1 mirroring
  • Excessive amounts of ram and processing power, for a headless server
  • Ubuntu Server installed with SMART monitoring, nagios, nginx (for some web services), a torrent client
  • Wake-on-lan capabilities

Backing up your Google Drive might sound funny to you, but it is a good precaution in case anything ever happens to your Google Account. Additionally, most of my program code is in either a public GitHub repository or a private BitBucket repository. Version control and social coding features like issues/pull requests give you additional benefits than simply backing up your code, and you should definitely be using some kind of VCS for any code you write.

For many of the projects that I am actively developing, I only use VCS and my file server. Git object data should not be backed up to cloud storage services like Google Drive because they change too often. My vim configuration is also stored on GitHub, to take advantage of git submodules for my vim plugins.

My personal photos are stored in Google+ Photos, formerly known as Picasa. They give you 15GB of shared storage for free, and if that’s not enough, additional space is cheap as dirt. My photos don’t have another level of redundancy like my code and configuration files do. They are less important to me, and Google can be trusted to sustain itself longer than any backup solution you create yourself.

I host a single VPS with Linode (that’s an affiliate link) that contains a good amount of irreplaceable data from my blogs and other services I host on it. Linode itself offers cheap and easy full-disk backups ($5/mo.) that I signed up for. Those backups aren’t intended for hardware failures so much as human error, because Linode already maintains high-availability redundant disk storage for all of its VPS nodes. Additionally, I backup the important parts of the server to my personal file server (/etc, /home, /srv, /var/log), for an extra level of redundancy.

Any pictures I collect from online news aggregators is dumped in my Google Drive and shares the same extra redundancy as my documents and personal configuration files. Larger media like videos are stored in one of my USB 3.0 flash drives, since they are regularly created and deleted.

I don’t back up system files, since Xubuntu is free and programs are only 1 package-manager command away. I don’t maintain extra redundancy for email for the same reason I don’t for photos.

A final thing to consider is the confidentiality of your backups. Whenever you upload data to a free public cloud storage service, you should treat the data as if it were being anonymously released to the public. In other words, personal data, cryptographic keys, and passwords should never be uploaded unencrypted to a public backup service. Things like PGP can help in this regard.