Code.RogerHub » backup https://rogerhub.com/~r/code.rogerhub The programming blog at RogerHub Fri, 27 Mar 2015 23:04:04 +0000 en-US hourly 1 http://wordpress.org/?v=4.2.2 Backing up my data as a linux user https://rogerhub.com/~r/code.rogerhub/infrastructure/415/backing-up-my-data-as-a-linux-user/ https://rogerhub.com/~r/code.rogerhub/infrastructure/415/backing-up-my-data-as-a-linux-user/#comments Tue, 16 Jul 2013 23:10:19 +0000 https://rogerhub.com/~r/code.rogerhub/?p=415 It’s a good habit to routinely back up your important data, and over the past few years, dozens of cloud storage/backup solutions have sprung up, many of which offer a good deal of free space. Before you even start looking for a backup solution, you need to sit down and think about what kind of data you’re looking to back up. Specifically, how much data you have, how often it is changed, and how desperately do you need to keep it safe?

I have a few kinds of data that I actively backup and check for integrity. (Don’t forget to verify your backups, or there isn’t any point in backing them up at all.) Here are all the kinds of data that might be on your computer:

  • Program code – irreplaceable, small size (~100MB), frequently updated
  • Documents, and personal configuration files – irreplaceable, small size (~100MB), regularly updated
  • Personal photos – mostly irreplaceable, large size (more than 10GB), append only
  • Server configuration and data – mostly irreplaceable, medium size (~1GB), regularly updated
  • Collected Media – replaceable if needed, medium size (~1GB), append only
  • System files – easily replaceable, medium size (~1GB), sometimes updated

Several backup solutions try to backup everything. This is not a good idea. First, there are a lot of files on your computer that are easily replaceable (system files) and others that you’d rather not keep in your backup archives (program files). Second, those solutions have no way of giving extra redundancy to the things that matter most, and less redundancy to things that matter less.

In addition to these files, here are some types of data that you might not usually think about backing up:

  • Email
  • RSS and Calendar data
  • Blog content
  • Social networking content

My backup solution is a mix consisting of free online version control sites, Google, Dropbox, and a personal file server. My code, documents (essays, forms, receipts), and configuration (bash, vim, keys, personal CA, wiki, profile pictures, etc..) are the most important part of my backup. I sync these with Insync to my Google Drive, where I’ve rented 100GB of cloud storage. My Google Drive is regularly backed up to my personal file server, with about 2 weeks of retention.

Disks and old computers are cheap. Get a high-availability file server set up in your home, and you can happily offload intensive tasks to it like virtual machines, backup services, and archival storage. Mine is configured with:

  • Two 1TB hard drives configured in RAID-1 mirroring
  • Excessive amounts of ram and processing power, for a headless server
  • Ubuntu Server installed with SMART monitoring, nagios, nginx (for some web services), a torrent client
  • Wake-on-lan capabilities

Backing up your Google Drive might sound funny to you, but it is a good precaution in case anything ever happens to your Google Account. Additionally, most of my program code is in either a public GitHub repository or a private BitBucket repository. Version control and social coding features like issues/pull requests give you additional benefits than simply backing up your code, and you should definitely be using some kind of VCS for any code you write.

For many of the projects that I am actively developing, I only use VCS and my file server. Git object data should not be backed up to cloud storage services like Google Drive because they change too often. My vim configuration is also stored on GitHub, to take advantage of git submodules for my vim plugins.

My personal photos are stored in Google+ Photos, formerly known as Picasa. They give you 15GB of shared storage for free, and if that’s not enough, additional space is cheap as dirt. My photos don’t have another level of redundancy like my code and configuration files do. They are less important to me, and Google can be trusted to sustain itself longer than any backup solution you create yourself.

I host a single VPS with Linode (that’s an affiliate link) that contains a good amount of irreplaceable data from my blogs and other services I host on it. Linode itself offers cheap and easy full-disk backups ($5/mo.) that I signed up for. Those backups aren’t intended for hardware failures so much as human error, because Linode already maintains high-availability redundant disk storage for all of its VPS nodes. Additionally, I backup the important parts of the server to my personal file server (/etc, /home, /srv, /var/log), for an extra level of redundancy.

Any pictures I collect from online news aggregators is dumped in my Google Drive and shares the same extra redundancy as my documents and personal configuration files. Larger media like videos are stored in one of my USB 3.0 flash drives, since they are regularly created and deleted.

I don’t back up system files, since Xubuntu is free and programs are only 1 package-manager command away. I don’t maintain extra redundancy for email for the same reason I don’t for photos.

A final thing to consider is the confidentiality of your backups. Whenever you upload data to a free public cloud storage service, you should treat the data as if it were being anonymously released to the public. In other words, personal data, cryptographic keys, and passwords should never be uploaded unencrypted to a public backup service. Things like PGP can help in this regard.

]]>
https://rogerhub.com/~r/code.rogerhub/infrastructure/415/backing-up-my-data-as-a-linux-user/feed/ 0
Backing up dropbox with rsync https://rogerhub.com/~r/code.rogerhub/terminal-fu/53/backing-up-dropbox-with-rsync/ https://rogerhub.com/~r/code.rogerhub/terminal-fu/53/backing-up-dropbox-with-rsync/#comments Tue, 05 Mar 2013 08:10:59 +0000 https://rogerhub.com/~r/code.rogerhub/?p=53

Update

I don’t use this system anymore. Learn about my new backup system instead.

At UC Berkeley, Dropbox has become the de-facto standard for cloud sync and live backups, especially for team projects that don’t particularly fit the game of traditional version control. (It is nice to keep local copies of git repositories on Dropbox anyway.) Despite this, it’s bad practice to just leave the safety of your data up to a third party. You might, for instance, accidentally trigger the deletion of all your data and unsuspectingly propagate the change to everywhere you have your Dropbox cached, which is why I proposed keeping an off-site copy. It’s like backing up your backup, and I’ve been doing it for months now.

Before I get started, here’s the structure of a backup server I have stationed at my home in SoCal:

/dev/sda1
  ..
    backup/
      archives/
      dropbox/
    media/
    public/
    ..
/dev/sdb1
  backup/
    archives/
    dropbox/

A cron script runs every so often that syncs the first backup directory to the second. It’s essentially a hacky equivalent of RAID 1 that provides the added bonus of bad-sector checking every time the script runs.

#!/bin/bash

DIR=$HOME/Dropbox
read -p "The target directory is: $DIR. Correct? [yn] " -n 1

if [[ $REPLY =~ ^[Yy]$ ]];then
  if [ -d $DIR ];then
    echo -e "\n"
    rsync -vauzh --progress --exclude '.dropbox*' --delete $DIR home:~/backup/dropbox
    echo "Complete."
    exit 0
  else
    echo "Could not find directory. Exiting."
    exit 1
  fi
else
  echo "Exiting."
  exit 1
fi

I’ll explain what I did in more detail. Read is a nifty command for reading from standard input to a bash variable, $REPLY. The -p flag specifies a prompt to ask the user, and the -n flag specifies that you want 1 character.

The [[ ... =~ ... ]] format tests a variable against a regular expression, which are exceedingly common in everyday server administration. They are a more complex system of simple wildcard expressions like *.sh or IMG_0??.jpg. In this case, the [Yy] block specifies a set of characters that are acceptable as input (lower and uppercase Y for yes), and the ^...$ instructs the computer to pass the match only if a Y or y character is the entire variable.

rsync -vauzh --progress --exclude '.dropbox*' --delete $DIR home:~/backup/dropbox

My ~/.ssh/config contains a block for a home alias that stores all the information needed to connect to my server at home. The last two arguments of this rsync command are familiar if you’ve ever used scp. Here are the flags I’ve set:

  • -v for verbosity, because why not?
  • -a for archive, which preserves things like file permissions and ownership
  • -u for update, so only files with more recent mtimes are copied
  • -z for gZip compression, since most of my files are plain-text and highly compressible
  • -h for human-readable information, which prints out super-readable messages about the sync operation’s progress
  • --progress for progress display
  • --exclude '.dropbox*' to exclude copying Dropbox’s local cache and configuration files
  • --delete to delete files that have since disappeared on my local copy

And voila! Run the script periodically, and know that you’ve got a backup plan in case your Dropbox ever becomes inaccessible.

]]>
https://rogerhub.com/~r/code.rogerhub/terminal-fu/53/backing-up-dropbox-with-rsync/feed/ 3