Code.RogerHub » cloud

Taking advantage of cloud VM-driven development

Roger Chen — Thu, 10 Jul 2014 06:39:48 +0000

Most people write about cloud computing as it relates to their service infrastructure. It’s exciting to hear about how Netflix and Dropbox et al. use AWS to support their operations, but all of those large-scale ideas don’t really mean much for the average developer. Most people don’t have the budget or the need for enormous computing power or highly-available SaaS data storage like S3, but it turns out that cloud-based VM’s can be highly useful for the average developer in a different way.

Sometimes, you see something like this one-liner installation script for Heroku Toolbelt, and you just get nervous:

wget -qO- https://toolbelt.heroku.com/install-ubuntu.sh | sh

Not only are they asking you to run a shell script downloaded over the Internet, but the script also asks for root privileges to install packages and stuff. Or, maybe you’re reading a blog post about some HA database cluster software and you want to try it out yourself, but running 3 virtual machines on your puny laptop is out of the questions.

To get around this issue, I’ve been using DigitalOcean machines for when I want to test something out but don’t want to go to the trouble of maintaining a development server or virtual machines. Virtualized cloud servers are great for this because:

They’re DIRT CHEAP. DO’s smallest machine costs $0.007 an hour. Even if you use it for 2 hours, it rounds down to 1 cent.
The internet connection is usually a lot better than whatever you’re using. Plus, most cloud providers have local mirrors for package manager stuff, which makes installing packages super fast.
Burstable CPU means that you can get an unfair amount of processing power for a short time at the beginning, which comes in handy for initially installing and downloading all the stuff you’ll want to have on your machine.

I use the tugboat client (a CLI ruby app) to interface with the DigitalOcean API. To try out MariaDB Galera clustering, I just opened up three terminals and had three SSH sessions going on. For source builds that have a dozen or more miscellaneous dependencies, I usually just prepare a simple build script that I can upload and run on a Cloud VM whenever I need it. When I’m done with a machine, I’ll shut it down until I need a machine again a few days later.

Running development virtual machines in the cloud might not drastically change your workflow, but it opens up a lot of opportunities for experimentation and massive computing resources when you want it. So load up a couple dollars onto a fresh DigitalOcean account and boot up some VMs!

Self-contained build environments with Vagrant

Roger Chen — Tue, 20 Aug 2013 00:22:54 +0000

Vagrant is a nifty piece of Ruby software that lets you set up Virtual Machines with an unparalleled amount of automation. It interfaces with a VM provider like VirtualBox, and helps you set up and tear down VM’s as you need them. I like it better than Juju because there isn’t as much hand-holding involved, and I like it better than vanilla Puppet because I don’t regularly deploy a thousand VM’s at a time. At the Daily Cal, I’ve used Vagrant to help developers set up their own build environments for our software where they can write code and test features in isolation. I also use it as a general-purpose VM manager on my home file server, so I can build and test server software in a sandbox.

You can run Vagrant on your laptop, but I think that it’s the wrong piece of hardware for the job. Long-running VM batch jobs and build environment VM’s should be run by headless servers, where you don’t have to worry about excessive heat, power consumption, huge amounts of I/O, and keeping your laptop on so it doesn’t suspend. My server at home is set up with:

1000GB of disk space backed by RAID1
Loud 3000RPM fans I bought off a Chinese guy four years ago
Repurposed consumer-grade CPU and memory (4GB) from an old desktop PC

You don’t need great hardware to run a couple of Linux VM’s. Since my server is basically hidden in the corner, the noisy fans are not a problem and actually do a great job of keeping everything cool under load. RAID mirroring (I’m hoping) will provide high availability, and since the server’s data is easily replaceable, I don’t need to worry about backups. Setting up your own server is usually cheaper than persistent storage on public clouds like AWS, but your mileage may vary.

Vagrant configuration is a single Ruby file named Vagrantfile in the working directory of your vagrant process. My basic Vagrantfile just sets up a virtual machine with Vagrant’s preconfigured Ubuntu 12.04LTS image. They offer other preconfigured images, but this is what I’m most familiar with.

# Vagrantfile

Vagrant.configure("2") do |config|
  # Every Vagrant virtual environment requires a box to build off of.
  config.vm.box = "precise32"

  # The url from where the 'config.vm.box' box will be fetched if it
  # doesn't already exist on the user's system.
  config.vm.box_url = "http://files.vagrantup.com/precise32.box"

  config.vm.network :forwarded_port, guest: 8080, host: 8080

  # Enable public network access from the VM. This is required so that
  # the machine can access the Internet and download required packages.
  config.vm.network :public_network

end

For long-running batch jobs, I like keeping a CPU Execution Cap on my VM’s so that they don’t overwork the system. The cap keeps the temperature down and prevents the VM from interfering with other server processes. You can add an execution cap (for VirtualBox only) by appending the following before the end of your primary configuration block:

# Adding a CPU Execution Cap

Vagrant.configure("2") do |config|
  ...

  config.vm.provider "virtualbox" do |v|
    v.customize ["modifyvm", :id, "--cpuexecutioncap", "40"]
  end

end

After setting up Vagrant’s configuration, create a new directory containing only the Vagrantfile and run vagrant up to set up the VM. Other useful commands include:

vagrant ssh — Opens a shell session to the VM
vagrant halt — Halts the VM gracefully (Vagrant will connect via SSH)
vagrant status — Checks the current status of the VM
vagrant destroy — Destroys the VM

Finally, to set up the build environment automatically every time you create a new Vagrant VM, you can write provisioners. Vagrant supports complex provisioning frameworks like Puppet and Chef, but you can also write a provisioner that’s just a shell script. To do so, add the following inside your Vagrantfile:

Vagrant.configure("2") do |config|
  ...

  config.vm.provision :shell, :path => "bootstrap.sh"
end

Then just stick your provisioner next to your Vagrantfile, and it will execute every time you start your VM. You can write commands to fetch package lists and upgrade system software, or to install build dependencies and check out source code. By default, Vagrant’s current working directory is also mounted on the VM guest as a folder named vagrant in the file system root. You can refer to other provisioner dependencies this way.

Vagrant uses a single public/private keypair for all of its default images. The private key can usually be found in your home directory as ~/.vagrant.d/insecure_private_key. You can add it to your ssh-agent and open your own SSH connections to your VM without Vagrant’s help.

Even if you accidentally mess up your Vagrant configuration, you can use VirtualBox’s built-in command-line tools to fix boot configuration issues or ssh daemon issues.

$ VBoxManage list vms
...
$ VBoxManage controlvm  pause|reset|poweroff|etc
...
$ VBoxHeadless -startvm  --vnc
...
(connect via VNC)

The great thing about Vagrant’s VM-provider abstraction layer is that you can grab the VM images from VirtualBox and boot them on another server with VirtualBox installed, without Vagrant completely. Vagrant is a excellent support tool for programmers (and combined with SSH tunneling, it is great for web developers as well). If you don’t already have support from some sort of VM infrastructure, you should into possibilities with Vagrant.

elementary OS, a distribution like no other

Roger Chen — Sun, 18 Aug 2013 04:55:48 +0000

There are a surprising number of people who hate elementary OS. They say that elementary is technically just a distribution of linux, not an OS. They say that it is too similar to OS X. They say that the developers are in over their heads. All of these things may be true, but I do not care. I am sick of ascetic desktop environments without animations. I am tired of not having a compositor. I don’t need a dozen GUI applications to hold my hand, but I hate having to fix things that don’t work out of the box. You are not Richard Stallman. Face it: modern people, whether or not they are also computer programmers, do not live in the terminal, and most of us aren’t using laptops with hardware built in 2004. And finally, I am sick of blue and black window decorations that look like they were designed by a 12 year old.

Elementary OS Luna is the first distribution of linux I’ve used where I don’t feel like changing a thing. The desktop environment defaults are excellent, and all of Ubuntu’s excellent hardware support, community of PPA’s, and familiar package manager are available. At the same time, there is a lot of graphical magic, window animation, and attention to detail that is quite similar to OS X. There is a minimal amount of hand-holding, and it’s quite easy to get used to the desktop because of the intuitive keyboard shortcuts and great application integration.

You can’t tell from just the screenshot above, but elementary OS is more than just a desktop environment. The distribution comes packaged with a bunch of custom-built applications, like the custom file manager and terminal app. Other apps like a IRC client, social media client, and system search are available in community PPA’s. I do most of my work through either the web browser, ssh, or vim. Important data on my laptop is limited to personal files on my Google Drive, and a directory of projects in active development that’s regularly backed up to my personal file server. Programs can be painlessly installed from the package manager, and configuration files are symlinked from my Google Drive. I’m not very attached to the current state of my laptop at any given moment because all the data on it is replaceable, with the exception of OS tweaks. I don’t like having to install a dozen customization packages to get my workflow to how I like it, so out-of-box experience is very important to me.

I would say that if you’re a regular linux user, you should at least give elementary OS Luna a try, even if it’s on a friend’s machine or in a VM. You may be surprised. I was.

Backing up my data as a linux user

Roger Chen — Tue, 16 Jul 2013 23:10:19 +0000

It’s a good habit to routinely back up your important data, and over the past few years, dozens of cloud storage/backup solutions have sprung up, many of which offer a good deal of free space. Before you even start looking for a backup solution, you need to sit down and think about what kind of data you’re looking to back up. Specifically, how much data you have, how often it is changed, and how desperately do you need to keep it safe?

I have a few kinds of data that I actively backup and check for integrity. (Don’t forget to verify your backups, or there isn’t any point in backing them up at all.) Here are all the kinds of data that might be on your computer:

Program code – irreplaceable, small size (~100MB), frequently updated
Documents, and personal configuration files – irreplaceable, small size (~100MB), regularly updated
Personal photos – mostly irreplaceable, large size (more than 10GB), append only
Server configuration and data – mostly irreplaceable, medium size (~1GB), regularly updated
Collected Media – replaceable if needed, medium size (~1GB), append only
System files – easily replaceable, medium size (~1GB), sometimes updated

Several backup solutions try to backup everything. This is not a good idea. First, there are a lot of files on your computer that are easily replaceable (system files) and others that you’d rather not keep in your backup archives (program files). Second, those solutions have no way of giving extra redundancy to the things that matter most, and less redundancy to things that matter less.

In addition to these files, here are some types of data that you might not usually think about backing up:

Email
RSS and Calendar data
Blog content
Social networking content

My backup solution is a mix consisting of free online version control sites, Google, Dropbox, and a personal file server. My code, documents (essays, forms, receipts), and configuration (bash, vim, keys, personal CA, wiki, profile pictures, etc..) are the most important part of my backup. I sync these with Insync to my Google Drive, where I’ve rented 100GB of cloud storage. My Google Drive is regularly backed up to my personal file server, with about 2 weeks of retention.

Disks and old computers are cheap. Get a high-availability file server set up in your home, and you can happily offload intensive tasks to it like virtual machines, backup services, and archival storage. Mine is configured with:

Two 1TB hard drives configured in RAID-1 mirroring
Excessive amounts of ram and processing power, for a headless server
Ubuntu Server installed with SMART monitoring, nagios, nginx (for some web services), a torrent client
Wake-on-lan capabilities

Backing up your Google Drive might sound funny to you, but it is a good precaution in case anything ever happens to your Google Account. Additionally, most of my program code is in either a public GitHub repository or a private BitBucket repository. Version control and social coding features like issues/pull requests give you additional benefits than simply backing up your code, and you should definitely be using some kind of VCS for any code you write.

For many of the projects that I am actively developing, I only use VCS and my file server. Git object data should not be backed up to cloud storage services like Google Drive because they change too often. My vim configuration is also stored on GitHub, to take advantage of git submodules for my vim plugins.

My personal photos are stored in Google+ Photos, formerly known as Picasa. They give you 15GB of shared storage for free, and if that’s not enough, additional space is cheap as dirt. My photos don’t have another level of redundancy like my code and configuration files do. They are less important to me, and Google can be trusted to sustain itself longer than any backup solution you create yourself.

I host a single VPS with Linode (that’s an affiliate link) that contains a good amount of irreplaceable data from my blogs and other services I host on it. Linode itself offers cheap and easy full-disk backups ($5/mo.) that I signed up for. Those backups aren’t intended for hardware failures so much as human error, because Linode already maintains high-availability redundant disk storage for all of its VPS nodes. Additionally, I backup the important parts of the server to my personal file server (/etc, /home, /srv, /var/log), for an extra level of redundancy.

Any pictures I collect from online news aggregators is dumped in my Google Drive and shares the same extra redundancy as my documents and personal configuration files. Larger media like videos are stored in one of my USB 3.0 flash drives, since they are regularly created and deleted.

I don’t back up system files, since Xubuntu is free and programs are only 1 package-manager command away. I don’t maintain extra redundancy for email for the same reason I don’t for photos.

A final thing to consider is the confidentiality of your backups. Whenever you upload data to a free public cloud storage service, you should treat the data as if it were being anonymously released to the public. In other words, personal data, cryptographic keys, and passwords should never be uploaded unencrypted to a public backup service. Things like PGP can help in this regard.

Continuous integration in web development

Roger Chen — Mon, 24 Jun 2013 01:08:26 +0000

CI, or Continuous Integration, is a big help when you’re working on console applications. If you’re not familiar with the term, continuous integration refers to a system through which you can have your code compiled, executed, and tested in the background as you’re working on it. This usually happens on remote servers dedicated to the heavy-lifting involved with building code. You can be notified of build failures asynchronously and test new features without having to interrupt your workflow to wait for a compile job to finish.

However, things are different with front-end web development. Websites are hard to test, and when they are tested, they’re usually tested by hand in some kind of re-build/alt-tab/refresh loop, without the power of CI. Following are some tips I derived from CI that may help you improve your web development workflow.

If you’ve ever worked on a large JavaScript-intensive application, you’ve probably written build scripts to concatenate and minify JavaScript. You don’t need a huge IDE or development framework to accomplish any of that. I regularly use the following tools and scripts to help me develop my Final Grade Calculator:

CLI JavaScript Minification

There are plenty of websites that will do JS-minification for you online. In fact, most of them run in JavaScript themselves. However, if you find yourself visiting them more than once a day, you should get yourself a command-line JavaScript minifier. Dean Edwards, the author of /packer/, one of the most popular JS minification websites online, has ports of his JS packer available in several programming languages on his website. (I’m using the PHP version, because it appears to be the most faithful port.)

After you’ve acquired a minifier and have appended it to your $PATH, you can incorporate it into your build scripts like so:

#!/bin/bash
packer src/final.english.js out.js
./build.py > out.htm
...

inotify Tools

Even if you’ve combined all of your build tasks into a single build script, you still have to run the thing every time you want to see new changes in your web application. A sleep-build loop would take care of this inconvenience, but then you’re stuck between wasting CPU cycles and having to wait a bit for new changes to appear. We can do better.

The term inotify stands for index-node (inode) notification. Most unix systems come with tools that bridge inotify services with the CLI, and you can use these to have your build script run only when you change and save a source file in your text editor.

I have an alias set up like so:

alias watchdir='inotifywait -r -e close_write,moved_to,create'

The switches enable recursive behavior and restrict the events to a certain few. This command will block until one of the three events occurs:

close_write – when a file handle in w mode is closed
moved_to – when a file is moved into the directory
create – when a new file is created in the directory

Combined with an rsync alias with some reasonable defaults, you can put together a loop that syncs source files to a remote server and builds them as they change.

alias rrsync='rsync -vauzh --progress --delete'

For example, I used something like the following while developing this blog’s theme:

while watchdir cobalt; do sass ... ; rrsync cobalt ... ; done

This last one isn’t a concept from CI, but it can be adapted to fit in your CI workflow whenever you need it.

Ad-hoc Web Server

You can test static html files just by opening them locally with your web browser, but there are a few reasons that an actual web server, no matter how simple it may be, is a slightly better option (root favicon, greater network permissions, absolute paths, protocol-relative URL’s, just off the top of my head). Python (and Python 3) comes with a built-in single-threaded simple web server that supports directory listing, symbolic links, a decent number of MIME types, modification/expiration headers, and large-file streaming. In other words, it’s a pretty good tool that will do anything you could want related to serving static assets (you know, unless you want to download two things at once).

Start the python web server at the command line, and it will start serving files through HTTP from the current working directory:

> python -m SimpleHTTPServer [port number]  # Python 2
> python3 -m http.server [port number]      # Python 3

Backing up dropbox with rsync

Roger Chen — Tue, 05 Mar 2013 08:10:59 +0000

Update

I don’t use this system anymore. Learn about my new backup system instead.

At UC Berkeley, Dropbox has become the de-facto standard for cloud sync and live backups, especially for team projects that don’t particularly fit the game of traditional version control. (It is nice to keep local copies of git repositories on Dropbox anyway.) Despite this, it’s bad practice to just leave the safety of your data up to a third party. You might, for instance, accidentally trigger the deletion of all your data and unsuspectingly propagate the change to everywhere you have your Dropbox cached, which is why I proposed keeping an off-site copy. It’s like backing up your backup, and I’ve been doing it for months now.

Before I get started, here’s the structure of a backup server I have stationed at my home in SoCal:

/dev/sda1
  ..
    backup/
      archives/
      dropbox/
    media/
    public/
    ..
/dev/sdb1
  backup/
    archives/
    dropbox/

A cron script runs every so often that syncs the first backup directory to the second. It’s essentially a hacky equivalent of RAID 1 that provides the added bonus of bad-sector checking every time the script runs.

#!/bin/bash

DIR=$HOME/Dropbox
read -p "The target directory is: $DIR. Correct? [yn] " -n 1

if [[ $REPLY =~ ^[Yy]$ ]];then
  if [ -d $DIR ];then
    echo -e "\n"
    rsync -vauzh --progress --exclude '.dropbox*' --delete $DIR home:~/backup/dropbox
    echo "Complete."
    exit 0
  else
    echo "Could not find directory. Exiting."
    exit 1
  fi
else
  echo "Exiting."
  exit 1
fi

I’ll explain what I did in more detail. Read is a nifty command for reading from standard input to a bash variable, $REPLY. The -p flag specifies a prompt to ask the user, and the -n flag specifies that you want 1 character.

The [[ ... =~ ... ]] format tests a variable against a regular expression, which are exceedingly common in everyday server administration. They are a more complex system of simple wildcard expressions like *.sh or IMG_0??.jpg. In this case, the [Yy] block specifies a set of characters that are acceptable as input (lower and uppercase Y for yes), and the ^...$ instructs the computer to pass the match only if a Y or y character is the entire variable.

rsync -vauzh --progress --exclude '.dropbox*' --delete $DIR home:~/backup/dropbox

My ~/.ssh/config contains a block for a home alias that stores all the information needed to connect to my server at home. The last two arguments of this rsync command are familiar if you’ve ever used scp. Here are the flags I’ve set:

-v for verbosity, because why not?
-a for archive, which preserves things like file permissions and ownership
-u for update, so only files with more recent mtimes are copied
-z for gZip compression, since most of my files are plain-text and highly compressible
-h for human-readable information, which prints out super-readable messages about the sync operation’s progress
--progress for progress display
--exclude '.dropbox*' to exclude copying Dropbox’s local cache and configuration files
--delete to delete files that have since disappeared on my local copy

And voila! Run the script periodically, and know that you’ve got a backup plan in case your Dropbox ever becomes inaccessible.