Code.RogerHub » bash

Continuous integration in web development

Roger Chen — Mon, 24 Jun 2013 01:08:26 +0000

CI, or Continuous Integration, is a big help when you’re working on console applications. If you’re not familiar with the term, continuous integration refers to a system through which you can have your code compiled, executed, and tested in the background as you’re working on it. This usually happens on remote servers dedicated to the heavy-lifting involved with building code. You can be notified of build failures asynchronously and test new features without having to interrupt your workflow to wait for a compile job to finish.

However, things are different with front-end web development. Websites are hard to test, and when they are tested, they’re usually tested by hand in some kind of re-build/alt-tab/refresh loop, without the power of CI. Following are some tips I derived from CI that may help you improve your web development workflow.

If you’ve ever worked on a large JavaScript-intensive application, you’ve probably written build scripts to concatenate and minify JavaScript. You don’t need a huge IDE or development framework to accomplish any of that. I regularly use the following tools and scripts to help me develop my Final Grade Calculator:

CLI JavaScript Minification

There are plenty of websites that will do JS-minification for you online. In fact, most of them run in JavaScript themselves. However, if you find yourself visiting them more than once a day, you should get yourself a command-line JavaScript minifier. Dean Edwards, the author of /packer/, one of the most popular JS minification websites online, has ports of his JS packer available in several programming languages on his website. (I’m using the PHP version, because it appears to be the most faithful port.)

After you’ve acquired a minifier and have appended it to your $PATH, you can incorporate it into your build scripts like so:

#!/bin/bash
packer src/final.english.js out.js
./build.py > out.htm
...

inotify Tools

Even if you’ve combined all of your build tasks into a single build script, you still have to run the thing every time you want to see new changes in your web application. A sleep-build loop would take care of this inconvenience, but then you’re stuck between wasting CPU cycles and having to wait a bit for new changes to appear. We can do better.

The term inotify stands for index-node (inode) notification. Most unix systems come with tools that bridge inotify services with the CLI, and you can use these to have your build script run only when you change and save a source file in your text editor.

I have an alias set up like so:

alias watchdir='inotifywait -r -e close_write,moved_to,create'

The switches enable recursive behavior and restrict the events to a certain few. This command will block until one of the three events occurs:

close_write – when a file handle in w mode is closed
moved_to – when a file is moved into the directory
create – when a new file is created in the directory

Combined with an rsync alias with some reasonable defaults, you can put together a loop that syncs source files to a remote server and builds them as they change.

alias rrsync='rsync -vauzh --progress --delete'

For example, I used something like the following while developing this blog’s theme:

while watchdir cobalt; do sass ... ; rrsync cobalt ... ; done

This last one isn’t a concept from CI, but it can be adapted to fit in your CI workflow whenever you need it.

Ad-hoc Web Server

You can test static html files just by opening them locally with your web browser, but there are a few reasons that an actual web server, no matter how simple it may be, is a slightly better option (root favicon, greater network permissions, absolute paths, protocol-relative URL’s, just off the top of my head). Python (and Python 3) comes with a built-in single-threaded simple web server that supports directory listing, symbolic links, a decent number of MIME types, modification/expiration headers, and large-file streaming. In other words, it’s a pretty good tool that will do anything you could want related to serving static assets (you know, unless you want to download two things at once).

Start the python web server at the command line, and it will start serving files through HTTP from the current working directory:

> python -m SimpleHTTPServer [port number]  # Python 2
> python3 -m http.server [port number]      # Python 3

Using locate to quickly change directories

Roger Chen — Sat, 20 Apr 2013 22:27:08 +0000

You’ve probably got a bunch of directories and subdirectories in your home folder that are organized logically, rather than a mess of top-level directories scattered all over the place. Although the former is cleaner and better organized, it also takes more time to get to where you’d like to be. Luckily, linux comes with /usr/bin/locate to help you find files. I made the following for my bashrc:

goto () { cd "$(locate -i "$@" | grep -P "/home/roger/(Dropbox|Documents|Downloads|Development|Desktop)" | awk '{ print length(), $0 | "sort -n" }' | head -n 1 | cut -d " " -f2-)"; }

I’ll go through it part by part:

locate -i "$@" – Locate uses the database generated by updatedb to find files. The -i flag tells locate to ignore case.
grep -P "..." – I only want to select directories that are in one of the places I store my stuff. This also means that results from web browser caches and temp files will be ignored. You should obviously change this regular expression. The -P flag specifies PCRE (Perl-compatible regular expressions), since I am most comfortable with those.
awk '{ print length(), $0 | "sort -n" }' – Awk is a mini string manipulation language that you can use on the command line. In this case, it just prefixes each string with its length and sorts them.
head -n 1 – After the sorting, I just want the shortest result, so I grab that one.
cut -d " " -f2- – Now, I get rid of the length and keep everything else, which is the path I wanted. The -d flag tells cut to use a space as the delimiter (the default is a tab character).
All of this is wrapped in a cd "$( ... )";. Bash will execute the contents of the $( ... ) and feed the textual result into cd as the argument.

It isn’t as fast as Apple’s spotlight search, but the difference is negligible. For greater performance, you can customize the system-wide behavior of updatedb to search in fewer directories.

Grabbing an apache-generated directory listing

Roger Chen — Thu, 11 Apr 2013 06:34:42 +0000

So it turns out that one of my professors, whose lectures I attend less frequently than I should, chooses to play classical music at the beginning of lecture when students are walking in. A couple of people started a thread on Piazza back when the course began to identify as many of these tracks as they could. On request, he posted a ripped vinyl on the course web page, which of course, is described with the default apache2 directory listing page, MIME-type icons and everything.

Short of a wget -r, it’s embarrassingly difficult to grab directories that are laid out like this from the command line. It isn’t such a big deal with only 14 files, but this number could easily scale up to a hundred, in which case you’d probably decide a programmatic solution would be worth it. For fun, I came up with the following:

for f in $(curl "http://www.cs.berkeley.edu/.../" 2>/dev/null | \
grep 'href="(.*?)\.m4a"' -o -P | cut -d '"' -f 2); \
do wget "http://www.cs.berkeley.edu/.../$f"; done

The three lines are split by a single backslash character (\) at the end of each line. This indicates to bash that the lines are meant to be treated as a single-line command (since I typed it as such in the first place).

The stuff inside the $( ... ) consists of a curl [url] 2>/dev/null piped into grep. The 2>/dev/null redirects the Standard Error stream to a black hole so that it isn’t displayed on the screen. This is to prevent curl from showing any information about its progress. (Curl also supports a --silent command-line switch that does the same thing.)

The grep simply searches for URL’s that link to a *.m4a file. The (.*?) syntax has special meaning in PERL-compatible grep, which I am invoking here with the -P switch. PERL supports a non-greedy syntax that translates to telling regular expression wildcards like .* to match as few characters as possible. This syntax is invoked, in this case, with the question mark. The parentheses in this case are unnecessary.

The -o command line switch tells grep only to print out the matching parts of the input, rather than their entire lines. The rest of the code just loops through the URL’s and prepends the absolute address of these files to the path on its way to wget.

How to tell if your system is big endian or little endian

Roger Chen — Tue, 26 Mar 2013 02:06:19 +0000

I was on MDN when I noticed that Chrome’s V8 (at least) was little-endian, meaning that a hexadecimal number you’d write as 0xCAFEBABE is actually stored as BE BA FE CA if you read the bytes off memory. It made me wonder how you could most easily determine if your system is little or big endian. I then came upon this gem:

echo -n I | od -to2 | head -n1 | cut -f2 -d" " | cut -c6

I’ll go through it part by part:

echo -n I – You’re already familiar with echo. The -n switch just suppresses the newline \n character that, by default, follows the output. This part just prints the letter I.
od -to2 – If you look at man ascii, you’ll notice that in the 7-bit octal range (00 to 0177), the letter I is 0111, which has the convenient property of being all 1’s. od is a program that lets you inspect, in human-readable format, what may or may not be printable characters. The -to2 switch tells it to print 2 bytes in octal. (The -to1 switch would not work because both little-endian and big-endian machines would indistinguishably print 111.) Little-endian machines will give you 0000000 000111 while big-endian machines will give you 0000000 111000. Great! Let’s simplify this result down a bit.
head -n1 – Here’s an easy one. We’re just grabbing the first line of stdin and spitting it back out.
cut -f2 -d" " – Another good one to know. Cut is string.split(). The -f2 switch tells it that the second field is desired, and the -d" " switch gives it a single space character as a delimeter.
cut -c6 – Finally, here’s another cut. This one just grabs the 6th character. The 4th or the 5th would also suffice.

You could take this one-liner a step further and add conditional printing of “big endian” or “little endian”, but by point #2, it’s pretty much there.

Backing up dropbox with rsync

Roger Chen — Tue, 05 Mar 2013 08:10:59 +0000

Update

I don’t use this system anymore. Learn about my new backup system instead.

At UC Berkeley, Dropbox has become the de-facto standard for cloud sync and live backups, especially for team projects that don’t particularly fit the game of traditional version control. (It is nice to keep local copies of git repositories on Dropbox anyway.) Despite this, it’s bad practice to just leave the safety of your data up to a third party. You might, for instance, accidentally trigger the deletion of all your data and unsuspectingly propagate the change to everywhere you have your Dropbox cached, which is why I proposed keeping an off-site copy. It’s like backing up your backup, and I’ve been doing it for months now.

Before I get started, here’s the structure of a backup server I have stationed at my home in SoCal:

/dev/sda1
  ..
    backup/
      archives/
      dropbox/
    media/
    public/
    ..
/dev/sdb1
  backup/
    archives/
    dropbox/

A cron script runs every so often that syncs the first backup directory to the second. It’s essentially a hacky equivalent of RAID 1 that provides the added bonus of bad-sector checking every time the script runs.

#!/bin/bash

DIR=$HOME/Dropbox
read -p "The target directory is: $DIR. Correct? [yn] " -n 1

if [[ $REPLY =~ ^[Yy]$ ]];then
  if [ -d $DIR ];then
    echo -e "\n"
    rsync -vauzh --progress --exclude '.dropbox*' --delete $DIR home:~/backup/dropbox
    echo "Complete."
    exit 0
  else
    echo "Could not find directory. Exiting."
    exit 1
  fi
else
  echo "Exiting."
  exit 1
fi

I’ll explain what I did in more detail. Read is a nifty command for reading from standard input to a bash variable, $REPLY. The -p flag specifies a prompt to ask the user, and the -n flag specifies that you want 1 character.

The [[ ... =~ ... ]] format tests a variable against a regular expression, which are exceedingly common in everyday server administration. They are a more complex system of simple wildcard expressions like *.sh or IMG_0??.jpg. In this case, the [Yy] block specifies a set of characters that are acceptable as input (lower and uppercase Y for yes), and the ^...$ instructs the computer to pass the match only if a Y or y character is the entire variable.

rsync -vauzh --progress --exclude '.dropbox*' --delete $DIR home:~/backup/dropbox

My ~/.ssh/config contains a block for a home alias that stores all the information needed to connect to my server at home. The last two arguments of this rsync command are familiar if you’ve ever used scp. Here are the flags I’ve set:

-v for verbosity, because why not?
-a for archive, which preserves things like file permissions and ownership
-u for update, so only files with more recent mtimes are copied
-z for gZip compression, since most of my files are plain-text and highly compressible
-h for human-readable information, which prints out super-readable messages about the sync operation’s progress
--progress for progress display
--exclude '.dropbox*' to exclude copying Dropbox’s local cache and configuration files
--delete to delete files that have since disappeared on my local copy

And voila! Run the script periodically, and know that you’ve got a backup plan in case your Dropbox ever becomes inaccessible.

Tidying up SASS with a one-liner

Roger Chen — Sat, 02 Mar 2013 22:05:36 +0000

At the Daily Cal, we maintain a ton of CSS code for our website’s WordPress theme. But instead of using a single enormous stylesheet, we check in SASS files to version control which are recompiled on deployment (or for development testing). In one directory, we have a bunch of scss-type files like so:

./
../
.sass-cache/
_archive.scss
_blogs.scss
...
style.scss
_wp.scss

The files that begin with an underscore are SASS Partials, meaning that they don’t get built themselves, but are imported by other files. In this case, style.scss imports everything in the directory and spits out a style.css that complies with WordPress’s theme standards.

(Without style.css, WordPress won’t recognize a theme, since all the metadata for the theme is contained within that stylesheet. Either way, the file needs to be built since without it, there’d be no styling.)

I was working on the code base yesterday and came up with this one-liner to do a bit of code-cleanup. I’ll explain it further in steps:

$ for i in $(find . -name "_*.scss" -type f); do sass-convert --in-place $i; done

The primary part of this line lies inside the $(...). The dollarsign-parentheses combo tells bash (or any other POSIX-complaint shell) to execute its contents before proceeding. You may also be familiar with the back-tick notation `...` of executing commands.

$ find . -name "_*.scss" -type f

Find is a part of GNU findutils along with xargs and locate that searches for files. It takes [options] [path] [expression]. In this case, I wanted to match all the scss partial files in the current directory, which happen to match the wildcard expression _*.scss (note the preceding underscore). A single dot . refers to the current working directory (see pwd). You may be familiar with its variant, the double dot .., which matches the parent directory.

(Fun fact: Hidden files, like the configuration files in your home directory, usually begin with a period because both single dot and double dot begin with a period. The presence of a period at the start of their names was used to exclude them from directory listings without imagining that hidden files would later make use of this quirk.)

for i in ...
do
  ...
  ...
done

The above loops through a list of space-separated elements (like 1 2 3 4), puts each in $i, and executes the suite of instructions specified between the keyword do and done. I chose the letter $i arbitrarily, but it’s one that’s typically used as a loop placeholder.

SASS comes with a command sass-convert that will convert between different CSS-variants (sass, scss, css) with the added bonus of syntax-checking and code-tidying. You can convert CSS to SASS with something like:

$ sass-convert --from css --to sass foo.css bar.sass

If you make extensive use of nested selectors, sass-convert will combine those for you. This utility can also be used to convert formats to themselves with the --in-place option. Putting it all together, we get this one-liner that loops through all the _*.scss file in the current directory and converts them in-place:

$ for i in $(find . -name "_*.scss" -type f); do sass-convert --in-place $i; done

This operation doesn’t change the output whatsoever. Even CSS multiline comments are left in place! (SASS removes single-line // comments by default, since they aren’t valid CSS syntax.)

And that’s it! One 6000-line patch later, and all the SCSS looks gorgeous. The indentation hierarchy is uniform and everything is super-readable.