Protecting yourself on open wifi with Firefox

So, I’m sitting in the back of Brewed Awakening right now in the midst of café-goers, some of which I know must be sniffing packets from the several overlapping open wifi networks around this dense part of campus. The spread of free wifi access points is an excellent direction for humanity, but it comes with its risks. Unless you’re browsing through HTTPS, anybody with a capable wifi network adapter can sit innocuously across the café and record everything you’re transmitting and receiving on your laptop, tablet, or smartphone. It may not seem immediately concerning that strangers know what kind of sick forums you frequent, but it becomes a security issue when you start transmitting your passwords in the clear (you know, the same ones you use for banking and email).

It just so happens that Linode, the hosting company that hosts the RogerHub network, recently announced a 10x increase in bandwidth caps for all their clients, and Opera announced their decision to move their browser line over to Webkit, sparking conversation about our complacency with Safari/Chrome. These two together motivated me to return to Firefox and recapture their vast add-on ecosystem to try to address this issue.

ssh -D 1025 rogerhub

You’re probably familiar with ssh‘s ability to forward TCP and erect ad-hoc SOCKS proxies. If not, you should definitely check out man ssh and read through the -L and -D flags. This command sets up a SOCKS proxy on localhost:1025 (ports below 1024 are privileged and can only be bound by root) through which you can forward web traffic and stuff.

Now, Google Chrome’s proxy support isn’t great for two reasons: first, they’ve had corporate adoption in mind since the beginning, so Chrome has historically read proxy settings from the environmental variables/group policy/system configuration, whatever it is on your system. In their Windows version, there’s also a UI to set proxy settings, but it isn’t their main focus. Second, their extension API doesn’t allow for the same kind of deep integration with the UI and with the program internals as Firefox’s add-on environment allows.

So, I opened up Firefox and installed FoxyProxy, the popular proxy-switching add-on, and configured it with the SSh proxy. I also pulled in NoScript for the sake of locking down the browser itself.

NoScript comes with a bunch of draconian defaults, and isn’t very useful without a bit of configuration (you could, for example, just turn off scripts in Firefox instead of keeping its defaults). Enabling same-origin scripts (Base 2nd level domains works well for me) will let most sites and their CDN subdomains to function the way they were meant while ignoring the GA, Facebook, and ad network trackers. Of course, this isn’t very good practice for casual browsing, but open wifi is a battlefield.

Altogether, it makes for enough security to give you peace of mind while browsing on public wifi. Firefox, as a browser, has really improved over the years as well, especially its web developer tools which (back in the day) once consisted of an Error Console and the imperative to install/learn Firebug. Also, splitting search and location just makes sense.

Generating on-the-fly filler text in PHP

Update

I’ve updated the code and text of this post to reflect the latest version of the code.

For one of the projects I’ve been working on recently, I needed huge amounts of filler text (we’re talking about a megabyte) for lorem ipsum placeholder copy. Copy is the journalistic term for plain ol’ text in an article or an advertisement, in contrast with pictures or design elements. When you design for type, it’s often helpful to have text that looks like it could be legitimate writing instead of a single word repeated or purely random characters. From this rose the art of lorem ipsum, which intelligently crafts words with pronounceable syllables and varying lengths.

It’s a rather complicated process to generate high-quality lorem ipsum, but the following will do an acceptable job with much fewer lines of code.

/**
 * Helper function that generates filler text
 *
 * @param $type is target length in characters
 */
protected function filler_text($type) {
  
  /**
   * Source text for lipsum
   */
  static $lipsum_source = array(
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit",
    "Aliquam <a href='/'>sodales blandit felis</a>, vitae imperdiet nisl",
    ...
    "Quisque ullamcorper aliquet ante, sit amet molestie magna auctor nec",
  );
  if ($type == 'title') {
    // Titles average 3 to 6 words
    $length = rand(3, 6);
    $ret = "";
    for ($i = 0; $i < $length; $i++) {
      if (!$i) {
        $ret = ucwords($this->array_random_value(explode(" ", strip_tags($this->array_random_value($lipsum_source))))) . ' ';
      } else {
        $ret .= strtolower($this->array_random_value(explode(" ", strip_tags($this->array_random_value($lipsum_source))))) . ' ';
      }
    }
    return trim($ret);
  } else if ($type == 'post') {
    $ret = "";
    $order = array('paragraph');
    $order_length = rand(12, 19);
    for ($n = 0; $n < $order_length; $n++) {
      $choice = rand(0, 8);
      switch ($choice) {
      case 0: $order[] = 'list'; break;
      case 1: $order[] = 'image'; break;
      case 2: $order[] = 'blockquote'; break;
      default: $order[] = 'paragraph'; break;
      }
    }
    for ($n = 0; $n < count($order); $n++) {
      switch ($order[$n]) {
      case 'paragraph':
        $length = rand(2,7);
        $ret .= '<p>';
        for ($i = 0; $i < $length; $i++) {
          if ($i) $ret .= ' ';
          $ret .= $this->array_random_value($lipsum_source) . '.';
        }
        $ret .= "</p>\n";
        break;
      case 'image':
        $ret .= "<p><img src='http://placehold.it/900x580' /></p>\n";
        break;
      case 'list':
        $tag = (rand(0, 1)) ? 'ul' : 'ol';
        $ret .= "<$tag>\n";
        $length = rand(2,5);
        for ($i = 0; $i < $length; $i++) {
          $ret .= "<li>" . $this->array_random_value($lipsum_source) . "</li>\n";
        }
        $ret .= "</$tag>\n";
        break;
      case 'blockquote':
        $length = rand(2,7);
        $ret .= '<blockquote><p>';
        for ($i = 0; $i < $length; $i++) {
          if ($i) $ret .= ' ';
          $ret .= $this->array_random_value($lipsum_source) . '.';
        }
        $ret .= "</p></blockquote>\n";
        break;
      }
    }
    
    return $ret;
  }
}

First of all, you’ll need some filler text to use as a seed for this function. The term seed is a heavily-used term in computer science. It usually refers to an initial value used in some sort of deterministic pseudo-random number generator (or PRNG). Most programming languages have built-in libraries that provide randomness-generation. Many of these implementations are not actually random, but deterministic algorithms that are pure functions of some environmental variable, usually the timestamp, and the number of calls to the algorithm preceding it: n − 1 for round n.

The seed in this case is just a bunch of pre-generated lorem ipsum that you can grab anywhere online. The heart of the code just breaks down this text into sentences and picks a number of them to fit into a new sentence.

Lorem ipsum is rarely useful as one enormous chunk of text. Most frequently, copy is broken into paragraphs of varying lengths, which is this next enhancement. The code alternates between paragraphs, images, lists, and blockquotes to keep things more interesting. You can get my post-generating plugin on WordPress.org.

Same origin policy and a buggy WordPress plugin

Update

I don’t use the plugin mentioned in this post anymore.

On this blog, I use the Crayon syntax highlighter for WordPress to render all the code snippets, since this is a programming blog after all. Crayon is one of the more popular highlighting plugins, as clearly demonstrated by the sad condition of its support forum. It comes with a bunch of color schemes including Ethan Schoonover’s extremely popular “solarized” color scheme and a replica of what Github uses (although the plugin’s version contains a tad bit more purple). A light pastel blue would have fit with this blog’s color scheme quite nicely, but definitely not purple. So, I dove deeper.

Crayon stores its highlighting themes as plain CSS in WordPress’s upload directory. That’s the same place that photos and other media go for your posts. I can think of a couple good reasons why this decision makes more sense than loading the CSS directly from the plugin’s folder:

  • WordPress has (by necessity) file permissions to write to the uploads directory, which means that if the plugin ever needs to customize themes, it can do that internally.
  • There are hooks for CDN’s and such that mirror WordPress’s upload directory, but may not do the same for plugins and other things.
  • On network sites, customized color schemes can be site-specific although the plugin is installed and activated network-wide.

However, there’s also one really bad effect (that likely applies only to me) of this, which I spent a good deal of the afternoon debugging. I run nginx on the backend, and its configuration divides WordPress into two regions: one contains wp-login.php and wp-admin, and the other, everything else. This way, I can restrict WordPress cookies to only the former, which is run over SSL using a self-signed certificate, and avoid having the issue where Google indexes everything twice. (Visitors shouldn’t be reading blogs over https anyway.)

It would be nice if everything could be so cleanly divided this way, but wp-admin also uses file uploads like image thumbnails, which is why there are all the mixed-content warnings you may have seen if you’ve ever administered anything through a web panel over https. Browsers complain when you try to load pages with mixed content, but they will let you do so anyway. Things are different, however, when the request is made after the page has loaded, through AJAX.

There’s an inherent security problem with letting a page script load whatever resources it wants to from anywhere on the Internet. This issue is addressed by the same-origin policy framework, wherein the server with the resource (as opposed to the one containing the web page) gets to decide if access is allowed.

HTTP/1.1 200 OK
Access-Control-Allow-Origin: http://example.com

The header takes either a list of origins, null, or the wildcard *. It’s ultimately up to the browser to implement this security header, but most modern ones do.

Now, here’s the twist: Crayon uses the built-in WordPress function, wp_upload_dir(), to find and load custom themes. If your server is set up like mine, this will return a http scheme address, whereas a https address is required. Without a check to see if the page is loaded through SSL, the plain-text AJAX call will fail because browsers treat the SSL and the plain-text version of a site as different origins.

There are a couple ways to go about this. You can:

  • Disable SSL temporarily while you’re editing the custom color themes (probably the easiest).
  • Instruct the browser to ignore same-origin policy.
  • Patch the plugin code to use https where appropriate.

I ended up going with the third, after unsuccessfully attempting to send a access-control-allow-origin header with every response on the server-side. This was a rather frustrating issue to resolve, but it was interesting to see the kinds of problems that arise in exchange for progress in security.

Backing up dropbox with rsync

Update

I don’t use this system anymore. Learn about my new backup system instead.

At UC Berkeley, Dropbox has become the de-facto standard for cloud sync and live backups, especially for team projects that don’t particularly fit the game of traditional version control. (It is nice to keep local copies of git repositories on Dropbox anyway.) Despite this, it’s bad practice to just leave the safety of your data up to a third party. You might, for instance, accidentally trigger the deletion of all your data and unsuspectingly propagate the change to everywhere you have your Dropbox cached, which is why I proposed keeping an off-site copy. It’s like backing up your backup, and I’ve been doing it for months now.

Before I get started, here’s the structure of a backup server I have stationed at my home in SoCal:

/dev/sda1
  ..
    backup/
      archives/
      dropbox/
    media/
    public/
    ..
/dev/sdb1
  backup/
    archives/
    dropbox/

A cron script runs every so often that syncs the first backup directory to the second. It’s essentially a hacky equivalent of RAID 1 that provides the added bonus of bad-sector checking every time the script runs.

#!/bin/bash

DIR=$HOME/Dropbox
read -p "The target directory is: $DIR. Correct? [yn] " -n 1

if [[ $REPLY =~ ^[Yy]$ ]];then
  if [ -d $DIR ];then
    echo -e "\n"
    rsync -vauzh --progress --exclude '.dropbox*' --delete $DIR home:~/backup/dropbox
    echo "Complete."
    exit 0
  else
    echo "Could not find directory. Exiting."
    exit 1
  fi
else
  echo "Exiting."
  exit 1
fi

I’ll explain what I did in more detail. Read is a nifty command for reading from standard input to a bash variable, $REPLY. The -p flag specifies a prompt to ask the user, and the -n flag specifies that you want 1 character.

The [[ ... =~ ... ]] format tests a variable against a regular expression, which are exceedingly common in everyday server administration. They are a more complex system of simple wildcard expressions like *.sh or IMG_0??.jpg. In this case, the [Yy] block specifies a set of characters that are acceptable as input (lower and uppercase Y for yes), and the ^...$ instructs the computer to pass the match only if a Y or y character is the entire variable.

rsync -vauzh --progress --exclude '.dropbox*' --delete $DIR home:~/backup/dropbox

My ~/.ssh/config contains a block for a home alias that stores all the information needed to connect to my server at home. The last two arguments of this rsync command are familiar if you’ve ever used scp. Here are the flags I’ve set:

  • -v for verbosity, because why not?
  • -a for archive, which preserves things like file permissions and ownership
  • -u for update, so only files with more recent mtimes are copied
  • -z for gZip compression, since most of my files are plain-text and highly compressible
  • -h for human-readable information, which prints out super-readable messages about the sync operation’s progress
  • --progress for progress display
  • --exclude '.dropbox*' to exclude copying Dropbox’s local cache and configuration files
  • --delete to delete files that have since disappeared on my local copy

And voila! Run the script periodically, and know that you’ve got a backup plan in case your Dropbox ever becomes inaccessible.

Tidying up SASS with a one-liner

At the Daily Cal, we maintain a ton of CSS code for our website’s WordPress theme. But instead of using a single enormous stylesheet, we check in SASS files to version control which are recompiled on deployment (or for development testing). In one directory, we have a bunch of scss-type files like so:

./
../
.sass-cache/
_archive.scss
_blogs.scss
...
style.scss
_wp.scss

The files that begin with an underscore are SASS Partials, meaning that they don’t get built themselves, but are imported by other files. In this case, style.scss imports everything in the directory and spits out a style.css that complies with WordPress’s theme standards.

(Without style.css, WordPress won’t recognize a theme, since all the metadata for the theme is contained within that stylesheet. Either way, the file needs to be built since without it, there’d be no styling.)

I was working on the code base yesterday and came up with this one-liner to do a bit of code-cleanup. I’ll explain it further in steps:

$ for i in $(find . -name "_*.scss" -type f); do sass-convert --in-place $i; done

The primary part of this line lies inside the $(...). The dollarsign-parentheses combo tells bash (or any other POSIX-complaint shell) to execute its contents before proceeding. You may also be familiar with the back-tick notation `...` of executing commands.

$ find . -name "_*.scss" -type f

Find is a part of GNU findutils along with xargs and locate that searches for files. It takes [options] [path] [expression]. In this case, I wanted to match all the scss partial files in the current directory, which happen to match the wildcard expression _*.scss (note the preceding underscore). A single dot . refers to the current working directory (see pwd). You may be familiar with its variant, the double dot .., which matches the parent directory.

(Fun fact: Hidden files, like the configuration files in your home directory, usually begin with a period because both single dot and double dot begin with a period. The presence of a period at the start of their names was used to exclude them from directory listings without imagining that hidden files would later make use of this quirk.)

for i in ...
do
  ...
  ...
done

The above loops through a list of space-separated elements (like 1 2 3 4), puts each in $i, and executes the suite of instructions specified between the keyword do and done. I chose the letter $i arbitrarily, but it’s one that’s typically used as a loop placeholder.

SASS comes with a command sass-convert that will convert between different CSS-variants (sass, scss, css) with the added bonus of syntax-checking and code-tidying. You can convert CSS to SASS with something like:

$ sass-convert --from css --to sass foo.css bar.sass

If you make extensive use of nested selectors, sass-convert will combine those for you. This utility can also be used to convert formats to themselves with the --in-place option. Putting it all together, we get this one-liner that loops through all the _*.scss file in the current directory and converts them in-place:

$ for i in $(find . -name "_*.scss" -type f); do sass-convert --in-place $i; done

This operation doesn’t change the output whatsoever. Even CSS multiline comments are left in place! (SASS removes single-line // comments by default, since they aren’t valid CSS syntax.)

And that’s it! One 6000-line patch later, and all the SCSS looks gorgeous. The indentation hierarchy is uniform and everything is super-readable.