Update

I’ve updated the code and text of this post to reflect the latest version of the code.

For one of the projects I’ve been working on recently, I needed huge amounts of filler text (we’re talking about a megabyte) for lorem ipsum placeholder copy. Copy is the journalistic term for plain ol’ text in an article or an advertisement, in contrast with pictures or design elements. When you design for type, it’s often helpful to have text that looks like it could be legitimate writing instead of a single word repeated or purely random characters. From this rose the art of lorem ipsum, which intelligently crafts words with pronounceable syllables and varying lengths.

It’s a rather complicated process to generate high-quality lorem ipsum, but the following will do an acceptable job with much fewer lines of code.

/**
 * Helper function that generates filler text
 *
 * @param $type is target length in characters
 */
protected function filler_text($type) {
  
  /**
   * Source text for lipsum
   */
  static $lipsum_source = array(
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit",
    "Aliquam <a href='/'>sodales blandit felis</a>, vitae imperdiet nisl",
    ...
    "Quisque ullamcorper aliquet ante, sit amet molestie magna auctor nec",
  );
  if ($type == 'title') {
    // Titles average 3 to 6 words
    $length = rand(3, 6);
    $ret = "";
    for ($i = 0; $i < $length; $i++) {
      if (!$i) {
        $ret = ucwords($this->array_random_value(explode(" ", strip_tags($this->array_random_value($lipsum_source))))) . ' ';
      } else {
        $ret .= strtolower($this->array_random_value(explode(" ", strip_tags($this->array_random_value($lipsum_source))))) . ' ';
      }
    }
    return trim($ret);
  } else if ($type == 'post') {
    $ret = "";
    $order = array('paragraph');
    $order_length = rand(12, 19);
    for ($n = 0; $n < $order_length; $n++) {
      $choice = rand(0, 8);
      switch ($choice) {
      case 0: $order[] = 'list'; break;
      case 1: $order[] = 'image'; break;
      case 2: $order[] = 'blockquote'; break;
      default: $order[] = 'paragraph'; break;
      }
    }
    for ($n = 0; $n < count($order); $n++) {
      switch ($order[$n]) {
      case 'paragraph':
        $length = rand(2,7);
        $ret .= '<p>';
        for ($i = 0; $i < $length; $i++) {
          if ($i) $ret .= ' ';
          $ret .= $this->array_random_value($lipsum_source) . '.';
        }
        $ret .= "</p>\n";
        break;
      case 'image':
        $ret .= "<p><img src='http://placehold.it/900x580' /></p>\n";
        break;
      case 'list':
        $tag = (rand(0, 1)) ? 'ul' : 'ol';
        $ret .= "<$tag>\n";
        $length = rand(2,5);
        for ($i = 0; $i < $length; $i++) {
          $ret .= "<li>" . $this->array_random_value($lipsum_source) . "</li>\n";
        }
        $ret .= "</$tag>\n";
        break;
      case 'blockquote':
        $length = rand(2,7);
        $ret .= '<blockquote><p>';
        for ($i = 0; $i < $length; $i++) {
          if ($i) $ret .= ' ';
          $ret .= $this->array_random_value($lipsum_source) . '.';
        }
        $ret .= "</p></blockquote>\n";
        break;
      }
    }
    
    return $ret;
  }
}

First of all, you’ll need some filler text to use as a seed for this function. The term seed is a heavily-used term in computer science. It usually refers to an initial value used in some sort of deterministic pseudo-random number generator (or PRNG). Most programming languages have built-in libraries that provide randomness-generation. Many of these implementations are not actually random, but deterministic algorithms that are pure functions of some environmental variable, usually the timestamp, and the number of calls to the algorithm preceding it: n − 1 for round n.

The seed in this case is just a bunch of pre-generated lorem ipsum that you can grab anywhere online. The heart of the code just breaks down this text into sentences and picks a number of them to fit into a new sentence.

Lorem ipsum is rarely useful as one enormous chunk of text. Most frequently, copy is broken into paragraphs of varying lengths, which is this next enhancement. The code alternates between paragraphs, images, lists, and blockquotes to keep things more interesting. You can get my post-generating plugin on WordPress.org.

Same origin policy and a buggy WordPress plugin

4 Replies

Update

I don’t use the plugin mentioned in this post anymore.

On this blog, I use the Crayon syntax highlighter for WordPress to render all the code snippets, since this is a programming blog after all. Crayon is one of the more popular highlighting plugins, as clearly demonstrated by the sad condition of its support forum. It comes with a bunch of color schemes including Ethan Schoonover’s extremely popular “solarized” color scheme and a replica of what Github uses (although the plugin’s version contains a tad bit more purple). A light pastel blue would have fit with this blog’s color scheme quite nicely, but definitely not purple. So, I dove deeper.

Crayon stores its highlighting themes as plain CSS in WordPress’s upload directory. That’s the same place that photos and other media go for your posts. I can think of a couple good reasons why this decision makes more sense than loading the CSS directly from the plugin’s folder:

WordPress has (by necessity) file permissions to write to the uploads directory, which means that if the plugin ever needs to customize themes, it can do that internally.
There are hooks for CDN’s and such that mirror WordPress’s upload directory, but may not do the same for plugins and other things.
On network sites, customized color schemes can be site-specific although the plugin is installed and activated network-wide.

However, there’s also one really bad effect (that likely applies only to me) of this, which I spent a good deal of the afternoon debugging. I run nginx on the backend, and its configuration divides WordPress into two regions: one contains wp-login.php and wp-admin, and the other, everything else. This way, I can restrict WordPress cookies to only the former, which is run over SSL using a self-signed certificate, and avoid having the issue where Google indexes everything twice. (Visitors shouldn’t be reading blogs over https anyway.)

It would be nice if everything could be so cleanly divided this way, but wp-admin also uses file uploads like image thumbnails, which is why there are all the mixed-content warnings you may have seen if you’ve ever administered anything through a web panel over https. Browsers complain when you try to load pages with mixed content, but they will let you do so anyway. Things are different, however, when the request is made after the page has loaded, through AJAX.

There’s an inherent security problem with letting a page script load whatever resources it wants to from anywhere on the Internet. This issue is addressed by the same-origin policy framework, wherein the server with the resource (as opposed to the one containing the web page) gets to decide if access is allowed.

HTTP/1.1 200 OK
Access-Control-Allow-Origin: http://example.com

The header takes either a list of origins, null, or the wildcard *. It’s ultimately up to the browser to implement this security header, but most modern ones do.

Now, here’s the twist: Crayon uses the built-in WordPress function, wp_upload_dir(), to find and load custom themes. If your server is set up like mine, this will return a http scheme address, whereas a https address is required. Without a check to see if the page is loaded through SSL, the plain-text AJAX call will fail because browsers treat the SSL and the plain-text version of a site as different origins.

There are a couple ways to go about this. You can:

Disable SSL temporarily while you’re editing the custom color themes (probably the easiest).
Instruct the browser to ignore same-origin policy.
Patch the plugin code to use https where appropriate.

I ended up going with the third, after unsuccessfully attempting to send a access-control-allow-origin header with every response on the server-side. This was a rather frustrating issue to resolve, but it was interesting to see the kinds of problems that arise in exchange for progress in security.

Backing up dropbox with rsync

3 Replies

Update

I don’t use this system anymore. Learn about my new backup system instead.

At UC Berkeley, Dropbox has become the de-facto standard for cloud sync and live backups, especially for team projects that don’t particularly fit the game of traditional version control. (It is nice to keep local copies of git repositories on Dropbox anyway.) Despite this, it’s bad practice to just leave the safety of your data up to a third party. You might, for instance, accidentally trigger the deletion of all your data and unsuspectingly propagate the change to everywhere you have your Dropbox cached, which is why I proposed keeping an off-site copy. It’s like backing up your backup, and I’ve been doing it for months now.

Before I get started, here’s the structure of a backup server I have stationed at my home in SoCal:

/dev/sda1
  ..
    backup/
      archives/
      dropbox/
    media/
    public/
    ..
/dev/sdb1
  backup/
    archives/
    dropbox/

A cron script runs every so often that syncs the first backup directory to the second. It’s essentially a hacky equivalent of RAID 1 that provides the added bonus of bad-sector checking every time the script runs.

#!/bin/bash

DIR=$HOME/Dropbox
read -p "The target directory is: $DIR. Correct? [yn] " -n 1

if [[ $REPLY =~ ^[Yy]$ ]];then
  if [ -d $DIR ];then
    echo -e "\n"
    rsync -vauzh --progress --exclude '.dropbox*' --delete $DIR home:~/backup/dropbox
    echo "Complete."
    exit 0
  else
    echo "Could not find directory. Exiting."
    exit 1
  fi
else
  echo "Exiting."
  exit 1
fi

I’ll explain what I did in more detail. Read is a nifty command for reading from standard input to a bash variable, $REPLY. The -p flag specifies a prompt to ask the user, and the -n flag specifies that you want 1 character.

The [[ ... =~ ... ]] format tests a variable against a regular expression, which are exceedingly common in everyday server administration. They are a more complex system of simple wildcard expressions like *.sh or IMG_0??.jpg. In this case, the [Yy] block specifies a set of characters that are acceptable as input (lower and uppercase Y for yes), and the ^...$ instructs the computer to pass the match only if a Y or y character is the entire variable.

rsync -vauzh --progress --exclude '.dropbox*' --delete $DIR home:~/backup/dropbox

My ~/.ssh/config contains a block for a home alias that stores all the information needed to connect to my server at home. The last two arguments of this rsync command are familiar if you’ve ever used scp. Here are the flags I’ve set:

-v for verbosity, because why not?
-a for archive, which preserves things like file permissions and ownership
-u for update, so only files with more recent mtimes are copied
-z for gZip compression, since most of my files are plain-text and highly compressible
-h for human-readable information, which prints out super-readable messages about the sync operation’s progress
--progress for progress display
--exclude '.dropbox*' to exclude copying Dropbox’s local cache and configuration files
--delete to delete files that have since disappeared on my local copy

And voila! Run the script periodically, and know that you’ve got a backup plan in case your Dropbox ever becomes inaccessible.

Tidying up SASS with a one-liner

Protecting yourself on open wifi with Firefox

Generating on-the-fly filler text in PHP

Update

Same origin policy and a buggy WordPress plugin

Update

Tidying up SASS with a one-liner