Generating on-the-fly filler text in PHP

Update

I’ve updated the code and text of this post to reflect the latest version of the code.

For one of the projects I’ve been working on recently, I needed huge amounts of filler text (we’re talking about a megabyte) for lorem ipsum placeholder copy. Copy is the journalistic term for plain ol’ text in an article or an advertisement, in contrast with pictures or design elements. When you design for type, it’s often helpful to have text that looks like it could be legitimate writing instead of a single word repeated or purely random characters. From this rose the art of lorem ipsum, which intelligently crafts words with pronounceable syllables and varying lengths.

It’s a rather complicated process to generate high-quality lorem ipsum, but the following will do an acceptable job with much fewer lines of code.

/**
 * Helper function that generates filler text
 *
 * @param $type is target length in characters
 */
protected function filler_text($type) {
  
  /**
   * Source text for lipsum
   */
  static $lipsum_source = array(
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit",
    "Aliquam <a href='/'>sodales blandit felis</a>, vitae imperdiet nisl",
    ...
    "Quisque ullamcorper aliquet ante, sit amet molestie magna auctor nec",
  );
  if ($type == 'title') {
    // Titles average 3 to 6 words
    $length = rand(3, 6);
    $ret = "";
    for ($i = 0; $i < $length; $i++) {
      if (!$i) {
        $ret = ucwords($this->array_random_value(explode(" ", strip_tags($this->array_random_value($lipsum_source))))) . ' ';
      } else {
        $ret .= strtolower($this->array_random_value(explode(" ", strip_tags($this->array_random_value($lipsum_source))))) . ' ';
      }
    }
    return trim($ret);
  } else if ($type == 'post') {
    $ret = "";
    $order = array('paragraph');
    $order_length = rand(12, 19);
    for ($n = 0; $n < $order_length; $n++) {
      $choice = rand(0, 8);
      switch ($choice) {
      case 0: $order[] = 'list'; break;
      case 1: $order[] = 'image'; break;
      case 2: $order[] = 'blockquote'; break;
      default: $order[] = 'paragraph'; break;
      }
    }
    for ($n = 0; $n < count($order); $n++) {
      switch ($order[$n]) {
      case 'paragraph':
        $length = rand(2,7);
        $ret .= '<p>';
        for ($i = 0; $i < $length; $i++) {
          if ($i) $ret .= ' ';
          $ret .= $this->array_random_value($lipsum_source) . '.';
        }
        $ret .= "</p>\n";
        break;
      case 'image':
        $ret .= "<p><img src='http://placehold.it/900x580' /></p>\n";
        break;
      case 'list':
        $tag = (rand(0, 1)) ? 'ul' : 'ol';
        $ret .= "<$tag>\n";
        $length = rand(2,5);
        for ($i = 0; $i < $length; $i++) {
          $ret .= "<li>" . $this->array_random_value($lipsum_source) . "</li>\n";
        }
        $ret .= "</$tag>\n";
        break;
      case 'blockquote':
        $length = rand(2,7);
        $ret .= '<blockquote><p>';
        for ($i = 0; $i < $length; $i++) {
          if ($i) $ret .= ' ';
          $ret .= $this->array_random_value($lipsum_source) . '.';
        }
        $ret .= "</p></blockquote>\n";
        break;
      }
    }
    
    return $ret;
  }
}

First of all, you’ll need some filler text to use as a seed for this function. The term seed is a heavily-used term in computer science. It usually refers to an initial value used in some sort of deterministic pseudo-random number generator (or PRNG). Most programming languages have built-in libraries that provide randomness-generation. Many of these implementations are not actually random, but deterministic algorithms that are pure functions of some environmental variable, usually the timestamp, and the number of calls to the algorithm preceding it: n − 1 for round n.

The seed in this case is just a bunch of pre-generated lorem ipsum that you can grab anywhere online. The heart of the code just breaks down this text into sentences and picks a number of them to fit into a new sentence.

Lorem ipsum is rarely useful as one enormous chunk of text. Most frequently, copy is broken into paragraphs of varying lengths, which is this next enhancement. The code alternates between paragraphs, images, lists, and blockquotes to keep things more interesting. You can get my post-generating plugin on WordPress.org.