Code.RogerHub » http https://rogerhub.com/~r/code.rogerhub The programming blog at RogerHub Fri, 27 Mar 2015 23:04:04 +0000 en-US hourly 1 http://wordpress.org/?v=4.2.2 Grabbing an apache-generated directory listing https://rogerhub.com/~r/code.rogerhub/terminal-fu/120/grabbing-an-apache-generated-directory-listing/ https://rogerhub.com/~r/code.rogerhub/terminal-fu/120/grabbing-an-apache-generated-directory-listing/#comments Thu, 11 Apr 2013 06:34:42 +0000 https://rogerhub.com/~r/code.rogerhub/?p=120 So it turns out that one of my professors, whose lectures I attend less frequently than I should, chooses to play classical music at the beginning of lecture when students are walking in. A couple of people started a thread on Piazza back when the course began to identify as many of these tracks as they could. On request, he posted a ripped vinyl on the course web page, which of course, is described with the default apache2 directory listing page, MIME-type icons and everything.

Short of a wget -r, it’s embarrassingly difficult to grab directories that are laid out like this from the command line. It isn’t such a big deal with only 14 files, but this number could easily scale up to a hundred, in which case you’d probably decide a programmatic solution would be worth it. For fun, I came up with the following:

for f in $(curl "http://www.cs.berkeley.edu/.../" 2>/dev/null | \
grep 'href="(.*?)\.m4a"' -o -P | cut -d '"' -f 2); \
do wget "http://www.cs.berkeley.edu/.../$f"; done

The three lines are split by a single backslash character (\) at the end of each line. This indicates to bash that the lines are meant to be treated as a single-line command (since I typed it as such in the first place).

The stuff inside the $( ... ) consists of a curl [url] 2>/dev/null piped into grep. The 2>/dev/null redirects the Standard Error stream to a black hole so that it isn’t displayed on the screen. This is to prevent curl from showing any information about its progress. (Curl also supports a --silent command-line switch that does the same thing.)

The grep simply searches for URL’s that link to a *.m4a file. The (.*?) syntax has special meaning in PERL-compatible grep, which I am invoking here with the -P switch. PERL supports a non-greedy syntax that translates to telling regular expression wildcards like .* to match as few characters as possible. This syntax is invoked, in this case, with the question mark. The parentheses in this case are unnecessary.

The -o command line switch tells grep only to print out the matching parts of the input, rather than their entire lines. The rest of the code just loops through the URL’s and prepends the absolute address of these files to the path on its way to wget.

]]>
https://rogerhub.com/~r/code.rogerhub/terminal-fu/120/grabbing-an-apache-generated-directory-listing/feed/ 1
Same origin policy and a buggy WordPress plugin https://rogerhub.com/~r/code.rogerhub/bughunt/70/same-origin-policy-and-a-buggy-wordpress-plugin/ https://rogerhub.com/~r/code.rogerhub/bughunt/70/same-origin-policy-and-a-buggy-wordpress-plugin/#comments Wed, 06 Mar 2013 21:46:02 +0000 https://rogerhub.com/~r/code.rogerhub/?p=70

Update

I don’t use the plugin mentioned in this post anymore.

On this blog, I use the Crayon syntax highlighter for WordPress to render all the code snippets, since this is a programming blog after all. Crayon is one of the more popular highlighting plugins, as clearly demonstrated by the sad condition of its support forum. It comes with a bunch of color schemes including Ethan Schoonover’s extremely popular “solarized” color scheme and a replica of what Github uses (although the plugin’s version contains a tad bit more purple). A light pastel blue would have fit with this blog’s color scheme quite nicely, but definitely not purple. So, I dove deeper.

Crayon stores its highlighting themes as plain CSS in WordPress’s upload directory. That’s the same place that photos and other media go for your posts. I can think of a couple good reasons why this decision makes more sense than loading the CSS directly from the plugin’s folder:

  • WordPress has (by necessity) file permissions to write to the uploads directory, which means that if the plugin ever needs to customize themes, it can do that internally.
  • There are hooks for CDN’s and such that mirror WordPress’s upload directory, but may not do the same for plugins and other things.
  • On network sites, customized color schemes can be site-specific although the plugin is installed and activated network-wide.

However, there’s also one really bad effect (that likely applies only to me) of this, which I spent a good deal of the afternoon debugging. I run nginx on the backend, and its configuration divides WordPress into two regions: one contains wp-login.php and wp-admin, and the other, everything else. This way, I can restrict WordPress cookies to only the former, which is run over SSL using a self-signed certificate, and avoid having the issue where Google indexes everything twice. (Visitors shouldn’t be reading blogs over https anyway.)

It would be nice if everything could be so cleanly divided this way, but wp-admin also uses file uploads like image thumbnails, which is why there are all the mixed-content warnings you may have seen if you’ve ever administered anything through a web panel over https. Browsers complain when you try to load pages with mixed content, but they will let you do so anyway. Things are different, however, when the request is made after the page has loaded, through AJAX.

There’s an inherent security problem with letting a page script load whatever resources it wants to from anywhere on the Internet. This issue is addressed by the same-origin policy framework, wherein the server with the resource (as opposed to the one containing the web page) gets to decide if access is allowed.

HTTP/1.1 200 OK
Access-Control-Allow-Origin: http://example.com

The header takes either a list of origins, null, or the wildcard *. It’s ultimately up to the browser to implement this security header, but most modern ones do.

Now, here’s the twist: Crayon uses the built-in WordPress function, wp_upload_dir(), to find and load custom themes. If your server is set up like mine, this will return a http scheme address, whereas a https address is required. Without a check to see if the page is loaded through SSL, the plain-text AJAX call will fail because browsers treat the SSL and the plain-text version of a site as different origins.

There are a couple ways to go about this. You can:

  • Disable SSL temporarily while you’re editing the custom color themes (probably the easiest).
  • Instruct the browser to ignore same-origin policy.
  • Patch the plugin code to use https where appropriate.

I ended up going with the third, after unsuccessfully attempting to send a access-control-allow-origin header with every response on the server-side. This was a rather frustrating issue to resolve, but it was interesting to see the kinds of problems that arise in exchange for progress in security.

]]>
https://rogerhub.com/~r/code.rogerhub/bughunt/70/same-origin-policy-and-a-buggy-wordpress-plugin/feed/ 4