Code.RogerHub » regular expressions https://rogerhub.com/~r/code.rogerhub The programming blog at RogerHub Fri, 27 Mar 2015 23:04:04 +0000 en-US hourly 1 http://wordpress.org/?v=4.2.2 Using locate to quickly change directories https://rogerhub.com/~r/code.rogerhub/terminal-fu/158/using-locate-to-quickly-change-directories/ https://rogerhub.com/~r/code.rogerhub/terminal-fu/158/using-locate-to-quickly-change-directories/#comments Sat, 20 Apr 2013 22:27:08 +0000 https://rogerhub.com/~r/code.rogerhub/?p=158 You’ve probably got a bunch of directories and subdirectories in your home folder that are organized logically, rather than a mess of top-level directories scattered all over the place. Although the former is cleaner and better organized, it also takes more time to get to where you’d like to be. Luckily, linux comes with /usr/bin/locate to help you find files. I made the following for my bashrc:

goto () { cd "$(locate -i "$@" | grep -P "/home/roger/(Dropbox|Documents|Downloads|Development|Desktop)" | awk '{ print length(), $0 | "sort -n" }' | head -n 1 | cut -d " " -f2-)"; }

I’ll go through it part by part:

  • locate -i "$@" – Locate uses the database generated by updatedb to find files. The -i flag tells locate to ignore case.
  • grep -P "..." – I only want to select directories that are in one of the places I store my stuff. This also means that results from web browser caches and temp files will be ignored. You should obviously change this regular expression. The -P flag specifies PCRE (Perl-compatible regular expressions), since I am most comfortable with those.
  • awk '{ print length(), $0 | "sort -n" }' – Awk is a mini string manipulation language that you can use on the command line. In this case, it just prefixes each string with its length and sorts them.
  • head -n 1 – After the sorting, I just want the shortest result, so I grab that one.
  • cut -d " " -f2- – Now, I get rid of the length and keep everything else, which is the path I wanted. The -d flag tells cut to use a space as the delimiter (the default is a tab character).
  • All of this is wrapped in a cd "$( ... )";. Bash will execute the contents of the $( ... ) and feed the textual result into cd as the argument.

It isn’t as fast as Apple’s spotlight search, but the difference is negligible. For greater performance, you can customize the system-wide behavior of updatedb to search in fewer directories.

]]>
https://rogerhub.com/~r/code.rogerhub/terminal-fu/158/using-locate-to-quickly-change-directories/feed/ 0
Grabbing an apache-generated directory listing https://rogerhub.com/~r/code.rogerhub/terminal-fu/120/grabbing-an-apache-generated-directory-listing/ https://rogerhub.com/~r/code.rogerhub/terminal-fu/120/grabbing-an-apache-generated-directory-listing/#comments Thu, 11 Apr 2013 06:34:42 +0000 https://rogerhub.com/~r/code.rogerhub/?p=120 So it turns out that one of my professors, whose lectures I attend less frequently than I should, chooses to play classical music at the beginning of lecture when students are walking in. A couple of people started a thread on Piazza back when the course began to identify as many of these tracks as they could. On request, he posted a ripped vinyl on the course web page, which of course, is described with the default apache2 directory listing page, MIME-type icons and everything.

Short of a wget -r, it’s embarrassingly difficult to grab directories that are laid out like this from the command line. It isn’t such a big deal with only 14 files, but this number could easily scale up to a hundred, in which case you’d probably decide a programmatic solution would be worth it. For fun, I came up with the following:

for f in $(curl "http://www.cs.berkeley.edu/.../" 2>/dev/null | \
grep 'href="(.*?)\.m4a"' -o -P | cut -d '"' -f 2); \
do wget "http://www.cs.berkeley.edu/.../$f"; done

The three lines are split by a single backslash character (\) at the end of each line. This indicates to bash that the lines are meant to be treated as a single-line command (since I typed it as such in the first place).

The stuff inside the $( ... ) consists of a curl [url] 2>/dev/null piped into grep. The 2>/dev/null redirects the Standard Error stream to a black hole so that it isn’t displayed on the screen. This is to prevent curl from showing any information about its progress. (Curl also supports a --silent command-line switch that does the same thing.)

The grep simply searches for URL’s that link to a *.m4a file. The (.*?) syntax has special meaning in PERL-compatible grep, which I am invoking here with the -P switch. PERL supports a non-greedy syntax that translates to telling regular expression wildcards like .* to match as few characters as possible. This syntax is invoked, in this case, with the question mark. The parentheses in this case are unnecessary.

The -o command line switch tells grep only to print out the matching parts of the input, rather than their entire lines. The rest of the code just loops through the URL’s and prepends the absolute address of these files to the path on its way to wget.

]]>
https://rogerhub.com/~r/code.rogerhub/terminal-fu/120/grabbing-an-apache-generated-directory-listing/feed/ 1