Google+

Data loss and you

My laptop’s hard drive crashed in 2012. I was on campus walking by Evans Hall, when I took my recently-purchased Thinkpad x230 out of my backpack to look up a map (didn’t have a smartphone), only to realize it wouldn’t boot. This wasn’t a disaster by any means. It set me back $200 to rush-order a new 256GB Crucial M4 SSD. But since I regularly backed up my data to an old desktop running at my parent’s house, I was able to restore almost everything once I received it1.

I never figured out why my almost-new laptop’s hard drive stopped working out of the blue. The drive still spun up, yet the system didn’t detect it. But whether it was the connector or the circuit board, that isn’t the point. Hardware fails all the time for no reason2, and you should be prepared for when it happens.

Data management has changed a lot in the last ten years, primarily driven by the growing popularity of SaaS (”cloud”) storage and greatly improved network capacity. But one thing that hasn’t changed is that most people are still unprepared for hardware failure when it comes to their personal data. Humans start manufacturing data from the moment they’re born. Kids should really be taught data husbandry, just like they’re taught about taxes and college admissions and health stuff. But anyway, here are a few things I’ve learned about managing data that I want to share:

Identify what’s important

Data management doesn’t work if you don’t know what you’re managing. In other words, what data would make you sad if you lost access to it? Every day, your computer handles massive amounts of garbage data: website assets, Netflix videos, application logs, PDFs of academic research, etc. There’s also the data that you produce, but don’t intend to keep long-term: dash cam and surveillance footage (it’s too big), your computer settings (it’s easy to re-create), or your phone’s location history (it’s too much of a hassle to extract).

For most people, important data is the data that’s irreplaceable. It’s your photos, your notes and documents, your email, your tax forms, and (if you’re a programmer) your enormous collection of personal source code.

Consider the threats

It’s impossible to predict every possible bad thing that could happen to your data. But fortunately, you don’t have to! You can safely ignore all the potential data disasters that are significantly less likely to occur than your own untimely death3. That leaves behind a few possibilities, roughly in order of decreasing likelihood:

  • Hardware failure
  • Malicious data loss (somebody deletes your shit)
  • Accidental data loss (you delete your shit)
  • Data breach (somebody leaks your shit)
  • Undetected data degradation

Hardware failures are the easiest to understand. Hard drives (external hard drives included), solid state drives, USB thumb drives, and memory cards all have an approximate “lifespan”, after which they tend to fail catastrophically4. The rule of thumb is 3 years for external hard drives, 5 years for internal hard drives, and perhaps 10 years for enterprise-grade hard drives.

Malicious data loss has become much more common these days, with the rise of a digital extortion scheme known as “ransomware”. Ransomeware encrypts user files on an infected machine, usually using public-key cryptography in at least one of the steps. The encryption is designed so that the infected computer can encrypt files easily, but is unable to reverse the encryption without the attacker’s cooperation (which is usually made available in exchange for a fee). Fortunately, ransomeware is easily detectable, because the infected computer prompts you for money once the data loss is complete.

On the other hand, accidental data loss can occur without anybody noticing. If you’ve ever accidentally overwritten or deleted a file, you’ve experienced accidental data loss. Because it can take months or years before accidental data loss is noticed, simple backups are sometimes ineffective against it.

Data breaches are a unique kind of data loss, because it doesn’t necessarily mean you’ve lost access to the data yourself. Some kinds of data (passwords, tax documents, government identification cards) lose their value when they become available to attackers. So, your data management strategy should also identify if some of your data is condential.

Undetected data degradation (or “bit rot”) occurs when your data becomes corrupted (either by software bugs or by forces of nature) without you noticing. Modern disk controllers and file systems can provide some defense against bit rot (for example, in the case of a bad sectors on a hard disk). But the possibility remains, and any good backup strategy needs a way to detect errors in the data (and also to fix them).

Things you can’t backup

Backups and redundancy are generally the solutions to data loss. But you should be aware that there are some things you simply can’t backup. For example:

  • Data you interact with, but can’t export. For example, your comments on social media would be difficult to backup.
  • Data that’s useless (or less useful) outside of the context of a SaaS application. For example, you can export your Google Docs as PDFs or Microsoft Word files, but then they’re no longer Google Docs.

Redundancy vs backup

Redundancy is buying 2 external hard drives, then saving your data to both. If either hard drive experiences a mechanical failure, you’ll still have a 2nd copy. But this isn’t a backup.

If you mistakenly overwrite or delete an important file on one hard drive, you’ll probably do the same on the other hard drive. In a sense, backups require the extra dimension of time. There needs to be either a time delay in when your data propagates to the backup copy, or better yet, your backup needs to maintain multiple versions of your data over time.

RAID and erasure encoding both offer redundancy, but do not count as a backup.

Backups vs archives

Backups are easier if you have less data. You can create archives of old data (simple ZIP archives will do) and back them up separately from your “live” data. Archives make your daily backups faster and also make it easier to perform data scrubbing.

When you’re archiving data, you should pick an archive format that will still be readable in 30 to 50 years. Proprietary and non-standard archive tools might fall out of popularity and become totally unusable in just 10 or 15 years.

Data scrubbing

One way to protect against bit rot is to check it periodically against known-good versions. For example, if you store cryptographic checksums with your files (and also digitally sign the checksums), you can verify the checksums at any time and detect bit rot. Make sure you have redundant copies of your data, so that you can restore corrupted files if you detect errors.

I generate SHA1 checksums for my archives and sign the checksums with my GPG key.

Failure domain

If your backup solution is 2 copies on the same hard drive, or 2 hard drives in the same computer, or 2 computers in the same house, then you’re consolidating your failure domain. If your computer experiences an electrical fire or your house burns down, then you’ve just lost all copies of your data.

Onsite vs offsite backups

Most people keep all their data within a 20 meter radius of their primary desktop computer. If all of your backups are onsite (e.g. in your home), then a physical disaster could eliminate all of the copies. The solution is to use offsite backups, either by using cloud storage (easy) or by stashing your backups at a friend’s house (pain in the SaaS).

Online vs offline backups

If a malicious attacker gains access to your system, they can delete your data. But they can also delete any cloud backups5 and external hard drive backups that are accessible from your computer. It’s sometimes useful to keep backups of your data that aren’t immediately deletable, either because they’re powered off (like an unplugged external hard drive) or because they’re read-only media (like data backups on Blu-ray Discs).

Encryption

You can reduce your risk of data leaks by applying encryption to your data. Good encryption schemes are automatic (you shouldn’t need to encrypt each file manually) and thoroughly audited by the infosec community. And while you’re at it, you should make use of your operating system’s full disk encryption capabilities (FileVault on macOS, BitLocker on Windows, and LUKS or whatever on Linux).

Encrypting your backups also means that you could lose access to them if you lose your encryption credentials. So, make sure you understand how to recover your encryption credentials, even if your computer is destroyed.

Online account security

If you’re considering cloud backups, you should also take steps to strengthen the security of your account:

  • Use a long password, and don’t re-use a password you’ve used on a different website.
  • Consider using a passphrase (a regular english sentence containing at least 4-5 uncommon words). Don’t share similar passphrases for multiple services (like “my facebook password”), because an attacker with access to the plaintext can easily guess the scheme.
  • Turn on two-factor authentication. The most common 2FA scheme (TOTP) requires you to type in a 6-8 digit code whenever you log in. You should prefer to use a mobile app (I recommend Authy) to generate the code, rather than to receive the code via SMS. Don’t forget to generate backup codes and store them in a physically secure top-secret location (e.g. underneath the kitchen sink).
  • If you’re asked to set security questions, don’t use real answers (they’re too easy to guess). Make up gibberish answers and write them down somewhere (preferably a password manager).
  • If your account password can be recovered via email, make sure your email account is also secure.

Capacity vs throughput

One strong disadvantage of cloud backups is that transfers are limited to the speed of your home internet, especially for large uploads. Backups are less useful when they take days or weeks to restore, so be aware of how your backup throughput affects your data management strategy.

This problem also applies to high-capacity microSD cards and hard drives. It can take several days to fully read or write a 10TB data archival hard drive. Sometimes, smaller but faster solid state drives are well worth the investment.

File system features

Most people think of backups as “copies of their files”. But the precise definition of a “file” has evolved rapidly just as computers have. File systems have become very complex to meet the increasing demands of modern computer applications. But the truth remains that most programs (and most users) don’t care about most of those features.

For most people, your “files” refers to (1) the directory-file tree and (2) the bytes contained in each file. Some people also care about file modification times. If you’re a computer programmer, you probably care about file permission bits (perhaps just the executable bit) and maybe symbolic links.

But consider this (non-exhaustive) list of filesystem features, and whether you think they need to be part of your data backups:

  • Capitalization of file and directory names
  • File owner (uid/gid) and permission bits, including SUID and sticky bits
  • File ACLs, especially in an enterprise environment
  • File access time, modification time, and creation time
  • Extended attributes (web quarantine, Finder.app tags, “hidden”, and “locked”)
  • Resource forks, on macOS computers
  • Non-regular files (sockets, pipes, character/block devices)
  • Hard links (also “aliases” or “junctions”)
  • Executable capabilities (maybe just CAP_NET_BIND_SERVICE?)

If your answer is no, no, no, no, no, what?, no, no, and no, then great! The majority of cloud storage tools will work just fine for you. But the unfortunate truth is that most computer programmers are completely unaware of many of these file system features. So, they write software that completely ignores them.

Programs and settings

Programs and settings are often left out of backup schemes. Most people don’t have a problem reconfiguring their computer once in a while, because catastrophic failures are unlikely. If you’re interested in creating backups of your programs, consider finding a package manager for your preferred operating system. Computer settings can usually be backed up with a combination of group policy magic for Windows and config files or /usr/bin/defaults for macOS.

Application-specific backup

If you’re backing up data for an application that uses a database or a complex file-system hierarchy, then you might be better served by an backup system that’s designed specifically for that application. For example, RogerHub runs on a PostgreSQL database, which comes with its own backup tools. But RogerHub uses an application-specific backup scheme that’s tailored to RogerHub specifically.

Testing

A backup isn’t a backup until you’ve tested the restoration process.

Recommendations

If you’ve just skipped to the end to read my recommendations, fantastic! You’re in great company. Here’s what I suggest for most people:

  • Use cloud services instead of files, to whatever extent you feel comfortable with. It’s most likely not worth your time to backup email or photos, since you could use Google Inbox or Google Photos instead.
  • Create backups of your files regularly, using the 3-2-1 rule: 3 copies of your data, on 2 different types of media, with at least 1 offsite backup. For example, keep your data on your computer. Then, back it up to an online cloud storage or cloud backup service. Finally, back up your data periodically to an external hard drive.
  • Don’t trust physical hardware. It doesn’t matter how much you paid for it. It doesn’t matter if it’s brand new or if you got the most advanced model. Hardware breaks all the time in the most unpredictable ways.
  • Don’t buy an external hard drive or a NAS as your primary backup destination. They’re probably no more reliable than your own computer.
  • Make sure to use full-disk encryption and encrypted backups.
  • Make sure nobody can maliciously (or accidentally) delete all of your backups, simply by compromising your primary computer.
  • Consider making archives of data that you use infrequently and no longer intend to modify.
  • Secure your online accounts (see section titled “Online account security”)
  • Pat yourself on the back and take a break once in a while. Data management is hard stuff!

If you find any mistakes on this page, let me know. I want to keep it somewhat updated.

And, here’s yet another photo:

Branches.

  1. My laptop contained the only copy of my finished yet unsubmitted class project. But technically I had a project partner. We didn’t actually work together on projects. We both finished each project independently, then just picked one version to submit. ↩︎
  2. About four and a half years later, that m4 stopped working and I ordered a MX300 to replace it. ↩︎
  3. That is, unless you’re interested in leaving behind a postmortem legacy. ↩︎
  4. There are other modes of failure other than total catastrophic failure. ↩︎
  5. Technically, most reputable cloud storage companies will keep your data for some time even after you delete it. If you really wanted to, you could explain the situation to your cloud provider, and they’ll probably be able to recover your cloud backups. ↩︎

Life lessons from artificial intelligence

If you speak to enough software engineers, you’ll realize that many of them can’t understand some everyday ideas without using computer metaphors. They say “context switching” to explain why it’s hard to work with interruptions and distractions. Empathy is essentially machine virtualization, but applied to other people’s brains. Practicing a skill is basically feedback-directed optimization. Motion sickness is just your video processor overheating, and so on.

A few years ago, I thought I was the only one whose brain used “computer” as its native language. And at the time, I considered this a major problem. I remember one summer afternoon, I was playing scrabble with some friends at my parents’ house. At that time, I had just finished an internship, where day-to-night I didn’t have much to think about other than computers. And as I stared at my scrabble tiles, I realized the only word I could think of was EEPROM1.

It was time to fix things. I started reading more. I’ve carried a Kindle in my backpack since I got my first Kindle2 in high school, but I haven’t always used it regularly. It’s loaded with a bunch of novels. I don’t like non-fiction, especially the popular non-fiction about famous politicians and the economy and how to manage your time. It seems like a waste of time to read about reality, when make-believe is so much more interesting.

I also started watching more anime. I especially like the ones where the main character has a professional skill and that skill becomes a inextricable part of their personal identity3. During my last semester in college, I thought really hard about whether I really wanted to just be a computer programmer until I die, or whether I simply had no other choice, because I wasn’t good at anything else. And so, I watched Hibike! Euphonium obsessively, searching for answers.

Devoting your life to a skill can be frustrating. It makes you wonder if you’d be a completely different person if that part of you were suddenly ripped away. And then there’s the creeping realization that your childhood passion is slowly turning into yet another boring adult job. It’s like when you’re a kid and you want to be the strongest ninja in your village, but then you grow up and start working as a mercenary. You can still do ninja stuff all day, but it’s just not fun anymore.

But I like those shows because it’s inspiring and refreshing to watch characters who really care about being really good at something, as long as that something isn’t just “make a ton of money”. I think it’s important to have passion and a competitive spirit for at least one thing. It’s no fun being just mediocre at a bunch of things. Plus, being good at something gives you a unique perspective on the world, and that perspective comes with insights worth sharing.

I thought a lot about Q-learning during the months after my car accident. I think normal people are generally unprepared to respond rationally in crisis situations. And that’s at least partially because most of us haven’t spent enough time evaluating the relative cost of all the different terrible things that might happen to us on a day to day basis. Q-learning is a technique for decision-making that relies on predicting the expected value of taking an action in a particular state. In order for Q-learning to work, you need models for both the state transitions (what could happen if I take this action?) and a cost for each of the outcomes. If you understand the transitions, but all of your costs are just “really bad, don’t let that happen”, then in a pinch, it becomes difficult to decide which bad outcome is the least terrible.

There are little nuggets of philosophy embedded all over the fields of artificial intelligence and machine learning. I skipped a lot of class in college, but I never skipped my introductory AI and ML classes. It turns out that machine learning and human learning have a lot in common. Here are some more ideas, inspired by artificial intelligence:

I try to spend as little time as possible shopping around before buying something, and that’s partially because of what’s called the Optimizer’s Curse4. The idea goes like this: Before buying something, you usually look at all your options and pick the best one. Since people aren’t perfect, sometimes you overestimate or underestimate how good your options are. The more options you consider, the higher the probability that the perceived value of your best option will be much greater than its actual value. Then, you end up feeling disappointed, because you bought something that isn’t as good as you thought it’d be.

Now that doesn’t mean you should just buy the first thing you see, since your first option might turn out to be really shitty. But if you’re reasonably satisfied with your options, it’s probably best to stop looking and just make your choice.

But artificial intelligence also tells us that it’s not smart to always pick the best option. Stochastic optimization methods are based on the idea that, sometimes, you should take suboptimal actions just to experience the possibilities. Humans call this “stepping out of your comfort zone”. Machines need to strike a balance between “exploration” (trying out different options to see what happens) and “exploitation” (using experience to make good decisions) in order to succeed in the long run. This balance is called the “learning rate”, and a good learning rate decreases over time. In other words, young people are supposed to make poor decisions and try new things, but once you get old, you should settle down5.

The difference in cumulative value resulting from sub-optimal decisions is known as “regret”. In the long run, machines should learn the optimal policy for decision-making. But machines should also try to reach this optimum with as little regret as possible. This is accomplished by adjusting the learning rate.

So is it wrong for parents to make all of their children’s decisions? A little guidance is probably valuable, but a too conservative learning rate converges to a suboptimal long-term policy6. I suppose kids should act like kids, and if they scrape their knees and do stupid stuff and get in trouble, that’s probably okay.

Anyway, there’s one more artificial intelligence technique that I don’t understand too well, but it comes with interesting implications for humans. It’s a technique for path planning applied to finite LQR problems, which are a type of problem where the system mechanics can be described linearly and the cost function is quadratic with the state. These restrictions yield a formulation that lets us compute a policy that is independent of the state of the system. In other words, the machine plans a path by starting at the goal, then working backward to determine what leads up to that goal.

The same policy can be applied no matter your goal (”terminal condition”), because all the mechanics of the system are encoded in the policy. For example, if your goal is to build rockets at NASA, then it’s useful to consider what needs to happen one day, one month, or even one year before your dream comes true. The policy becomes less and less useful when the distance to your goal increases, but by working backward far enough, you can figure out what to do tomorrow to take the first step.

And if your plans don’t work out, well don’t worry, because the policy is independent of the state of the system. You can reevaluate your trajectory at any point to put yourself back on the right track7.

I miss learning signal processing and computer graphics and machine learning and all of these classes with a lot of math in them. I work on infrastructure and networking at work, which is supposedly my specialization. But I also feel like I’m missing out on a lot of great stuff that I used to be interested in. The math-heavy computer science courses always felt a little more legit. I always imagined college to be a lot of handwriting and equations and stuff. Maybe I’ll pick up another side project for this stuff soon.

And here’s a photo of the hard disk from my first laptop:

A hard disk lying on some leaves.

It died less than a month after I got the laptop. After that, I started backing up my data more religiously. Plus, I replaced the spinning rust with a new Crucial M4 and that lasted for about 4.5 years until it broke too. I still kept this hard drive chassis and platter, because it looks cool.

  1. Acronyms aren’t allowed anyway. ↩︎
  2. My first Kindle was a 3rd generation Kindle Keyboard. When I broke that one, I bought another Kindle Keyboard even though a newer model had been released. I didn’t want my parents to notice I had broken my Kindle so soon after I got it, so I hid the old Kindle in a manilla envelope and used its adopted brother instead. Three years later, I upgraded to the Paperwhite, and that’s still in my backpack today. ↩︎
  3. See this or this. ↩︎
  4. But also partially because I’m a lazy bastard. ↩︎
  5. And yet, I haven’t left my apartment all weekend. ↩︎
  6. PAaaS: parenting advice as a service. ↩︎
  7. On second thought, this doesn’t have much to do with artificial intelligence. ↩︎

The data model of Hubnext

I got my first computer when I was 8. It was made out of this beige-white plastic and ran a (possibly bootlegged) copy of Windows ME1. Since our house had recently gotten DSL installed, the internet could be on 24 hours a day without tying up the phone line. But I didn’t care about that. I was perfectly content browsing through each of the menus in Control Panel and rearranging the files in My Documents. As long as I was in front of a computer screen, I felt like I was in my element and everything was going to be alright.

Computers have come a long way. Today, you can rent jiggabytes of data storage for literally pennies per month (and yet iPhone users still constantly run out of space to save photos). For most people living in advanced capitalist societies, storage capacity has been permanently eliminated as a reason why you might consider deleting any data at all. For people working in tech, there’s a mindset known as “big data”, where businesses blindly hoard all of their data in the hope that some of it will become useful at some time in the future.

On the other hand, I’m a fan of “small data”. It’s the realization that, for many practical applications, the amount of useful you have is dwarfed by the overwhelming computing and storage capacity of modern computers. It really doesn’t matter how inefficient or primitive your programs are, and that opens up a world of opportunities for most folks to do ridiculous audacious things with their data2.

When RogerHub ran on WordPress, I set up master-slave database and filesystem replication for my primary and replica web backends. WordPress needs to support all kinds of ancient shared hosting environments, so WordPress core makes very few assumptions about its operating environment. But WordPress plugins, on the other hand, typically make a lot of assumptions about what kinds of things the web server is allowed to do3. So the only way to really run WordPress in a highly-available configuration is to treat it like a black box and try your best to synchronize the database and filesystem underneath it.

RogerHub has no need for all of that complexity. RogerHub is small data. Its 38,000 comments could fit in the system memory of my first cellphone4 and the blobs could easily fit in the included external MicroSD card. But perhaps more important than the size of the data is how simple RogerHub’s dataset is.

Database replication comes with its own complexities, because it assumes you actually need transaction semantics5. Filesystem replication is mostly a crapshoot with no meaningful conflict resolution strategy for applications that use disk like a lock server. But RogerHub really only collects one kind of data: comments. The nice thing about my comments is that they have no relationship to each other. You can’t reply directly to other comments. Adding a new comment is as simple as inserting it in chronological order. So theoretically, all of this conflict resolution mumbo jumbo should be completely unnecessary.

I call the new version of RogerHub “hubnext” internally6. Hubnext stores all kinds of data: comments, pages, templates7, blobs8, and even internal data, like custom redirects and web certificates. Altogether, these different kinds of data are just called “Things”.

One special feature of hubnext is that you can’t modify or delete a Thing, once it has been created (e.g. an append-only data store). This property makes it really easy to synchronize multiple sets of Things on different servers, since each replica of the hubnext software just needs to figure out which of its Things the other replicas don’t have. To make synchronization easier, each Thing is given a unique identifier, so hubnext replicas can talk about their Things by just using their IDs.

Each hubnext replica keeps a list of all known Thing IDs in memory. It also keeps a rolling set hash of the IDs. It needs to be a rolling hash, so that it’s fast to compute H(n1, n2, …, nk, nk+1), given H(n1, n2, …, nk) and nk+1. And it needs to be a set hash, so that the order of the elements doesn’t matter. When a new ID set added to the list of Thing IDs, the hubnext replica computes the updated hash, but it also remembers the old hash, as well as the ID that triggered the change. By remembering the last N old hashes and the corresponding Thing IDs, hubnext builds a “trail of breadcrumbs” of the most recently added IDs. When a hubnext replica wants to sync with a peer, it sends its latest N hashes through a secure channel. The peer searches for the most recent matching hash that’s in both the requester’s hashes and the peer’s own latest N hashes. If a match is found, then the peer can use its breadcrumbs to generate a “delta” of newly added IDs and return them back to the requester. And if a match isn’t found, the default behavior is to assume the delta should include the entire set of all Thing IDs.

This algorithm runs periodically on all hubnext replicas. It’s optimized for the most common case, where all replicas have identical sets of Thing IDs, but it also works well for highly unusual cases (for example, when a new hubnext replica joins the cluster). But most of the time, this algorithm is completely unnecessary. Most writes (like new comments, new blog posts, etc) are synchronously pushed to all replicas simultaneously, so they become visible to all users globally without any delay. The synchronization algorithm is mostly for bootstrapping a new replica or catching up after some network/host downtime.

To make sure that every Thing has a unique ID, the cluster also runs a separate algorithm to allocate chunks of IDs to each hubnext replica. The ID allocation algorithm is an optimistic majority consensus one-phase commit with randomized exponential backoff. When a hubnext replica needs a chunk of new IDs, it proposes a desired ID range to each of its peers. If more than half of the peers accept the allocation, then hubnext adds the range to its pool of available IDs. If the peers reject the allocation, then hubnext just waits a while and tries again. Hubnext doesn’t make an attempt to release partially allocated IDs, because collisions are rare and we can afford to be wasteful. To decide whether to accept or reject an allocation, each peer only needs to keep track of one 64-bit ID, representing the largest known allocated ID. And to make the algorithm more efficient, rejections will include the largest known allocated ID as a “hint” for the requester.

There are some obvious problems with using an append-only set to serve website content directly. To address these issue, each Thing type contains (1) a “last modified” timestamp and (2) some unique identifier that links together multiple versions of the same thing. For blobs and pages, the identifier is the canonicalized URL. For templates, it’s the template’s name. For comments, it’s the Thing ID of the first version of the comment. When the website needs to fetch some website content, it only considers the instance of the data with the latest “last modified” timestamp among multiple Things with the same identifier.

Overall, I’m really satisfied with how this data storage model turned out. It makes a lot of things easier, like website backups, importing/exporting data, and publishing new website content. I intentionally glossed over the database indexing magic that makes all of this somewhat efficient, but that’s nonetheless present. There’s also an in-memory caching layer for the most commonly-requested content (like static versions of popular web pages and assets). Plus, there’s some Google Cloud CDN magic in the mix too.

It’s somewhat unusual to store static assets (like images and javascript) in a relational database. The only reason why I can get away with it is because RogerHub is small data. The only user-produced content is plaintext comments, and I don’t upload nearly enough images to fill up even the smallest GCE instances.

Anyway, have a nice Friday. If I find another interesting topic about Hubnext, I’ll probably write another blog post like this one soon.

A bridge in Kamikochi, Japan.

  1. But not for long, because I found install disks for Windows 2000 and XP in the garage and decided to install those. ↩︎
  2. I once made a project grading system for a class I TA’ed in college. It ran on a SQLite database with a single global database lock, because that was plenty fast for everybody. ↩︎
  3. Things like writing to any location in the web root and assuming that filesystem locks are real global locks. ↩︎
  4. a Nokia 5300 with 32MB of internal flash ↩︎
  5. I’ve never actually seen any WordPress code try to use a transaction. ↩︎
  6. Does “internally” even mean anything if it’s just me? ↩︎
  7. Templates determine how different pages look and feel. ↩︎
  8. Images, stylesheets, etc. ↩︎

What’s “next” for RogerHub

Did I intentionally use 3 different smart quotes in the title? You bet I did! But did it require a few trips to fileformat.info and some Python to figure out what the proper octal escape sequences are? As a matter of fact, yes. Yes it did. And if you’re wondering, they’re \342\200\231, \342\200\234, and \342\200\2351.

The last time I rewrote RogerHub.com was in November of 2010, more than 6 years ago. Before that, I was using this PHP/MySQL blogging software that I wrote myself. RogerHub ran on cheap shared hosting that cost $44 USD per year. I moved the site to WordPress because I was tired of writing basic features (RSS feeds, caching, comments, etc.) myself. The whole migration process took about a week. That includes translating my blog theme to WordPress, exporting all my articles2, and setting up WordPress via 2000s-era web control panels and FTP.

Maybe it’s that time again? The time when I’m unhappy with my website and need to do something drastic to change things up.

To be fair, my “personal blog” doesn’t really feel like a blog anymore. Since RogerHub now gets anywhere between 217 to 221 visitors per month, it demands a lot more of my attention than a personal blog really should. During final exam season, I log onto my website every night to collect my reward: a day’s worth of final exam questions and outdated memes3. Meanwhile, I wrote 3 blog posts last year and just 1 the year before that.

I want to take back my blog. And I want to strategically reduce the amount of time I spend managing the comments section without eliminating them altogether. Lately I’ve been too scared to make changes to my blog, because of how it might break other parts of the site. On top of that, I have to build everything within the framework of WordPress, an enormous piece of software written by strangers in a language that gives me no pleasure to use. I miss when it didn’t matter if I broke everything for a few hours, because I was editing my site directly in production over FTP. And every time WordPress announces a new vulnerability in some JSON API or media attachments (all features that I don’t use), I miss running a website where I owned all of the code.

So on nights and weekends over the last 5 months, I’ve been working on a complete rewrite of RogerHub from the ground up. And you’re looking at it right now.

Why does it look exactly the same as before? Well, I lied. I didn’t rewrite the frontend or any of the website’s pages. But all the stuff under the hood that’s responsible for delivering this website to your eyeballs has been replaced with entirely new code4.

The rewrite replaces WordPress, NGINX, HHVM, Puppet, MySQL, and all the miscellaneous Python and Bash scripts that I used to maintain the website. RogerHub is now just a single Go program, running on 3 GCE instances, each with a PostgreSQL database, fronted by Google Cloud Load Balancer.

Although this website looks the same, I’ve made a ton of improvements behind the scenes that’ll make it easier for me to add features with confidence and reduce the amount of toil I perform to maintain the site. I’ll probably write more about the specifics of what’s new, but one of the most important things is that I can now easily run a local version of RogerHub in my apartment to test out new changes before pushing them live5. I’ve also greatly improved my rollout and rollback processes for new website code and configuration.

Does this mean I’ll start writing blogs again? Sure, probably.

I’m not done with the changes. I’ve only just finished the features that I thought were mandatory before I could migrate the live site over to the new system. I performed the migration last night and I’ve been working on post-migration fixes and cleanup all day today. It’s getting late, so I should just finish this post and go to sleep. But I’ll leave you with this nice photo. I used to end these posts with funny comics and reddit screencaps.

Tree branches and flowers in the fog.

It’s a little wider than usual, because I’m adding new features, and this is the first one.

  1. Two TODOs for me: memorize those escape codes and add support for automatic smart quotes in post titles ↩︎
  2. I used Google Sheets to template a long list of SQL queries, based on a phpMyAdmin dump that I copied and pasted into a spreadsheet. Then, I copied those SQL queries back into phpMyAdmin to import everything into WordPress. ↩︎
  3. By my count, I’ve answered more than 5,000 questions so far. The same $44 annual website fee is enough to run 2017’s RogerHub.com for about 2 weeks. ↩︎
  4. And that’s a big deal, I swear! ↩︎
  5. Gee, it’s 2017. Who would have thought that I still tested new code in production? ↩︎

Child prodigy

I watched a YouTube video this morning about a 13 year old boy taught himself to make iPhone apps and got famous for it. He took an internship at Facebook and then started working there full-time. There were TV stations and news websites that interviewed him and wrote about how he’s helping his family financially and how any teenager can start making tons of money if they just learn to code. And the story was nice and inspiring and stuff, except there are tons of kids that do the same thing and nobody writes articles about any of them. He’s probably 18 or 19 now1 and still working at Facebook as a product manager. How’s he feeling now? On the other hand, I’m a college senior, dreading the day when I have to start working like a grown-up and wondering if I’ll miss college and confused why people can’t just stay in college forever. He never went to college. He had probably gotten accepted at lots of different schools (did he even get a chance to apply?), but he decided college wasn’t worth the opportunity to work at Facebook and pull his family out of their crappy financial situation. Cheers to him.

I felt exactly the same way in high school. But I didn’t have a compelling reason to start working or the balls to deviate from the Good Kid Story™. I started making websites when I was 10, and by the time I finished high school, I could churn out CRUD web applications like any other rank-and-file software developer. Part of me honestly thought that I could skip a few semesters of class once I got to Berkeley, because I already knew about for-loops and I could write Hello World in a handful of languages. I thought college was going to be the place where people learn about the less-useful theoretical parts of programming. They’d teach me what a tree was, even though I never had any reason to use anything but PHP’s ubiquitous ordered hash map. I thought it wouldn’t be anything that I wouldn’t have learned anyways, if I just kept writing more and more code. And I was partially right, but also very wrong.

Getting a proper CS education is really important, and I wouldn’t recommend that anybody drop out or skip college, just so they can start working, especially if there isn’t a strong financial reason to do so. However, there’s two hard truths that people don’t like admitting about CS education: 1) most of the stuff taught to undergrads is also available on the Internet, and 2) most people who get a CS degree are still cruddy programmers. So, school isn’t irreplaceable and it’s not like attending school will magically transform you into a mature grown-up programmer. But that’s really not why getting a formal CS education is important.

After 7 semesters, it’s still hard to say exactly why people place a lot of value on getting a formal education in computer science. Most people need to be taught programming, because they have no experience and are in no shape to do anything productive with a computer. But for all the programming prodigies of the world, there needs to be another reason. I can say that I’m a much better programmer than I was four years ago. It always seems like the code I wrote the previous year is a pile of garbage2.

School forced me to learn things that I never would have learned on my own (because they were irrelevant to my own projects) nor would I have learned while working full-time (because they’d be irrelevant to the work I’d be doing). In high school, I had no idea people could write programs that did more than loading and saving data to a database. The classes I took actually expanded the range of what programs I thought were possible to write3.

When I taught myself things as a kid, I would enter a tight loop of learn-do-learn-do. Most of the code I wrote were attempts to get the Thing working as easily as possible, which ended up leading to a lot of frustration and wasted time. It’s hard to piece together a system before you understand the fundamental concepts. And that sounds really obvious, but a lot of programming tutorials seem to take that approach. They’ll tell you how to do the Thing, but they don’t bother giving you any intuition about the method itself. On the other hand, college classes have the freedom to explain the Thing in the abstract. Then once you start doing it yourself, you’ll know exactly what to look for4.

It’s really unfair to make a teenager make their own decisions about work and college, because you really shouldn’t be punished for making stupid life choices as a kid. Teaching myself programming as a kid was useful, but frankly I was a terrible teacher. But I’ve gotten better at that as well. This is my 5th semester as a teaching assistant, and I’ve picked up all kinds of awesome skills, from public speaking to technical writing, not to mention actual pedagogy as well. I’ve spent literally a thousand hours working on my tooling, because college convinced me that it really does matter5.

They say that it takes 10 years to really master a skill. Well, this is going to be my 12th year as a computer programmer, and I still don’t feel like I’ve mastered anything. I guess everybody learns in a different way, but it really sucks that society has convinced teenagers that college is optional/outdated. It’s easy to lure teenagers away from education with money and praise, especially because it’s really hard to see the point of a formal education when your entire programming career is creating applications that are essentially pretty interfaces to a database6. It doesn’t help that college-educated programmers are sometimes embarrassed to admit that school doesn’t work for everyone.

I wonder if that iPhone kid is disappointed with the reality of working full-time in software development. The free food and absurd office perks lose their novelty quickly.

  1. I have no idea actually. ↩︎
  2. Some people say that’s a good thing? I’ve realized that code is the enemy. The more code you write, the more bugs you’ve introduced. It’s incredibly hard to write code that you won’t just want to throw out next year. Code is the source of complexity and security problems, so the goal of software engineers is to produce less code, not more. When you have a codebase with a lot of parts, it’s easy to break things if you’re not careful. Bad code is unintuitive. Good code should be resistant to bugs, even when bad programmers need to modify it. ↩︎
  3. Little kids always tell you that programmers need to be good at math, which actually doesn’t make that much sense when I think about it. You need some linear algebra and calculus for computer graphics and machine learning. Maybe you’ll need to know modular arithmetic and number systems. But math really isn’t very important. ↩︎
  4. A huge number of software bugs are caused by the programmer misunderstanding the fundamentals of the thing they’re interacting with. ↩︎
  5. My favorite programming tools in high school were Adobe Dreamweaver and Notepad. I started using Ubuntu full-time in 11th grade, but didn’t make any actual efforts to improve my tools until college. ↩︎
  6. Not to underestimate the usefulness of simple CRUD apps. ↩︎

Email surveillance

There’s a new article in the SF Chronicle that says the University of California, Office of the President (UCOP) has been monitoring emails going in and out of the UC system by using computer hardware. I wanted to give my personal opinion, as a computer programmer and somebody who has experience managing mail exchangers1. The quotes in the SF Cron article are very generous with technical details about the email surveillance system. Most of the time, articles about mass surveillance are dumbed down, but this one gives us at least a little something to chew on.

Email was not originally designed to be a secure protocol. Over the three (four?) decades that email systems have been used, computer people have created several extensions to the original SMTP2 and 822 envelope protocol to provide enough modern security to make email “good enough” for modern use. Most email today is exchanged under the protection of STARTTLS, which is an extension for SMTP that upgrades a cleartext connection to an encrypted connection, if both parties support it. The goal of STARTTLS is to provide resistance against passive monitoring. It doesn’t provide any guarantees about the authenticity of the other party, because usually the certificates aren’t validated, so STARTTLS is still vulnerable against MITM attacks3. There are other email-security extensions. But they’re either designed for ensuring authenticity rather than privacy (like SPF, DKIM, and DMARC) or they’re not widely used (like GPG).

The only protection we have against passive snooping of emails is STARTTLS. According to the SF Cron article, the “intrusive device” installed at UC campuses is intended to capture and analyze traffic, rather than intercepting and modifying it. So, I took a look at some of the emails I’ve received at my personal berkeley.edu address over the last 3.5 years of living in Berkeley. I looked specifically at the advertising emails I get from Amazon.com, because I’ve been receiving them consistently for many years, and they always come from the same place (Amazon SES). All of my most recent emails from Amazon follow this path, according to the email headers:

  • Amazon SES
  • UC Berkeley Mail Server “ees-ppmaster-prod-01”
  • 3 local mail filters, called “pps.reinject”, “pps.reinject”, and “pps.filterd”
  • UC Berkeley Mail Server “ees-sentrion-ucb3”
  • Google Apps Mail Server

Before April 2015, another UC Berkeley Mail Server was part of this path, in between the “sentrion” server and the Google Apps server. Before December 2014, the path looked completely different. There was only a single server between SES and Google, which was labeled “cm06fe.ist.berkeley.edu”.

According to the email headers, each step along the path is encrypted using STARTTLS, except for some of the local mail filters. Those 3 local mail filters are programs that run on the UC Berkeley Mail Server which might do things like scanning for viruses or filtering spam. They don’t exactly need encryption, because they don’t communicate over the network. I also noticed that before May 2015, there was only 1 local mail filter (the “pps.filterd” one) instead of 3.

The SF Cron article mentions that email surveillance started after attacks on UCLA Medical Center, which occurred in July 2015. Unfortunately, nothing significant seems to have changed in the email headers between June and October of 2015. But the use of STARTTLS, even within UC Berkeley’s own networks, casts doubt on the idea that UCOP surveillance was implemented as passive network monitoring.

If the surveillance was implemented at the network level, it would have to proxy the SMTP connections between all of the “ppmaster” and “sentrion” servers, as well as spoof the source IP or routing tables or reverse DNS lookup tables of the entirety of Berkeley’s local email network. It’d be an unnecessarily sophisticated method, if they just wanted to hide the presence of surveillance hardware.

On the other hand, if surveillance was implemented with the cooperation of campus IT staff, it would be pretty simple to implement for all emails campus-wide. There are already plenty of unlabeled local mail filters in place. These could easily be configured to forward an unencrypted copy of all emails to a 3rd party vendor’s system, for monitoring and analysis. Additionally, “sentrion”, which probably refers to SendMail’s Sentrion product, looks like it was expressly designed for the purpose of recording and analyzing large amounts of email.

There are a couple of problems if email monitoring really were implemented on the mail servers themselves with the cooperation of campus IT staff. If this is really the case, then it would require another system to monitor web traffic, which doesn’t seem to be explained in the article. Or perhaps, the claim that web traffic were being monitored is incorrect4.

I’ve always accepted that work email should be considered the property of your employer. Your personal stuff should stay on your personal cell phone and email accounts. However, students are not employees of the University5. I don’t know much about law, but I feel like FERPA was passed to address these kinds of privacy questions regarding students and academic institutions. Implementing mass email surveillance without consulting faculty and students, regardless of its legality, seems underhanded and embarrassing for what claims to be the number one public university in the world.

  1. I’m currently a student and (technically?) an employee of UC Berkeley. But these opinions are my own. ↩︎
  2. The Simple Mail Transfer Protocol, which is used to deliver all publicly-routed email. ↩︎
  3. Man-in-the-middle attacks ↩︎
  4. Most web traffic (including RogerHub) goes through HTTPS today anyway. Monitoring web traffic without a MITM proxy would be ineffective. ↩︎
  5. Unless you happen to be both. ↩︎

Website updates

Last December was the biggest month for RogerHub ever. We served over 4 million hits, which consumed over 3 terabytes of bandwidth. By request, we released the 6th calculator mode, “lowest test dropped”, to the public. But during the same month, we experienced the biggest outage that has ever happened on RogerHub, which affected over 60,000 visitors, and the number of total spam comments has nearly doubled. I keep using “we”, even though this is a one-man operation, because these seasonal surges of traffic feel a lot bigger than just me. Toward the end of the month, my hosting provider Linode was targeted by several large DDoS attacks across all their US datacenters. RogerHub is run in 2 Linode locations: Dallas, TX and Fremont, CA. However, only one location is active at any time. The purpose of the inactive location is to take over the website when the primary location goes offline. There’s a lot of reasons why a Linode datacenter could fail, including physical issues with Linode machines, power outages, and network connectivity issues. During the recent DDoS attacks, Linode came very close to being offline in both Dallas and Fremont, which would have caused issues for this site. There’s another wave of traffic in January, for people who have finals after Winter Break, and it’s important that RogerHub doesn’t have an outage then.

I’ve been working on new stuff for RogerHub. I’ve decreased the payload size of the most popular pages by paginating comments. It took a while before I found a solution that both provided a pleasant user experience and allowed the comment text to be easily indexed. I’ve made the site a bit wider, and I’ve reduced the amount of space around the leaderboard ad on desktop browsers. I’ve improved the appearance of buttons on the site, and I’ve given the front page a new look. Finally, I’ve migrated RogerHub from Linode to Google Compute Engine and enabled HTTPS for the entire site.

RogerHub is using GCE’s global HTTP load balancer to terminate HTTPS connections at endpoints that are very close geographically to visitors. Google is able to provide this with their BGP anycast content distribution network. With HTTPS also comes support for SPDY and HTTP/2 on RogerHub, which remove some of the performance quirks associated with plain HTTP. I’ve also converted all my ad units to load asynchronously. My use of HTTPS and GCE’s global HTTP load balancer also makes it tricker to block RogerHub on academic WiFi networks, especially on non-school owned equipment, where TLS interception is out of the question.

You might think it’s silly to run third party ads under HTTPS, since advertising destroys any client-sided security you might claim to offer and many ad networks still don’t fully support HTTPS. I’ve always had mixed feelings about the advertising on RogerHub. Advertising covers my server costs, and I wouldn’t be able to run this site without advertising revenue. But poorly-designed advertising can ruin the user’s experience, especially on mobile devices. I’m only interested in the most unobtrusive online advertising for my website, and I try very hard to make sure that expanding ads, auto-playing video ads, and noise-making ads never get served from RogerHub. During the last few months, I’ve removed the main leaderboard ad for mobile users and I’ve removed ads from the home page as well.

In other news, a lot of RogerHub’s sites have been shutdown, including the Wiki and a bunch of miscellaneous things you’ve probably never looked at. My coding blog has been turned into static HTML, but is still available1. This is my only blog left (also my first blog), so I might use it again some time soon.

  1. There’s currently some mixed-content warnings on it, but I’ll fix it soon ↩︎

Grown ups

I haven’t posted anything to my Tumblr blog in 649 days, but in that time I’ve gained maybe 50 new followers, and they’re all strangers. I don’t think any of them are bots either. They found a link on my homepage and maybe they decided I would some day post something again. Sometimes, I click on their profile picture and check out their Tumblr blogs too. I open up web inspector and grab the URL of their avatar thumbnail, and then I change the _128 suffix to _512, because I knew that Tumblr offered avatar thumbnails with sizes in powers of 2, between 32 and 512. And then I remembered that a few years ago I built a tool to uncover Tumblr avatars and put it on RogerHub, and suddenly it feels kind of creepy checking out 512px thumbnails of strangers’ avatars, because most of them probably don’t know avatar thumbnails go up to that size.

It’s summer now, and it has been half a year since I wrote anything here on RogerHub, so I suppose I owe you an update about what’s new with me1. I feel more clumsy with words than I felt in high school, which was when I wrote new posts on this blog every week or so. It’s a side effect of sitting in a chair at work every day with my earbuds in my ears and having very little conversation with other actual people. Even when I talk during the workday, the talking is usually about computer stuff, which doesn’t help with normal talking that much. There was a time last Summer break when I felt like I had gotten really terrible at Scrabble, because all of the words that I thought of were computer words or acronyms that didn’t count as legal words. I might have told you about that already, sorry.

Working an internship has its pros and cons. The company really spoils its interns, and when I get home, I don’t have any homework. This opens up my schedule to cook more2 and also to go hang out with my friends. I don’t have to spend time in the nasty parts of Berkeley. I can read my Kindle a little bit more, and I can focus on my health.

On the other hand, I miss school and TA’ing for my class. I miss when all the projects were easy and understandable and written terribly. I miss using my own laptop and spending time on my personal data backup system and my text editor configuration3. I miss coming home to roommates that I actually talk to, and I miss living in walking distance of a lot of people.

Hm, so far, it seems like I’m just whining about missing a whole bunch of things. I suppose there are other cons to working an internship too.

I have to share my room with another person, but it’s not terrible. My roommate is cool, and I already sort of knew him from Berkeley. The internet speed sucks, and the connection is kind of unreliable. I don’t have space to set up all my tech, and I don’t feel that comfortable ordering stuff online here. The internship comes with its own kind of stress, because I want to do a good job and feel competent, but it’s not easy. I doze off sometimes, because I’m used to working for myself, not somebody else. It was always stuff for RogerHub or building out cool infrastructural stuff I wanted to have. Even when I was working on things for students and grading, it felt like working for myself. I have trouble seeing the big picture of what I’m contributing to.

I guess all these cons aren’t hard to fix. I can make new friends, and I can try to relax more at work. There isn’t much I can do about the internet speed, but I’ll just have to learn to live with that. Maybe I just need an actual vacation.

One thing I can’t stop thinking about is the possibility that right now, I’m just hanging on until the semester starts again. If that’s true, then in 1 year’s time, I’ll be in this same position again (minus some of the intern perks), but without that reassurance that in less than two months, everything will be back to the way it was. The start of the semester means people will come back together in Berkeley again. It’s not like doing fun student things was so much better than doing internship things. Objectively, there is a lot of crap that students have to do that isn’t fun at all. I have to sit through humanities classes that put me to sleep, and I have to do CS projects, even if I don’t feel like they’re interesting or educational. Once you’re an adult, you get to cut a lot of the bullshit that kids have to deal with, because you always get a choice, and nobody can make you sit through something so boring that it puts you to sleep4. Also, it isn’t like I see my friends every day, or even every week, when I’m at school. There’s some people I haven’t seen all semester long, so why is not-seeing-them at school better than not-seeing-them in this corporate-provided San Francisco apartment?

I’m kind of disappointed at the percentage of adults that seem to be excited for the next day, every night, compared to the percentage of my school friends do. Why is it so hard to keep friends and not be sad when you’re an adult? I wish I had my calendar and to-do list back by my side, and I wish I actually had stuff to put on them. Sorry to end with something sad, but being an adult sounds like it totally sucks.

  1. It’s not like you can find out via Facebook or anything. ↩︎
  2. I have been making breakfast almost every morning for the last five weeks, and on the weekends, I make all three meals for myself. There’s a Safeway across the street. ↩︎
  3. I can do this at work actually. ↩︎
  4. Ok, I guess your boss can threaten to fire you, but you have a choice to get a new job. ↩︎

Notes and reminders

This is Notes.app, which I use to save rich text and organize ideas. I like it because it’s not a website, it’s a native OS X app. And because it opens in a small window that fits on the side of the screen, I feel creative and comfortable writing notes here.

Notes.app on my desktop.

But it doesn’t sync with my Android phone. It only syncs with an iCloud account, and I don’t use iCloud for anything except iTunes purchases and this. It’s also a little buggy with too much rich text.

I use vim for all my text editing1, and I wanted to use vim for notes too. But it didn’t work out. Rich text lets me put in checkmarks like ✔︎, and I can start bulleted list with wiki-style syntax. There’s a font color palette, and you can paste images and headings into it directly from Safari. I have a ton of places to write stuff, but this one is my favorite.

I also tried Google Keep, but the mobile app is so clunky and there’s no native app. The web interface is awkward too. I don’t like sticky notes. They feel like a lazy way to make reminders.

There’s also a Tasks system that is integrated with Gmail and Google Calendar. I’ve been using that one for a long time, but there is no mobile app. I purchased a third-party mobile app for Tasks2, and I’ve been using that for a long time too. But there’s no native OS X app. The next best thing is a full page web interface for Tasks, which isn’t too bad.

I guess I’ll never be able to see my notes on my phone, or edit my todo list with a native OS X app. It’s the end of the semester anyway, so my todo list is getting shorter and shorter.

  1. I wrote an essay in vim. It has spell check and text wrapping. Pretty good! ↩︎
  2. It’s called Tasks (surprise). ↩︎

Sticky note

Sticky note on my apartment window

1

  1. I put this sticky note on my window on the first week of school. I needed pictures of something where the foreground was close by, but the background was far away. It was for the project I was composing for the new semester, and I took pictures of this note on my window for the input images. ↩︎

Catching up on this year

Welcome back reader! I’ve been gone for more than a year, and a lot of things have changed. I haven’t stopped blogging. I was just scared off of this blog, because of all the visitors spilling over from my calculator page. So, I made four or five other blogs so I could write without wondering how many strangers were going to read the posts I wrote.

Here are some things that have happened to me since September 2013, to get you up to speed: I got an offer last December to be a TA for a class at my university, which meant a 20-hour commitment and lots of cool perks. I was in San Diego visiting a friend when I got the offer (it was past midnight during winter break). I was almost going to reply immediately and turn down the offer, because even though I had really wanted to be a TA before1, it sounded like way too much work on top of everything I already had scheduled for the Spring semester. Besides, I had already agreed to be a reader (less work, less cred) for another class. My friend convinced me to sleep on it instead, and the next morning, I woke up and told the professor I was interested, but already committed to the reader position. Things kind of settled into place after that, and I became a little sophomore TA.

Before my first discussion section, I was really nervous. I made a slideshow and everything to introduce myself and try to give a couple of tips for succeeding in the course. When I took the class, I went to every single lecture and made sure I stayed on top of the material. I never went to discussion though... My discussions were always pretty small. I think that was because I was inexperienced and also that my sections were at really inconvenient times. As for the few people that showed up every week, I got to know them pretty well. I did a lot of work on the 3rd project, which was fun because I got to support that assignment from start to end. I took the pedagogy course that new TA’s all take. They talk about some of the research about teaching and learning and all of the techniques that come out of those. Once I heard about some of those techniques, I started noticing them from my own professors and TA’s. I could also spot when a TA was making mistakes and breaking all the rules, and it really did make a difference.

I got sick a bunch of times, but I don’t remember all of those. I went through two birthdays, so I’m 20 years old now, which is kind of cool. Some time in October, I was going through my four year plan and I noticed I’d taken enough requirements and had enough units to graduate pretty easily in just 3 more semesters. I had 5 more upper division CS classes to take and 2 more humanities courses and 1 more requirement for my major. So, I decided I was going to plan to graduate in 3 years. I thought that I might change my mind as my last semester approached, but I did change my graduation year from 2016 to 2015 on my resume to make it somewhat official. I thought I was going to graduate early for a long time. I talked to my parents about it, and I told my friends about it. But I cancelled this a couple weeks ago, so I’m back to 2016.

I took a few unmemorable classes that semester. I got to know some of my friends better, since they were taking the class I was TA’ing for. I convinced some of them to attend my section sometimes, and we’d just have fun and I’d bring food on occasion.

I spent a bunch of time doing interviews for a summer internship last year. Job hunting took a long time, and I didn’t find an internship until kind of late in the year. I ended up getting offers from three companies, which is frustrating, because up until that first offer, it seemed like nobody was really interested in hiring me, and then suddenly three of them come in a week. I went to intern at Quizlet for the summer. All three really would have been pretty solid choices, so I sent these long apology emails to the other two places letting them know that I thought they were awesome too, but sorry! During the summer, I read a this big long book about a slum on the outskirts of Mumbai. The book was called Behind the Beautiful Forevers2, and I thought it was really ironic being in one of the richest cities in the world with all of these rich tech industry folks. I know that not everyone had a lot of money, but everyone sure acted like it.. and ate like it. I also thought a lot about those Japanese teenagers who left home to go to university in Tokyo and after that, worked hard to make their fare in a big place. I found a sublet on the west side of Berkeley, which I thought would be convenient because it was near the subway station. Turned out that the 45-minute commute every day was really tiring. The apartment I lived in was kind of old and smelly and in a poorer part of Berkeley. I didn’t really get along with my roommate, who lived a completely different lifestyle than I did. At work, I learned a lot about real infrastructure stuff. I picked up a bunch of lingo and got better at evaluating software and infrastructure.

This semester I’m a TA again. I think I’ve gotten better at it the second time around. I signed up for a business class about negotiations, to fulfill my first of 2 humanities courses. I didn’t get in, even though I was #2 on the waiting list since the start of telebears. But to be honest, I’m sort of relieved that I only have my 3 technical courses this semester, because they are more than enough workload for me. A bunch of my friends are also TA’s, which is cool, since we can talk about TA things.

I’m taking the operating systems class this semester, which is super cool, because we do our projects in 4-person groups, and working with my group is lots of fun. I’m also taking artificial intelligence, which I thought I would hate. But I actually think we cover a lot of cool material in that class, and I wish I had more time to really understand it. Finally, I’m taking computer networking, which is alright.

I bought a MacBook Air over the summer for myself. I also bought a big giant lens for my DSLR. It’s alright. I bought a 49-key USB piano keyboard, which I play sometimes with GarageBand. I bought a subscription to Adobe Creative Cloud for Photographers, which comes with Photoshop and Lightroom. I used Photoshop a ton for developing project 1 this semester. I also started using Lightroom to post-process all of my pictures. My MacBook is able to handle realtime editing and image exporting just fine. I didn’t skimp on the CPU and memory upgrades.

Things are alright right now. I looked at some of the things I wrote here in high school, and they sound really kind of dumb and don’t sound like how I would say them in real life. I think it’s endearing how blog authors adore their readers and refer to them. I still follow a lot of people’s blogs on my RSS reader. I don’t think they know!

Here’s me from a few weeks ago3, before I got my haircut:

Picture of Roger holding his DSLR.

I might not write again for a long time. But talk to you again soon!

  1. At Berkeley, I feel like becoming a TA is one HUGE thing you probably want to do at least once before you graduate in CS. A large percent of CS students become TA’s at some point in college, and it’s really prestigious, and they waive your tuition. ↩︎
  2. And then a few weeks later, the vlogbrothers decide they’re going to have everybody read it >.> ↩︎
  3. Check the EXIF on the full size! ↩︎

Miscellaneous things

Last friday, there was an outdoor concert on campus and I went with some of the people I lived with in the dorms last year. I hadn’t heard of either of the bands performing, but that didn’t worry me. I try not to be stingy with my time, and I feel uncomfortable when other people are with theirs. It’s not like I am doing something important with every minute of my waking day anyway. Also, I don’t put a price on anything that concerns my mental well-being. I am not on the edge of going insane; that is not what I mean. I just mean that some things are more important and should not be valued the same way you value unimportant things. It just so happens that at the moment, I can’t think of very many unimportant things to exemplify, or else I would have listed them instead of just saying “unimportant things”. The first unimportant thing I thought of was hardware for a decent laptop, because my laptop is sitting right in front of me. But then I remembered that having a highly-functional laptop is critical to my usual workflow and by extension, my mental wellbeing. The next thing I thought of was ice cream. There is an ice cream shop right down the street from the apartment complex where I live. I personally think that they are overpriced for what they produce, especially since their competitor sells scoops of ice cream for one dollar on the other side of town. If ice cream is essential to your mental health as a laptop is to mine, then perhaps you should not skimp on your ice cream budget. As for me, I think I would be happier to know that my ice cream had only cost me one dollar.

Anyway, I decided that this outdoor concert was something that concerned my mental health and was worth the time, however little I thought my time was worth. The concert was free, after all. I did not like the music the bands played, although that is probably my fault. I like music with great lyrical density, sensual vocals, and tangible instrumentation. The electronic music at the concert was not this. What I mean by tangible instrumentation is that I prefer music where all of the instruments are identifiable, and in some cases, reproducible with nothing more than a trained ear. Some kinds of modern music use computers to distort and mix these identifiable sounds into exciting new tones that have never before been heard. Others try to accentuate and clarify sounds in order to make them more identifiable. I have no problem with the latter, even if entire guitar tracks can today be completely synthesized from recorded samples. So long as the deception is convincing, I am not concerned. To me, this kind of music is the best evidence of humanity we can get on demand today. On the other hand, there is car music, which I listen to exclusively in the car where there are usually other people as well. Car music is what the popular young people radio stations play. It is a good thing that radio stations love playing car music, or else we would need to find another place to get it.

Many people at the concert were not paying attention to the music. They were sitting toward the back underneath the trees where they could talk or smoke, safely hidden from the sight of the one police officer who, that night, had the misfortune of being assigned to our concert. There are few things worse than not being able to sit down at an outside concert where everyone else is sitting down and enjoying the ambiance. Berkeley has a lot of night light so the sky is not very dark, but the concert glade had an excellent unrestricted view of the sky above. I laid down on the grass as well. With my friend’s phone, I identified the summer triangle--Deneb, Altair, and Vega--three bright stars that make a triangle. I am not familiar with celestial names and places. I always thought it was more worthwhile to spend time understanding celestial concepts rather than the names that Englishmen gave to those objects of brightest apparent magnitude as viewed from our planet Earth. The latter is a much cooler thing to know though. I only know of those three stars because they played an interesting part in a TV show I saw almost 6 years ago. I felt an moment of unexplainable sadness about seeing the actuality of those summer stars after learning about them so long ago.

Three of my favorite fields of study are computer science, astronomy, and biology. I always thought this was interesting (the fact, not the subjects) because they deal with everything we use, everything we see, and everything we are respectively. These actual fields may not be so broadly defined, but if you stretch your imagination a little, it sounds almost right. Learning about these fields of study helped to shape my philosophy and world view as I grew up. Astronomy was the earliest influencer of these three. You will find that nearly everyone who enjoys casual astronomy has a detached and laid-back approach to things that might seem very important.

I want to share with you two interesting things about astronomy that I think are more important than they’re given credit for. The first is the solar plane. In diagrams and illustrations, our planets are always drawn with their concentric orbits in a disk, like the ridges on a frisbee. On the other hand, we know that space is three-dimensional. The fact that all of the planets orbit in a roughly disk-shaped region must seem quite unusual1! Most people assume this just had to be the case, which is actually true for a lot of situations, but the reasons aren’t so obvious as it would seem. The second thing is that, while we know a lot of things about our solar system and our interstellar neighbors, we don’t know so much about the space in between. Past the orbit of Pluto lies a rough sphere of interesting random things that deserve some more attention. These include the Kuiper belt, another disk-shaped region past the orbit of Pluto, and the Oort Cloud, which encompasses all objects on which the Sun influences gravitationally. There are countless orbiting rocks, comets, clouds of hydrogen, and other unknown things all in this region. There are also several imaginary boundaries separating our solar system from interstellar space that define the regions where solar wind and interstellar forces are balanced out and things like that2.

In high school biology, I learned that almost all of the visible parts of a person are just made up of layers of keratin. I learned a few other things, but the keratin thing was the most intriguing. In the back of their minds, everyone is somewhat aware that people are just made of their constituent fluids and tissues and that somehow the orchestration of all those parts make interesting functional beings. However, there are some people who just seem so different from regular people that you’d refuse to believe all their visible parts were made up of the same proteins all of our visible parts are. It seems like a fantastic mission. Proving yourself to be more than keratin, that is. Talking to them, it almost makes you forget all the things you know about ancient geology and human anatomy, because all of these sciences seem so unbelievable in their light.

I could recount for you all of the cosmic miracles that made Earth into the fertile life-bearing oasis it is today. Astronomy and biology sort of overlap in that regard. It sounds kind of silly, but after so many years under the oppression of contextualizing science, I just want to forget that any of it was ever true. It gets more and more difficult to separate the irrelevant from the immediate when your brain keeps reminding you of the way things came to be. Of stars and humans, which is the irrelevant? That is hard to say. I enjoy deception oh so much; sometimes I just like to look at the pretty stars and forget about them both.

  1. In fact, there is a very good reason why this is usually the case, but it’s also true that our solar plane is tilted by around 63 degrees when compared to the galactic plane. Source. ↩︎
  2. This iconic picture comes to mind. ↩︎