Tumblr’s Phishing Protection Code

Roger Chen — Sat, 20 Apr 2013 22:30:25 +0000

At the top of every Tumblr user’s blog, there’s a piece of JavaScript inserted by Tumblr itself. In general, Tumblr is very generous about the control they give you over your blog’s appearance. They don’t insert any advertisements or enforce any global content other than a Quantcast visitor analytics script, follow/dashboard controls on the upper right, and this script. I’ve posted the first few lines here:

(function(){var a=translated_warning_string;var b=function(d){d=d||window.event;var c=d.target||d.srcElement;if(c.type=="password"){if(confirm(a)){b=function(){}}else{c.value="";return false}}};setInterval(function(){if(typeof document.addEventListener!="undefined"){document.addEventListener("keypress",b,false)}},0)})();

This isn’t particularly clever or difficult to understand, but it does utilize several good ideas in JavaScript programming, so I’d like to go over it. First of all, you’ll notice that this extract is a single line of code that’s surrounded by (function() { ... })();. Javascript treats functions as first-class citizens. They can be declared, reassigned, and called inline. There are several advantages of first-class functions in any dynamically-typed language:

The ability to create anonymous functions is particularly helpful when you don’t want to clog up your global namespace with function names that are only used in one part of your code.
They can be created dynamically using, what’s known as a closure in many languages. Closures take their variables from the environment frame in which they were created, so you can generate them on-the-fly in a loop, in an event handler, or as part of a callback.
Even if you don’t need or want any of the fancy benefits listed so far, it’s useful that local variables declared in first-class functions are destroyed when the function quits. That way, you can use names like a or _ without polluting your global namespace.

I’ve reformatted the statements inside the function here:

var a = translated_warning_string;
var b = function(d){
  d = d||window.event;
  var c = d.target||d.srcElement;
  if (c.type == "password"){
    if(confirm(a)){
      b = function(){}
    } else {
      c.value = "";
      return false
    }
  }
};
setInterval(function(){
  if (typeof document.addEventListener != "undefined"){
    document.addEventListener("keypress",b,false)
  }
}, 0)

The first line references the global translated_warning_string variable that is declared in the blog’s HTML itself. Its contents should hint at the goal of this code. (Although it looks like not all languages are supported yet.) Assigning the variable to a means that it can be reused without making the code too long, but it seems that the variable is only actually referenced once here.

Variable b gets assigned to a function in the same way we manipulated functions earlier. The d = d||window.event; is a metaphor for if d is garbage use window.event instead, or else, leave it alone. It is a handy way to specify a default value in variable assignment when you’re uncertain whether the preferred value will work or not. In this case, d defaults to window.event which is a hack for older versions of Internet Explorer. (more on this later)

I was kidding when I said later. Here’s more on it now. Skip ahead a few lines and peek over at this part of the code:

if (typeof document.addEventListener != "undefined"){
  document.addEventListener("keypress",b,false)
}

The function we declared earlier, b, is used as the event listener for the document’s keypress event. It is triggered whenever someone presses a key while on the webpage (actually a little more complicated than this). Event listeners are functions that are supposed to be called with a single parameter, the event’s event parameter, which contains information about the event that triggered the call. This is not the case in older versions of Internet Explorer, hence the d = d||window.event; statement earlier.

When a keypress happens, the event actually travels down the tree to the target element first, from the root to the leaf (document → html → body → …). It gives each of these elements a chance to “capture” the event and handle it before it reaches its destination. This is known as event capturing, and is specified by the third argument to EventTarget.addEventListener. The other more commonly used and default behavior is to let the event bubble up if the destination doesn’t handle it. The event will bubble up until one of the destination’s ancestors catches it and stops the bubbling.

The code examined here chooses not to capture the event on the way down. (If it did, then all events would have to go through this handler.) The trade-off to not interfering with events that already have keypress handlers is that the behavior can be easily overridden by rogue sites (although this is easily detectable).

Back to where we were, we now see that variable d is an event, so c is its EventTarget, and also a dom-tree node. The variable-assignment-with-default trick is used again here.

if (c.type == "password"){
  if(confirm(a)){
    b = function(){}
  } else {
    c.value = "";
    return false
  }
}

If the event’s source is a password input, raise a confirmation dialog with the text stored in a, the translated confirmation string. If the confirmation passes, then b is set to an empty anonymous function, function(){}. Recall that b was previously defined to be another anonymous function and was also used in our event listener. Clearing this variable after the first successful confirmation prevents the script from prompting the user more than once, which is a good idea.

If the user rejects the confirmation, then the password input is cleared and the keyevent is halted with return false. Note that JavaScript dictates that the last semicolon in a block is optional, although it is usually good style to include it anyway.

Finally, note that the event listener is wrapped inside a setInterval(function() { ... }, 0). setInterval is a built-in Javascript function that runs a piece of code regularly. The second parameter is the delay in between calls, specified in milliseconds. In this case, 0 milliseconds is used, but the web browser imposes a lower limit on this delay.

The function’s contents checks the type of document.addEventListener. This function is part of the DOM that is set up near the beginning of every page load. Once the event infrastructure is available, the listener is attached. A more common way to achieve the same affect is to attach an event listener to the window object’s onload function, which is usually achieved through one of many Javascript libraries, although this solution is appropriate in the context of this script.

Anti-fraud Detection in Best of Berkeley

Roger Chen — Thu, 11 Apr 2013 08:02:53 +0000

Near the end of Spring Break, I helped build the back-end for the Daily Cal’s Best of Berkeley voting website. The awards are given to restaurants and organizations chosen via public voting over the period of a week. Somewhere during the development, we decided it’d be more effective to implement fraud detection rather than prevention. The latter would only encourage evasion and resistance while with the former, we could sit idle and track fraud as it occurred. It made for some interesting statistics too.

This first one’s simple. One of the candidates earned a highly suspicious number of submissions where they were the only choice selected in any of the 39 categories and 195 total candidates. Our fraud-recognition system aggregated sets of submission choices and raised alerts when abnormal numbers of identical choices appeared. The graph shows the frequency of submissions where only this candidate was selected and demonstrates the regularity and nonrandomness of these fraudulent entries.

Combined with data from tracking cookies and user agents, it’s safe to say that these submissions could be cast out.

The system also calculated and analyzed the elapsed time between successive submissions. It alerted both for abnormal volume, when a large number of submissions were received in a small time, and for abnormal regularity, when submissions came in at abnormally regular intervals. From the graph, it looks like it takes about 10.5 to 12 seconds for the whole process: reload, scroll, check vote, scroll, submit.

The calculations for this alert were a bit trickier than I expected. At first, I thought of using a queue where old entries would be discarded:

s := queue()
for each sort_by_time(submission):
  s.add(submission)
  s.remove_entries_before(5 minutes ago)
  if s.length > threshold:
    send_alert
    s->clear

This doesn’t work very well. Each time the queue length exceeded the threshold, it would flush the queue and notify that threshold submissions were detected in abnormal volume. So, I added a minimum time-out before another alert would be raised.

s := queue()
last_alert := 0
for each sort_by_time(submission):
  s.add(submission)
  s.remove_entries_before(5 minutes ago)
  if s.length > threshold and now - last_alert > threshold:
    send_alert
    s->clear
    last_alert := now

The regularity detector performed a similar task, except it would store each of the time differences in a list, sort it, and then run with a smaller threshold (around 0.3 seconds). Ideally, observations about true randomness suggest that these bars should be more or less horizontal, but this is hardly the case. After these fraudulent entries were removed, this particular candidate was left with a paltry 70-some votes, about 5% of its pre-filtered count.

Code.RogerHub » fraud

Tumblr’s Phishing Protection Code

Anti-fraud Detection in Best of Berkeley