/~r/sysadmin/ Have you tried turning it on and off again?

Does your file syncing tool have these features?

It’s a common misconception that building a file syncing tool like Dropbox is easy. File systems have a lot of features that most people don’t use or think about regularly. So every time a new cloud storage company starts, everyone only looks at how much free space they offer.

Here are a few features of file syncing tools, other than capacity, that your file syncing provider should provide. I’ll also describe how Dropbox handles each of these, based on my experience as a user.

Case-sensitive vs case-insensitive filesystems

OS X is notoriously case-insensitive when resolving file names. However, HFS+ will remember the case of file and directory names, so if you create a file named “sCReensHOTs”, it should stay that way. On the other hand, Windows’s NTFS is case-sensitive, and will treat “foo.TXT” and “foo.txt” as different files, although not all applications will correctly handle multiple files that differ only in case. Linux is case-sensitive as well, as required in POSIX.

Dropbox takes the OS X stance when it comes to case-sensitivity. It forbids two names that differ only in case, and it will sync the case of names (in most cases). Google Drive allows any number of duplicate names, so this is a moot point.

File permissions (especially the executable bit)

UNIX permissions (read, write, execute) are supported on OS X and Linux, but not on Windows (neither NTFS or FAT will support them). As far as I know, Dropbox will sync the user’s executable bit (whenever possible), but will ignore all of the other permissions bits. Most users won’t ever need to use these permission bits. The ones that do, well, they usually only care about the user executable bit (0500) anyway, except possibly in server contexts.

File modification date

Every file system I know of (including FAT32) stores file modification times. Many of them also store access times, creation times, and birth times (wtf?), but the most important of these is probably just the modification time. Dropbox will sync the modification time. I haven’t needed to use any of the other timestamps, but it’s possible that they sync them as well.

Extended attributes

HFS+ supports extended attributes, which are used for things like Finder.app’s flags and the OS X File Quarantine for downloaded files. Linux also supports extended attributes, but I’ve never used them. Linux also stores SELinux context with every file.

HFS+ supports resource forks, although I don’t think they’re widely used anymore. HFS+ also supports the “hidden” and “locked” flags.

Dropbox will sync extended attributes where possible, according to what I’ve read on the internet. In fact, Dropbox itself uses extended attributes (on OS X and Windows) to store metadata. Dropbox may attempt to sync some of this other metadata. But honestly, most of these flags aren’t useful to the typical user, and no portable program should depend on them.

Symbolic links are fairly common on OS X and Linux. Software that improperly handles symbolic links is even more common. Windows has Explorer shortcuts, but I don’t think those are exactly the same thing. OS X also has Finder aliases, which are essentially just normal files.

Dropbox will follow symbolic links and sync their contents. This is very useful if you want to sync an existing folder to Dropbox, but you don’t want to move it into your Dropbox folder. However, this behavior sucks if you use symbolic links as regular files and don’t want them followed (they might even point to non-existent targets). If you create a cycle of symbolic links, Dropbox will follow it to a depth of 3.

OS X and Linux filesystems also support hard links, fifos, UNIX sockets, and device files. I haven’t tried syncing any of these to Dropbox. But I’m pretty sure hard links will be treated as normal files (they won’t continue to be linked when synced to a different computer), and the rest of these files will be ignored (marked as “unsyncable” with the red X). That’s probably okay for most users.

OS X Packages

Package files are like normal directories, but they appear like single files in Finder. They’re primarily used for containing native OS X applications in /Applications, but they’re also used by Pages.app, Numbers.app, and Keynote.app to store documents (this may become an increasingly popular trend with first-party OS X apps). Most file syncing tools that access the file system via libc will be able to sync Packages like regular directories. But Packages are supposed to be updated transactionally, so an inconsistent package view on a recipient’s computer may cause issues.

Dropbox just treats package files as regular directories and won’t update them transactionally, as far as I know. I believe iCloud supports transactional updates to package files, but only if you’re building a native OS X app that uses their iCloud API’s. Either way, I haven’t heard of anyone actually running into problems because of non-transactional updates, so I suppose app developers just design around this issue.

Efficient filesystem event subscription

A core part of any file syncing tool is the part that notifies the application when a file changes and needs to be synced. OS X supports FSEvents and Linux supports inotify. Windows probably has something too. However, file change notifications are only half of the story.

A file syncing tool also needs to know when a file stops changing. A large write, for example, could take a long time. Syncing too early could cause an inconsistent version of the file to be committed to the server. If you download a large file to your Dropbox over a slow connection, the file may undergo many file system events until it is fully complete.

From what I can tell, inotify is less efficient than FSEvents, simply because inotify requires you to set up watches on every subdirectory in your synced folder, whereas FSEvents will watch subdirectories for free. But neither API will really tell you when a file has stopped changing. It’s up to the file syncing tool to decide when to start uploading new versions of a continuously changing file.

Based on my own experience, Dropbox properly handles file downloads and large file writes.

Data race safety

What happens if your file syncing tool starts uploading a file, and then it changes again? At the very least, you should not end up with an old version of the file in the cloud. You should definitely not end up with an old version overwriting your local copy. It would be nice if we cancelled uploading the old file, but not strictly necessary, as long as the new version were uploaded after. (If both versions are uploaded, then what if the upload of the new file finishes first? An outdated copy should never overwrite a newer one.)

Conflict resolution

If two clients both upload the same file, the file syncing tool should have a good plan to resolve the conflict. Last I checked, Dropbox just saves both versions with different names and lets you figure it out. This works if the conflicting file is a Word document. But what if the conflict occurs inside .git/objects/? Or inside a OS X Package file? Users don’t usually explore the inside of those files and won’t notice the conflict until something breaks. Conflict resolution should also gracefully handle 3 or more conflicting copies.

Keeping multiple copies is probably the “right answer”. But because this causes issues with machine-created files, I still avoid using Git with Dropbox. The only things in my Dropbox are media files (PDFs, images, zip archives, plain text files, etc).

Differential updates

When only a single byte changes in a large file, the file syncing tool shouldn’t need to upload the entire file again. Dropbox seems to handle small changes efficiently. In the best case, you should be able to sync something like a virtual machine disk image with no problem, after the initial upload is complete.

Deduplication

This feature only really matters if you’re paying for the file syncing tool’s infrastructure. A good file syncing tool should be able to deduplicate identical or similar files if they’re uploaded multiple times. For a multi-user installation, file syncing tools must also make sure not to reveal the existence of files in other user’s folders via timing attacks.

Dropbox supports deduplication, but was vulnerable to timing attacks until 2011 (look up “Dropship” online).

Transfer resumption

Computers lose their internet connections for various reasons (power outage, network failure, laptop standby). A good file syncing tool needs to be able to tolerate failures and resume uploads when the connectivity resumes. Dropbox seems to handle this well.

Friendly to corporate firewalls

The most draconian corporate firewalls will allow only DNS, HTTP, and HTTPS to the public internet. If your file syncing tool wants to support the most users out of the box (without needing to ask IT for firewall exceptions), then HTTPS is your answer. If you’re BTSync, then you could maybe get away with some home-grown encryption and non-standard ports.

Dropbox performs all data transfers over HTTPS. Their well-analyzed LAN sync feature runs on port 17500 with some L2 broadcast magic thrown in, but that doesn’t really need to connect to the public internet. An HTTPS-only syncing protocol is also useful for public Wi-Fi networks. Some public networks will only allow DNS and web traffic to pass through, in the interest of security.

Video transcoding

If you want to share a video or just watch your own videos, it’s useful for your cloud provider to provide video transcoding, so you can stream it on a desktop web browser or smartphone without needing to download it first. Bonus points for avoiding Flash Player for this. However, this is a big ask, especially for some smaller open-source options.

Dropbox supports video transcoding, but only at a basic level. Their video player doesn’t offer different quality levels like Google Drive does. From my experience, videos usually look pretty terrible when streamed directly from dropbox.com. Maybe they’ll fix this in the future.

Bonus features

There are a lot of extra features that a file syncing tool can provide:

  • Syncing performance, especially with large numbers of files
  • Version history
  • Web-based uploader for public use
  • Direct linking to images
  • MIME types suitable for hosting websites
  • Pleasant user interface
  • Native applications for Windows, OS X, Android, iOS, and Web (and maybe even Linux)

Conclusion

A file syncing tool that can upload and download files isn’t much better than just using Amazon S3. There are plenty of other important factors to consider when evaluating your options.