Solving the Filetyping Puzzle

How BeOS frees users from limited filetyping schemes

Scot Hacker, 7/23/00

The topic du jour is the humble -- but surprisingly involved -- question of filetyping. Specifically, what happens when you double-click on a file? How does the OS keep track of document filetypes? How does the OS know in which application to launch certain filetypes? How does the user control filetypes and application associations, either on a system-wide or on a single-file basis? How can an untyped file be reliably identified by the OS? Why have virtually all other operating systems to date blown it where filetyping is concerned?

There are a number of different-but-related problems to be solved here. To create a system that is transparently easy to use, but which does not shield the power user from power options, and which does not impose arbitrary limitations, involves some relatively complex design and engineering considerations.

I have used Windows, Linux, MacOS, and BeOS extensively, and am convinced that only BeOS does filetyping right. In fact, the BeOS filetyping solution is a perfect example of the success of Be's primordial strategy: To re-examine the failings and crude assumptions of OSes past, to learn from their mistakes, and to do it right this time.

Those who have been reading my stuff for a long time may consider this topic old-hat, but Free BeOS has brought a ton of new users to the platform, and recent discussions on beusertalk have reminded me that BeOS technology many of us now take for granted is still somewhat alien to new users.

Sins of the Past

In DOS/Windows (all flavors) and Unix/Linux, filetyping is handled entirely by the presence of 3- or 4-letter filename extensions. A file ending in .doc will open in Word or WordPad, a .txt file in Notepad or other preferred text editor, and so on. It's a simple and logical system, but has several disadvantages. First, new users have to learn these extensions. This problem is ameliorated somewhat through the use of Save dialogs that add extensions automatically, and later versions of Explorer hide filename extensions by default (which just masks the problem, rather than fixing it).

Second, a filename without an extension can't be identified properly by Windows or Unix. The user is not at liberty to name a text file "meeting notes" and expect to have it opened in their application of choice when double-clicked. This problem crops up all the time when Windows users receive files from MacOS or BeOS users. Windows simply doesn't know what to do with no-extension filenames, and has no mechanism to figure it out. It's an arbitrarily imposed limitation, it's not modern, and it's overly restrictive. It's just not good OS design.

Finally, the only way to change the type of a file in Windows or Unix is to rename it -- an unpleasant task if you've got thousands of files scattered through various subdirectories. And what if you want to change the associated application for one directory full of HTML files, while leaving associations for HTML on the rest of the system untouched? Again, Windows and Unix do not allow this kind of workflow freedom.

In MacOS, filetype information is stored in each file's resource fork. This is quite a bit better, as it allows users to name files elegantly, without having to learn filename extensions. Unfortunately, MacOS makes a gross assumption: That the app that created a file is the best app to launch it in. But what if you create your JPEGs in Photoshop, but want to be able to view them quickly in a fast-loading image editor, without having to launch the kitchen sink?

This assumption can also result in some fairly ridiculous situations. For example, what if you've got some HTML files you created in SimpleText, others created in BBEdit, others saved down from Netscape Navigator, and still more created in MS Word? Now you've got a pile of HTML files that logically should share the same type (and probably the same association), but that actually bear four different icons and open in four different apps when double-clicked. Yes, there are workarounds, but the fact is that the MacOS filetyping system is brain-dead too. It assumes too much, and puts too little power in the hands of the user.

The BeOS Panacea

In BeOS, meta-data can be stored in the "attributes" of a given file or filetype. I've described the power of attributes many times before, and their power extends far beyond the mere problem of filetyping. But the one attribute every file on a BeOS system is guaranteed to have is BEOS:TYPE -- the filetype attribute. The contents of BEOS:TYPE are an official or unofficial MIME type -- the same kind of MIME types used to identify on the Internet, such as audio/x-mpeg or text/html. These MIME types also have "Friendly names" and descriptions, so you don't have to remember the awkward MIME types if you don't want to.

The BeOS FileTypes panel lets users establish correspondences between MIME types, "Friendly names," icons, filename extensions, preferred handling applications, and associated attributes.

One advantage of using standard MIME types is that it guarantees a good level of integration with Internet services. If a Web server sends an appropriate MIME type, that file will be correctly typed on the receiving BeOS system automatically. And of course, BeOS applications create documents with appropriate MIME types, without forcing associations. A JPEG image created in ArtPaint may still be launched in ShowImage, or anything else. The creating application assigns a reasonable MIME type, but does not assume that it should "own" that document. Another side effect of using MIME types in the filesystem is that BeOS web servers don't need to maintain separate extensions tables for sending MIME headers -- they just serve files with the MIME type stored in that file's BEOS:TYPE attribute.

BeOS users can set associations between filetypes and their handling applications in a variety of ways. A central FileTypes panel lets you set associations between MIME types, "Friendly names," icons, and (optionally) filename extensions on a system-wide basis. In addition, the FileType Tracker add-on lets you set these parameters for a single file or group of files. This means that all HTML files (for example) will have a single association by default, probably with your favorite web browser. However, the user can quickly tell BeOS to associate a directory full of HTML files with a preferred text editor, which is a boon to webmasters. And of course, users can easily create custom MIME types with custom collections of attributes, or add additional attributes to existing filetypes for specific purposes.

The MIME type or preferred handling app can be set for any number of selected files, indepent of -- and overriding the -- global defaults.

Note that nowhere in this scenario is there any requirement for documents to end with appropriate filename extensions, nor is there any assumption that the creating application is also going to be the default handling application. BeOS users get consistency, freedom from arbitrarily imposed limitations and requirements, the ability to change associations system-wide or file-by-file, and no brain-deadness. And all of this control is also available from the shell as well as from the GUI, which means you can set and change filetypes and associations from within scripts, if desired.

Handling Ambiguities

That's all well and good for files that already have a known type or that are created in BeOS, but what happens when you grab files over FTP (since FTP servers don't send MIME headers)? What about files retrieved from mounted FAT, NTFS, or ext2fs volumes?

The solution to this has a couple of dimensions. First of all, the central FileTypes panel will let you set up optional filename extensions to go with your filetypes. If you tell FileTypes that all files ending in .htm or .html are of type text/html, then any unidentified HTML files will be typed as text/html. But how and when does this happen? A system daemon called the Registrar runs contantly in the background, seeking out untyped files when your system is idle. If it finds one, it will do its best to assign the correct MIME type to the file, and its first recourse is to look to see if you've set up an an extension correspondence. In this way, extensions are never required, but can be used as a 2nd recourse in ambiguous situations.

Alternatively, you can force the Registrar to try and identify a file by right-clicking a file or set of files and selecting "Identify" from the Tracker's context menu. When you double-click an untyped file, the Registrar is called into action immediately, before the file is passed to a handling application. Finally, you can run the mimeset command from the shell, which is the same as selecting "Identify" from the Tracker.

And what if you try to Identify or mimeset a file without a filename extension, or for which there is no extension set up in FileTypes? BeOS has an answer to that situation as well. The Registrar will examine the first few lines or bytes of the file and compare them to a set of "sniffer rules," which store correspondences between common file headers or strings contained in known file types. If a match is found, it will be used to perform an appropriate identification.

Note: To see all of the sniff rules in use on your system, type "setmime -dump"; to see the rule for a given filetype, use "setmime -dump text/html", replacing text/html with the MIME type in question. Don't confuse mimeset and setmime -- they're different commands for historical reasons. The sniff rules can also be added to or modified by use of the setmime command. Type setmime --help for a complete list of options.
To summarize, the Registrar goes through these steps when you double-click an untyped file:
  1. Does this file have a BEOS:PREF_APP attribute? If so, use that preferred application, rather than the preferred application specified by this file's MIME type (this handles situations where the user has changed the preferred app for this file without changing its MIME type).

  2. If not, does this file have a BEOS:TYPE attribute? If so, launch the file in the system-wide preferred application specified for this file's MIME type.

  3. If this file does not have a BEOS:TYPE attribute, determine its MIME type with the steps below, then return to step #2.

    1. Does this file have an extension registered in the global FileTypes panel? If so, assign it that MIME type.

    2. If no extension, or if this extension is not registered in FileTypes, read the first few lines or bytes of this file. Was a pattern found matching any of the system sniff rules? If so, assign the corresponding MIME type.

    3. If no matching pattern was found in the sniff rules, tell the user that no filetype could be determined (give up and let the user handle the situation).

Puzzle Solved

While this may seem like a lot of complexity to address a seemingly simple problem, the vast majority of files on a given BeOS system are already correctly typed, and none of this ever has to transpire. The identification process for untyped files runs in microseconds, and is totally transparent to users.

Because Be was able to identify and effectively overcome the arbitrary rules and assumptions of other OSes in this department, BeOS users enjoy a greater degree of freedom and flexibility than users of other OSes. At the same time, BeOS systems are highly compatible with other operating systems (since BeOS handles incoming files with or without extensions with equal ease), and BeOS users have the ability to control filetyping either globally or locally. The result is a better user experience -- one not bound by architectural decisions of the past.

BeView Content Archives