Working with Formula Queries

Scripting the Be File System, Pt. II

Scot Hacker, April 2001

Last month I introduced a set of shell tools designed for working with Be File System (BFS) attributes and queries from the command line, or from within scripts. In that column, I sort of glossed over the "query" command, saying that was a topic for another day. Well, it's another day. This month we'll take a deeper look at the BeOS "Query by Formula" syntax, and how you can put it to work for you. You should definitely read last month's column before reading this one. This column is based on the "Formula Queries" section of The BeOS Bible.

Note that some of the examples here have been wrapped onto two or more lines. You'll need to place these on a single line in the Terminal in order for them to work properly.

Starter Formulas

When using the BeOS Find panel (hit Alt+F in any Tracker window), you'll notice there are three items in the Query Type picklist: Name, Attribute, and Formula. The first lets you find files by name, the second lets you create fairly detailed queries from arbitrary parameter sets, and the third presents a text field. But if you set up an Attribute query with custom criteria and then switch to Formula mode, you'll get a long string of characters in the text field -- the formula query equivalent of the attribute query you just created.

So why are formula queries useful? Like all GUIs, the graphical interface onto the BeOS query functions is limited. For example, attribute queries let you search all files of a given type by arbitrary criteria. But what if you want to search multiple filetypes for the same criteria? What if you want to organize the AND and OR sets in the query into logical groupings for which the GUI doesn't have an appropriate interface? What if you want to run queries from within scripts, using real-time user input for the search criteria? What if you need case-sensitive queries? Formula queries solve all of these problems.

The basic structure of a formula query takes the form:

<attr-name> [logical-op] <value>
where [logical-op] is one of the operators in the table below.

Operator Meaning

==

Equal to.

!=

Not equal to.

<

Less than.

>

Greater than.

>=

Greater than or equal to.

<=

Less than or equal to.

!

Negates whatever it precedes; e.g. !>= means "not greater than or equal to."
You can specify relationships between named attributes and their values by using the operators in this chart.

In its simplest form, a formula query looks something like this:

(ThisAttribute=="ThisValue")
In other words, you can search on any attribute known to the system, and you can specify that that attribute is or is not equal to, or is less than or greater than, a value that you specify. The value is always contained within quotation marks, and the equation is always enclosed in a set of parentheses. Remember that many more things than you may expect are attributes. E-mail subject lines are attributes, Person files' nicknames are attributes, the Artist, Album, Genre, Year, and Song Titles attached to your MP3 files are attributes, and all filetypes are attributes. And even though filenames, modification times, and sizes are not officially attributes, they behave like them as far as queries are concerned.

Let's look at a few examples. To find all of the files on your system containing the string "config" in their names, you'd use this formula query:

(name=="*config*")
The problem with this sample query is that BeOS is case-sensitive, but you may want to find files with names like TermConfig, ConFigThis, and CONFIG.TXT. That's why when you switch from attribute mode to formula mode, you'll see all names expanded out with a lowercase and uppercase option for each letter:
(name=="*[cC][oO][nN][fF][iI][gG]*")
So any time you want to search on particular cases in filenames, just edit the formula to fit your needs, stripping out the case options for any letters that matter. For example, this query:
(name=="*Config*")
Will find files named Configurator and SomConfig, but not config.txt. Because the GUI Find panel is not case sensitive, you've already accomplished something you can't do with standard attribute queries.

Boolean Operators

A string like the one above can easily be combined with a second set of criteria by joining the two statements with an "and" (&&) or "or" (||) operator, then surrounding the whole mess in a new set of parentheses. If you want to find all filenames containing either "config" or "tweak," your query string will look like this:
((name=="*config*")||(name=="*tweak*"))
Of course, you can combine any type of criteria with any other type of criteria. In the next example, we're searching for all Person files that include the letters "CA" in their state field by combining a MIME type search with an attribute search.
((BEOS:TYPE=="application/x-person")&&
(META:state=="[cC][aA]*"))
Note that we've used a * wildcard after [cC][aA], but not before. This ensures that we'll only catch state names that start with "CA", and that we won't accidentally catch "North Carolina", for instance.
To learn the names of the attributes associated with any file on your system, open a Terminal window, cd to the directory containing that file, and type listattr filename.

Logical Groupings

Sometimes you'll need to create queries where sets of parameters are grouped together logically into larger sets. Because the GUI Query dialog only lets you create sequential lists of parameters, you'll need to jump into formula mode to change the way groups of parameters relate to each other. Here's a problem that seems like a simple query on the surface, but that actually takes a little juggling: find all e-mail you've ever sent to Be employees. If it were as simple as querying for "be.com" in the To: field, it would be a piece of cake. But what if you're subscribed to the beusertalk and bedevtalk mailing lists? Everything going to those lists is addressed to the be.com domain as well. And what if you've written to people at gobe.com at some point? Those messages need to be filtered out as well. What you need to do is satisfy two major groups of criteria:

1) The file must be an e-mail message and must include "be.com" in the To: field.

AND

2) The To: field must NOT contain any of the strings "beusertalk", bedevtalk", or "gobe.com". So you'll need a query structured like this:

((A and B) AND (not C and not D and not E))
Pay attention to where the parentheses are situated -- they're used to separate logical groups from one another. Since each of A, B, C, D, and E is a clause unto itself, each will need to be enclosed in parentheses as well.
(((A) and (B)) AND ((not C) and (not D) and (not E)))
When trying to make query formulas work, pay careful attention to placement of your parentheses as they relate to logical groupings -- one misplaced parenthetical will break the query.

A broken query does not return any kind of error message, it just yields zero results. There's no easy way to determine whether you're getting zero results because there are no matching files or because your syntax is wrong, so take care with your formula query syntax.

In the end, the working query looks like this (we've wrapped it here -- you'll have to imagine it all on one line):
(((BEOS:TYPE=="text/x-email")&&
(MAIL:to=="*be.com*"))&&
(MAIL:to!="*beusertalk*")&&
(MAIL:to!="*gobe*")&&
(MAIL:to!="*devtalk*"))
Note the != characters used to negate equalities. Note also that we've stripped out the case-insensitivity syntax to make this formula easier to read.

Multiple Filetypes

Once you've got grouping down, you can do another trick not possible from the GUI query interface: searching on multiple filetypes at once. Let's say you want to find all sound files of the type MPEG or AIFF whose names start with "burnside":
(((BEOS:TYPE=="audio/mpeg")||
(BEOS:TYPE=="audio/aiff"))&&
(name=="burnside*"))
Since you need to find either MPEG or AIFF files, they're contained within a set of parentheses all their own, and are separated by the || ("or") operator. You could easily extend this formula to search on four or five audio filetypes at once.
Remember: All queries must include at least one indexed attribute, or you'll get no search results. This means you should always make sure your formula queries include either a reference to filenames or to an attribute you already know is indexed. Type lsindex into the shell to see the list of currently indexed attributes.

Special Query Characters

The example above makes use of the "*" wildcard (to match any number or kind of characters). You can also use the [] bracket set to group ranges of characters. In addition to these, formula queries can also take advantage of the ? symbol for matching any single character, and the ^ symbol for negating the contents of characters specified within brackets.

Symbol Matches
* Any number of characters (including none)
? Any single character
[] The range/class of characters inside the []
[^] The negated range/class of characters inside the []

Working with Dates and Sizes

The GUI version of the Find (query) panel is quite forgiving and lets you enter dates in dozens of formats and file sizes in bytes, kilobytes, or megabytes. Formula queries require more specific date and file-size formats. All dates are expressed in elapsed seconds since New Year's Eve, 1970 (which is standard POSIX date notation). The easiest way to get this number is to begin your query in attribute mode and then switch to formula mode.

File sizes always need to be entered as bytes; when converting from megabytes or kilobytes to bytes, just add the appropriate number of zeros. For example, 2MB equals 2000000 bytes, 2K equals 2000 bytes (yes, technically there are 1024 bytes in a kilobyte, but it's hard to imagine a scenario where you would need to specify that much detail in file sizes -- make it easy on yourself and just add the zeros).

Remember that when runing numerical queries, the indexes for the attributes you're querying for must have a datatype of integer, rather than string. For example, nothing will stop you from creating a string index called "TEXT:Year" for your custom filetype. But if you later want to search for all files of that type created between 1981 and 1989, the query will fail, since "1984" will be indexed as a string of characters, not as a number. But if you create the year index as an integer, you'll be able run a formula query like this:

query "(((TEXT:Year>=1981)&&
(TEXT:Year<=1989))&&
(BEOS:TYPE=="text/x-foobar"))"
In fact, the RipEnc shell script lets users run a live "era" query by defining earliest and latest recording dates, returning all MP3s on the system recorded between those dates, and saving the results as a playlist. In this example, you can see where the years input by the user are stored as variables then inserted into the query string:
query -v "$QueryVolume" 
"(((Audio:Year<=$StartYear)&&
(Audio:Year<=$EndYear))&&
(BEOS:TYPE=="audio/x-mpeg"))"

Getting Work Done

You can combine a formula query with other shell commands to perform further operations on your query results. For example, let's say you keep all of your business contacts in BeOS Person files, and you need a printout of just the names of your American partners -- something neither the People application nor the Tracker will let you do. This Terminal command will redirect its output to a single text file in your home directory, which you can then print out or e-mail to someone else:
query '((META:country=="*[uU][sS][aA]")&&
(BEOS:TYPE=="application/x-person"))' >
 /boot/home/contacts.txt

The query results will be sent to standard output and redirected into a text file living in your home directory. You can then open that file in any text editor and print it out. Here's another good one. Let's say you want to delete all MPEG files starting with "burnside", regardless where they live on your system. This command:

rm `query '((BEOS:TYPE=="audio/mpeg")&&(name=="burnside*"))'`
simply applies the rm command to the results of the content appearing within the backticks. Thus you end up running a query and a batch delete operation simultaneously.

You could just as easily pipe the results of an arbitrarily complex formula query through zip to create backups of certain files, or through one of the command line tools discussed last month to batch-edit attributes on files matching certain criteria, or perform any other operation for which there is a shell tool.

Handle with care, and have fun!

Note: All queries search the current disk volume by default. If you have more than one BFS volume and want to search them all simultaneously, use the -a flag. If you just want to search specific volumes, use the -v flag, followed by the volume name, e.g.:

query -v /tink <expression>

BeView Content Archives