by Chris Herborth
In this section:
Shell Scripting 101
Looping
The for Loop Defined
A Better Version of a for Loop
Substitutions
Simple Substitution
Combining Substitutions with a for Loop
Command Substitution
Using Backticks
Doing Tests
Other Commands for Tests
Testing with One Argument
Using File Tests
Testing with Two Arguments
Checking the Exit Status
Protecting the $? Variable
Testing the Opposite
Something Useful
Getting the MIME Type
Shell Scripting 101
Now that we've spent some time going on about the difference between shell and application scripting, why we'll use bash with hey for application scripting, and how to set up a script, isn't it about time we started looking at something more useful? Well, your wish is my command.
In Chapter 6, The Terminal, you learned how to copy, move, and rename files using the cp and mv commands. You probably thought, "Great, I can use this for lots of things!" and merrily started rearranging your filesystem.
Soon, though, you'll run into a problem: These commands only work on one file or directory at a time (for renaming) or one destination at a time (for copying and moving). What if you've got a big directory full of files that you want to rename? What if some silly program (or user!) has set a whole bunch of files in a bunch of directories to the wrong filetype, and you want to fix them now instead of waiting around for the Registrar to do it?
To do these sorts of tricky things, you're going to have to learn a little shell programming. Don't worry, though--it's easy!
Looping
Say you've got a directory full of files you've downloaded from the Web; they're all text files, but none of them have file extensions. Since you like to share your files with other operating systems (possibly even on the same computer), you want to give them all a .txt extension so operating systems without a studly MIME typing scheme will have a clue what to do with the file.
Setting Up a Loop You could rename each file from the Tracker, or from a Terminal using mv, but that'd take ages if there are lots of files. If you'd like to save some time (and your brain), you could rename all the files in one fell swoop using a for loop in the Terminal:
$ for i in * ; do mv "$i" "$i.txt" ; done
or, if you're writing it as a shell script, you could add some extra white space to make it more readable:
#! /bin/sh
for i in * ; do
mv "$i" "$i.txt"
done
|
If you do this in the shell, it'll look something like this (it won't let you add tabs to make it more readable):
$ for i in * ; do
> mv "$i" "$i.txt"
> done
Note the "secondary" prompt that you'll get when you continue a command over several lines; that's the > in the example. |
|
Just as the PS1 environment variable controls the default prompt (which starts life as $), the PS2 environment variable controls the secondary prompt (which starts life as >) that you get when you continue a command across lines.
For example, if you do this:
export PS1="Keeps going: "
export PS2="... and going: "
and type that for loop again, you'll see something like this:
Keeps going: for i in * ; do
... and going: mv "$i" "$i.txt"
... and going: done
|
Don't panic--I'm about to explain every detail of the syntax used in this construct. This for loop takes every file in the current directory (represented by the * wildcard) and runs through the commands between do and done; every pass through the commands assigns one of the filenames to the variable named i (variables are explained in Chapter 6, The Terminal). For example, if you've got files named chris, henry, scot, and simon in a directory, the script will treat each of those files in turn. On the first trip though, the mv command will assign chris to the variable i, then rename $i to $i.txt. In other words, chris gets renamed to chris.txt. The second pass through will rename henry to henry.txt, and so on.
|
I've put quotes around the variable ("$i" instead of just $i) in case there are any files with spaces in their names. If you don't, the shell will think that there's one argument per part of the filename(a file named "this is a test" would be seen as four arguments, for example). This isn't too important in an echo command, but will cause the command to fail (or do something unexpected!) with cp or mv. |
The for Loop Defined The general form of the for loop (you can type help for in the shell if you need a reminder) is
for NAME in WORDS ; do
COMMANDS
done
NAME is the name of the variable (it can be any combination of letters, numbers, and the underscore character; I usually use i because I'm too lazy to think up a better name or type all of index), which will be assigned one of the WORDS on each pass through the set of COMMANDS. COMMANDS can be several shell commands. For example, if you're paranoid (like I am), or just like to get some feedback so you know your script is doing what you intend, you can expand the renaming loop to look like this:
for i in * ; do
echo Renaming "$i"...
mv "$i" "$i.txt"
done
When running this script from our example directory, the Terminal would report the following:
Renaming chris...
Renaming henry...
Renaming scot...
Renaming simon...
A Better Version of a for Loop If you find you're doing this sort of thing a lot, you could create a slightly better version of this script. A full shell script version of this set of commands might look like this:
#! /bin/sh
#
# Rename all the dropped files to have .txt extensions,
# then give them the text/plain filetype.
# Loop through all of the command-line arguments, which we
# collect using the special variable $@, described below.
for i in "$@" ; do
# Rename the file:
echo Renaming "$i"...
mv "$i" "$i.txt"
# Make sure it's got the right filetype:
settype -t text/plain "$i.txt"
done
At the start, we've got the magic cookie for a shell script, plus a couple lines of comments to remind us what this script does. After that, there's one new thing in this shell script: The $@ in the for loop is a variable that holds all of the arguments that we specified when we called the script (arguments are covered in Chapter 6, The Terminal, in the Basic Shell Syntax section). Going through the for loop, the commands will be executed for every argument, which is exactly what we want. Inside the loop, each argument is renamed to end in .txt, and is then given a filetype of text/plain.
Save this file as txt_renamer, use chmod +x txt_renamer to make it executable, and store it in ~/config/bin; now you can use it to rename text files (and give them the right filetype) any time you want by invoking it and passing it filenames like this:
$ txt_renamer chris henry scot simon
Substitutions
Imagine that that after we've renamed our directory full of files and set the type to plain text, we discover that they're all really HTML documents and that their new names and types are going to confuse everyone.
The best thing to do would be to rename these files to have normal .html extensions and set the filetype to text/html. But watch out-- the first thing that comes to mind is
#! /bin/sh
for i in *.txt ; do
mv "$i" "$i.html"
done
But if we go this route we'll end up with a bunch of files named whatever.txt.html Not quite what we wanted. Wouldn't it be nice if we could strip off the .txt extension before we added .html? If you remember reading about sed in Chapter 6, you might think we could use it somehow; a command like sed -e s/.txt/.html/ will replace .txt with .html for us. This starts to get tricky, though, and all we wanted to do was something simple.
|
If you're coming to bash from the DOS world, you may be surprised that such a simple thing as renaming a batch of files should be so involved. After all, DOS lets you type ren *.txt *.html and be done with it (and no, simply typing mv *.txt *.html into the shell does not work; this will attempt to move all the .txt files and all of the .html files into the last .html file!). As you've no doubt realized by now, bash lets you do things that DOS could never even dream about. Unfortunately, there are side effects to the shell's flexibility and power, and this little file renaming quirk is one of them. Rest assured, though, that examples like this--where bash actually makes things harder than they are in DOS--are few and far between, and that the almost unlimited power you get in return is well worth any extra effort. Plus, we're going to learn a lot about the shell's possibilities by working on this renamer. |
Simple Substitution Luckily, bash gives us a simple way of doing what we want:
#! /bin/sh
for i in *.txt ; do
mv "$i" "${i%.txt}.html"
done
The tricky bit of this script is in the mv command; we've stuck in curly brackets with some extra stuff.
These curly brackets turn into a substitution; which the shell will use to transform text according to your commands. In this case, the % command is used to strip some text from the end of the i variable's contents. It's called a "substitution" because the transformed text is substituted for the original text.
The substitution looks like this:
${i%.txt}
and if the contents of the i variable match .txt, you'll get the contents of i without .txt on the end. Let's play with this in the shell a bit to see what I mean:
$ TEXT="hello there"
$ echo ${TEXT%what}
hello there
$ echo ${TEXT%there}
hello
$ echo ${TEXT%the}
hello there
You'll get back a "hello there" the first time because "what" doesn't match anything at the end of $TEXT. The second time, you'll get "hello" (actually "hello " with a space after it) because "there" does match the end of the text. The third time, you might expect to get "hello re", but you don't..."the" doesn't match the end of the text, so nothing happens.
The general form of the % text-stripper is:
${variable%text}
and it removes the given text from the end of variable's contents.
Combining Substitutions with a for Loop This construct is handy because it lets us strip off unwanted bits at the end of things, such as the incorrect .txt extension in our example. If we try this out with our example directory of files:
$ ls
chris.txt henry.txt scot.txt simon.txt
$ for i in *.txt ; do
> mv -v "$i" "${i%.txt}.html"
> done
The shell will return this:
chris.txt -> chris.html
henry.txt -> henry.html
scot.txt -> scot.html
simon.txt -> simon.html
$ ls
chris.html henry.html scot.html simon.html
Of course, we haven't set the filetype properly, but we can do that pretty easily now:
settype -t text/html *.html
Type settype -h in a Terminal window for details on using the settype command.
The shell supports a bunch of different replacement and substitution commands, and they're all just as "easy" to remember as the % substitution (see Table 1). Some of them are pretty esoteric, and you won't end up using them very often, if ever. Still, it's handy to know they exist when you need to manipulate command-line arguments or other strings in your shell scripts. It's always faster to use the shell's built-in substitutions than to use another command like sed.
Table 1 Shell Replacement and Substitution Commands
Command |
Description |
${parameter:-word} |
If parameter isn't set, or is empty, return word; otherwise return parameter. This can be handy if you want to check an environment variable and provide a sensible default if it hasn't been set. |
${parameter:offset} |
Return a substring of parameter starting at offset. For example, if TEXT is set to "hello world", echo ${TEXT:5} will print "world"; the first six characters (because you start at 5 and the first character is offset 0) are skipped. If offset is negative, it's used as an offset from the end of parameter; echo ${TEXT:-3} should print "rld" because we're getting three characters from the end of $TEXT. |
${parameter:offset:length} |
Return a substring of parameter starting at offset and going for length characters. echo ${TEXT:5:3} will print "wor", which is three characters starting at offset 5. |
${#parameter} |
Return the number of characters in parameter. |
${parameter#word} |
If word matches the beginning of parameter, return parameter with word deleted from the beginning. For example, echo ${TEXT#hello} will return "world". This is the opposite of the % substitution that we used earlier. |
${parameter%word} |
If word matches the end of parameter, return parameter with word deleted from the end. We've already used this one. |
${parameter/pattern/string} |
If pattern matches part of parameter, return parameter with pattern replaced by string. For example, if FOO is set to "eeeeek", echo ${FOO/e/E} will return "Eeeeek". The pattern can include shell wildcards (see Chapter 6, The Terminal). |
${parameter//pattern/string} |
If pattern matches part of parameter, return parameter with all instances of pattern replaced by string. Using FOO again, echo ${FOO//e/E} will return "EEEEEk". |
Command Substitution
You've already learned how to make the output of one command function as the input of another (in the Redirection section of Chapter 6, The Terminal), but what if you want to use the output of one command as an argument to another command? That's a subtle distinction, but consider this: What if you've got a file that lists all the files you want to run through in a for loop? You could look in the list of files and type everything out on the command line, but that's too much work.
Why not just embed the command you'd use to display the list? You can actually embed one command inside another one:
#! /bin/sh
for file in $(cat list) ; do
something
done
The command between $( and ) is run, and its output is used as an argument, or even as a command with arguments of its own. In this case, cat will print out the list of files, and the for loop will run through them. But you could also have a command stashed in a file somewhere and run it with:
$(cat /tmp/some_file)
If /tmp/some_file had ls /boot in it, you'd see a listing of the files and directories in /boot.
Using Backticks Sometimes you might see `command` instead of $(command); these are equivalent, but the first form, which uses backticks (`), is "deprecated." That's a geek way of saying, "Don't use this." Using the $(command) form will also save you from Quoting Hell, and it's much easier to tell there's a subcommand in there.
Doing Tests
Sometimes you'd like to execute part of a script depending on something else, such as whether a file exists or whether an environment variable is set. This is done by using an if statement to test whether the relevant condition is true. The general form for a simple if statement is
if TESTS ; then
COMMANDS
fi
If the TESTS turn out to be true, the COMMANDS between the then and fi are executed. In this sort of construct "true" is defined as a number that isn't 0, a string that isn't empty, or a zero return value (a.k.a. an "exit status") from a program or script.
0 the number and 0 returned by a program are different; the 0 number can be typed right into your script, but exit status values are a little different; the system keeps track of these. When a program sends back an exit status of zero, it means that everything worked.
A Bit about the Exit Status
Every command that you run in the Terminal sends back an "exit status" when it finishes to let the shell know whether it succeeded or not; this number is kept hidden by the shell (you won't see it printed in the Terminal). This idea of an exit status is a little strange at first, but you can test it yourself. Every BeOS system comes with a couple of commands named true and false; these commands don't do anything but return an exit status. The true command's exit status is 0, and false's is something else. It doesn't actually matter what else, as long as it's not 0.
You can try these out in an if statement:
$ if true ; then
> echo we got true
> fi
we got true
$ if false; then
> echo we got false
> fi
$
You can use the true and false commands anywhere you'd normally use a test. |
|
The most common command to use as one of the TESTS is, oddly enough, test, which returns an exit status of true if its test succeeds, or false if it doesn't. For example, the test command to check to see if a string isn't empty is test -n; you can use this to see if an environment variable is set or not:
$! /bin/sh
if test -n "$FOLDER_PATH" ; then
echo "FOLDER_PATH is set"
fi
If FOLDER_PATH is set to something, you'll see "FOLDER_PATH is set" in your Terminal.
You can also form the test command using square brackets. This lets you write the test for FOLDER_PATH like this:
$! /bin/sh
if [ -n "$FOLDER_PATH" ]; then
echo "FOLDER_PATH is set"
fi
Most people think it's much easier to read the version with square brackets, so I'll be using them throughout the rest of this chapter.
|
Other Commands for Tests Any command can be used as a test in the if statement. Properly written command-line tools will have an exit status of true if they succeed and false if there's an error.
|
Remember, these exit status values are kept hidden by the system; you won't actually see the words "true" and "false" appearing in your Terminal after running a command. |
For example, this command:
#! /bin/sh
if chmod +w filename ; then
echo "Made filename writeable."
else
echo "Had an error."
fi
will print "Made filename writeable." if the chmod command succeeds, or "Had an error." if it fails (which will happen if filename doesn't exist). This can be handy if you want to do something special when a command fails or display a custom error message.
An if statement can be more complex, too:
if TESTS_1; then
COMMANDS_1
elif TESTS_2 ; then
COMMANDS_2
...
else
COMMANDS_N
fi
Each additional set of tests and commands is attached with an elif (short for "else if") statement. If none of the tests succeeds, the commands in the else statement will be executed. The else statement is optional.
Testing with One Argument Using the test command, or its more readable cousin [ ... ], you can test quite a few things, such as whether a file exists, what type of file it is, and whether a string is empty or not (see Table 2).
To test one argument, like whether a file exists, you'd use
test op argument
or
[ op argument ]
where op is the kind of test. For example, test -e is used to see if a file exists, so you can check to see if there's a file named bozo in /boot with
test -e /boot/bozo
or
[ -e /boot/bozo ]
Just running the test command like this isn't very useful, so you'd stick it inside an if statement:
#! /bin/sh
if test -e /boot/bozo ; then
echo "/boot has a bozo"
else
echo "no bozo"
fi
Of course, being so smart and friendly, you'd want this to be more readable, so you'd use this version instead:
#! /bin/sh
if [ -e /boot/bozo ] ; then
echo "/boot has a bozo"
else
echo "no bozo"
fi
|
Are those spaces before and after the brackets really necessary? Could you write the test like this instead?
if [-e /boot/bozo]; then
If you try this, you'll get back an error message like "[-e, command not found." The spaces aren't just there to make the script more readable, they're actually necessary--the shell can't understand the test without them. The shell can be very picky about syntax sometimes. |
Table 2 Tests for One Argument
Test |
True If |
Comments |
-d FILE |
FILE exists and is a directory. |
|
-e FILE |
FILE exists. |
This will succeed whether FILE is a normal file, a directory, or a symbolic link. |
-f FILE |
FILE exists and is a normal file (i.e., not a directory or symbolic link). |
|
-L FILE |
FILE exists and is a symbolic link. |
See Chapter 5, Files and the Tracker, for information about symbolic links. |
-n STRING |
STRING isn't empty (that is, has at least one character inside, even if that character is a space or a tab). |
A string can be any chunk of text, such as the contents of an environment variable or something you type between quotes. This can be handy for checking whether an environment variable is set or not. |
-r FILE |
FILE exists and is readable by you. |
|
-s FILE |
FILE exists and is not empty (that is, has more than 0 bytes of data inside; file attributes don't count). |
|
-w FILE |
FILE exists and you can write to it. |
|
-x FILE |
FILE exists and you can execute it. |
|
-z STRING |
STRING is empty (that is, has no characters inside). |
|
|
Test Quick Reference If you need a quick reminder to help you find the test you're looking for, try typing help test | less in a Terminal window. It's a good idea to pipe it into less because the help message for test is really long! |
Using File Tests Let's try out a few of the tests from Table 2 in the Terminal to see how they really work. Open up a Terminal and cd to /boot. Now type this:
$ if [ -e beos ] ; then
> echo beos exists here
> fi
When you hit Enter after typing fi you should see "beos exists here," which is obviously true if your system managed to boot. In this example the -e flag inside of the square brackets performs an existence test on its argument "beos" and tells us that, indeed, there is a directory here named /boot/beos. Let's see what else we can learn about it:
$ if [ -f beos ] ; then
> echo beos is a normal file
> elif [ -d beos ] ; then
> echo beos is a directory
> else
> echo beos is an unknown kind of file
> fi
You'll see that "beos is a directory" (well duh, we already knew that).
Testing with Two Arguments There are also some tests that work in pairs and take two arguments. These are used for comparing two files, two text strings, or two numbers (see Table 3).
To compare the dates of two files, for example, you'd use:
test argument1 op argument2
or the ever-popular:
[ argument1 op argument2 ]
where op is the kind of test. For example, -nt is used to see if the first argument is newer than the second. To see if /boot/beos is newer than /boot/home (this would tell you if the system has been updated since being installed), you'd do this:
#! /bin/sh
if [ /boot/beos -nt /boot/home ] ; then
echo "This system was probably updated."
fi
Table 3 Tests for Two Arguments
Test |
True If |
Comments |
NUMBER1 -eq NUMBER2 |
NUMBER1 equals NUMBER2. |
You can use this to check for specific exit status values; you'll see how in Checking the Exit Status, below. |
NUMBER1 -ge NUMBER2 |
NUMBER1 is greater than or equal to NUMBER2. |
|
NUMBER1 -gt ARG2 |
NUMBER1 is greater than NUMBER2. |
|
NUMBER1 -le ARG2 |
NUMBER1 is less than or equal to NUMBER2. |
|
NUMBER1 -lt ARG2 |
NUMBER1 is less than NUMBER2. |
|
NUMBER1 -ne NUMBER2 |
NUMBER1 isn't equal to NUMBER2. |
|
FILE1 -nt FILE2 |
FILE1 is newer than FILE2 according to the modification date and time. |
|
FILE1 -ot FILE2 |
FILE1 is older than FILE2 according to the modification date and time. |
|
STRING1 = STRING2 |
The strings are the same. |
This is handy if you want to check command-line arguments in a shell script or see if an environment variable is set to a specific string. |
Say you're writing a script and you want its error messages to behave differently depending on the SCRIPT_ERRORS environment variable. In the documentation, you let the user know that they can set this to "polite" or "stressed" depending whether they want to see calm or overwrought error messages.
In your script, you'd handle this by doing something like this:
#! /bin/sh
if [ "$SCRIPT_ERRORS" = "polite" ] ; then
echo "Your files have been deleted. Sorry."
elif [ "$SCRIPT_ERRORS" = "stressed" ] ; then
echo "ARGH! My life is over, I killed your files..."
else
echo "Hey, SCRIPT_ERRORS is wrong; please set it to:"
echo "polite or stressed."
fi
Checking the Exit Status The arithmetic tests (-eq, -ge, -gt, -le, -lt, and -ne) are handy for checking the exit status of another command in a script. You'll remember from A Bit about the Exit Status that every command sends an exit status back to the shell when it finishes to tell the shell whether it succeeded or not.
The exit status of the last command can be found in the magic environment variable $?, which always contains the exit status of the last command executed by the shell. By convention, an exit status of 0 means that all is well, and anything else is an error:
#! /bin/sh
if [ $? -ne 0 ] ; then
echo "oh no, an error"
else
echo "everything is good"
fi
In this example we're testing the value of one argument, whatever is currently in the $? variable, against the value of 0 by using the -ne test. -ne asks if they're not equal, so if the current value of $? is not 0 then the script prints "oh no, an error." This if statement lets scripts respond to errors.
Protecting the $? Variable Because the value of the $? variable can change as the script runs, the best way to preserve a particular exit status is to store it in another variable. This is important because even minor steps occurring later in the same script will overwrite the value of $? with a new exit status. Even adding an elif clause to the if statement will overwrite the value of $? with the exit status of the test at the start of the if!. In this next example we store the value of $? in another variable (called return_value), so it doesn't get changed while our script is running.
#!/bin/sh
# Save the exit status of the last command in return_value.
return_value=$?
# Now we can check for specific values in the return_value;
# This assumes that the last command returns an exit status
# of 1 when it can't find a file, and anything else is a
# general error
if [ $return_value -eq 0 ] ; then
echo "everything is good"
elif [ $return_value -eq 1 ] ; then
echo "file not found"
else
echo "some other error"
fi
Testing the Opposite
If you need to test the opposite of something, you can use ! to negate a test. Maybe you'd like to know if a file isn't a directory, but don't care if it's a normal file or a symbolic link:
#! /bin/sh
if [ ! -d /boot/beos ] ; then
echo "/boot/beos isn't a directory"
else
echo "/boot/beos is a directory"
fi
Here we use the -d test to see if the argument in the test is a directory but add the ! symbol, which makes the test return true if the opposite of -d is found. These kinds of negative tests are most often used with file test operators; The string and arithmetic tests already exist in negative versions. There's no reason for you to type
if [ ! ARG1 -eq ARG2 ] ;
when you could use
if [ ARG1 -ne ARG2 ] ;
The second version is easier to read and easier to type.
Something Useful
So far, we've gone over the for loop, how to do substitutions, how to do tests, and the if statement. We really ought to be able to make something useful now, right?
A couple of our earlier examples involved taking a bunch of "unknown" files we got off the Web and giving them an extension so that other, less fortunate operating systems might be able to guess what to do with them. The examples also pointed out a problem: We could screw this up pretty easily if we blindly assumed all the files were the same kind.
Wouldn't it be better if we could add a file extension based on the actual MIME type of the file? If we're downloading piles of stuff from the Internet with a browser like NetPositive, all the files will have correct MIME types. Since these files are usually coming from a system that doesn't have MIME types for files, they'll probably have file extensions already, so we won't have to worry about them.
Unless the files we're interested in are part of NetPositive's cache.
Say you've been surfing the Net for a while, and /boot/home/config/settings/NetPositive/NetCache is full of files with not-very-helpful names like 981234...1, 981234...2, etc. Right now, mine's got a little over 1400 files in it, going up past 981234...2000. Yikes! What if I wanted to keep all the HTML documents in there and give them a .html extension so I could take them over to another system? It's going to be a real pain to go through 1400 files in a Tracker window, selecting only the ones that have an HTML document icon. (Astute readers will note that I could probably use a command-line query to find all the HTML documents in the NetCache directory, but that's not the point of this example, and I'd still have to rename them all by hand.)
Before I do anything, I'll copy all of the files out of the NetCache directory and into another one; if I wanted to clean out the cache at the same time, I'd just move them. Then I'll go about designing a shell script that will take the following steps for every file, saving me a ton of work:
- If the file doesn't exist, or it's not a normal file, skip it.
- Get its filetype.
- If it doesn't have a filetype, use mimeset to try to give it one.
- If the filetype is text/plain, give the file a .txt extension.
- If the filetype is text/html, give the file a .html extension.
- If it's still unclear what the file is, delete it.
This would all be annoying if we were just typing commands into the shell, but it's not too bad in a shell script. In fact we can do all of this using the techniques we learned in the sections above. Here's the complete listing:
#! /bin/sh
#
# Give file extensions to files we care about, and delete files we
# don't, based on their MIME filetype.
#
# First we create a loop that goes though each of the arguments we
# supplied to the script. In this example, we'll be passing in the
# files from the NetPositive cache, but you could run this with any
# files you wanted.
#
# Remember, $@ is a special variable that has all of the command-line
# arguments inside.
for i in "$@" ; do
# Check to see if the file exists; since we're working on
# command-line arguments, the user could've typed in some files
# that don't exist.
#
# The -f test checks to see if a file exists; ! -f checks to
# see if a file doesn't exit.
if [ ! -f "$i" ] ; then
echo "$i is not a file, skipping"
# The continue statement continues our for loop with the
# next argument; we want to go on with the next file
# instead of going on down into the rest of the script.
continue
fi
# This next complicated line uses the catattr command to get
# the file's type. catattr prints out too much information,
# so we pipe its output through awk to strip off everything
# we don't care about.
#
# The "2> /dev/null" redirects any errors to /dev/null; if
# the file has no MIME type, catattr will print an error, but
# we don't want to see it.
#
# Another thing to note is the \ at the end of the line;
# this tells the shell that we're not done with our command
# yet. Both of these lines get combined by the \ to do what
# we want.
file_type=$(catattr BEOS:TYPE "$i" 2> /dev/null | \
awk '{ print $5; }')
# If there's no filetype, try to assign one. You'll
# remember that -z checks to see if a string is empty, so
# if the variable file_type is still empty, the file has
# no type.
if [ -z "$file_type" ] ; then
# The mimeset command asks the BeOS Registrar to assign
# a MIME type to the specified file.
mimeset -f -all "$i"
# Now the file will have a type, so we'll do what we
# did before to read the filetype.
file_type=$(catattr BEOS:TYPE "$i" 2> /dev/null | \
awk '{ print $5; }')
fi
# Now we check to see if the filetype is one we like:
# By adding more elif... statements, you can extend this to
# handle other kinds of files.
if [ "$file_type" = "text/html" ] ; then
# Rename our HTML documents.
mv $i $i.html
elif [ "$file_type" = "text/plain" ] ; then
# Rename our text files.
mv $i $i.txt
elif [ "$file_type" = "application/zip" ] ; then
# Rename our zip files.
mv $i $i.zip
else
# Delete anything we didn't care about.
rm $i
fi
done
Save this script as renamer and make it executable with chmod. Now you can type
$ renamer 981234*
to automatically go through the NetCache files giving them reasonable file extensions based on their types. Unfortunately, this doesn't change the fact that the filenames are totally incomprehensible to anyone who isn't NetPositive. You win some, and you lose some....
Horrible Truths about Unix
One of the evil things about Unix shells is that the buffer (or memory area) used to pick up command arguments is a fixed size; if you try to feed too many arguments to a command, it'll either ignore the ones at the end or behave strangely. If you're trying to run hundreds of files through the renamer script discussed here, they're going to overflow this command buffer and only a few files will actually get fed through the script.
There's a way around this, but you'll have to run the renamer script from the shell; there isn't a way to do it from the Tracker. Put renamer into /boot/home/config/bin where your shell can find it, cd into the directory full of files (or, if you've got TermHire installed, select the directory window and hit Alt+Windows+T or Option+Command+T).
Now you'll use the find and xargs commands:
find . -print | xargs renamer
This doesn't seem to do anything with files--how could it work? Well, "find . -print" will find every file, directory, and symbolic link in the current directory (thanks to the . directory argument that we're giving to find) and print them out (the -print option), producing a big list of everything in the filesystem from here down.
We pipe this list into xargs, which takes the input and parcels it up into chunks of commands small enough to fit into the command buffer. It passes each chunk to the specified command, which is our renamer script.
xargs's sole purpose in life is to help you work around the Unix command buffer's inability to grow. |
After using the find/xargs trick on my directory of over 1400 NetCache files, I'm left with "only" a couple hundred HTML and plain text files. There sure are a lot of graphics on Web pages these days, and they've all just been deleted!
"Hey, not so fast!" you scream, "There's something in that script I don't understand!"
I knew I wouldn't be able to sneak it past you. I've introduced one new thing in this script: the continue statement. continue lets you skip over the rest of the loop and continue with the next run through. We do this right away if the file isn't a regular file, since we don't want to mess with any directories that we may encounter. If the file is something other than a regular file, the renamer script will print a warning message, then hit the continue statement and go on with the next file.
Getting the MIME Type The only other tricky thing in the script is getting the file's MIME type:
file_type=$(catattr BEOS:TYPE "$i" 2> /dev/null | awk '{ print $5; }')
This just looks tricky; if we split it up a little it'll make more sense. The file_type variable is going to be set to whatever is returned by the embedded commands between $( and ). There are two commands inside connected with a pipe:
catattr BEOS:TYPE "$i" 2> /dev/null
awk '{ print $5; }'
The catattr command will print the current file's MIME type, which is stored in a file attribute named BEOS:TYPE. We've redirected the standard error stream (which I'll describe in a minute) to /dev/null, the universal bit-bucket, because we don't want to see the error message if the file has no MIME type. The pipe sends the filetype into awk, which prints the fifth item.
A Word about Streams
Every command-line tools works with three streams: standard input, standard output, and standard error. Traditionally, input from the user comes in through standard input, output goes to standard output, and errors are printed to standard error. Hmm, this almost makes sense....
# |
Name |
Geek Name |
Redirecting |
Piping |
0 |
Standard input |
stdin |
< file |
command | |
1 |
Standard output |
stdout |
> file |
| command |
2 |
Standard error |
stderr |
2> file |
2| command |
If you redirect stderr to a file, stdout is still going to send the program's output to your Terminal.This ability to redirect stdout and stderr to different files is often used by programmers building an application with the make utility. Sending stdout and stderr to different files makes it easier to keep track of (and fix!) the application's bugs. |
If you run catattr BEOS:TYPE on a file in a Terminal window, you'll see that it prints a line like this one:
filename : string : text/plain
Counting over, we can see that the fifth item is the filetype we wanted.
So the file_type variable will be set to the file's MIME type if it has one, or nothing if it doesn't. Everything else in the script should be pretty easy to figure out if you've gotten this far.
|