One of the best things about working in the UNIX environment is the powerful command-line interface, which includes the available shells, scripting languages, standard utilities, and facilities for I/O manipulation. The strength of this interface is its modularity; a large-ish collection of powerful, somewhat single-minded tools is more flexible than a monolithic, “kitchen-sink” approach to interface design. However, this very modularity can make it hard to learn how to use the interface, since most documentation is devoted to a single tool or utility, and there is relatively little guidance on how the interface as a whole may be used to do work. Therefore, I present some simple UNIX shell commands, with brief explanantions, that might prove suggestive.
Motivation
When I returned to UNIX after a long stay in Windows-land, I was delighted to return to a rich native shell, but it took me longer than I would have liked to begin to make that shell do what I wanted. I found the available documentation good, but inappropriate for the problems I was having; the documentation was too low-level, and did not provide a good overview of the interface. I ultimately sought out various command-line examples, and developed an overview by dissecting them. This collection of annotated examples is the sort of thing I would have liked to have started with.
Disclaimers
I make no representation that these examples are stylistically correct, simply that they work for my purposes. I also do not claim that this collection is in any way unique; it’s just my take on something I wish I had when I rejoined the UNIX world.
Validate argument count in a bash shell script
Purpose:
This code snippet is intended to be placed in a bash shell script; it aborts script execution if the number of command-line arguments supplied to the script doesn’t equal some pre-defined number (1, in this example).
Code:
if [ $# -ne 1 ]
then
echo "Usage: $(basename $0) destination_directory"
exit 1
fi
Notes:
- $# is a special symbol equal to the number of command-line arguments passed to the script. It does not include the script name itself.
- The “if” construct is picky about whitespace; it appears to be required around the “[” and “]” characters.
- $0 is the first command-line arguments passed to the script, $1 is the second, etc.
- The “$(…)” construct is replaced with the result of evaluating its contents.
- The “basename” utility returns the last element of a pathname.
- The “exit” command terminates the script; the non-zero argument signals an error condition.
Remove all .svn directories from a directory tree
Purpose:
Subversion creates a “.svn” subdirectory in every directory it has under version control. These directories are superfluous in some situations, and must be deleted.
Code:
find $wrk -depth -name .svn -exec rm -f -R {} \;
Notes:
- This command assumes that the root of the tree to be cleaned has been stored in the shell variable “$wrk”, with a statement of the form “wrk=/foo/bar/baz”.
- The “-depth” command tells the find utility to process subdirectories before their parents. In the absence of this command, the find utility will, upon encountering a directory named “.svn”, first delete the directory, and then try to descend into the directory, which will generate error messages.
- The “-name” condition simply directs the find utility to locate any objects named “.svn” in the directory tree.
- The “-exec” clause causes the find utility to execute a command for each object it locates. The command to be executed is constructed from the template immediately following the “-exec” keyword. The template is terminated by an (escaped) semicolon. In this example, the template is “rm -f -R {}”. Instances of “{}” in the template are replaced by the pathname of the located object.
- The “-f” and “-R” options to the rm utility result in a quiet recursive delete of a file or directory.
Loop through a list of files, and do something
Purpose:
This snippet invokes a JavaScript compressor/obfuscator on all the JS files in a directory tree.
Code:
for pn in $(find $wrk -name *.js)
do
echo
echo
echo "Compressing $pn"
java -jar ~/jars/yuicompressor-2.3.5.jar --charset ISO-8859-1 -v -o $pn $pn
done
Notes:
- I’m actually a little uneasy about the naked wildcard (“*.js”) that’s passed to the find utility; my instinct now says that it should be quoted to protect it from shell expansion. It does work, however.
- This snippet invokes the Yahoo JavaScript compressor/obfuscator, which is a Java program I happen to store in my ~/jars directory. If you haven’t installed it there, then this snippet won’t work for you.
Process a list of files: Dump results to a single file & avoid loops
Purpose:
In this particular case, we are handling a set of very large database dumps; these dumps begin with a little SQL code, but are mostly comprised of tab-separated values. We want to capture all the SQL headers in a single file. We want to avoid using a shell loop construct for reasons of perceived elegance.
Code:
ls scripts/*.txt | xargs -n 1 sed -n -e "/\\q;/ q" -e p | python dbc.py data/DATAFORMS/EQRDATA/eqr.dbc > sql.txt
Notes:
- The snippet assumes that database dumps are all stored as .txt files in the scripts subdirectory of the current working directory.
- The “-n 1” argument to xargs results in each filename being passed on the command line to a different invocation of sed.
- The “-n” argument to sed means that, by default, sed won’t emit anything.
- The first “-e” argument to sed forces sed to stop processing when it encounters a line containing “\q;”. (This string marks the end of the SQL header in the database dumps.)
- The second “-e” argument causes sed to echo the line it is processing; the fact that the print directive follows the conditional abort directive means that the termininating line containing “\q;” will not be printed, which is the desired behaviour.
- The output of xargs is filtered through a python script that manipulates the SQL headers in an application-specific way.
Process a list of files: Dump results to separate files & avoid loops
Purpose:
In this particular case, we are converting a set of Visual FoxPro .dbf files to tab-separated-value files. We want to create a separate TSV output file for each .dbf input file. We want to avoid using a shell loop construct for reasons of perceived elegance.
Code:
ls *.dbf | sed -e "s/^\(.*\)\.dbf$/pgdbf -em \1.dbf > \1.txt/" | $SHELL -s
Notes:
- The snippet assumes that source files are all stored as .dbf files in the current working directory.
- The solitary “-e” argument to sed is a substitution command, so sed will take a list of filenames on STDIN and emit a list of command lines on STDOUT.
- $SHELL is an environment variable that specifies the current shell’s executable (e.g. /bin/bash).
- The command lines output by sed are piped to the STDIN of a new shell process. The “-s” argument forces the shell to read commands from the standard input.