Make a Bourne Again Shell Script Log its Output to a File

The Bourne Again Shell script presented in this article demonstrates techniques related to capturing and logging output (standard output and standard error stream) of a script into a log file while also delivering it to the regular output destinations (for example the terminal or whatever the caller has chosen to redirect to). Some questions are addressed:

  • How to „clone“ standard output and standard error stream of a shell script to a single log file?
  • Are the messages from both streams interleaved into a single file in the appropriate order?
  • Can a script perform output cloning for some portions of its procedure while skipping it for others?
  • Are there specific requirements for output that is to be cloned to a file?
  • Are there best practices when a script modifies its own output redirections?

The example provided below has been tested on GNU/Linux using GNU „bash“ version 5 and „tee“ version 8.32 from GNU coreutils. It uses the Bash-only feature of process substitution. An implementation in POSIX Shell (possibly using only POSIX tools) would be more difficult.

1. How can standard output and standard error of a sequence of shell commands be displayed to the user while also being captured to a single log file? „Process substitution“ can be used for this purpose, a feature of Bourne Again Shell (but not of Bourne or POSIX Shell):

exec 1> >(tee -ai "$logfile" >&1)
exec 2> >(tee -ai "$logfile" >&2)

Explanation: For each of the output streams, start tee -ai "$logfile" in a sub-process. The -i option makes „tee“ ignore interrupt signals, it will continue to run until terminated or killed. The -a option causes „tee“ to append to the file instead of overwriting it (which would discard the output of the other „tee“). For each sub-process, open a file descriptor to its standard input and connect the affected output stream of the main process to that file descriptor. As a result, both output streams are redirected to two „tee“ sub-processes that concurrently append to the log file. Note that the second sub-process redirects its standard output (the output of the „tee“ command) to its error stream, which at this point is connected to the original second file descriptor (and thus to the original error output of the script).

2. How does the „interleaving“ of the standard output and the error stream into a single log file behave? Essentially not different from command &> logfile, but the reasoning is more tricky: If the individual commands emit their regular and their error output as complete lines, terminated by a line feed (\n), those lines will appear as individual lines in the log file, separated by line feeds, in the order they were emitted. This is due to a stream’s output buffer being flushed immediately when receiving a line feed; essentially this means that the „tee“ sub-process receives new input on every \n of the output it s attached to. Additionally, „tee“ is sufficiently smart about appending to existing content even when writing concurrently with other processes. Effectively, in the output file the lines are interleaved in the order they were emitted by the script’s commands.

3. The rationale presented in 2. assumes that the script’s output is directly attached to „tee“. Redirecting to a pipeline of commands, with „tee“ being a later component in the pipeline, can (and most likely will) introduce problems with interleaving. The following example would render all error messages before all regular output messages, affecting both the console output and the log file content:

# Problematic: Lines of output in $logfile will not be chronological.
exec 1> >(sed "s/^/[STDOUT] /" | tee -ai "$logfile")
exec 2> >(sed "s/^/[STDERR] /" | tee -ai "$logfile" >&2)

Constructing output handling that post-processes the stream contents before sequentialising them to a single file would be more difficult.

4. It should be noted that the construct shown in 1. allows to redirect standard output and error streams individually from outside the script. For example, the following is possible:

./example.sh 2> /dev/null

This will discard error output on the console; error output will still appear in the log file.

5. It should be noted that interactive built-ins such as read or select, decorative use of carriage return (\r), terminal escape sequences (color codes, clearing the screen, cursor positioning etc.) or more extensive interactions such as performed by „dialog“ or „whiptail“ should all NOT be used while cloning the standard output into a log file. In the example below, such interactive code is happening strictly before the output redirection.

6. A robust shell script should not assume anything about the external attachments of its standard input, output or error streams. Before a script tries to interact with a user on a terminal, it should check if at least its standard input is connected to a terminal at all; in Bourne Again Shell this can easily be done using the test [[ -t 0 ]].

7. Redirecting a script’s output streams essentially means modifying its output file descriptors #1 and #2, and it can be useful to store (and at a later part, restore) their original values. In the example below, this is done by calling exec 3>&1 4>&2 before starting the redirection. In the exit handler on_exit the original file descriptors are restored by exec 1>&3 2>&4 before printing a final console-only report. A more elaborate implementation could perform this exchange repeatedly, allowing the script to switch back and forth between „console-only“ and „cloned“ mode of output.

8. A restriction of the process substitution proposed in 1. is that the „tee“ sub-processes run until the output to be cloned is finished (when „tee“ detects EOF or an error situation on its input); in the example below, „tee“ receives EOF and terminates with exit code 0 once the exit handler on_exit has detached the streams from their sub-processes. Generally, the exit code of a sub-process is (of course) not available until that sub-process has finished. Furthermore, since Bash’s built-in wait does not feature a timeout, there is no simple way of checking if a substituted process exited prematurely (which was the case, for example, if „tee“ could not access the requested log file). In the example below, no attempt is made to check if „tee“ actually works before starting the example workload, and it will not become apparent if the log file redirection did work or not until on_exit writes its final report to the console. Implementing a sub-process that can reliably report back to the calling script if it is working without problems and if it is actually ready to write to the log file would be more difficult.

This is the complete example code:

#!/bin/bash
logfile=/tmp/test.log

# Exit-handler to stop all output redirection. 
on_exit() {
    local rv="$?"

    # Print message to console and logfile.
    echo "INFO: Finished at $(date -R), PID $$, exit code $rv" >&2

    # Restore previous standard output and error from FDs 3 and 4;
    # this will cause the "tee" sub-processes to terminate (EOF); 
    exec 1>&3 2>&4

    # Reap the sub-processes and evaluate their exit codes.

    wait "$stdout_pid"
    stdout_rv=$?
    wait "$stderr_pid"
    stderr_rv=$?

    # Print final report to console only. 

    if [[ $stdout_rv -ne 0 ]] || [[ $stderr_rv -ne 0 ]]
    then
        echo "ERROR: Output redirection to $logfile may be incomplete." >&2
    fi

    echo "Done (exit code: $rv, see $logfile for details)." >&2
}

# Initially, no output is logged.
# Perform some initial greeting / interactions here ...

echo "Note: The following procedure will be logged to $logfile."
[[ -t 0 ]] && read -rp 'Press ENTER to continue'

# Copy original output FD to FD 3 and original error FD to FD 4.
exec 3>&1 4>&2

# Redirect standard output and error to sub-processes.
# Store sub-process PIDs for later reaping.

exec 1> >(tee -ai "$logfile")
stdout_pid=$!

exec 2> >(tee -ai "$logfile" >&2)
stderr_pid=$!

# Register exit-handler.
trap on_exit EXIT

# From here on, all output will be logged to $logfile.

# Print a startup report.
echo "INFO: Starting at $(date -R), PID $$ ..." >&2 

########################
# EXAMPLE WORKLOAD BEGIN

# Find and report temporary files older than 15 days.
# Note: May encounter "permission denied" errors if run as non-root user.

if ! find /tmp -mtime +15 -exec ls -ld "{}" \;
then
    echo "ERROR: \"find\" command failed." >&2
    exit 2
fi

# EXAMPLE WORKLOAD END
######################

exit 0

The framework above should work quite well for a variety of standard script workloads. As a reminder, here is a list of the unattended issues identified above:

  • The solution requires Bourne Again Shell, it is not implemented using POSIX tools alone.
  • Output streams can not be redirected to pipelines while retaining individual script output filehandles (all output could be redirected to a pipeline, exec &> >(foo | bar | ...), but then standard error would appear redirected to standard output on the script’s console output).
  • Output-handling sub-processes can not be reliably checked for successful operation before all output has been processed.