Using sed or awk to ensure a specific last Line in a Text

Given a file containing bytes of text with lines separated by the newline character (\n), one of these lines can be said to be „the last line of the file“; it is a sequence of bytes occurring in the file, for which holds:

  • The sequence contains no newline character, and
  • the sequence is followed by at most one newline character and no other bytes.

The task at hand is, using shell utilities, to write a procedure that makes sure that a given file contains a last line that contains a desired sequence of text characters.

In the following solution proposal for the streaming editor „sed“, the following instructions check if the last line of a text stream matches the required string; if not, the string is appended to the current last line, separated by a newline character.

sed -e "
    \${
        /^$lastline\$/! {
            s/\$/\n$lastline/
        }
    }
"

sed does not offer a robust way of passing variables into the program. The above sed program is only correct if the string passed in shell variable lastline does contain slashes (/) only in escaped form (\/); the value of lastline will also be interpreted as a regular expression, imposing further restrictions on the expressable content.

In the following solution proposal for „awk“, every line is printed as it is processed; additionally, at the end of the program, if the current line buffer $0 is not the required string, an additional line containing the required string is printed.

awk -v lastline="$lastline" '
        { print }
    END { if($0!=lastline) print lastline }
'

awk features the -v var=value option which is used to pass the required string into the program code where it can be used for comparison and eventual output.

Suppose it should be ensured that a series of files has the following last line:

// End of file

The following shell script could accomplish this (the example uses GNU awk’s option for in-place-editing, -i):

lastline='// End of file'

gawk -i -v lastline="$lastline" '
        { print }
    END { if($0!=lastline) print lastline }
' /data/file*.txt

Using GNU sed, the same transformation could be performed as shown below:

lastline='\/\/ End of file'

sed -i \
    -e "\${ /^$lastline\$/! { s/\$/\n$lastline/ } }" \
    /data/file*.txt

The solution with awk allows for regex-special characters (such as ., [, ] and *) in the last line string that would distort or even incapacitate the sed program. Regarding code readability and robustness in the general case, the awk solution is certainly the preferable one of the two solutions presented here.

The sed solution (according to some rough measurements i have performed with the /usr/bin/time utility) consumes less memory; GNU sed arguably has a lower memory and CPU footprint than GNU awk. This disadvantage of the awk solution could be compensated by using „mawk“ instead of GNU awk, which (again, according to my rough measurements) consumes even less memory than GNU sed.

The last example uses mawk to perform the transformation and stores the result to a temporary file. If the original file already contains the required last line, the awk program exits with a status code of 1, preventing the file to be modified on disk. Otherwise, the awk program prints the missing last line, exits successfully, and the original file is overwritten by the new content. Additionally, „mktemp“ and a for-in-Loop are used, compensating for mawk’s lack of in-place-editing.

lastline='// End of file'

temp=$(mktemp)

for f in /data/file*.txt ; do
    mawk -v lastline="$lastline" '
            { print }
        END { if($0!=lastline) print lastline; else exit 1 }
    ' < "$f" > "$temp" && cat "$temp" > "$f"
done

rm "$temp"