Some configuration and output text formats contain sections like the following:
foo:
value1
value2
bar:
value 3
In this article, two scripts are presented that print all consecutive indented lines that follow a non-indented line that matches a search pattern given by a regular expression.
This means, given the single argument foo and the standard input above, the scripts should
- determine the line that matches foo and
- print the following two lines, but no other lines.
Also, the indentation of the printed lines should be removed.
Extracting Values from an „Indented Subsection“ Format
The following script section.sh extracts indented parts that follow a non-indented search pattern:
#!/bin/sh
awk -v s="$1" '{
if($0 ~ s){
while(getline){
if(!/^[ \t]/)
exit;
gsub(/^[ \t]+/,"")
print
}
}
}'
Executing this script as follows:
section.sh << EOF
foo:
value 1
value 2
bar:
value 3
EOF
results in the following output:
value 1
However, the script fails to render input that itself contains indentation in a useful manner:
section.sh << EOF
foo:
bar:
value 1
EOF
This renders:
bar:
value 1
The nested indentation is lost.
The following Perl script section.pl reads lines from STDIN until the beginning of a line matches the first script argument $ARGV[0]. It then reads all consecutive indented lines into an array @lines, and, in the process, it determines the shortest common white-space prefix of those lines, $prefix. Finally, it left-strips $prefix from each element of @lines, printing the results line by line.
#!/usr/bin/perl
use strict;
use warnings;
my @lines;
my $prefix = undef;
READ: while(<STDIN>) {
next READ unless $_ =~ qr/^$ARGV[0]/;
while(<STDIN>) {
last READ unless /^(\s+)/;
push @lines, $_;
$prefix = $1 unless defined $prefix;
$prefix = $1 unless length($1)>length($prefix);
}
}
exit unless defined $prefix;
$prefix = quotemeta($prefix);
my $reg = qr/^$prefix/;
for(@lines) {
s/$reg//;
print
}
This script preserves nested indentation; thus it can be used to form pipelines that extract more deeply nested values:
cat > /tmp/input << EOF
foo:
bar:
value 1
EOF
cat /tmp/input | ./section.pl foo | section.pl bar
The output is:
value 1
Extracting Values from a „Stanza“ Format
A variation of the configuration syntax presented until now is the „stanza format“. In this format, a certain line prefix string marks the beginning of a certain section of the configuration, and the section includes the remainder of the first line of the section and all subsequent indented lines:
foo: key1=value1, key2=value2, key3=value3,
key4=value4, key5=value5
bar: key6=value6, key7=value7, key8=value8,
key9=value9, key10=value10
Sections can be extracted, for example, using the following Perl script, stanza.pl:
#!/usr/bin/perl
use strict;
use warnings;
my $reg = qr/^$ARGV[0]: (.*)/;
while(<STDIN>) {
$_ =~ $reg && print $1;
while() {
last unless /^\s/;
print;
}
}
This script could be used as follows:
cat > /tmp/input << EOF
foo: key1=value1, key2=value2, key3=value3,
key4=value4, key5=value5
bar: key6=value6, key7=value7, key8=value8,
key9=value9, key10=value10
EOF
cat /tmp/input | ./stanza.pl foo
The output should be:
key1=value1, key2=value2, key3=value3, key4=value4, key5=value5
- GNU find hat keine Option „-older“ …
- Bourne to Bourne Again Shell Forward Compatibility
- Print XDG Desktop Definition for Application
- Find Files by Size given in Bytes
- Using sed or awk to ensure a specific last Line in a Text
- Make a Bourne Again Shell Script Log its Output to a File
- Maintaining Multi-Line „stat“ Formats using Bourne Again Shell
- Print all indented Lines following a non-indented Line