Print all indented Lines following a non-indented Line

Some configuration and output text formats contain sections like the following:

foo:
   value1
   value2
bar:
   value 3

In this article, two scripts are presented that print all consecutive indented lines that follow a non-indented line that matches a search pattern given by a regular expression.

This means, given the single argument foo and the standard input above, the scripts should

  • determine the line that matches foo and
  • print the following two lines, but no other lines.

Also, the indentation of the printed lines should be removed.

Extracting Values from an „Indented Subsection“ Format

The following script section.sh extracts indented parts that follow a non-indented search pattern:

#!/bin/sh

awk -v s="$1" '{
  if($0 ~ s){
    while(getline){
      if(!/^[ \t]/)
        exit;
      gsub(/^[ \t]+/,"")
      print
    }
  }
}'

Executing this script as follows:

section.sh << EOF
foo:
  value 1
  value 2
bar:
  value 3
EOF

results in the following output:

value 1

However, the script fails to render input that itself contains indentation in a useful manner:

section.sh << EOF
foo:
  bar:
    value 1
EOF

This renders:

bar:
value 1

The nested indentation is lost.

The following Perl script section.pl reads lines from STDIN until the beginning of a line matches the first script argument $ARGV[0]. It then reads all consecutive indented lines into an array @lines, and, in the process, it determines the shortest common white-space prefix of those lines, $prefix. Finally, it left-strips $prefix from each element of @lines, printing the results line by line.

#!/usr/bin/perl

use strict;
use warnings;

my @lines;
my $prefix = undef;

READ: while(<STDIN>) {
    next READ unless $_ =~ qr/^$ARGV[0]/;

    while(<STDIN>) {
        last READ unless /^(\s+)/;
        push @lines, $_;
        $prefix = $1 unless defined $prefix;
        $prefix = $1 unless length($1)>length($prefix);
    }
}

exit unless defined $prefix;

$prefix = quotemeta($prefix);
my $reg = qr/^$prefix/;

for(@lines) {
    s/$reg//;
    print
}

This script preserves nested indentation; thus it can be used to form pipelines that extract more deeply nested values:

cat > /tmp/input << EOF
foo:
   bar:
      value 1
EOF
cat /tmp/input | ./section.pl foo | section.pl bar

The output is:

value 1

Extracting Values from a „Stanza“ Format

A variation of the configuration syntax presented until now is the „stanza format“. In this format, a certain line prefix string marks the beginning of a certain section of the configuration, and the section includes the remainder of the first line of the section and all subsequent indented lines:

foo: key1=value1, key2=value2, key3=value3,
     key4=value4, key5=value5
bar: key6=value6, key7=value7, key8=value8,
     key9=value9, key10=value10

Sections can be extracted, for example, using the following Perl script, stanza.pl:

#!/usr/bin/perl

use strict;
use warnings;

my $reg = qr/^$ARGV[0]: (.*)/;

while(<STDIN>) {
    $_ =~ $reg && print $1;

    while() {
        last unless /^\s/;
        print;
    }
}

This script could be used as follows:

cat > /tmp/input << EOF
foo: key1=value1, key2=value2, key3=value3,
     key4=value4, key5=value5
bar: key6=value6, key7=value7, key8=value8,
     key9=value9, key10=value10
EOF
cat /tmp/input | ./stanza.pl foo

The output should be:

key1=value1, key2=value2, key3=value3,  key4=value4, key5=value5