Print all indented Lines following a non-indented Line

Some configuration and output text formats contain sections like the following:

foo:
   value1
   value2
bar:
   value 3

In this article, two scripts are presented that print all consecutive indented lines that follow a non-indented line that matches a search pattern given by a regular expression.

This means, given the single argument foo and the standard input above, the scripts should

  • determine the line that matches foo and
  • print the following two lines, but no other lines.

Also, the indentation of the printed lines should be removed.

Extracting Values from an „Indented Subsection“ Format

The following script ./section.sh extracts indented parts that follow a non-indented search pattern:

#!/bin/sh

awk -v s="$1" '{
  if($0 ~ s){
    while(getline){
      if(!/^[ \t]/)
        exit;
      gsub(/^[ \t]+/,"")
      print
    }
  }
}'

Executing this script as follows:

section.sh << EOF
foo:
  value 1
  value 2
bar:
  value 3
EOF

results in the following output:

value 1

However, the script fails to render input that itself contains indentation in a useful manner:

section.sh << EOF
foo:
  bar:
    value 1
EOF

This renders:

bar:
value 1

The nested indentation is lost.

The following Perl script ./section.pl reads lines from STDIN until a line matches the first script argument $ARGV[0]. It then reads all consecutive indented lines into an array @lines, and, in the process, it determines the shortest common white-space prefix of those lines, $prefix. Finally, it left-strips $prefix from each element of @lines, printing the results line by line.

#!/usr/bin/perl

use strict;
use warnings;

my @lines;
my $prefix = undef;

READ: while(<STDIN>) {
    next READ unless $_ =~ qr/$ARGV[0]/;

    while(<STDIN>) {
        last READ unless /^(\s+)/;
        push @lines, $_;
        $prefix = $1 unless defined $prefix;
        $prefix = $1 unless length($1)>length($prefix);
    }
}

my $reg = qr/^$prefix/;

for(@lines) {
    s/$reg//;
    print
}

This script preserves nested indentation; thus it can be used to form pipelines that extract more deeply nested values:

cat > /tmp/input << EOF
foo:
   bar:
      value 1
EOF
cat /tmp/input | ./section.pl foo | section.pl bar

The output is:

value 1

Extracting Values from a „Stanza“ Format

A variation of the configuration syntax presented until now is the „stanza format“. In this format, a certain line prefix string marks the beginning of a certain section of the configuration, and the section includes the remainder of the first line of the section and all subsequent indented lines:

foo: key1=value1, key2=value2, key3=value3,
     key4=value4, key5=value5
bar: key6=value6, key7=value7, key8=value8,
     key9=value9, key10=value10

Sections can be extracted, for example, using the following Perl script, stanza.pl:

#!/usr/bin/perl

use strict;
use warnings;

my $reg = qr/^$ARGV[0]: (.*)/;

while(<STDIN>) {
    $_ =~ $reg && print $1;

    while() {
        last unless /^\s/;
        print;
    }
}

This script could be used as follows:

cat > /tmp/input << EOF
foo: key1=value1, key2=value2, key3=value3,
     key4=value4, key5=value5
bar: key6=value6, key7=value7, key8=value8,
     key9=value9, key10=value10
EOF
cat /tmp/input | ./stanza.pl foo

The output should be:

key1=value1, key2=value2, key3=value3,  key4=value4, key5=value5