Peeter Joot's (OLD) Blog.

Math, physics, perl, and programming obscurity.

grepping a range of lines using perl.

Posted by peeterjoot on November 27, 2009

I was asked how to use grep to select everything in a file starting with a pattern, and ending with a different one. The file is our diagnostic log and if this has originated with one of our system testers could be massive (a few hundred thousand lines long). gnu-grep can be used for this. You could do something like:

grep -A 9999999 'some first expression' < wayToBigFile | grep -B 9999999 'some other expression'

Here 9999999 is some number of lines that is guessed big enough to contain all the lines of interest (not known ahead of time), so the command says “give me everything after the expression, and then give me everything before the other expression in that output”

Perl gives you a nicer way of doing this:

#!/usr/bin/perl -n

$foundIt = 1 if ( /some first expression/ ) ;

next if ( !$foundIt ) ;

print ;

exit if ( /some other expression/ ) ;

# done.

The -n flag says to run the whole script as if it is in a ‘while (<>){ … }’ loop.  Until the initial pattern is seen $foundIt is false, and nothing will be printed, and we bail if the second pattern is seen. Note that this relies on perl’s lazy variable initialization, since $foundIt = 0 until modified.

Observe also that this script is actually also it’s own test case.

myPrompt$ chmod 755 ./theScript
myPrompt$ ./theScript < ./theScript
$foundIt = 1 if ( /some first expression/ ) ;

next if ( !$foundIt ) ;

print ;

exit if ( /some other expression/ ) ;

6 Responses to “grepping a range of lines using perl.”

  1. Garfield Lewis said

    Tried the perl script and it works like a charm.

    Thx, Peeter…

  2. “The script is actually also its own testcase” – nice!

  3. You can also use sed for this:

    sed '/some first expression/,/some other expression/!d' file

    Even with line numbers (this shows lines from 12 to 20):

    sed '12,20!d' file

    • peeterjoot said

      With the perl code you can do more complex stuff on what was matched if you want, but if plain old extraction is the goal this sed method is definitely superior.

      Thanks for the tip.

  4. Jotr said

    Sounds like you really want the range operator:

    http://perldoc.perl.org/perlop.html#Range-Operators

    • peeterjoot said

      Hey thanks Jotr, I was hoping that by posting this, somebody would eventually supply a better and easier way. This works nicely for a one liner range grep, as in the following:

      # perl -n -e 'next unless ( /E449758E484/ .. /db2spcat/ ) ; print ;' db2diag.log
      2010-09-20-16.14.21.168949-240 E449758E484           LEVEL: Event
      PID     : 3133                 TID  : 47625558550848 KTID : 5843
      PROC    : db2sysc 0
      INSTANCE: peeterj              NODE : 000          DB   : CORAL
      APPHDL  : 0-54                 APPID: *N0.peeterj.100920201417
      AUTHID  : PEETERJ
      EDUID   : 24                   EDUNAME: db2agent (CORAL) 0
      FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::FirstConnect, probe:1000
      START   : DATABASE: CORAL    : ACTIVATED: NO
      
      2010-09-20-16.14.21.748602-240 I450243E460           LEVEL: Warning
      PID     : 3133                 TID  : 47625558550848 KTID : 5843
      PROC    : db2sysc 0
      INSTANCE: peeterj              NODE : 000          DB   : CORAL
      APPHDL  : 0-54                 APPID: *N0.peeterj.100920201417
      AUTHID  : PEETERJ
      EDUID   : 24                   EDUNAME: db2agent (CORAL) 0
      FUNCTION: DB2 UDB, base sys utilities, db2spcat, probe:173
      

      Do you know if it is possible to combine a range boundry with a number as in something like:

      next unless ( /E449758E484/ .. /db2spcat/+10 ) ;
      

      I was thinking along the lines of gnu-grep -A10 (in this case I wanted grep for everything in a range delimited by a pair of patterns) but a few extra lines of context. Trying that with /db2spcat/+10 doesn’t appear to work.

Leave a reply to Jacobo de Vera Cancel reply