perl regex tricks. Non capturing expressions and alternatives.
Posted by peeterjoot on August 27, 2009
Have two sets of dump output to compare, and both have the occasional pointer dumped which messes up the diff. I want to mask the pointer output (all starting with 0x) on all the lines like:
List Entry Address: 0x... List Tail (primary): 0x... List Tail (secondary): 0x... Next entry name collision: 0x... Next entry PLEID collision: 0x... Next entry (primary): 0x... Next entry (secondary): 0x... Previous entry (primary): 0x... Previous entry (secondary): 0x...
An easy way would be to run ‘grep -v’ and just filter these out completely, but I wanted the original line numbers to stay intact for reference.
Here’s a one liner perl script, executed with ‘perl -pi ./myScript *.fmt’ (where the files *.fmt are what I’m mucking with) :
$ cat myScript s/((?i:Next|List|previous) (?i:entry|head|tail).*0x).*/$1................/;
Since I had to lookup (man perlre) how to do this once again, it’s a good blog topic for self reference. Let’s break it down. First thing is an outermost capturing pattern
this says match ‘stuff.*0x’, namely ‘stuff’ followed by anything (the .* part), then ‘0x’, then anything. All of this within the braces goes into $1, so the replacement is everything on the line except whatever follows 0x (and for that I replaced with 16 dots). Now look at the nested expression before the .*0x part:
More perl ASCII barf starting things off, but it’s not so bad. If you have an expression like (?:stuff) it means match ‘stuff’ but don’t capture it (i.e. don’t put it in $2 or $3, …). Only slightly more complex is having alternatives in the pattern, so something like (?:Next|List) means match Next or match List, but also don’t put anything into a $N variable. There’s one more bit in there unexplained, the ‘i’ modifier flag. This is a way to add case Insensitive to the pattern. In this case I could have made that a global flag at the end of the replace specification so it would apply to the whole pattern:
but initially I had the case Insensitive modifier only on one of the patterns, so the final result ended up with some redundancy.