Peeter Joot's (OLD) Blog.

Math, physics, perl, and programming obscurity.

dirty perl tricks. using evaluations in a replacement expression

Posted by peeterjoot on August 6, 2009

I’ve gone and done a search and replace of a return type everywhere in a certain file, and the new typedef name has more characters than the original. Now the nicely indented prologues for each function are all messed up like so:

inline SAL_CA_STATUS_TYPE SQLE_CA_CONN_ENTRY_DATA::sqleCaCeWriteSA( CASA_t * const          SAToken,
                                                            SAL_CA_PAGENAME_TYPE * const    pSaPageName,
                                                            const Uint8             newElement,
                                                            const Uint8             cond,
                                                            const Uint8             maxagg,
                                                            const Uint8             increasing,
                                                            const Uint64            input,
                                                            Uint64   * const        aggregate,
                                                            bool * const            pbDispatchError )

I want things indented by eight characters, on all the lines after the ones that start with ‘inline SAL_CA_STATUS_TYPE’ till the end brace that marks the end of the argument list. It should look like:

...
inline SAL_CA_STATUS_TYPE SQLE_CA_CONN_ENTRY_DATA::sqleCaCeWriteSA( CASA_t * const          SAToken,
                                                                    SAL_CA_PAGENAME_TYPE * const    pSaPageName,
                                                                    const Uint8             newElement,
                                                                    const Uint8             cond,
                                                                    const Uint8             maxagg,
                                                                    const Uint8             increasing,
                                                                    const Uint64            input,
                                                                    Uint64   * const        aggregate,
                                                                    bool * const            pbDispatchError )
{
...

Kind of a silly exersize in prettying things up, but the new poor formatting makes things harder to read, and is distracting for maintainance. I’ve got 44 such functions in this file and don’t want to do them manually.

Is there an easy way to indent all the lines after the first by the eight characters needed in this case? I’ve wanted to do scripted changes like this before (like add an argument to all function calls matching some pattern), so it seemed like it was worth a few minutes to play with it. Here’s what I came up with:

#!/usr/bin/perl

while (<>)
{
   $p .= $_ ;
}

$p =~ s/^(inline SAL_CA_STATUS_TYPE.*?\))/foo("$1")/smeg ;
print $p ;

exit ;

sub foo
{
   my $s = "@_" ;
   $s =~ s/^ /         /smg ;

   return "$s" ;
}

Here’s a breakdown of what this does and how. The first problem is that I want to operate on the whole file and not on a line by line basis. This loop:

while (<>)
{
   $p .= $_ ;
}

sucks up each line from stdin and puts it all in a working variable $p.

Now that I’ve got all 19000 lines of the file in a working variable (yes, I should probably split up my file;), I want to match all instances of any lines that start with ‘inline SAL_CA_STATUS_TYPE’ until the first ending brace for the end of the argument list. I don’t have any function pointer arguments so I can match til the first ) after the starting expression. So, a match expression that does the job is:

/^inline SAL_CA_STATUS_TYPE.*?\)/

The caret says match the beginning of the line, and since ) is a special character in perl I have to escape it. I also don’t want to match past the first ) so I use a non-greedy pattern ‘.*?’ … meaning match anything but stop at the earliest point based on context. Next I want to put all of this into a variable I can refer to in the replacement expression (that is $1 ), so wrap the whole thing in in the capture pattern (). That leaves me with:

/^(inline SAL_CA_STATUS_TYPE.*?\))/

Since I want to do this for all matches, I need the g modifier at the end, and since the text is multiline, I need /sm too. If I wanted a pure text change at this point, I could do something like:

$p =~ s/^(inline SAL_CA_STATUS_TYPE.*?\))/blah$1blah/smg ;

This would wrap all instances of the pattern with blah blah, like a quoting operation. What I want though, is to extract all matches to this first pattern and do more to it. The /e modifier does that, and allows the replacement expression to be code. I wrote a quick function that in turn did my second search and replace:

sub foo
{
   my $s = "@_" ;
   $s =~ s/^ /         /smg ;

   return "$s" ;
}

In this helper function, I replace all lines starting with one character with nine, and voila I’m done. Takes longer to explain this throw away script than to write it, and I now have a template for other similar automated changes in the future.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: