Peeter Joot's (OLD) Blog.

Math, physics, perl, and programming obscurity.

another good regex trick. Matching a word boundary.

Posted by peeterjoot on August 26, 2009

Task: Have a badly named variable, in this case caKeyValue, and want to change this in a number of files to caKeySample.

I’ve also got variables named m_caKeyValue that I don’t want to change. Once I’m done this replacement, all the variables left with caKeyValue in their names will be the ones I’m interested in, and I can examine all of those in sequence to make sure that I’m treating those right.

A regular expression with a word boundary pattern is the trick. Here’s a sample command, starting with a small file that has my search and replace patterns:

$ cat myPatterns
# if there are trailing spaces then try not to mess up indenting:
s/\bcaKeyValue\b  /caKeySample /g;

# but mess up indenting if there's no option:
s/\bcaKeyValue\b/caKeySample/g;

and here’s the perl command line invocation to do the replacement:

$ perl -pi ./myPatterns `cat listOfFiles`

Let’s break it down. First the command. We use -p -i perl command line flags as explained in previous posts, this treats the perl script like it’s a while loop and modifies all the files (in this case without backup since I’ve just checked them out of the version control system).

If the perl script containing the search and replace patterns I want were to contain just:

s/caKeyValue/caKeySample/g;

Then all instances of caKeyValue would be replaced. I only want this if they aren’t embedded in something else (like m_caKeyValue). If I only cared about not mucking with variables named m_caKeyValue then a sufficient replacement expression would be:

s/\bcaKeyValue/caKeySample/g;

This says it’s okay to do the replacement if something trails “caKeyValue” without spaces, so caKeyValues, say, would be replaced by caKeySamples. To be careful I’m telling perl to be stricter, requiring something that is recognized as a separator (like whitespace) at both the beginning and the end of the expression. That is:

s/\bcaKeyValue\b/caKeySample/g;

Now, a diff of the results with the originals in the version control system (something to _always_ do with automated code changes) showed that I was messing up the indentation in some cases, as in the following diff fragment:

-   SAL_ENCODED_CA_STATE                caKeyValue        = m_caKey.SAL_SampleCaKeyValue() ;
+   SAL_ENCODED_CA_STATE                caKeySample        = m_caKey.SAL_SampleCaKeyValue() ;

So, to compensate, I undid my automated change and added a first replacement pattern to keep things pretty:

s/\bcaKeyValue\b  /caKeySample /g;

This one is match the expression with two trailing spaces and replace it with the desired, plus one trailing space. Note that the trailing \b is spurious in this case, but since I was cut and pasting the regex based on the initial try, I had this extra bit and it doesn’t change the desired result.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: