Peeter Joot's (OLD) Blog.

Math, physics, perl, and programming obscurity.

Posts Tagged ‘perl’

“fun” bug in 64-bit perl 5.10.0 bigint sprintf

Posted by peeterjoot on November 19, 2013

Here’s a rather unexpected bug with perl sprintf

#! /usr/bin/perl

use strict;
use warnings;
use bigint ;

my $a = hex( "0x0A0000001D05A820" ) ;
printf( "0x%016X\n", $a ) ;
printf( "%d\n", $a ) ;
printf( "$a\n" ) ;

The %X printf produces a value where the least significant 0x20 is lost:

$ ./bigint
0x0A0000001D05A800
720575940866189312
720575940866189344

Observe that the loss occurs in the printf and not the hex() call, since 720575940866189344 == 0x0A0000001D05A820.

This bug appears to be fixed in some version of perl <= 5.16.2. Oh, the joys of using ancient operating system versions so that we can support customers on many of the ancient deployments that they seem to like to run on.

Advertisements

Posted in perl and general scripting hackery | Tagged: , , , | Leave a Comment »

C++11 play. Simple hashing, a comparison with perl

Posted by peeterjoot on December 17, 2012

I’m very impressed with how easy C++11 makes it possible to implement a simple hash

#include <set>
#include <string>
#include <iostream>

using namespace std ;

int main()
{
   set< string > d ;

   d.insert( "uu:long" ) ;
   d.insert( "uu:long" ) ;
   d.insert( "uu:long long" ) ;

   d.insert( "qq:int" ) ;
   d.insert( "Test:int" ) ;

   for ( auto & kv : d )
   {
      cout << kv << endl ;
   }
}

If I had to do hashing like this before I’d probably have seen if I could generate a text file, and then post process it with perl, where the same hashing would look like:

#!/usr/bin/perl

my %d ;

$d{'uu:long'}++ ;
$d{'uu:long'}++ ;
$d{'uu:long long'}++ ;

$d{'qq:long'}++ ;
$d{'Test:int'}++ ;

foreach ( keys %d )
{
   print "$_\n" ;
}

Other the the header includes and namespace statement, it’s not really any harder to do this in C++ now. I just have to wait 5 years before all our product compilers catch up with the standard, I could start thinking about using this sort of code in production. Unfortunately in production I’d also have to deal with exceptions, and the (hidden) fact that the std::allocator is not generally appropriate for use within DB2 code.

Posted in C/C++ development and debugging. | Tagged: , , , , , | Leave a Comment »

Use of __LINE__ in perl.

Posted by peeterjoot on January 24, 2011

I’d wondered a couple times how to do this, and had littered scripts occasionally with various manually created unique text markers for debugging purposes. Here’s an easier way:

print " Line: ", __LINE__, "\n";

(works for __FILE__ too).

Posted in perl and general scripting hackery | Tagged: , , | Leave a Comment »

perl implicit matching operator.

Posted by peeterjoot on August 26, 2010

Perl can regularly surprise you with many ways of doing the same thing. I’d seen the following fragment of a script and thought “how can that work … there’s no match operator”.

if ( $CC =~ "xlC|xlc" )
{
  # stuff.
}

What I’d expected is a match // operator, or one that used explicit delimiters such as m@@, as in one of:

if ( $CC =~ /xlC|xlc/ )
{
  # stuff.
}

# or

if ( $CC =~ m@xlC|xlc@ )
{
  # stuff.
}

I’d had the urge to “correct” this incorrect if condition, but a small experiment confirms that this code actually worked as intended:

$CC = "/blah/xlC" ;
#$CC = "/blah/gcc" ;

if ( $CC =~ "xlC|xlc" )
{
print "m\n" ;
}

No explicit match delimited expression is required. It appears that the context of the binding operator is enough to convert a subsequent string into a regular expression. I think I still like doing it explicitly better, perhaps just since that’s how I’ve seen it done the most.

Posted in perl and general scripting hackery | Tagged: , | 2 Comments »

Have to love perl for quicky automated source changes.

Posted by peeterjoot on July 16, 2010

Looking at some code today of the following form:

   char buf[10] ;
   sprintf( buf, "%s ... ", somefunction() ) ;

Where somefunction returns a char *. Very unsafe code since you could easily overflow buf and have all sorts of fun stack corruptions to deal with. This was repeated about 400 times in the modules in question, and it’s desirable to replace these all with snprintf calls to ensure there is no bounds error (in DB2 we use a different version of snprintf due to some portability issues, but the idea here is the same).

Here’s a nice little one liner to make the code changes required:

perl -p -i -e 's/\bsprintf *\( *(.*?), */snprintf( $1, sizeof($1), /' LIST_OF_FILENAMES

It’s not perfect, but does the job nicely in the bulk of the call sites, adding as desired, the additional sizeof() parameter to the call and changing the function name. Of course thorough review is required with context, since you don’t want to be taking sizeof() of a char * argument and get the size of a pointer.

Posted in C/C++ development and debugging. | Tagged: , , , , | Leave a Comment »

comparing some times for perl vs sort command line hacking

Posted by peeterjoot on June 22, 2010

I had a 2M line file that contained among other things function identifier strings such as:

SAL_MANAGEMENT_PORT_HANDLE::SAL_ManagementGetServerRole
SAL_MANAGEMENT_PORT_HANDLE::SAL_ManagementHandleClose
SAL_MANAGEMENT_PORT_HANDLE::SAL_ManagementHandleOpen

I wanted to extract just these and sort them by name for something else. I’d first tried this in vim, but it was taking too long. Eventually I control-C’ed it and realized I had to be a bit smarter about it. I figured something like perl would do the trick, and I was able to extract those strings easily with:

cat flw.* | perl -p -e 's/.*?(\S+::\S+).*/$1/;'

(ie: grab just the not-space::not-space text and spit it out). passing this to ‘sort -u’ was also taking quite a while. Here’s a slightly smarter way to do it, still also a one-liner:

cat flw.* | perl -n -e 's/.*?(\S+::\S+).*/$h{$1}=1/e; END{ foreach (sort keys %h) { print "$_\n" ; } } '

All the duplicates are automatically discarded by inserting the matched value into a hash instead of just spitting it out. Then a simple loop over the hash keys gives the result directly. For the data in question, this ended up reducing the time required for the whole operation to just 12.5seconds (eventually I ran the original ‘perl -… | sort -u’ in the background and found it would have taken 1.6 minutes). It took far less time to tweak the command line than the original command would have taken, and provides a nice example where an evaluated expression in the regex match can be handy.

Of course, I then lost my time savings by writing up these notes for posterity;)

Posted in perl and general scripting hackery | Tagged: , , , , | 4 Comments »

An in-place c++filt ?

Posted by peeterjoot on May 26, 2010

A filter script like c++filt can be a bit irritating sometimes. Imagine that you want to run somelike like the following

$ c++filt < v > v

The effect of this is to completely clobber the input file, and not alter it in place. You may think that something like the following may work, so that the read is done first by the cat program:

$ cat v | c++filt > v

but this also doesn’t work, and one is also left with a zero sized output file, and not the filtered output. I’ve run stuff like the following a number of times:

$ for i in *some list of files* ; do c++filt < $i > $i.tmp$$ ; mv $i.tmp$$ $i ; done

and have often wondered if there’s an easier way. One way would be to put something like this in a script and avoid re-creating a command line like this every time. I tried this in perl, making a stdin/stdout filter by default, and a file modifying helper when files are listed specifically (not really a filter anymore, but often how I’d like to be able to invoke c++filt). Here’s that beastie:

#!/usr/bin/perl

use warnings ;
use strict ;

# slurp whole file into a single variable
undef( $/ ) ; #slurp mode

if ( scalar(@ARGV) )
{
   foreach (@ARGV)
   {
      my $cmd = "cat $_ | c++filt |" ;

      open( my $fhIn, $cmd ) or die "pipe open '$cmd' failed\n" ;

      my $file_contents = ( <$fhIn> ) ;

      close $fhIn or die "read or pipe close of '$cmd' failed\n" ;


      open( my $fhOut, ">$_") or die "open of '$_' for write failed\n" ;

      print $fhOut $file_contents ;

      close $fhOut or die "close or write to '$_' failed\n" ;
   }
}
else
{
   my $file_contents = ( <> ) ;

   print $file_contents ;
}

This also works, but is clunkier than I expected. If anybody knows of some way to use or abuse the in place filtering capability of perl (ie: perl -p -i) to do something like this, or some other clever way to do this, I’d be curious what it is?

Posted in C/C++ development and debugging. | Tagged: , , | 10 Comments »

stripping color control characters from lssam output.

Posted by peeterjoot on March 17, 2010

There’s probably a billion ways to do this, but here’s one that appears to work.  If you have TSA command output that has been collected by a somebody who did not use the –nocolor option.

perl -p -e 's/\x1b\[\d+m//g;' < lssamBEFORE.out

The \x1b is the ESC character.  This says to remove that ESC followed by a ‘[‘ character, then 1 or more digits and the character ‘m’, and do it for all lines.

Posted in perl and general scripting hackery | Tagged: , , , | Leave a Comment »

Shell tips and tricks. The most basic stuff.

Posted by peeterjoot on February 28, 2010

A while back I assembled the following shell tips and tricks notes for an ad-hoc ‘lunch and learn’ session at work. For some reason (probably for colour) I had made these notes in microsoft word instead of plain text. That made them of limited use for reference, not being cut and pastable (since word mucks up the quote characters). Despite a few things that are work centric (references to clearcase and our source code repository directory structure), there’s enough here that is generally applicable that the converted-to-text version makes sense to have available as a blog post.

Variables

 
# a=foo
# b=goo
 
# echo $a $b

foo goo
 
# p=/view/joes_view/vbs/engn/sqo
# diff $p/sqlofmga.C .

You will have many predefined variables when you login. Examples could include

 
$HOME                            home dir.
$EDITOR                          preferred editor.
$VISUAL                          preferred editor.
$REPLYTO                         where mail should be addressed from.
$PS1                             What you want your shell prompt to look like.
$TMPDIR                          Good to set to avoid getting hit as badly when /tmp fills up.
$CDPATH                          good for build tree paths.

CDPATH Example:

 
CDPATH=.:..:/home/hotelz/peeterj:/vbs/engn:/vbs/test/fvt/standalone:/vbs/common:/vbs/common/osse/core

one can run: ‘cd sqo’ and go right to that component dir.

 
$CDPATH                            good for build tree paths.  Example: CDPATH=.:..:/home/hotelz/peeterj:/vbs/engn:/vbs/test/fvt/standalone:/vbs/common:/vbs/common/osse/core ; one can run: 'cd sqo' and go right to that component dir.
 
$1                                 First argument passed to a shell script (or shell function).
$2
$*                                 All arguments that were passed to a shell script
 

Wildcards

All files starting with an 'a', and ending with a 'b'
 
# ls a*b

All files of the form 'a'{char}'b'

 
# ls a?b

Quotes

Three different kinds. This is one of the most important things to know for any "shell programming".

Single quotes

Variables and wildcards are NOT expanded when contained in a set of single quotes. Example:

 
# a=foo
# b='goo boo'
# echo '$a $b'

$a $b

Double quotes

Variables and wildcards (*, ?, $...) are expanded (called globbing and/or interpolation sometimes depending on the context).

 
# a=foo
# b='goo boo'
# echo "$a $b"
 
foo goo boo

You don't have to double quote something for this sort of wildcard, and variable expansion, so you could write:

 
# echo $a $b

and the result will be the same:

 
foo goo boo

There is a difference though, namely, echo will treat this as three arguments, because the command is expanded before the final execution. This can be important when you want something with spaces to be treated as a single argument. Example:

 
# gcc x.c | grep ': error'

Back Quotes

Expression is executed as a command.

 
# cleartool checkin -c 'Please, let this compile and link this time.' `lsco`

Execution of a command in another one can also be done with a variable syntax (sometimes useful for nesting stuff). These would produce the same output:

 
# echo It once was: `date`
# echo It once was: $(date)

It once was: Mon Jun 18 16:20:28 EDT 2007

The alternate syntax can be useful if you wanted to run a command inside of a command.

Other Special Shell Characters

 
~              your home dir.
;              command separator
\              backslash (escape).  When you want to use a special character as is, you either have to single quote it, or use an escape character to let the shell know what you want.

Redirect input and output

 
|                            pipe input from another command
<                            redirect input
>                            redirect output
2>&1                         redirect stderr to stdin
 
echo hi > hi.out
cat < hi.out
cat /tmp/something | grep ': error' | sort
something_that_may_fail >/tmp/blah 2>&1
something_that_may_fail >/tmp/blah 2>/tmp/blah.err

The for loop.

If you have the quotes and variables mastered, this is probably the next most useful construct for ad-hoc command line stuff. We use computers for repetitive stuff, but it's amazing how little people sometimes take advantage of this.

By example:

 
# for i in `grep : /tmp/something` ; do echo $i ; done

Here, i is the variable you name, and you can reference it in the loop as $i.

 
# for i in `cat my_list_of_files` ; do cleartool checkout -nc $i ; done

If the command you want to run is something that accepts multiple arguments then you may even need a for loop. The second example above could be written:

 
# cleartool checkout -nc `cat my_list_of_files`

It's good to know both ways of doing this, since the backquote method can sometimes hit shell "environment length" limits that force you to split up what has to be done or to do parts individually.

Some common useful programs for command line use.

grep search for an expression or expressions

 
# gcc x.c | grep ': error'
# grep -n mystring `lsbranch -file` > o ; vim -q o
# grep -e something -e somethingelse
# grep -v '^ *$'                            # all non blank lines.

tr translate characters (one to another, to uppercase, ...)

 
# echo $PATH | tr : '\n'              # print out path elements on separate lines.
# tr '[a-z]' '[A-Z]'                  # upper case something.

cut Extract parts of a line

 
# cut -f1 -d:            # extract everything in the line(s) up until the first colon char.
# cut -f3-4              # extract from positions 3 and 4.

sort sort stuff

 
# sort                 # plain old sort.              
# sort -u              # unique lines only
# sort -n              # numeric sort
# sort -t :            # sort using alternate field separator (default is whitespace).

xargs Run a command on all the items in standard input.

 
# find . -type f -maxdepth 2 | xargs ls -l

sed search and replace program

 
# sed 's/string1/string2/g'
# sed 's/#.*//'                                          # remove script "comments"
# sed 's!//.*!!'                                           # remove C++ comments.
# sed 's/[ \t]*$//'                            # strip trailing whitespace
# sed 's/\(.*\):\(.*\)/\2:\1/'              # swap stuff separated by colons.

perl Any of the above and pretty much anything else you can think of.

Explaining perl is a whole different game, and if you don't know it, it probably won't look much different than ASCII barf (much like some of the sed commands above).

Some examples (things done above with a mix of other commands) :

 
# g++ x.c | perl -ne 'print if /: error/'
# perl -pe 's/string1/string2/g'
# perl -e ' $p = $ENV{PATH}; $p =~ s/:/\n/g ; print "$p\n" '
# perl -pe '$_ = uc($_)'

What's notable here is not the perl itself, but the fact that to run some of these commands required passing a pile of shell special characters. In order to pass these all to perl unaltered, it was required to use single quotes, and not double quotes.

Common to grep, sed, and perl is a concept called a regular expression (or regex). This is an extremely valuable thing to get to know well if you do any programming, since there's often a lot of text manipulation required as a programmer. Going into detail on this topic will require it's own time.

Shell Aliases

These are one liner "shortcuts". ksh/bash example:

 
alias la='ls -a'

Shell Functions

Multiline shortcuts. ksh/bash example:

 
function foo
{
   echo boo
   echo foo
}

This is similar to putting the commands in their own file and running that as a script, and can be used as helper functions in other scripts or as more complex "alias"es.

The example above could be written as:

 
alias foo='echo boo ; echo foo'

But functions also allow you do pass arguments. Example:

 
function debugattach {
    $1 ~/sqllib/adm/db2sysc --pid $(db2pd -edus -dbp $2 | perl -ne 'next unless s/db2sysc PID: // ; print')
}
 
alias ddda='debugattach ddd'
alias gdba='debugattach gdb'

calling this with ddda 0 will attach the ddd debugger to the db2sysc process db2pd reports to be node 0.

Except for the perl fragment, which is basically a combined 'grep' and 'sed', this example uses many of the things that have been explained above (variables, embedded command, single quotes to avoid expansion and for grouping arguments).

Posted in C/C++ development and debugging. | Tagged: , , , , | 1 Comment »

A fun regular expression for the day. change all function calls to another.

Posted by peeterjoot on December 18, 2009

Hit some nasty old school code today that dates back to our one-time 16-bit OS/2 port. I figured out that 730 lines of code for an ancient function called sqlepost() could all be removed if I could make a change of all lines like so:

- sqlepost(SQLT_SQLE, SQLT_SQLE_SUBCOORD_TERM, 122, SQLE_EBAD_DB_ERR, sizeof(eRC), &eRC);
+ pdLog( PD_DEV, SQLT_SQLE_SUBCOORD_TERM, eRC, 122, PD_LEVEL_SEV, 0 ) ;

(83 places). A desirable side effect of making this change is that we will stop logging the return code as a byte reversed hex number, and instead log it as a return code. Easier on developers and system testers alike.

perl -p is once again a good friend for this sort of task

s/sqlepost\s*\(
\s*(.*?)\s*, # componentID -- unused.
\s*(.*?)\s*, # functionID
\s*(.*?)\s*, # probe
\s*(.*?)\s*, # index -- unused.
\s*(.*?)\s*, # size -- unused.
\s*&(.*?)\s*\)\s*; # rc
/pdLog( PD_DEV, $2, $6, $3, PD_LEVEL_SEV, 0 ) ;/x ;

I made a quick manual modification of each of the call sites that weren’t all in one line, with control-J in vim to put the whole function call on one line, then just had to run:

perl -p -i ./replacementScript `cat listOfFilesWithTheseCalls`

Voila! Very nice if I have to say so myself;)

EDIT: it was pointed out to me that the regular expressions used above are not entirely obvious.  Here’s a quick synopsis:

\s       space
.        any character
*        zero or more of the preceding
(.*)     capture an expression (creates $1, $2, ...)
            ie. zero or more of anything.
(.*?)    capture an expression, but don't be greedy, only capturing the
            minimal amount.
\(       a plain old start brace character (ie. non-capturing)
\)       a plain old end brace character.

Posted in C/C++ development and debugging., perl and general scripting hackery | Tagged: , , | Leave a Comment »