Peeter Joot's (OLD) Blog.

Math, physics, perl, and programming obscurity.

Archive for the ‘C/C++ development and debugging.’ Category

Found code that fails grade 11 math: log base conversion.

Posted by peeterjoot on March 26, 2014

Unless my headache is impacting my ability to manipulate log identities, this LOG10 code is plain wrong:

      static double findLog(double value, FunctionType logFunction)
      {
         switch (logFunction)
         {
            case LN:
               return log(double(value));
            case LOG10:
               return log(double(value)) / log(2.0);
...
      }

Perhaps it is dead code, since this divide should be log(10.0) (or just M_LN10), but nobody appears to have noticed.

Two other possibilities are:

  • somebody was being way too clever, and when they wrote LOG10, they meant it as Log base 0b10.
  • somebody thought that for computer software a “natural logarithm” would use base 2.

Posted in C/C++ development and debugging. | Tagged: , , | 2 Comments »

Understanding a powerpc subfc, subfe sequence used to compare without branching.

Posted by peeterjoot on September 25, 2013

I was tasked to review some inline assembly, essentially like so:

Uint64 negativeLessThanX(Uint64 v0, Uint64 v1)
{
   Uint64 mask;
   __asm__ volatile ("subfc %0,%2,%1; subfe %0,%0,%0" \
           /* outputs */ : "=r"(mask) /* %0 */        \
     /* inputs */  : "r"(v0),   /* %1 */        \
                     "r"(v1)    /* %2 */        \
           /* clobbers */: "xer"      /* condition registers (CF, ...) */ \
    );
   return mask;
}

This should have the effect of doing:

Uint64 negativeLessThanX(Uint64 v0, Uint64 v1)
{
   Uint64 mask;

   subfc mask,v1,v0
   subfe mask, mask, mask

   return mask;
}

From the powerpc instruction set reference (PowerISA_V2.07_PUBLIC.pdf), our subfc, and subfe instructions are respectively ‘Subtract From Carrying XO-form’, ‘Subtract From Extended XO-form’ :

subfc RT,RA,RB (OE=0 Rc=0)
RT  <- ¬ (RA) + (RB) + 1 
(RT = RB - RA)

subfe RT,RA,RB (OE=0 Rc=0)
RT  <- ¬ (RA) + (RB) + CA

Since we have RA = RB in the subfe, and self plus complement is all bits set, we essentially have

RT  <- 0xFFFFFFFFFFFFFFFF + CA

Let’s walk through this in the debugger to understand it. We have:

(dbx) listi negativeLessThanX
0x100000a24 (negativeLessThanX(unsigned long,unsigned long)+0x24) 7c030010       subfc   r0,r3,r0
0x100000a28 (negativeLessThanX(unsigned long,unsigned long)+0x28) 7c000110       subfe   r0,r0,r0

So, we have r0 = r0 – r3. After this we have a r0 = r0 – r0, but also bringing in the carry flag (CA bit in XER). Let’s see this in the debugger, first with r0=2, r3=1 :

(dbx) stop in negativeLessThanX
[1] stop in negativeLessThanX(unsigned long,unsigned long)
(dbx) c
[1] stopped in negativeLessThanX(unsigned long,unsigned long) at line 6 in file "w.C" ($t1)
    6      __asm__ volatile ("subfc %0,%2,%1; subfe %0,%0,%0" \
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a20 ($t1)
0x100000a20 (negativeLessThanX(unsigned long,unsigned long)+0x20) e86100b8          ld   r3,0xb8(r1)
(dbx)
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a24 ($t1)
0x100000a24 (negativeLessThanX(unsigned long,unsigned long)+0x24) 7c030010       subfc   r0,r3,r0
(dbx) p $r0
0x0000000000000002
(dbx) p $r3
0x0000000000000001
(dbx) p $xer
0x0000000020000002
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a28 ($t1)
0x100000a28 (negativeLessThanX(unsigned long,unsigned long)+0x28) 7c000110       subfe   r0,r0,r0
(dbx) p $r0
0x0000000000000001
(dbx) p $xer
0x0000000020000002
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a2c ($t1)
0x100000a2c (negativeLessThanX(unsigned long,unsigned long)+0x2c) f8010070         std   r0,0x70(r1)
(dbx) p $r0

We see that the subfc does generate r0=1 as expected. The CA bit of the XER is ’34 Carry (CA)’, and out XER value is: 0b[0….]00100000000000000000000000000010. Bits 32, 33 are clear, but CA (34) is set. At first this seems curiously inverted. We have $r0 = v0-v1 > 0, so why is CA set?

How about with v0=1, v1=2:

(dbx)
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a24 ($t1)
0x100000a24 (negativeLessThanX(unsigned long,unsigned long)+0x24) 7c030010       subfc   r0,r3,r0
(dbx) p $r0
0x0000000000000001
(dbx) p $r3
0x0000000000000002
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a28 ($t1)
0x100000a28 (negativeLessThanX(unsigned long,unsigned long)+0x28) 7c000110       subfe   r0,r0,r0
(dbx) p $r0
0xffffffffffffffff
(dbx) p $xer
0x0000000000000013
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a2c ($t1)
0x100000a2c (negativeLessThanX(unsigned long,unsigned long)+0x2c) f8010070         std   r0,0x70(r1)
(dbx) p $r0
0xffffffffffffffff

The intermediate subtraction now produces a -1 (0xffffffffffffffff), but now we have CA clear from the subfc, since we see XER=0b[0….]00000000000000000000000000010011 (with bit 34 clear). Again, this seems backwards.

The trick to understanding this is that the subtract isn’t implemented as a subtraction, but an addition. For 2-1, where we don’t have to borrow, our subfc is actually doing:

~1 + 2 + 1:
 1110
+0010
+0001
=====
10001

No borrow is required, but we do generate a carry when doing this _addition_ operation!

Compare that to the a 1-2 operation:

~2 + 1 + 1:
 1101
+0001
+0001
=====
 1111

Also compare to a 1-1 operation:

~1 + 1 + 1:
 1110
+0001
+0001
=====
10000

We generate a carry (now borrow!) when v0=v1 and v0>v1. We do not generate a carry (CA clear) for v0<v1, so that the end result is that for v0<v1 we have -1, and 0 otherwise.

Posted in C/C++ development and debugging. | Tagged: , , , , , , , | Leave a Comment »

A nice example of one reason to const your parameters.

Posted by peeterjoot on January 8, 2013

I’d done a test build of our code with the clang compiler, and tried out its static analyzer. It politely pointed out that the code like the following was likely wrong:

struct cpuinfo
{
   int   pkg_id ;
   int   core_id ;
   int   smt_id ;
   int   logicalId ;

   void init( int pkg, int core, int smt, int logical )
   {
       pkg_id    = pkg ;
       core_id   = core ;
       smt_id    = smt ;
       logical   = logicalId ;
   }
} ;
$ clang --analyze -c cpuinfo.C
cpuinfo.C:13:8: warning: Value stored to 'logical' is never read
       logical   = logicalId ;
       ^           ~~~~~~~~~
1 warning generated.

In fixing this, I made two changes. The first was const’ing the parameters:

   void init( const int pkg, const int core, const int smt, const int logical )

If that had been done, this runtime error wouldn’t have been possible:

clang -c cpuinfo.C
cpuinfo.C:13:18: error: read-only variable is not assignable
       logical   = logicalId ;
       ~~~~~~~   ^
1 error generated.

Had the member variables been named consistently I doubt this error would have been made. It would have been too obvious that something was wrong:

   void init( int pkg, int core, int smt, int logical )
   {
       pkg_id    = pkg ;
       core_id   = core ;
       smt_id    = smt ;
       logical   = logical_id ;
   }

so, I also followed up the const’ing, done as a preventive maintanance action, with a search and replace to fix up the logicalId member variable:

s/logicalId/logical_id/g

and then made the actual fix in question.

I could have fixed the code only changing the one assignment line in question. The final fix ended up just touching 5 additional lines, since the code was self contained.

I got asked by the original author why I did the const’ing part of this change. It suprised me to get that question, and reminded me of the time where I was once met with a significant objection to using “new-fangled C++” features like ‘const’ in our code.

At the time, most of our code was very C-like, despite being compiled with a C++ compiler. The component owner said that he’d only consider using const if I could prove there was a performance benefit to doing so. I don’t think I ever proved that to him, but I think here’s a nice demo of why it’s a good habit in general to use const whenever there isn’t explicit intention to allow modification.

Posted in C/C++ development and debugging. | Tagged: , | 1 Comment »

cost of a global array access?

Posted by peeterjoot on December 17, 2012

For an array where the value of

   myGlobalArrayOfInts[ THE_ARRAY_OFFSET_FOR_OPERATION__FOO ] == THE_VALUE_FOR_OPERATION__FOO

some developer used

   int value = THE_VALUE_FOR_OPERATION__FOO ; // for performance, just set it.

instead of

   int value = myGlobalArrayOfInts[ THE_ARRAY_OFFSET_FOR_OPERATION__FOO ]

At one point in time the assumption that myGlobalArrayOfInts[ THE_ARRAY_OFFSET_FOR_OPERATION__FOO ] was never have been any different than THE_VALUE_FOR_OPERATION__FOO was true. However, this bit of premature optimization inevitably broke when maintainance programming required that this array value be changed.

The developer fixing this asked me “How expensive is this array access”. I can’t exactly quantify the cost of this, although at least half of it in this case is probably going after the address of the global itself. On AIX I believe that this may require loading the TOC register if the last global access was in a different shared lib, and there’s probably some similar notion on most other systems. I happened to be using a Linux build at that point of time, so to help quantify the cost I took a look at the asm generated for a function of the form

int foo0()
{
   return myGlobalArrayOfInts[ THE_ARRAY_OFFSET_FOR_OPERATION__FOO ] ;
}

Here’s the code that was generated

0000000000002f50 :
    2f50:   48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 2f57 
    2f57:   48 63 42 14             movslq 0x14(%rdx),%rax
    2f5b:   c3                      retq

We’ve essentially got two dereferences required for this array dereference, one that’s probably getting the address of the global itself. I’d guess that this mov instruction ends up rewritten by the linker later, putting in actual values for the location of this global instead of the ’00 00 00 00′ seen in the instruction dump. Then we’ve got one more dereference to get at the global value itself.

Would this really have made a measurable difference in a function that was 841 lines of code long, (1067 straight line instruction count)? Doubtful.

Posted in C/C++ development and debugging. | Tagged: , , | Leave a Comment »

C++11 play. Simple hashing, a comparison with perl

Posted by peeterjoot on December 17, 2012

I’m very impressed with how easy C++11 makes it possible to implement a simple hash

#include <set>
#include <string>
#include <iostream>

using namespace std ;

int main()
{
   set< string > d ;

   d.insert( "uu:long" ) ;
   d.insert( "uu:long" ) ;
   d.insert( "uu:long long" ) ;

   d.insert( "qq:int" ) ;
   d.insert( "Test:int" ) ;

   for ( auto & kv : d )
   {
      cout << kv << endl ;
   }
}

If I had to do hashing like this before I’d probably have seen if I could generate a text file, and then post process it with perl, where the same hashing would look like:

#!/usr/bin/perl

my %d ;

$d{'uu:long'}++ ;
$d{'uu:long'}++ ;
$d{'uu:long long'}++ ;

$d{'qq:long'}++ ;
$d{'Test:int'}++ ;

foreach ( keys %d )
{
   print "$_\n" ;
}

Other the the header includes and namespace statement, it’s not really any harder to do this in C++ now. I just have to wait 5 years before all our product compilers catch up with the standard, I could start thinking about using this sort of code in production. Unfortunately in production I’d also have to deal with exceptions, and the (hidden) fact that the std::allocator is not generally appropriate for use within DB2 code.

Posted in C/C++ development and debugging. | Tagged: , , , , , | Leave a Comment »

ease of getting strncat wrong.

Posted by peeterjoot on December 5, 2012

I’d done a test build with the clang compiler of our db2 code and fired off some defects and emails to some of the code owners about messages from that compiler like:

foo.C:1220:15: warning: the value of the size argument in ‘strncat’ is too large, might lead to a buffer overflow [-Wstrncat-size]

It’s a great message, clear and unambiguous: “your code is wrong”!

Here’s a sample of the code in question that clang was complaining about:

   strncat( dirPath, ptrToDirectories, ( sizeof( dirPath ) - strlen( dirPath ) ) );

The compiler could just as well complain about simpler strncat calls.  Imagine at first that dirPath was all zeros, then the above code would be equivalent to:

   strncat( dirPath, ptrToDirectories, sizeof( dirPath ) ) ;

It’s not immediately obvious reading the strncat man page that this is wrong, since that man page says “[strncat] will use at most n characters from src”. However, it also says “As with strcat(), the resulting string in dest is always null terminated.”

Does this mean that it will use no more than n characters from src, but that it null terminates within the range of those n-characters, or does this mean use (if strlen(src) > n) n characters from src and then add a null terminator after that?

To answer that question, and rule out a possible false positive warning from the compiler I wrote the following simple bit of test code:

#include <string.h>
#include <stdio.h>

struct string
{
   char dest[2] ;
   char overflow ;
} ;

int main()
{
   string o ;
   o.dest[0] = 0  ;
   o.dest[1] = 1  ;
   o.overflow = 1 ;

   printf( "%d\n", (int)o.overflow ) ;

   strncat( o.dest, "abc", sizeof(o.dest) ) ;

   printf( "%d\n", (int)o.overflow ) ;

   return 0 ;
}

And the verdict was:

$ g++ -g t.C ; a.out
1
0

Yup, the code was bad, and the compiler is without fault. We actually have a different function for strncat’ing in our code that takes the actual buffer size, to make this harder to get wrong, but the code in question was not using that.

Posted in C/C++ development and debugging. | Tagged: , , | Leave a Comment »

macro expansion order, and evil macros producing commas.

Posted by peeterjoot on November 7, 2012

I was looking at a bit of code, the problematic portion, after wading through many layers of nested macro hell, was roughly of the form:


    void traceFunction( int n, ... )
    {
    }

    int main()
    {
       #define TUPLE( size, arg )   (size), (arg)

       #define traceInt( sz1, ptr1 )    traceFunction( 1, sz1, ptr1 )
       #define trace( arg1 )            traceInt( arg1 )

       int x = 3 ;

       trace( TUPLE( sizeof(x), &x ) ) ;

       return 0 ;
    }

Observe that one of the macros produces a comma separated list of arguments, and that the `trace` macro, which ends up actually called with two parameters (the result of the `TUPLE` macro), is declared as if it has a single parameter.

Somewhat mysteriously, this actually compiles on many different platforms and compilers (about 16 different combinations), but breaks with the intel 13 compiler beta (although that compiler does have a compatibility mode that I’d prefer not to depend on)

I figured I could fix this with:

    #define trace( sz1, ptr1 )       traceFunction( 1, sz1, ptr1 )

eliminating the middle man, but this gives me, for C++:

t.C(23): error: identifier “trace” is undefined

and for C compilation:

t.c(23): error #54: too few arguments in invocation of macro “trace”

error, indicating that the attempt to expand the macro `trace` occurs before the attempt to expand the `TUPLE` macro. I think I can fix this provided I rely on C99 macro varargs like so:

    #define traceInt( sz1, ptr1 )    traceFunction( 1, sz1, ptr1 )
    #define trace( ... )             traceInt( __VA_ARGS__ )

That’s likely an acceptable solution, given that we’ve now got other dependencies on C99 __VA_ARGS__ in the code.

It appears that, rather luckily, I never needed to know exactly what order nested macro expansion happens in before this.

Posted in C/C++ development and debugging. | Tagged: | Leave a Comment »

very deceptive indenting.

Posted by peeterjoot on October 16, 2012

Check out the following mismatched indenting (counting carefully if it doesn’t stand out obviously … it didn’t to me) :

#ifdef SQLUNIX
   #ifdef OSS_AIXPPC
      #ifdef OSS_ARCH_P64
         #define SQLO_SHR_OBJECT "(shr_64.o)"
      #else
         #define SQLO_SHR_OBJECT "(shr.o)"
    #endif
#endif

The ending #endif was actually 550 lines away in the file!

Posted in C/C++ development and debugging. | Tagged: , | 1 Comment »

Poking around to see how much stack to corrupt to alter a local variable.

Posted by peeterjoot on September 10, 2012

I’ve got a scenerio where it appears that the last stack variable declared appears to be have been corrupted (the highest order 32-bits of this 64-bit integer look like they’ve been zeroed). That got me wondering how far a calling function would have to corrupt to muck up this variable. Here’s what I wrote to test this:

#include <stdio.h>

int foo( int r )
{
   Uint64         x ;
   Uint64         y ;
   Uint64         z ;
   Uint64         w = 0 ;

   w = 1 ;

   printf( "&x: 0x%0lx\n", (long)&x ) ;
   printf( "&w: 0x%0lx\n", (long)&w ) ;

   if ( r )
   {
      foo( r - 1 ) ;
   }

   return w ;
}

int main()
{
   foo( 2 ) ;

   return 0 ;
}

and the results on this (linuxamd64) system:

&x: 0x7fffffffd490
&w: 0x7fffffffd488
&x: 0x7fffffffd450
&w: 0x7fffffffd448
&x: 0x7fffffffd410
&w: 0x7fffffffd408

So, it looks like I need about at least a (0x88-0x50 =) 56 byte corruption to do the job.

A quirk: Also see how the variables in my function actually got laid out in reverse address order on the stack. I’d not have expected that. However, since I don’t really have any reason to expect any specific stack layout so perhaps I shouldn’t be surprised.

Posted in C/C++ development and debugging. | Tagged: , | Leave a Comment »

How to find exported symbols in Windows dlls

Posted by peeterjoot on September 5, 2012

One liner:

WSDB::E:\snap\> dumpbin /exports db2app64.dll | grep DiagWhat
        510  212 00BD1152 pdDiagWhatIsRc = pdDiagWhatIsRc

This isn’t an nm equivalent, instead is more like the AIX command to dump just the exported symbols from a shared object (dump -TvHX32_64), but enough to tell me that I shouldn’t have a link error this iteration of my build.

I found it curious that ‘dumpbin /symbols’ didn’t produce any output for this dll, as is does for a .obj file, and don’t really know what the reason for that is.

Posted in C/C++ development and debugging. | Tagged: , , | Leave a Comment »