Peeter Joot's (OLD) Blog.

Math, physics, perl, and programming obscurity.

Fun with AIX nested signal handlers.

Posted by peeterjoot on May 5, 2010

One of our more industrious testers was digging into a problem with the debugger themselves and finds that the debugger didn’t give much in the way of a stacktrace

"and no more for a stack. I attached the debugger and see this:"

(dbx)  thread current 216
warning: Thread is in kernel mode, not all registers can be accessed.
(dbx) where
raise.nsleep(??, ??) at 0x90000000002a824
sleep(??) at 0x90000000011dac8
sqloinst.sqloSleepInstance(), line 3487 in "sqloinst.C"
sqlodump.sqlo_trca(signum = 5, sigcode = 0x070000006c7eb5c0, scp = 0x070000006c7eb310), line 532 in "sqlodump.C"
sqloEDUCodeTrapHandler(signum = 5, sigcode = 0x070000006c7eb5c0, scp = 0x070000006c7eb310), line 4721 in "sqloedu.C"

"How come it doesn't show me anything but the trap handler? How do I go about getting a real usable stack?"

What do we see from this? First thing of interest is that the signal number is SIGTRAP==5. That’s an AIX’ism, and you won’t see this on a sane platform that treats low memory addresses as invalid. AIX lets you read from 0x0 and other similarly “bad” pointers, unless you compile with -qcheck=nullptr. If you do this, the compiler will insert runtime code using one of the trap conditional instructions so that it makes the code blow up on NULL pointers instead of blissfully dereferencing them. Here’s an example

#include <stdio.h>

struct foo
   int y ;
   foo * b ;
} ;

foo * nullpointer = 0 ;

int aixCanBeEvil()
   return nullpointer->y ;

int andYouCanMakeItWorse( foo * nullishButNotQuite )
   return nullishButNotQuite->y ;

int main()
   aixCanBeEvil() ;
   andYouCanMakeItWorse( nullpointer->b ) ;

   printf( "Guahhaha... You are running an OS from the dark side of the force.\n" ) ;

   return 0 ;

Let’s try it

# xlC -qlist t.C
# a.out
Guahhaha... You are running an OS from the dark side of the force.

Suprised? Any sane developer would be. An operating system wouldn’t allow this would they? That’s insane! Cough, cough. There’s some technical reasons why this is a good reason to allow on a powerpc platform, but favoring efficiency over correctness and developer error wasn’t a terribly friendly move by the OS guys, and it’s now a binary compatability issue that can’t be changed.

Now, if you add in the -qcheck=nullptr flags to the compilation, you’ll see that things behave better

# xlC -qlist -qcheck=nullptr t.C
# a.out
Trace/BPT trap(coredump)

Here’s a fragment of the listing to show how this is managed:

     | 000000                           PDEF     aixCanBeEvil()
     | 000000                           AKA       aixCanBeEvil__Fv
   11|                                  PROC
   13| 000000 lwz      80620004   1     L4A       gr3=.nullpointer(gr2,0)
   13| 000004 lwz      80630000   2     L4A       gr3=nullpointer(gr3,0)
   13| 000008 twi      0C430200   2     TCT4     *gr574=gr3,512,0x8/llt,3
   13| 00000C lwz      80630000   1     L4A       gr3=(foo).y@0(gr3,0,trap=gr574)
   14|                              CL.3:
   14| 000010 bclr     4E800020   0     BA        lr


     | 000000                           PDEF     andYouCanMakeItWorse(foo *)
     | 000000                           AKA       andYouCanMakeItWorse__FP3foo
    0|                                  PROC      nullishButNotQuite,gr3
    0| 000040 stwu     9421FFC0   1     ST4U      gr1,#stack(gr1,-64)=gr1
    0| 000044 stw      90610058   1     ST4A      nullishButNotQuite(gr1,88)=gr3
   18| 000048 lwz      80610058   1     L4A       gr3=nullishButNotQuite(gr1,88)
   18| 00004C twi      0C430200   2     TCT4     *gr577=gr3,512,0x8/llt,3
   18| 000050 lwz      80630000   1     L4A       gr3=(foo).y@0(gr3,0,trap=gr577)
   19|                              CL.4:
   19| 000054 addi     38210040   1     AI        gr1=gr1,64
   19| 000058 bclr     4E800020   0     BA        lr

See those twi instructions. Well, those are the Null pointer checks (trap word immediate), explicitly added by the compiler, because the OS won’t do it for you.

So, where are we actually blowing up? The registers in the debugger don’t tell us:

(dbx) registers
  $r0:0xbadc0ffee0ddf00d  $stkp:0x070000006c7eabe0   $toc:0xbadc0ffee0ddf00d
  $r3:0xbadc0ffee0ddf00d    $r4:0xbadc0ffee0ddf00d    $r5:0xbadc0ffee0ddf00d
  $r6:0xbadc0ffee0ddf00d    $r7:0xbadc0ffee0ddf00d    $r8:0xbadc0ffee0ddf00d
  $r9:0xbadc0ffee0ddf00d   $r10:0xbadc0ffee0ddf00d   $r11:0xbadc0ffee0ddf00d
 $r12:0xbadc0ffee0ddf00d   $r13:0xbadc0ffee0ddf00d   $r14:0xbadc0ffee0ddf00d
 $r15:0xbadc0ffee0ddf00d   $r16:0xbadc0ffee0ddf00d   $r17:0xbadc0ffee0ddf00d
 $r18:0xbadc0ffee0ddf00d   $r19:0xbadc0ffee0ddf00d   $r20:0xbadc0ffee0ddf00d
 $r21:0xbadc0ffee0ddf00d   $r22:0xbadc0ffee0ddf00d   $r23:0xbadc0ffee0ddf00d
 $r24:0xbadc0ffee0ddf00d   $r25:0xbadc0ffee0ddf00d   $r26:0xbadc0ffee0ddf00d
 $r27:0xbadc0ffee0ddf00d   $r28:0xbadc0ffee0ddf00d   $r29:0xbadc0ffee0ddf00d
 $r30:0xbadc0ffee0ddf00d   $r31:0xbadc0ffee0ddf00d
 $iar:0x090000000002a824   $msr:0xbadc0ffee0ddf00d    $cr:0xdeadbeef
$link:0x090000000002a828   $ctr:0xbadc0ffee0ddf00d   $xer:0xdeadbeef

The DeadBeef and BadCoffeeOddFood register values look a bit disturbing, but I seem to recall that these are just due to the syscall (or some types of syscalls).
Observe that both the iar (current instruction address) and the link (address of the next instruction after a function call return) are in the nsleep routine:

in sqloEDUCodeTrapHandler at line 1518 in file "" ($t216)
0x90000000002a824 (nsleep+0xe4) 4e800421 bctrl 

And the debugger doesn’t want to format scp for us that has the signal context

(dbx) up
sqloEDUCodeTrapHandler(signum = 5, sigcode = 0x070000006c7eb5c0, scp = 0x070000006c7eb310), line 4721 in "sqloedu.C"
(dbx) p scp
(dbx) p *scp

Casting doesn’t help either

(dbx) p *(struct sigcontext *)scp
print *(struct sigcontext *)scp
                           ^ syntax error

The debugger just doesn’t have the symbol information for this type it seems. We can dump that sigcontext in the debugger by address without any trouble

(dbx) 0x070000006c7eb310/156
0x070000006c7eb310:  00000000 500f51e8 00000000 00080000
0x070000006c7eb320:  00000000 00000000 00000000 00000000
0x070000006c7eb330:  00000000 00000000 00000002 2ff49fe8
0x070000006c7eb340:  09000000 1655ed20 07000000 6c7eb800
0x070000006c7eb350:  09001000 a3184198 00000000 00000000
0x070000006c7eb360:  00000000 00000000 00000000 0000000a
0x070000006c7eb370:  00000000 00000001 00000000 00000001
0x070000006c7eb380:  00000000 00000000 00000000 00000014
0x070000006c7eb390:  00000000 00000028 00000000 0000000a
0x070000006c7eb3a0:  09000000 1655a634 00000001 13ef7800
0x070000006c7eb3b0:  00000000 00000000 00000000 00000000
0x070000006c7eb3c0:  00000000 00000000 07000000 52f946e7
0x070000006c7eb3d0:  00000001 167f7944 00000001 167f7ddc
0x070000006c7eb3e0:  00000001 167f791c 00000001 167fa4f8
0x070000006c7eb3f0:  00000001 167fbc68 09001000 a2d3d070
0x070000006c7eb400:  00000000 00000000 00000000 00000000
0x070000006c7eb410:  00000000 00000000 00000000 00000000
0x070000006c7eb420:  00000000 00000000 09001000 a0c479b8
0x070000006c7eb430:  09001000 a3171ce8 09000000 16648cc8
0x070000006c7eb440:  a0000000 0002d032 09000000 1655ede4
0x070000006c7eb450:  09000000 1655ed20 09000000 00048e00
0x070000006c7eb460:  22000242 20000004 82024000 00000000
0x070000006c7eb470:  09000000 1655ede4 00000000 00000000
0x070000006c7eb480:  00000000 0000000b 3fe00000 00000000
0x070000006c7eb490:  43300800 00000000 00000000 82064000
0x070000006c7eb4a0:  43300000 00000000 43300800 00000047
0x070000006c7eb4b0:  00000000 00000000 00a3d615 00000100
0x070000006c7eb4c0:  00000001 10d17e08 3ff193ee 05bf4018
0x070000006c7eb4d0:  3fd55555 555450ef 3c7abc9e 3b39803f
0x070000006c7eb4e0:  3fe00000 00000000 00000000 00000000
0x070000006c7eb4f0:  00000000 00000000 00000000 00000000
0x070000006c7eb500:  00000000 00000000 00000000 00000000
0x070000006c7eb510:  00000000 00000000 00000000 00000000
0x070000006c7eb520:  00000000 00000000 00000000 00000000
0x070000006c7eb530:  00000000 00000000 00000000 00000000
0x070000006c7eb540:  00000000 00000000 00000000 00000000
0x070000006c7eb550:  00000000 00000000 00000000 00000000
0x070000006c7eb560:  00000000 00000000 00000000 00000000
0x070000006c7eb570:  00000000 00000000 01000000 00000083

Well, that’s a bit of a pain in the butt, but not too hard to work around. With a small cut and paste and fun with regular expressions we have a debuggable helper program

#include <signal.h>
#include <stdio.h>
#include <string.h>

int main()
//   printf("%lu\n", sizeof(sigcontext) ) ;

   sigcontext blah ;

"\x00\x00\x00\x00\x50\x0f\x51\xe8\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x2f\xf4\x9f\xe8\x09\x00\x00\x00\x16\x55\xed\x20\x07\x00\x00\x00\x6c\x7e\xb8\x00\x09\x00\x10\x00\xa3\x18\x41\x98\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0a\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x14\x00\x00\x00\x00\x00\x00\x00\x28\x00\x00\x00\x00\x00\x00\x00\x0a\x09\x00\x00\x00\x16\x55\xa6\x34\x00\x00\x00\x01\x13\xef\x78\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x07\x00\x00\x00\x52\xf9\x46\xe7\x00\x00\x00\x01\x16\x7f\x79\x44\x00\x00\x00\x01\x16\x7f\x7d\xdc\x00\x00\x00\x01\x16\x7f\x79\x1c\x00\x00\x00\x01\x16\x7f\xa4\xf8\x00\x00\x00\x01\x16\x7f\xbc\x68\x09\x00\x10\x00\xa2\xd3\xd0\x70\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x09\x00\x10\x00\xa0\xc4\x79\xb8\x09\x00\x10\x00\xa3\x17\x1c\xe8\x09\x00\x00\x00\x16\x64\x8c\xc8\xa0\x00\x00\x00\x00\x02\xd0\x32\x09\x00\x00\x00\x16\x55\xed\xe4\x09\x00\x00\x00\x16\x55\xed\x20\x09\x00\x00\x00\x00\x04\x8e\x00\x22\x00\x02\x42\x20\x00\x00\x04\x82\x02\x40\x00\x00\x00\x00\x00\x09\x00\x00\x00\x16\x55\xed\xe4\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0b\x3f\xe0\x00\x00\x00\x00\x00\x00\x43\x30\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x82\x06\x40\x00\x43\x30\x00\x00\x00\x00\x00\x00\x43\x30\x08\x00\x00\x00\x00\x47\x00\x00\x00\x00\x00\x00\x00\x00\x00\xa3\xd6\x15\x00\x00\x01\x00\x00\x00\x00\x01\x10\xd1\x7e\x08\x3f\xf1\x93\xee\x05\xbf\x40\x18\x3f\xd5\x55\x55\x55\x54\x50\xef\x3c\x7a\xbc\x9e\x3b\x39\x80\x3f\x3f\xe0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x83", sizeof(blah) ) ;

   return 0 ;

and some dbx output for that:

(dbx) p blah
(sc_onstack = 0x0, sc_mask = (ss_set = (0x80000, 0x0, 0x0, 0x0)), 
sc_uerror = 0x2, sc_jmpbuf = (jmp_context = (
gpr = (0x90000001655ed20, 
msr = 0xa00000000002d032, 
iar = 0x90000001655ede4, 
lr = 0x90000001655ed20,
ctr = 0x900000000048e00, 
cr = 0x22000242, 
xer = 0x20000004, ...
excp_type = 0x83)))

where sigcontext is a known type to the debugger. So our iar for the trap itself was:

(dbx) listi 0x90000001655ede4-8
0x90000001655eddc (foo+0x12bc) 80630300         lwz   r3,0x300(r3)
0x90000001655ede0 (foo+0x12c0) 7c630734       extsh   r3,r3
0x90000001655ede4 (foo+0x12c4) 08440200      tdllti   r4,0x200
0x90000001655ede8 (foo+0x12c8) b0640000         sth   r3,0x0(r4)
0x90000001655edec (foo+0x12cc) e88102c8          ld   r4,0x2c8(r1)
0x90000001655edf0 (foo+0x12d0) 08440200      tdllti   r4,0x200
0x90000001655edf4 (foo+0x12d4) 38600000          li   r3,0x0
0x90000001655edf8 (foo+0x12d8) b0640000         sth   r3,0x0(r4)
0x90000001655edfc (foo+0x12dc) e88102d0          ld   r4,0x2d0(r1)
0x90000001655ee00 (foo+0x12e0) 08440200      tdllti   r4,0x200

How about the return address?

(dbx) listi 0x90000001655ed20-8
0x90000001655ed18 (foo+0x11f8) 386300b4        addi   r3,0xb4(r3)
0x90000001655ed1c (foo+0x11fc) 4bffb6e5          bl   0x90000001655a400 (blah(short*,char*))
0x90000001655ed20 (foo+0x1200) 906100ac         stw   r3,0xac(r1)

This looks like a possible candidate:

   char  path[MAX_TYPE_LENGTH] = {'\0'};

   ossStrNCopy(path, file, MAX_PATH_LENGTH);

   // Get currently processed member number using dbpath
   // ----------------------------------------------------
   zrc = blah(&pScratchArea->initialMemberNumber, path);
   if (zrc != SQLO_OK)

The path parameter won’t be NULL, but pScratchArea could be, and we blow up on a trap instruction after processing the first parameter (gr3). Why the debugger (and our own homegrown stacktrace code) has so much trouble with this is a different digging game, but this is probably at least the trigger for the issue.

AN UPDATE: This turned out to have nothing to do with nested handlers, since the sigcontext_t::iar held all the info required, so the title should have been something different. What gave dbx and our own stacktrace code so much trouble was actually a stack corruption. Look very carefully at the strncpy that follows the char array declaration (the size vs. the specified length). The size of the buffer was wrong, AND, the strcpy should have been coded in the less error prone way:

   char  path[MAX_PATH_LENGTH] = {'\0'};

   ossStrNCopy(path, file, sizeof(path));

One Response to “Fun with AIX nested signal handlers.”

  1. Matt. W. said

    Thanks Peeter, this was a very helpful explanation.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: