I found some test code that did the following:
#if defined _AIX
#define HANDLE_ARRAY_SIZE OPEN_MAX
#else
#define HANDLE_ARRAY_SIZE 32768
#endif
int i = 0 ;
int fileHandleArray[HANDLE_ARRAY_SIZE] ;
memset( fileHandleArray, 0, sizeof(fileHandleArray) ) ;
do {
fileHandleArray[i] = open( "/dev/null", O_RDWR, 0 ) ;
} while ( fileHandleArray[i++] > 0 ) ;
Fun for all!
I think this test case must have been written assuming that the ulimit for number of file handles was some nice small number. This does a nice number on the stack, and was fairly non-obvious when looking in the debugger session.
The debugger shows me the test function (testgroup2) driving a trap in a called function:
(gdb) where
#0 0x00002aaab10419d6 in gtfPrint (gtfCBInfo=0x803700008036, piFile=0x2aaab1046c9c "sqo_fileapi.C", iLine=664,
piFormat=0x2aaab104835c "File handle value is is %d.\n") at gtfmod.h:226
#1 0x00002aaab1044035 in testgroup2 (gtfCBInfo=0x803700008036, poResult=0x803900008038) at sqo_fileapi.C:664
where my trap is on a dereference to a variable gtfCBInfo that was good earlier in the code.
if ( gtfCBInfo->logptr == NULL )
Yet, it was clear that there were no intervening assignments to gtfCBInfo. I was curious exactly how this corruption propagated once found, and that’s clear enough once you look at the assembly for a function call immediately after the corruption. See all the code that sets up the parameters for the call by fetching stack spills (the loads from (%rbp)):
0x00002aaab1043fed <+1231>: mov %eax,-0xb0(%rbp)
0x00002aaab1043ff3 <+1237>: mov -0×48(%rbp),%rax
0x00002aaab1043ff7 <+1241>: lea 0x2c9e(%rip),%rdx # 0x2aaab1046c9c
0x00002aaab1043ffe <+1248>: mov $0×298,%ecx
0x00002aaab1044003 <+1253>: lea 0×4352(%rip),%rbx # 0x2aaab104835c
0x00002aaab104400a <+1260>: mov -0xb0(%rbp),%esi
0x00002aaab1044010 <+1266>: mov %rax,%rdi
0x00002aaab1044013 <+1269>: mov %esi,-0xf8(%rbp)
0x00002aaab1044019 <+1275>: mov %rdx,%rsi
0x00002aaab104401c <+1278>: mov %rcx,%rdx
0x00002aaab104401f <+1281>: mov %rbx,%rcx
0x00002aaab1044022 <+1284>: mov -0xf8(%rbp),%eax
0x00002aaab1044028 <+1290>: mov %eax,%r8d
0x00002aaab104402b <+1293>: mov $0×0,%eax
0x00002aaab1044030 <+1298>: callq 0x2aaab1041120 <_Z8gtfPrintP5gtfCBPKcmS2_z@plt>
Because we’ve walked all over the neighbourhood of 0(%rbp ) with the open() calls, once we try to fetch them we start passing garbage. It’s so common to see stack corruptions manifest in even nastier ways (like a trap on return from a function call because we’ve wiped out the return address), so I was a bit suprised to see such a run of the mill seeming corruption here as a result. Perhaps if we hadn’t called any functions, we would have trapped on unwind instead?