Peeter Joot's (OLD) Blog.

Math, physics, perl, and programming obscurity.

Archive for the ‘debugging’ Category

turning off the gdb pager to collect gobs of info.

Posted by peeterjoot on September 30, 2009

I’ve got stuff interrupted with the debugger, so I can’t invoke our external tool to collect stacks. Since gdb doesn’t have redirect for most commands here’s how I was able to collect all my stacks, leaving my debugger attached:

(gdb) set height 0
(gdb) set logging on
(gdb) thread apply all where

Now I can go edit gdb.txt when it finishes (in the directory where I initially attached the debugger to my pid), and examine things. A small tip, but it took me 10 minutes to figure out how to do it (yet again), so it’s worth jotting down for future reference.

Posted in debugging | Tagged: | 2 Comments »

Making function calls within gdb.

Posted by peeterjoot on August 31, 2009

Here’s a quick debugging tidbit for the day, somewhat obscure, but it is a cool one and can be useful (I used it today)

Gdb has a particularily nice feature of being able to call arbitrary functions on the gdb command line, and print their output (if any)


(gdb) p getpid()
$8 = 6649
(gdb) p this->SAL_DumpMyStateToAFile()
[New Thread 47323035986240 (LWP 32200)]
$9 = void

You may have to move up and down your stack frames to find the context required to make the call, or to get the parameters you need in scope. You have to think about (or exploit) the side effects of the functions you call.

Somewhat like modification of variables in the debugger, this capability allows you to shoot yourself fairly easily, and but that’s part of the power.

I don’t recall if many other debuggers had this functionality. I have a vague recollection that the sun workshop’s dbx did too, but I could be wrong.

Posted in debugging | Tagged: , | Leave a Comment »

gdb on linux. Finding your damn thread.

Posted by peeterjoot on August 28, 2009

Suppose you are debugging a threaded process, and know that somewhere in there you have one of many threads that’s running the code you want to debug. How do you find it?

Listing the running threads isn’t terribly helpful if you’ve got a lot of them. You may see something unhelpful like:

(gdb) info threads
  30 Thread 47529939401216 (LWP 13827)  0x00002b3a5ff8e5c5 in pthread_join () from /lib64/
  29 Thread 47529945196864 (LWP 13831)  0x00002b3a66fd2baa in semtimedop () from /lib64/
  17 Thread 47530159106368 (LWP 14065)  0x00002b3a66fd2baa in semtimedop () from /lib64/
  16 Thread 47530150717760 (LWP 14067)  0x00002b3a66fd2baa in semtimedop () from /lib64/
  15 Thread 47530154912064 (LWP 14559)  0x00002b3a5ff94231 in __gxx_personality_v0 ()
    at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/
  14 Thread 47530146523456 (LWP 14561)  0x00002b3a66fc9476 in poll () from /lib64/
  13 Thread 47530142329152 (LWP 14564)  0x00002b3a66fc9476 in poll () from /lib64/
  12 Thread 47530138134848 (LWP 14580)  0x00002b3a66fc9476 in poll () from /lib64/
  11 Thread 47530133940544 (LWP 14581)  0x00002b3a66fc9476 in poll () from /lib64/
  10 Thread 47530129746240 (LWP 14582)  0x00002b3a66fd2baa in semtimedop () from /lib64/
  9 Thread 47530125551936 (LWP 14583)  0x00002b3a66fd2baa in semtimedop () from /lib64/
  8 Thread 47530121357632 (LWP 14584)  0x00002b3a66fd2baa in semtimedop () from /lib64/
  7 Thread 47530117163328 (LWP 14585)  0x00002b3a66fd2baa in semtimedop () from /lib64/
  6 Thread 47530112969024 (LWP 14586)  0x00002b3a66fd2baa in semtimedop () from /lib64/
  5 Thread 47530108774720 (LWP 14587)  0x00002b3a66fd2baa in semtimedop () from /lib64/
  4 Thread 47530104580416 (LWP 14588)  0x00002b3a66fd2baa in semtimedop () from /lib64/
  3 Thread 47530100386112 (LWP 14589)  0x00002b3a66fd2baa in semtimedop () from /lib64/
  2 Thread 47530096191808 (LWP 14590)  0x00002b3a66fd2baa in semtimedop () from /lib64/
  1 Thread 47530091997504 (LWP 14591)  0x00002b3a66fd2baa in semtimedop () from /lib64/

Unless you are running on a 128 way (and god help you if you have to actively debug with that kind of concurrency), most of your threads will be blocked all the time, stuck in a kernel or C runtime function, and only that shows at the top of the stack.

You can list the top frames of all your functions easily enough, doing something like:

(gdb) thread apply all where 4

Thread 30 (Thread 47529939401216 (LWP 13827)):
#0  0x00002b3a5ff8e5c5 in pthread_join () from /lib64/
#1  0x00002b3a6312e635 in sqloSpawnEDU (FuncPtr=0x2b3a6312bd7e ,
    pcArguments=0x7fff4ac3c380 "4'A", ulArgSize=24, pEDUInfo=0x7fff4ac3c340, pEDUid=0x7fff4ac3c3a0) at sqloedu.C:2206
#2  0x00002b3a6312e928 in sqloRunMainAsEDU (pFncPtr=0x412734 , argc=2, argv=0x7fff4ac3c4b8) at sqloedu.C:2445
#3  0x000000000041272c in main (argc=2, argv=0x7fff4ac3c4b8) at sqlesysc.C:1495

Thread 29 (Thread 47529945196864 (LWP 13831)):
#0  0x00002b3a66fd2baa in semtimedop () from /lib64/
#1  0x00002b3a63050880 in sqlo_waitlist::timeoutWait (this=0x2004807e0, timeout=10000)
    at /view/peeterj_kseq/vbs/engn/include/sqlowlst_inlines.h:557
#2  0x00002b3a6304eb1c in sqloWaitEDUWaitPost (pEDUWaitPostArea=0x200e90528, pUserPostCode=0x2b3a6d7d9170, timeOut=10000, flags=0)
    at sqlowaitpost.C:942
#3  0x00002b3a61b95e13 in sqeSyscQueueEdu::syscWaitRequest (this=0x200e904c0, reason=@0x2b3a6d7d9590) at sqlesyscqueue.C:510
(More stack frames follow...)

Thread 28 (Thread 47530205243712 (LWP 13894)):
#0  0x00002b3a5ff94231 in __gxx_personality_v0 () at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/
#1  0x00002b3a66722625 in ossSleep (milliseconds=1000) at osstime.C:204
#2  0x00002b3a6305f69d in sqloAlarmThreadEntry (pArgs=0x0, argSize=0) at sqloalarm.C:453
#3  0x00002b3a63131703 in sqloEDUEntry (parms=0x2b3a6d7d91d0) at sqloedu.C:3402
(More stack frames follow...)

then page through that output, and find what you are looking for, set breakpoints and start debugging, but that can be tedious.

A different way, which requires some preparation, is by dumping to a log file, the thread id. There’s still a gotcha for that though, and you can see in the ‘info threads’ output that the thread ids (what’s you’d get if you call and log the value of pthread_self()) are big ass hexadecimal values that aren’t particularily easy to find in the ‘info threads’ output. Note that pthread_self() will return the base address of the stack itself (or something close to it) on a number of platforms since this can be used as a unique identifier, and linux currently appears to do this (AIX no longer does since around 4.3).

Also observe that gdb prints out (LWP ….) values in the ‘info threads’ output. These are the Linux kernel Task values, roughly equivalent to a threads’s pid as far as the linux kernel is concerned (linux threads and processes are all types of “tasks” … threads just happen to share more than processes, like virtual memory and signal handlers and file descriptors). At the time of this writing there isn’t a super easy way to dump this task id, but a helper function of the following form will do the trick:

#include <sys/syscall.h>
      int GetMyKernelThreadId(void)
         return syscall(__NR_gettid);

You’ll probably have to put this code in a separate module from other stuff since kernel headers and C runtime headers don’t get along well. Having done that you can call this in your dumping code, like the output below tagged with the prefix KTID (i.e. what a DB2 developer will find in n-builds in the coral project db2diag.log).

2009-08-28- I735590E1447          LEVEL: Severe
PID     : 13827                TID  : 47530154912064 KTID : 14559
PROC    : db2sysc 0

This identifier is much easier to pick out in the ‘info threads’ output (and is in this case thread 15), so get to yourself up and debugging now requires just:

(gdb) thread 15
[Switching to thread 15 (Thread 47530154912064 (LWP 14559))]#0  0x00002b3a5ff94231 in __gxx_personality_v0 ()
    at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/
351     ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/ No such file or directory.
        in ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/
(gdb) where 5
#0  0x00002b3a5ff94231 in __gxx_personality_v0 () at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/
#1  0x00002b3a66722625 in ossSleep (milliseconds=100) at osstime.C:204
#2  0x00002b3a6a29e674 in traceCrash () from /home/hotel77/peeterj/sqllib/lib64/
#3  0x00002b3a66732b66 in _gtraceEntryVar (threadID=47530154912064, ecfID=423100446, eduID=25, eduIndex=3, pNargs=3)
    at gtrace.C:2130
#4  0x00002b3a61145ade in pdtEntry3 (ecfID=423100446, t1=423100418, s1=16, p1=0x2b3a8a01c478, t2=36, s2=8, p2=0x2b3a79fde418,
    t3=3, s3=8, p3=0x2b3a79fde410) at pdtraceapi.C:2012
(More stack frames follow...)

Posted in debugging | Tagged: , , | 3 Comments »

A gdb tip for threaded debugging. scheduler locking.

Posted by peeterjoot on August 25, 2009

Threaded debugging can behave oddly on a number of platforms. In particular the next command can behave oddly. Here’s an example:

(gdb) n
29375         m_pConnEntry->sqleCaCeGetCachedRegisteredMemory( &m_mem, &m_pRegistration ) ;
(gdb) n
[New Thread 47757037398336 (LWP 5917)]

Breakpoint 2, SAL_SA_HANDLE::SAL_ReadSAList (this=0x2b6f54b75420, pSaPageName=0x2b6f4ebfcfb0, ecfFunc=423100625,
    ecfProbe=4262, pBuffer=0x2b6f4ebece70 "01", pEntryCount=0x2b6f4ebfcfa4, bReadAllSAs=true,
    pSALReadSAListIterator=0x2b6f4ebfce70) at sqle_ca_cmd.C:29629
29629      pdTraceExit( SQLT_SAL_SA_HANDLE__SAL_ReadSAList, rc ) ;
(gdb) q

I was in the function SAL_ReadSAList, single stepping line-by-line (I happened to also have a breakpoint set at the end of the function, but didn’t expect to hit it) … then BAM, a new thread got started up and my next command somehow gets interpreted by the debugger as “next, but maybe something else random for another thread too”. Pain in the butt, and now I’ve lost the context I wanted to debug, and have to start over.

On platforms that support it, the gdb scheduler locking feature works nicely. Here’s an example:

(gdb) b 4233
Breakpoint 1 at 0x2b3bbe7285eb: file testca.C, line 4233.
(gdb) signal SIGCONT
Continuing with signal SIGCONT.

Breakpoint 1, SA_TESTS (n1=1, flags=1, init=18446744073709551615, next=18446744073709551615) at testca.C:4239
4239                                                                           &i ) ;
(gdb) set scheduler-locking on
(gdb) n
4241          if ( rc )

Get the debugger to where I wanted (which in this case included raising a SIGCONT to get out of a pause() call where I’d forced things to stop). Once I’m there, turn on the scheduler lock so only my thread runs, then can step through my code without worrying about other threads interfering.

Note that this can also be a good way to lock up a threaded program. If your thread blocks on something like a pthread_mutex, and you haven’t let other threads run, you’ll get it stuck there without intervention (i.e. set scheduler-locking off) when you are done.

Posted in debugging | Tagged: , | 2 Comments »

gdb signal SIGCONT and pause()

Posted by peeterjoot on July 15, 2009

Anybody used to debugging on AIX is probably familiar with the convienence of the handy pause() function. Put a call to this in your program where you want to get the debugger attached, and let it run. Once you attach dbx you can ‘next’ past the pause() function to the code you want to debug.

This doesn’t work on Linux, and I found out the magic trick today. Attach gdb, find your thread, and set a breakpoint at a good spot past the pause() call. Then, do:

(gdb) signal SIGCONT

Voila. You’ll then hit your breakpoint, and can find the bug or walk through the code and learn it’s behaviour, …

Note that unlike the kill command, gdb doesn’t take cont or sigcont as allowable variations of SIGCONT.

EDIT: I can no longer get this to work with gdb 7.0. If anybody knows why I would be interested.

Posted in debugging | Tagged: , | Leave a Comment »