Suppose you are debugging a threaded process, and know that somewhere in there you have one of many threads that’s running the code you want to debug. How do you find it?
Listing the running threads isn’t terribly helpful if you’ve got a lot of them. You may see something unhelpful like:
(gdb) info threads
30 Thread 47529939401216 (LWP 13827) 0x00002b3a5ff8e5c5 in pthread_join () from /lib64/libpthread.so.0
29 Thread 47529945196864 (LWP 13831) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
...
17 Thread 47530159106368 (LWP 14065) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
16 Thread 47530150717760 (LWP 14067) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
15 Thread 47530154912064 (LWP 14559) 0x00002b3a5ff94231 in __gxx_personality_v0 ()
at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc:351
14 Thread 47530146523456 (LWP 14561) 0x00002b3a66fc9476 in poll () from /lib64/libc.so.6
13 Thread 47530142329152 (LWP 14564) 0x00002b3a66fc9476 in poll () from /lib64/libc.so.6
12 Thread 47530138134848 (LWP 14580) 0x00002b3a66fc9476 in poll () from /lib64/libc.so.6
11 Thread 47530133940544 (LWP 14581) 0x00002b3a66fc9476 in poll () from /lib64/libc.so.6
10 Thread 47530129746240 (LWP 14582) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
9 Thread 47530125551936 (LWP 14583) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
8 Thread 47530121357632 (LWP 14584) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
7 Thread 47530117163328 (LWP 14585) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
6 Thread 47530112969024 (LWP 14586) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
5 Thread 47530108774720 (LWP 14587) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
4 Thread 47530104580416 (LWP 14588) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
3 Thread 47530100386112 (LWP 14589) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
2 Thread 47530096191808 (LWP 14590) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
1 Thread 47530091997504 (LWP 14591) 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
Unless you are running on a 128 way (and god help you if you have to actively debug with that kind of concurrency), most of your threads will be blocked all the time, stuck in a kernel or C runtime function, and only that shows at the top of the stack.
You can list the top frames of all your functions easily enough, doing something like:
(gdb) thread apply all where 4
Thread 30 (Thread 47529939401216 (LWP 13827)):
#0 0x00002b3a5ff8e5c5 in pthread_join () from /lib64/libpthread.so.0
#1 0x00002b3a6312e635 in sqloSpawnEDU (FuncPtr=0x2b3a6312bd7e ,
pcArguments=0x7fff4ac3c380 "4'A", ulArgSize=24, pEDUInfo=0x7fff4ac3c340, pEDUid=0x7fff4ac3c3a0) at sqloedu.C:2206
#2 0x00002b3a6312e928 in sqloRunMainAsEDU (pFncPtr=0x412734 , argc=2, argv=0x7fff4ac3c4b8) at sqloedu.C:2445
#3 0x000000000041272c in main (argc=2, argv=0x7fff4ac3c4b8) at sqlesysc.C:1495
Thread 29 (Thread 47529945196864 (LWP 13831)):
#0 0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
#1 0x00002b3a63050880 in sqlo_waitlist::timeoutWait (this=0x2004807e0, timeout=10000)
at /view/peeterj_kseq/vbs/engn/include/sqlowlst_inlines.h:557
#2 0x00002b3a6304eb1c in sqloWaitEDUWaitPost (pEDUWaitPostArea=0x200e90528, pUserPostCode=0x2b3a6d7d9170, timeOut=10000, flags=0)
at sqlowaitpost.C:942
#3 0x00002b3a61b95e13 in sqeSyscQueueEdu::syscWaitRequest (this=0x200e904c0, reason=@0x2b3a6d7d9590) at sqlesyscqueue.C:510
(More stack frames follow...)
Thread 28 (Thread 47530205243712 (LWP 13894)):
#0 0x00002b3a5ff94231 in __gxx_personality_v0 () at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc:351
#1 0x00002b3a66722625 in ossSleep (milliseconds=1000) at osstime.C:204
#2 0x00002b3a6305f69d in sqloAlarmThreadEntry (pArgs=0x0, argSize=0) at sqloalarm.C:453
#3 0x00002b3a63131703 in sqloEDUEntry (parms=0x2b3a6d7d91d0) at sqloedu.C:3402
(More stack frames follow...)
...
then page through that output, and find what you are looking for, set breakpoints and start debugging, but that can be tedious.
A different way, which requires some preparation, is by dumping to a log file, the thread id. There’s still a gotcha for that though, and you can see in the ‘info threads’ output that the thread ids (what’s you’d get if you call and log the value of pthread_self()) are big ass hexadecimal values that aren’t particularily easy to find in the ‘info threads’ output. Note that pthread_self() will return the base address of the stack itself (or something close to it) on a number of platforms since this can be used as a unique identifier, and linux currently appears to do this (AIX no longer does since around 4.3).
Also observe that gdb prints out (LWP ….) values in the ‘info threads’ output. These are the Linux kernel Task values, roughly equivalent to a threads’s pid as far as the linux kernel is concerned (linux threads and processes are all types of “tasks” … threads just happen to share more than processes, like virtual memory and signal handlers and file descriptors). At the time of this writing there isn’t a super easy way to dump this task id, but a helper function of the following form will do the trick:
#include <sys/syscall.h>
int GetMyKernelThreadId(void)
{
return syscall(__NR_gettid);
}
You’ll probably have to put this code in a separate module from other stuff since kernel headers and C runtime headers don’t get along well. Having done that you can call this in your dumping code, like the output below tagged with the prefix KTID (i.e. what a DB2 developer will find in n-builds in the coral project db2diag.log).
2009-08-28-13.00.24.791416-240 I735590E1447 LEVEL: Severe
PID : 13827 TID : 47530154912064 KTID : 14559
PROC : db2sysc 0
This identifier is much easier to pick out in the ‘info threads’ output (and is in this case thread 15), so get to yourself up and debugging now requires just:
(gdb) thread 15
[Switching to thread 15 (Thread 47530154912064 (LWP 14559))]#0 0x00002b3a5ff94231 in __gxx_personality_v0 ()
at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc:351
351 ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc: No such file or directory.
in ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc
(gdb) where 5
#0 0x00002b3a5ff94231 in __gxx_personality_v0 () at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc:351
#1 0x00002b3a66722625 in ossSleep (milliseconds=100) at osstime.C:204
#2 0x00002b3a6a29e674 in traceCrash () from /home/hotel77/peeterj/sqllib/lib64/libdb2trcapi.so.1
#3 0x00002b3a66732b66 in _gtraceEntryVar (threadID=47530154912064, ecfID=423100446, eduID=25, eduIndex=3, pNargs=3)
at gtrace.C:2130
#4 0x00002b3a61145ade in pdtEntry3 (ecfID=423100446, t1=423100418, s1=16, p1=0x2b3a8a01c478, t2=36, s2=8, p2=0x2b3a79fde418,
t3=3, s3=8, p3=0x2b3a79fde410) at pdtraceapi.C:2012
(More stack frames follow...)