Peeter Joot's (OLD) Blog.

Math, physics, perl, and programming obscurity.

Posts Tagged ‘linux’

building a private version of gdb on a machine that has an older version.

Posted by peeterjoot on November 23, 2009

We have SLES10 linux machines, and the gdb version available on them is a old (so old that it no longer works with the version of the intel compiler that we use to build our product). Here’s a quick cheatsheet on how to download and install a newer version of gdb for private use, without having to have root privileges or replace the default version on the machine:

mkdir -p ~/tmp/gdb
cd ~/tmp/gdb
wget http://ftp.gnu.org/gnu/gdb/gdb-7.2.tar.bz2
bzip2 -dc gdb-7.2.tar.bz2 | tar -xf -
mkdir g
cd g
../gdb-7.2/configure --prefix=$HOME/gdb
make
make install

Executing these leaves you with a private version of gdb in ~/gdb/bin/gdb that works with newer intel compiled code.

This version of gdb has some additional features (relative to 6.8 that we have on our machines) that also look interesting:

  •  disassemble start,+length looks very handy (grab just the disassembly that is of interest, or when the whole thing is desired, not more hacking around with the pager depth to get it all).
  • save and restore breakpoints.
  • current thread number variable $_thread
  • trace state variables (7.1), and fast tracepoints (will have to try that).
  • detached tracing
  • multiple program debugging (although I’m not sure I’d want that, especially when just one multi-threaded program can be pretty hairy to debug).  I recall many times when dbx would crash AIX with follow fork.  I wonder if other operating systems deal with this better?
  • reverse debugging, so that you can undo changes!  This is said to be target dependent.  I wonder if amd64 is supported?
  • catch syscalls.  I’ve seen some times when the glibc dynamic loader appeared to be able to exit the process, and breaking on exit, _exit, __exit did nothing.  I wonder if the exit syscall would catch such an issue.
  • find.  Search memory for a sequence of bytes.

Posted in Development environment | Tagged: , | Leave a Comment »

grep with a range using a perl one liner.

Posted by peeterjoot on September 23, 2009

Here’s a small shell scripting problem. I have grep output that I’d like further filtered

# ./displayInfo | grep peeterj
Offline RG:blah_peeterj_0-rg
Offline RG:blah_peeterj_0-rg_MLN-rg
Offline RG:idle_peeterj_998_goo-rg
Offline RG:primary_peeterj_900-rg
Online Equivalency:blah_peeterj_0-rg_MLN-rg_group-equ
...
MORE STUFF I DON'T CARE ABOUT.
...
Online Equivalency:instancehost_peeterj-equ
        '- Online APP:instancehost_peeterj_goo:goo
Online Equivalency:primary_peeterj_900-rg_group-equ

peeterj is my userid and the command in question that generates this has output for all the other userids on the system. I’m only interested in the subset of the info that has my name, hence the grep.

However, once the text Equivalency is displayed I’m not interested in any more of it. I could filter out all the patterns that I’m also not interested in doing something like:

# ./displayInfo | grep peeterj | grep -v -e Equivalency -e instancehost -e ...

but this is a bit cumbersome. An alternative is filtering specifically on what I want

# ./displayInfo | grep -e 'peeterj.*blah' -e 'peeterj.*idle' -e 'primary_peeterj' -e ...

but that also means I have to know and enumerate all such expressions for what I’m interested in. Since I’m on Linux my grep is gnu-grep, so I considered using ‘grep -B N’ to show N lines of text precededing a match, but this also outputs the text I’m not interested in so doesn’t really work.

Here’s what I came up with (I’m sure there’s lot of ways, some perhaps easier, but I liked this one). It uses the perl filtering option -n once again, to convert the entire script into a filter (specified here inline with -e instead of in a file) :

# ./displayInfo | perl -n -e 'next unless (/$ENV{USER}/) ; last if ( /Equivalency/ ) ; print ;'

Basically, this one liner is as if I’d written the following perl script to read and process all of STDIN:

#/usr/bin/perl

while (<>)
{
   my $line = $_ ;
   next unless ($line =~ /$ENV{USER}/) ; # "grep" for my userid (peeterj).

   # Won't get here until I've started seeing the peeterj text.
   last if ( $line =~ /Equivalency/ ) ;         # stop when I've seen enough.

   # If I get this far print the "matched" line as is:
   print $line ;
}

Posted in perl and general scripting hackery | Tagged: , , | Leave a Comment »

gdb on linux. Finding your damn thread.

Posted by peeterjoot on August 28, 2009

Suppose you are debugging a threaded process, and know that somewhere in there you have one of many threads that’s running the code you want to debug. How do you find it?

Listing the running threads isn’t terribly helpful if you’ve got a lot of them. You may see something unhelpful like:

(gdb) info threads
  30 Thread 47529939401216 (LWP 13827)  0x00002b3a5ff8e5c5 in pthread_join () from /lib64/libpthread.so.0
  29 Thread 47529945196864 (LWP 13831)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
...
  17 Thread 47530159106368 (LWP 14065)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
  16 Thread 47530150717760 (LWP 14067)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
  15 Thread 47530154912064 (LWP 14559)  0x00002b3a5ff94231 in __gxx_personality_v0 ()
    at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc:351
  14 Thread 47530146523456 (LWP 14561)  0x00002b3a66fc9476 in poll () from /lib64/libc.so.6
  13 Thread 47530142329152 (LWP 14564)  0x00002b3a66fc9476 in poll () from /lib64/libc.so.6
  12 Thread 47530138134848 (LWP 14580)  0x00002b3a66fc9476 in poll () from /lib64/libc.so.6
  11 Thread 47530133940544 (LWP 14581)  0x00002b3a66fc9476 in poll () from /lib64/libc.so.6
  10 Thread 47530129746240 (LWP 14582)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
  9 Thread 47530125551936 (LWP 14583)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
  8 Thread 47530121357632 (LWP 14584)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
  7 Thread 47530117163328 (LWP 14585)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
  6 Thread 47530112969024 (LWP 14586)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
  5 Thread 47530108774720 (LWP 14587)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
  4 Thread 47530104580416 (LWP 14588)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
  3 Thread 47530100386112 (LWP 14589)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
  2 Thread 47530096191808 (LWP 14590)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
  1 Thread 47530091997504 (LWP 14591)  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6

Unless you are running on a 128 way (and god help you if you have to actively debug with that kind of concurrency), most of your threads will be blocked all the time, stuck in a kernel or C runtime function, and only that shows at the top of the stack.

You can list the top frames of all your functions easily enough, doing something like:

(gdb) thread apply all where 4

Thread 30 (Thread 47529939401216 (LWP 13827)):
#0  0x00002b3a5ff8e5c5 in pthread_join () from /lib64/libpthread.so.0
#1  0x00002b3a6312e635 in sqloSpawnEDU (FuncPtr=0x2b3a6312bd7e ,
    pcArguments=0x7fff4ac3c380 "4'A", ulArgSize=24, pEDUInfo=0x7fff4ac3c340, pEDUid=0x7fff4ac3c3a0) at sqloedu.C:2206
#2  0x00002b3a6312e928 in sqloRunMainAsEDU (pFncPtr=0x412734 , argc=2, argv=0x7fff4ac3c4b8) at sqloedu.C:2445
#3  0x000000000041272c in main (argc=2, argv=0x7fff4ac3c4b8) at sqlesysc.C:1495

Thread 29 (Thread 47529945196864 (LWP 13831)):
#0  0x00002b3a66fd2baa in semtimedop () from /lib64/libc.so.6
#1  0x00002b3a63050880 in sqlo_waitlist::timeoutWait (this=0x2004807e0, timeout=10000)
    at /view/peeterj_kseq/vbs/engn/include/sqlowlst_inlines.h:557
#2  0x00002b3a6304eb1c in sqloWaitEDUWaitPost (pEDUWaitPostArea=0x200e90528, pUserPostCode=0x2b3a6d7d9170, timeOut=10000, flags=0)
    at sqlowaitpost.C:942
#3  0x00002b3a61b95e13 in sqeSyscQueueEdu::syscWaitRequest (this=0x200e904c0, reason=@0x2b3a6d7d9590) at sqlesyscqueue.C:510
(More stack frames follow...)

Thread 28 (Thread 47530205243712 (LWP 13894)):
#0  0x00002b3a5ff94231 in __gxx_personality_v0 () at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc:351
#1  0x00002b3a66722625 in ossSleep (milliseconds=1000) at osstime.C:204
#2  0x00002b3a6305f69d in sqloAlarmThreadEntry (pArgs=0x0, argSize=0) at sqloalarm.C:453
#3  0x00002b3a63131703 in sqloEDUEntry (parms=0x2b3a6d7d91d0) at sqloedu.C:3402
(More stack frames follow...)
...

then page through that output, and find what you are looking for, set breakpoints and start debugging, but that can be tedious.

A different way, which requires some preparation, is by dumping to a log file, the thread id. There’s still a gotcha for that though, and you can see in the ‘info threads’ output that the thread ids (what’s you’d get if you call and log the value of pthread_self()) are big ass hexadecimal values that aren’t particularily easy to find in the ‘info threads’ output. Note that pthread_self() will return the base address of the stack itself (or something close to it) on a number of platforms since this can be used as a unique identifier, and linux currently appears to do this (AIX no longer does since around 4.3).

Also observe that gdb prints out (LWP ….) values in the ‘info threads’ output. These are the Linux kernel Task values, roughly equivalent to a threads’s pid as far as the linux kernel is concerned (linux threads and processes are all types of “tasks” … threads just happen to share more than processes, like virtual memory and signal handlers and file descriptors). At the time of this writing there isn’t a super easy way to dump this task id, but a helper function of the following form will do the trick:

#include <sys/syscall.h>
      int GetMyKernelThreadId(void)
      {
         return syscall(__NR_gettid);
      }

You’ll probably have to put this code in a separate module from other stuff since kernel headers and C runtime headers don’t get along well. Having done that you can call this in your dumping code, like the output below tagged with the prefix KTID (i.e. what a DB2 developer will find in n-builds in the coral project db2diag.log).

2009-08-28-13.00.24.791416-240 I735590E1447          LEVEL: Severe
PID     : 13827                TID  : 47530154912064 KTID : 14559
PROC    : db2sysc 0

This identifier is much easier to pick out in the ‘info threads’ output (and is in this case thread 15), so get to yourself up and debugging now requires just:

(gdb) thread 15
[Switching to thread 15 (Thread 47530154912064 (LWP 14559))]#0  0x00002b3a5ff94231 in __gxx_personality_v0 ()
    at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc:351
351     ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc: No such file or directory.
        in ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc
(gdb) where 5
#0  0x00002b3a5ff94231 in __gxx_personality_v0 () at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc:351
#1  0x00002b3a66722625 in ossSleep (milliseconds=100) at osstime.C:204
#2  0x00002b3a6a29e674 in traceCrash () from /home/hotel77/peeterj/sqllib/lib64/libdb2trcapi.so.1
#3  0x00002b3a66732b66 in _gtraceEntryVar (threadID=47530154912064, ecfID=423100446, eduID=25, eduIndex=3, pNargs=3)
    at gtrace.C:2130
#4  0x00002b3a61145ade in pdtEntry3 (ecfID=423100446, t1=423100418, s1=16, p1=0x2b3a8a01c478, t2=36, s2=8, p2=0x2b3a79fde418,
    t3=3, s3=8, p3=0x2b3a79fde410) at pdtraceapi.C:2012
(More stack frames follow...)

Posted in debugging | Tagged: , , | 3 Comments »