A fun and curious dig. GCC generation of a ud2a instruction (SIGILL)

May 2010
M	T	W	T	F	S	S
	1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Posted by peeterjoot on May 26, 2010

Recently some of our code started misbehaving only when compiled with the GCC compiler. Our post mortem stacktrace and data collection tools didn’t deal with this trap very gracefully, and dealing with that (or even understanding it) is a different story.

What I see in the debugger once I find the guilty thread is:

(gdb) thread 12
[Switching to thread 12 (Thread 46970517317952 (LWP 30316))]#0  0x00002ab824438ec1 in __gxx_personality_v0 ()
    at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc:351
351     ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc: No such file or directory.
        in ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc
(gdb) where
#0  0x00002ab824438ec1 in __gxx_personality_v0 ()
    at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc:351
#1  0x00002ab824438cc9 in sleep () from /lib64/libc.so.6
#2  0x00002ab8203090ee in sqloEDUSleepHandler (signum=20, sigcode=0x2ab82cffa0c0, scp=0x2ab82cff9f90)
    at sqloinst.C:283
#3  
#4  0x00002ab81cf03231 in __gxx_personality_v0 ()
    at ../../../../gcc-4.2.2/libstdc++-v3/libsupc++/eh_personality.cc:351
#5  0x00002ab823b9b745 in ossSleep () from /home/hotel74/peeterj/sqllib/lib64/libdb2osse.so.1
#6  0x00002ab821206992 in pdInvokeCalloutScript () at /view/peeterj_m19/vbs/engn/include/sqluDMSort_inlines.h:158
#7  0x00002ab82030fe99 in sqloEDUCodeTrapHandler (signum=4, sigcode=0x2ab82cffcc60, scp=0x2ab82cffcb30)
    at sqloedu.C:4476
#8  
#9  0x00002ab821393257 in sqluInitLoadEDU (pPrivateACBIn=0x2059e0080, ppPrivateACBOut=0x2ab82cffd320,
    puchAuthID=0x2ab8fcef19b8 "PEETERJ ", pNLSACB=0x2ab8fceea168, pComCB=0x2ab8fceea080, pMemPool=0x2ab8fccca2d0)
    at sqluedus.C:1696
#10 0x00002ab8212d34c2 in sqluldat (pArgs=0x2ab82cffdef0 "", argsSize=96) at sqluldat.C:737
#11 0x00002ab820310ced in sqloEDUEntry (parms=0x2ab82f3e9680) at sqloedu.C:3438
#12 0x00002ab81cefc143 in start_thread () from /lib64/libpthread.so.0
#13 0x00002ab82446674d in clone () from /lib64/libc.so.6
#14 0x0000000000000000 in ?? ()

Observe that there are two sets of ” frames. One from the original SIGILL, and another one that our “main” thread ends up sending to all the rest of the threads as part of our process for freezing things to be able to take a peek and see what’s up.

Looking at the siginfo_t for the SIGILL handler we have:

(gdb) frame 7
#7  0x00002ab82030fe99 in sqloEDUCodeTrapHandler (signum=4, sigcode=0x2ab82cffcc60, scp=0x2ab82cffcb30)
    at sqloedu.C:4476
4476    sqloedu.C: No such file or directory.
        in sqloedu.C
(gdb) p *sigcode
$4 = {si_signo = 4, si_errno = 0, si_code = 2, _sifields = {_pad = {557396567, 10936, 0, 0, 1, 16777216,
      -1170923664, 10936, 754961616, 10936, 599153081, 10936, 0, 0, 15711488, 10752, 4, 0, -1170923664, 10936, 1, 0,
      0, 0, 754961680, 10936, 4292335, 0}, _kill = {si_pid = 557396567, si_uid = 10936}, _timer = {
      si_tid = 557396567, si_overrun = 10936, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {
      si_pid = 557396567, si_uid = 10936, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {
      si_pid = 557396567, si_uid = 10936, si_status = 0, si_utime = 72057594037927937, si_stime = 46972886392688},
    _sigfault = {si_addr = 0x2ab821393257}, _sigpoll = {si_band = 46970319745623, si_fd = 0}}}
(gdb) p /x *sigcode
$5 = {si_signo = 0x4, si_errno = 0x0, si_code = 0x2, _sifields = {_pad = {0x21393257, 0x2ab8, 0x0, 0x0, 0x1,
      0x1000000, 0xba351f70, 0x2ab8, 0x2cffccd0, 0x2ab8, 0x23b659b9, 0x2ab8, 0x0, 0x0, 0xefbd00, 0x2a00, 0x4, 0x0,
      0xba351f70, 0x2ab8, 0x1, 0x0, 0x0, 0x0, 0x2cffcd10, 0x2ab8, 0x417eef, 0x0}, _kill = {si_pid = 0x21393257,
      si_uid = 0x2ab8}, _timer = {si_tid = 0x21393257, si_overrun = 0x2ab8, si_sigval = {sival_int = 0x0,
        sival_ptr = 0x0}}, _rt = {si_pid = 0x21393257, si_uid = 0x2ab8, si_sigval = {sival_int = 0x0,
        sival_ptr = 0x0}}, _sigchld = {si_pid = 0x21393257, si_uid = 0x2ab8, si_status = 0x0,
      si_utime = 0x100000000000001, si_stime = 0x2ab8ba351f70}, _sigfault = {si_addr = 0x2ab821393257}, _sigpoll = {
      si_band = 0x2ab821393257, si_fd = 0x0}}}

This has got the si_addr value 0x00002AB821393257, which also matches frame 9 in the stack for sqluInitLoadEDU. What was at that line of code, doesn’t appear to be something that ought to generate a SIGILL:

   1693    // Set current activity in private agent CB to
   1694    // point to the activity that the EDU is working
   1695    // on behalf of.
   1696    pPrivateACB->agtRqstCB.pActivityCB = pComCB->my_curr_activity_entry;
   1697 #ifdef DB2_DEBUG
   1698    { //!!  This debug code is only useful in conjunction with a trap described by W749645
   1699       char mesg[500];
   1700       sprintf(mesg,"W749645:uILE pPr->agtR=%p ->pAct=%p",pPrivateACB->agtRqstCB,pPrivateACB->agtRqstCB.pActivi        tyCB);
   1701       sqlt_logerr_str(SQLT_SQLU, SQLT_sqluInitLoadEDU, __LINE__, mesg, NULL, 0, SQLT_FFSL_INF);
   1702    } //!!
   1703 #endif

So what is going on? Let’s look at the assembly for the trapping instruction address. Using ‘(gdb) set logging on’, and ‘(gdb) disassemble’ we find:

0x00002ab82139323c 
0x00002ab82139323e : mov    0xfffffffffffffd68(%rbp),%rax
0x00002ab821393245 : mov    0x6498(%rax),%rdx
0x00002ab82139324c : mov    0xffffffffffffffb0(%rbp),%rax
0x00002ab821393250 : mov    %rdx,0x5bd0(%rax)
0x00002ab821393257 : ud2a
^^^^^^^^^^^^^^^^^^
0x00002ab821393259 : cmpl   $0x0,0xffffffffffffffac(%rbp)
0x00002ab82139325d 
0x00002ab82139325f : mov    0xfffffffffffffd80(%rbp),%rdi
0x00002ab821393266 : callq  0x2ab81dcd4218 
0x00002ab82139326b : mov    0xffffffffffffffd8(%rbp),%rax
0x00002ab82139326f : and    $0x82,%eax
0x00002ab821393274 : test   %rax,%rax

Hmm. What is a ud2a instruction? Google is our friend and we find that the linux kernel uses this as a “guaranteed invalid instruction”. It is used to fault the processor and halt the kernel in case you did something really really bad.

Other similar references can be found, also explaining the use in the linux kernel. So what is this doing in userspace code? It seems like something too specific to get there by accident and since the instruction stream itself contains this stack corruption or any other sneaky nasty mechanism doesn’t seem likely. The instruction doesn’t immediately follow a callq, so a runtime loader malfunction or something else equally odd doesn’t seem likely.

Perhaps the compiler put this instruction into the code for some reason. A compiler bug perhaps? A new google search for GCC ud2a instruction finds me

   ...generates this warning (using gcc 4.4.1 but I think it applies to most
   gcc versions):

   main.cpp:12: warning: cannot pass objects of non-POD type .class A.
   through .....; call will abort at runtime

   1. Why is this a "warning" rather than an "error"? When I run the program
   it hits a "ud2a" instruction emitted by gcc and promptly hits SIGILL.

Oh my! It sounds like GCC has cowardly refused to generate an error, but also bravely refuses to generate bad code for whatever this code sequence is. Do I have such an error in my build log? In fact, I have three, all of which look like:

sqluedus.C:1464: warning: deprecated conversion from string constant to 'char*'
sqluedus.C:1700: warning: cannot pass objects of non-POD type 'struct sqlrw_request_cb' through '...'; call will abort at runtime

At 1700 of that file we have:

sprintf(mesg,"W749645:uILE pPr->agtR=%p ->pAct=%p",pPrivateACB->agtRqstCB,pPrivateACB->agtRqstCB.pActivityCB);

It turns out that agtRqstCB is a rather large structure, and certainly doesn’t match the %p that the developer used in this debug build special code. The debug code actually makes things worse, and certainly won’t help on any platform. It probably also won’t crash on any platform either (except when using the GCC compiler) since there are no subsequent %s format parameters that will get messed up by placing gob-loads of structure data in the varargs data area inappropriately.

This should resolve this issue and allow me to go back to avoiding the (much slower!) intel compiler that is used by our nightly build process.

This entry was posted on May 26, 2010 at 11:19 am and is filed under C/C++ development and debugging.. Tagged: cannot pass objects of non-POD type, disassembly, gcc, gdb, SIGILL, siginfo_t, ud2a instruction. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

15 Responses to “A fun and curious dig. GCC generation of a ud2a instruction (SIGILL)”

greg said

August 7, 2010 at 1:57 pm
I just wanted to thank you for this article. It just saved me alot of time…

Reply
- peeterjoot said
  
  August 7, 2010 at 3:10 pm
  It was fun to debug and seemed worthy of documenting, but I’m glad that it also helped.
  
  Reply
Daren Scot Wilson said

August 13, 2010 at 2:40 pm
Yes, I 2nd what Greg said.

This is very weird error, to be single-step debugging and see a “ud2a”, which I had never heard of, and be totally stuck as to why it’s there. I suspected my debugger of doing something buggy like maybe leaving a forgotten breakpoint there. Of course, like many programmers I don’t pay a whole lot of attention to compiler warnings. This article (and Google’s talent for finding it quickly) saved me a bunch of vexation and loss of hair.

Reply
Daren Scot Wilson said

August 13, 2010 at 2:42 pm
BTW, the origin of my trouble was I had fumble-fingeredly typed “%a” instead of “%s” in a printf format.

Reply
- Daren Scot Wilson said
  
  August 13, 2010 at 2:48 pm
  Er, no… I gave it a “pointer” instead of a pointer elsewhere. Well anyway, I’ll figure it out. Wouldn’t have had a clue without this article, in any case.
  
  Reply
Michael Podolsky said

September 23, 2010 at 6:04 pm
Hmmm… printing a a warning together with producing a code which unconditionally crashes the application is quite a weird approach.

Thanks for the article, it saved me a lot of time.

Reply
- Coda Highland said
  
  March 24, 2011 at 3:03 pm
  It seems to me that this is the only legal thing gcc can do. If printf is a real C varargs function, it’s impossible for the compiler to even generate warnings for it. Since gcc has special handling for printf and friends it can detect mistakes in the invocation. But since gcc has to PRETEND that printf is a real C varargs function for the sake of standards compatibility, it can’t generate an error at compile-time. But gcc *CAN* generate a printf implementation that produces errors during its run-time parameter checking, and so it does. The main point here is that if you put that mistake at the END of the parameter list, and don’t have an entry in the format string to read it, the abort doesn’t (or at least shouldn’t; I haven’t tested) trigger.
  
  Reply
  - peeterjoot said
    
    March 24, 2011 at 5:36 pm
    Newer versions of gcc are more reasonable. They just produce a compilation error instead of a runtime error.
Andrew said

May 24, 2011 at 12:22 pm
Thanks very much for posting this. I was up against the wall trying to figure out this SIGILL, when I saw the ud2a in the disassembly and decided to Google it. Your post was the first hit, and completely explained exactly what was going on. You *rock* :).

Reply
Kasreyn (@Kasreyn) said

December 29, 2011 at 9:07 am
Found this article the same way. The program would just terminate, “Killed” (by linux?). Finally got around to viewing the disassembly, not that I have a clue about half of it but “ud2a” did look suspicous.

Reply
Georg A. said

November 12, 2012 at 5:17 pm
Really helpful… The same thing happens with C++ and/or Qt, if you forget that a string(-object) is no longer just a char*. I had this:

QString err=….; // a real object, not just a object pointer…

printf_log(“Error: %s\n”, err); // instead of qPrintable(err)

So printing a harmless log message killed it all 😦

Reply
Articles and documents associated with peeterjoot.wordpress search results: Aug/2013 « Peeter Joot's Blog. said

August 25, 2013 at 10:52 pm
[…] ud2a: 1066 ud2a 635 ud2a instruction 213 ud2a assembly 36 assembly ud2a 29 intel ud2a 25 instruction ud2a 23 gcc ud2a 22 intel ud2a instruction 18 ud2a assembly instruction 14 ud2a gcc 13 ud2a sigill 13 ud2a x86 13 intel instruction ud2a 12 https://peeterjoot.wordpress.com/2010/05/26/a-fun-and-curious-dig-gcc-generation-of-a-ud2a-instructio… […]

Reply
program killed by signal 4 | HongQuan's Blog said

March 4, 2014 at 4:17 am
[…] 一阵瞎找以后看到了这篇文章里面解释了gcc在遇到format里面的参数类型和传入参数类型不匹配的时候报了warning,但是同时可能会产生出ud2a之类的代码，让程序在运行时挂掉。 […]

Reply
Michael said

March 9, 2015 at 8:05 am
Thanks a lot, this saved me a lot of time, too. 🙂

Reply
Moi Riba said

May 18, 2016 at 2:16 pm
Thank you! It saved lots of my time too.

Reply

	Determining the alig… on C structure alignment pad…
	Manas shetty on Cartesian to spherical change…
	peeterjoot on Derivative recurrence relation…
	Daniel Pires on Derivative recurrence relation…
	peeterjoot on Curious problem using the vari…

Peeter Joot's (OLD) Blog.

Math, physics, perl, and programming obscurity.

Categories

Archives

Recent Posts

Meta

Recent Comments

People not reading this blog: 7,179,979,522 minus:

Subscribe