Peeter Joot's (OLD) Blog.

Math, physics, perl, and programming obscurity.

Understanding a powerpc subfc, subfe sequence used to compare without branching.

Posted by peeterjoot on September 25, 2013

I was tasked to review some inline assembly, essentially like so:

Uint64 negativeLessThanX(Uint64 v0, Uint64 v1)
{
   Uint64 mask;
   __asm__ volatile ("subfc %0,%2,%1; subfe %0,%0,%0" \
           /* outputs */ : "=r"(mask) /* %0 */        \
     /* inputs */  : "r"(v0),   /* %1 */        \
                     "r"(v1)    /* %2 */        \
           /* clobbers */: "xer"      /* condition registers (CF, ...) */ \
    );
   return mask;
}

This should have the effect of doing:

Uint64 negativeLessThanX(Uint64 v0, Uint64 v1)
{
   Uint64 mask;

   subfc mask,v1,v0
   subfe mask, mask, mask

   return mask;
}

From the powerpc instruction set reference (PowerISA_V2.07_PUBLIC.pdf), our subfc, and subfe instructions are respectively ‘Subtract From Carrying XO-form’, ‘Subtract From Extended XO-form’ :

subfc RT,RA,RB (OE=0 Rc=0)
RT  <- ¬ (RA) + (RB) + 1 
(RT = RB - RA)

subfe RT,RA,RB (OE=0 Rc=0)
RT  <- ¬ (RA) + (RB) + CA

Since we have RA = RB in the subfe, and self plus complement is all bits set, we essentially have

RT  <- 0xFFFFFFFFFFFFFFFF + CA

Let’s walk through this in the debugger to understand it. We have:

(dbx) listi negativeLessThanX
0x100000a24 (negativeLessThanX(unsigned long,unsigned long)+0x24) 7c030010       subfc   r0,r3,r0
0x100000a28 (negativeLessThanX(unsigned long,unsigned long)+0x28) 7c000110       subfe   r0,r0,r0

So, we have r0 = r0 – r3. After this we have a r0 = r0 – r0, but also bringing in the carry flag (CA bit in XER). Let’s see this in the debugger, first with r0=2, r3=1 :

(dbx) stop in negativeLessThanX
[1] stop in negativeLessThanX(unsigned long,unsigned long)
(dbx) c
[1] stopped in negativeLessThanX(unsigned long,unsigned long) at line 6 in file "w.C" ($t1)
    6      __asm__ volatile ("subfc %0,%2,%1; subfe %0,%0,%0" \
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a20 ($t1)
0x100000a20 (negativeLessThanX(unsigned long,unsigned long)+0x20) e86100b8          ld   r3,0xb8(r1)
(dbx)
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a24 ($t1)
0x100000a24 (negativeLessThanX(unsigned long,unsigned long)+0x24) 7c030010       subfc   r0,r3,r0
(dbx) p $r0
0x0000000000000002
(dbx) p $r3
0x0000000000000001
(dbx) p $xer
0x0000000020000002
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a28 ($t1)
0x100000a28 (negativeLessThanX(unsigned long,unsigned long)+0x28) 7c000110       subfe   r0,r0,r0
(dbx) p $r0
0x0000000000000001
(dbx) p $xer
0x0000000020000002
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a2c ($t1)
0x100000a2c (negativeLessThanX(unsigned long,unsigned long)+0x2c) f8010070         std   r0,0x70(r1)
(dbx) p $r0

We see that the subfc does generate r0=1 as expected. The CA bit of the XER is ’34 Carry (CA)’, and out XER value is: 0b[0….]00100000000000000000000000000010. Bits 32, 33 are clear, but CA (34) is set. At first this seems curiously inverted. We have $r0 = v0-v1 > 0, so why is CA set?

How about with v0=1, v1=2:

(dbx)
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a24 ($t1)
0x100000a24 (negativeLessThanX(unsigned long,unsigned long)+0x24) 7c030010       subfc   r0,r3,r0
(dbx) p $r0
0x0000000000000001
(dbx) p $r3
0x0000000000000002
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a28 ($t1)
0x100000a28 (negativeLessThanX(unsigned long,unsigned long)+0x28) 7c000110       subfe   r0,r0,r0
(dbx) p $r0
0xffffffffffffffff
(dbx) p $xer
0x0000000000000013
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a2c ($t1)
0x100000a2c (negativeLessThanX(unsigned long,unsigned long)+0x2c) f8010070         std   r0,0x70(r1)
(dbx) p $r0
0xffffffffffffffff

The intermediate subtraction now produces a -1 (0xffffffffffffffff), but now we have CA clear from the subfc, since we see XER=0b[0….]00000000000000000000000000010011 (with bit 34 clear). Again, this seems backwards.

The trick to understanding this is that the subtract isn’t implemented as a subtraction, but an addition. For 2-1, where we don’t have to borrow, our subfc is actually doing:

~1 + 2 + 1:
 1110
+0010
+0001
=====
10001

No borrow is required, but we do generate a carry when doing this _addition_ operation!

Compare that to the a 1-2 operation:

~2 + 1 + 1:
 1101
+0001
+0001
=====
 1111

Also compare to a 1-1 operation:

~1 + 1 + 1:
 1110
+0001
+0001
=====
10000

We generate a carry (now borrow!) when v0=v1 and v0>v1. We do not generate a carry (CA clear) for v0<v1, so that the end result is that for v0<v1 we have -1, and 0 otherwise.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: