# Peeter Joot's (OLD) Blog.

• 333,079

## Understanding a powerpc subfc, subfe sequence used to compare without branching.

Posted by peeterjoot on September 25, 2013

I was tasked to review some inline assembly, essentially like so:

```Uint64 negativeLessThanX(Uint64 v0, Uint64 v1)
{
__asm__ volatile ("subfc %0,%2,%1; subfe %0,%0,%0" \
/* outputs */ : "=r"(mask) /* %0 */        \
/* inputs */  : "r"(v0),   /* %1 */        \
"r"(v1)    /* %2 */        \
/* clobbers */: "xer"      /* condition registers (CF, ...) */ \
);
}
```

This should have the effect of doing:

```Uint64 negativeLessThanX(Uint64 v0, Uint64 v1)
{

}
```

From the powerpc instruction set reference (PowerISA_V2.07_PUBLIC.pdf), our subfc, and subfe instructions are respectively ‘Subtract From Carrying XO-form’, ‘Subtract From Extended XO-form’ :

```subfc RT,RA,RB (OE=0 Rc=0)
RT  <- ¬ (RA) + (RB) + 1
(RT = RB - RA)

subfe RT,RA,RB (OE=0 Rc=0)
RT  <- ¬ (RA) + (RB) + CA
```

Since we have RA = RB in the subfe, and self plus complement is all bits set, we essentially have

```RT  <- 0xFFFFFFFFFFFFFFFF + CA
```

Let’s walk through this in the debugger to understand it. We have:

```(dbx) listi negativeLessThanX
0x100000a24 (negativeLessThanX(unsigned long,unsigned long)+0x24) 7c030010       subfc   r0,r3,r0
0x100000a28 (negativeLessThanX(unsigned long,unsigned long)+0x28) 7c000110       subfe   r0,r0,r0
```

So, we have r0 = r0 – r3. After this we have a r0 = r0 – r0, but also bringing in the carry flag (CA bit in XER). Let’s see this in the debugger, first with r0=2, r3=1 :

```(dbx) stop in negativeLessThanX
[1] stop in negativeLessThanX(unsigned long,unsigned long)
(dbx) c
[1] stopped in negativeLessThanX(unsigned long,unsigned long) at line 6 in file "w.C" (\$t1)
6      __asm__ volatile ("subfc %0,%2,%1; subfe %0,%0,%0" \
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a20 (\$t1)
0x100000a20 (negativeLessThanX(unsigned long,unsigned long)+0x20) e86100b8          ld   r3,0xb8(r1)
(dbx)
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a24 (\$t1)
0x100000a24 (negativeLessThanX(unsigned long,unsigned long)+0x24) 7c030010       subfc   r0,r3,r0
(dbx) p \$r0
0x0000000000000002
(dbx) p \$r3
0x0000000000000001
(dbx) p \$xer
0x0000000020000002
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a28 (\$t1)
0x100000a28 (negativeLessThanX(unsigned long,unsigned long)+0x28) 7c000110       subfe   r0,r0,r0
(dbx) p \$r0
0x0000000000000001
(dbx) p \$xer
0x0000000020000002
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a2c (\$t1)
0x100000a2c (negativeLessThanX(unsigned long,unsigned long)+0x2c) f8010070         std   r0,0x70(r1)
(dbx) p \$r0
```

We see that the subfc does generate r0=1 as expected. The CA bit of the XER is ’34 Carry (CA)’, and out XER value is: 0b[0….]00100000000000000000000000000010. Bits 32, 33 are clear, but CA (34) is set. At first this seems curiously inverted. We have \$r0 = v0-v1 > 0, so why is CA set?

```(dbx)
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a24 (\$t1)
0x100000a24 (negativeLessThanX(unsigned long,unsigned long)+0x24) 7c030010       subfc   r0,r3,r0
(dbx) p \$r0
0x0000000000000001
(dbx) p \$r3
0x0000000000000002
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a28 (\$t1)
0x100000a28 (negativeLessThanX(unsigned long,unsigned long)+0x28) 7c000110       subfe   r0,r0,r0
(dbx) p \$r0
0xffffffffffffffff
(dbx) p \$xer
0x0000000000000013
(dbx) stepi
stopped in negativeLessThanX(unsigned long,unsigned long) at 0x100000a2c (\$t1)
0x100000a2c (negativeLessThanX(unsigned long,unsigned long)+0x2c) f8010070         std   r0,0x70(r1)
(dbx) p \$r0
0xffffffffffffffff
```

The intermediate subtraction now produces a -1 (0xffffffffffffffff), but now we have CA clear from the subfc, since we see XER=0b[0….]00000000000000000000000000010011 (with bit 34 clear). Again, this seems backwards.

The trick to understanding this is that the subtract isn’t implemented as a subtraction, but an addition. For 2-1, where we don’t have to borrow, our subfc is actually doing:

```~1 + 2 + 1:
1110
+0010
+0001
=====
10001
```

No borrow is required, but we do generate a carry when doing this _addition_ operation!

Compare that to the a 1-2 operation:

```~2 + 1 + 1:
1101
+0001
+0001
=====
1111
```

Also compare to a 1-1 operation:

```~1 + 1 + 1:
1110
+0001
+0001
=====
10000
```

We generate a carry (now borrow!) when v0=v1 and v0>v1. We do not generate a carry (CA clear) for v0<v1, so that the end result is that for v0<v1 we have -1, and 0 otherwise.