Peeter Joot's (OLD) Blog.

Math, physics, perl, and programming obscurity.

Archive for July, 2010

A byte reversal cheatsheet.

Posted by peeterjoot on July 29, 2010


#include <stdio.h>
#include "osstype.h"

int main()
{
   union {
      Uint64 u64 ;
      Uint32 u32[sizeof(Uint64)/sizeof(Uint32)] ;
      Uint16 u16[sizeof(Uint64)/sizeof(Uint16)] ;
      Uint8 u8[sizeof(Uint64)/sizeof(Uint8)] ;
   } u ;

   u.u64 = U64(0x0123456789abcdef) ;

   printf( "Uint64: 0x%016llX\n", u.u64 ) ;

   printf( "Uint32: " ) ;
   for ( int i = 0 ; i < sizeof(Uint64)/sizeof(Uint32) ; i++ )
   {
      printf( "0x%08X ", (unsigned)u.u32[i] & 0xFFFFFFFF ) ;
   }
   printf( "\n" ) ;

   printf( "Uint16: " ) ;
   for ( int i = 0 ; i < sizeof(Uint64)/sizeof(Uint16) ; i++ )
   {
      printf( "0x%04X ", (unsigned)u.u16[i] & 0xFFFF ) ;
   }
   printf( "\n" ) ;

   printf( "Uint8: " ) ;
   for ( int i = 0 ; i < sizeof(Uint64)/sizeof(Uint8) ; i++ )
   {
      printf( "0x%02X ", (unsigned)u.u8[i] & 0xFF ) ;
   }
   printf( "\n" ) ;

   return 0 ;
}

output when run on a little endian platform:

Uint64: 0x0123456789ABCDEF
Uint32: 0x89ABCDEF 0x01234567
Uint16: 0xCDEF 0x89AB 0x4567 0x0123
Uint8: 0xEF 0xCD 0xAB 0x89 0x67 0x45 0x23 0x01

Posted in C/C++ development and debugging. | Tagged: , | Leave a Comment »

The point of an isync (or similar memory barrier)

Posted by peeterjoot on July 29, 2010

The following simple code fragment was used today when asked about the effect of an isync

v = atomic.fetchAndAdd( 0 ) ;

__isync() ;

if ( v == w )
{
   foo( pSharedMem->blah ) ;
} 

The isync will prevent loads and stores that occur after the atomic no-op operation from occuring ahead of time. For example, with the isync omitted, it could be as if the following code was executed instead:

blahValue = pSharedMem->blah ;

v = atomic.fetchAndAdd( 0 ) ;

if ( v == w )
{
   foo( blahValue ) ;
} 

If you want the pSharedMem->blah to happen strictly after the atomic read, then you’ll need an isync in between.

The excellent question was posed: “if pSharedMem->blah was declared volatile is the compiler allowed to still do this reordering?”

It is important to note that the code reordering of interest here may not actually have anything do with the compiler.

The compiler can generate the code in program order, but that doesn’t have to be the order that the hardware “executes it”.

That also doesn’t mean that you don’t may not also have to do something special to get the code to be generated in program order. For most platforms the compiler atomic intrinics are also implicit instruction scheduling fences, or have variations that include such fences to prevent reordering. The HP compiler is one where instruction scheduling fences for atomic instructions are not the default, so one has to add these explicitly in the implementation of an atomic library API.

Posted in C/C++ development and debugging. | Tagged: , , , , , | Leave a Comment »

Uhaul Markham: an example of an amature business.

Posted by peeterjoot on July 28, 2010

I’ve had my first, and last dealings with Uhaul Markham, and probably Uhaul period.  I’m not impressed.  It looks like just about anybody can be a Uhaul dealer.  If you walk into the Uhaul on Markham Road, you’ll probably act like me, and walk out.  “Oh, that can’t be the right place … it looks like a tile company.”  You can then look around, see that there’s nothing else left, and go back in after all.

My reason for wanting a Uhaul truck was for my move, as is probably the case for most people.  Once I faced the fact that the guy in the tile store was in fact probably also the Uhaul dealer, I go to talk to this guy.  He’s a surly fellow, and very incommunicative.

I’d like to rent the 17’ truck for Saturday July 31st

Hmm.  I don’t know, that’s a busy day.  …. putz away at the computer, not looking at me (something he hardly did for the whole transaction).

“Nope.  Nothing available for that day.  You can reserve it anyways and somebody will call you if it is available at another location”.

How about a different size truck, for the same day?

… putz on computer, again, not ever really looking at me.  “I can give you a trailer or a small van.”

How about Sunday?

“We are closed Sunday.”  No attempt to be helpful in any way (like suggest a Saturday night pickup, something that I was later told was normal for rental companies closed on Sundays).

How about Monday?

Grimaces, saying nothing.  One is left to interpolate his reaction, and it appears to be “Oh come on, don’t make me work”.  Putzes on his computer.  “Nope.”  Nothing more than that.  He didn’t say, “I’m sorry, that is the civic holiday, and we are closed”.  Just No, end of story.  No clarification.

In the end I made the Saturday reservation, hoping that I’d get a call that something was available at another location.

So, later I get the call that there’s nothing available at any of the Markham locations, and deal with the fact that I have to change my planned moving day.  So be it.  My fault for leaving the reservation too late.  I ask for the 17’ truck from any location on Sunday when talking to the regional Uhaul office.  At least that guy (AJ) you can talk to, and he tells me that there are trucks available that day, but all the Uhaul locations in Markham are closed on Sundays (something that is surprising to me since I’d figure that most people move on weekends.)  I ask about Monday, and find out that yes, there’s a truck available on Monday, and that they can deliver it to my local branch for me to pick up.

Mr grumpy at Uhaul Markham calls me back the next day to confirm my pickup for Monday August 2.  Turns out that the truck I’m getting is also a 26’ truck instead of the 17’ that I asked for.  I’m told that nothing else was available, but “no extra charge.” – oh well, I’ll deal with driving a too big truck.  At least I don’t have to go far.

I’d gotten help offers from friends for moving muscle, and had made the mistake of telling them my initial plan of a Saturday moving day before I’d reserved the truck.  Luckily they were still mostly all available for the Monday.

It’s not ideal for me, but if I take a step back, it actually ends up working out nicely, since it gives me the weekend to do last minute packing, and Tuesday to unpack.

I end up confirming the reservation, and figure this is a done deal.

Next chapter.  Around two weeks later.

Uhaul regional calls me, and says that the local branch has changed the reservation from August 2 to August 3 because they are closed that day.  WTF?  Both the regional office and the local guy had called me previously to confirm the reservation.

I change the reservation day, and luckily that day still works for those who’d said they’d help me.  But I’m not impressed with the amateur nature of the company, particularly the local walk in branch I dealt with at Markham Road and 16th.  My grumpy at the local branch says that he told me they were closed for the long weekend when I’d asked about Monday.  Perhaps he said so at some non-memorable and non-sequiter part of the conversation when I was asking about a Saturday reservation.  I don’t remember any such comment.  I do remember that it was hard to get any communication from him.  It had to be extracted, and didn’t come naturally.  If he had told me, it was probably under his breath, facing the computer, not me.  This is a guy who appears to take no pride in his job, nor care about it, or even pretend to for the sake of the business when dealing with customers.

Even if he had told me, this company is so disorganized that the regional office doesn’t know the hours of the local branches, and the local branches don’t know their days of business when they are booking reservations.  I got two reservation confirmations for a day of business that they were not open on!

My conclusion is that Uhaul is not a business run for customers.  It’s a franchise for confused businesses that also do other stuff and have some parking space that they’d like to squeeze a bit of extra money out of.  I’d not recommend it to anybody.  Perhaps most of the fault lies with the local location I had to deal with.  I hope so.

Posted in Incoherent ramblings | Leave a Comment »

Rotations using matrix exponentials

Posted by peeterjoot on July 27, 2010

[Click here for a PDF of this post with nicer formatting]

Motivation.

In [1] it is noted in problem 1.3 that any Unitary operator can be expressed in exponential form

\begin{aligned}U = e^{iC},\end{aligned} \hspace{\stretch{1}}(1.1)

where C is Hermitian. This is a powerful result hiding away in this problem. I haven’t actually managed to prove this yet to my satisfaction, but working through some examples is highly worthwhile. In particular it is interesting to compute the matrix C for a rotation matrix. One finds that the matrix for such a rotation operator is in fact one of the Pauli spin matrices, and I found it interesting that this falls out so naturally. Additionally, it is rather slick that one is able to so concisely express the rotation in exponential form, something that is natural and powerful in complex variable algebra, and also possible using Geometric Algebra using exponentials of bivectors. Here we can do it after all with nothing more than the plain old matrix algebra that everybody is already comfortable with.

The logarithm of the Unitary matrix.

By inspection we can invert 1.1 for C, by taking the logarithm

\begin{aligned}C = -i \ln U.\end{aligned} \hspace{\stretch{1}}(2.2)

The problem becomes one of evaluating the logarithm, or even giving meaning to it. I’ll assume that the functions of matrices that we are interested in are all polynomial in powers of the matrix, as in

\begin{aligned}f(U) = \sum_k \alpha_k U^k,\end{aligned} \hspace{\stretch{1}}(2.3)

and that such series are convergent. Then using a spectral decomposition, possible since Unitary matrices are normal, we can write for diagonal \Sigma = {\begin{bmatrix} \lambda_i \end{bmatrix}}_i

\begin{aligned}U = V \Sigma V^\dagger,\end{aligned} \hspace{\stretch{1}}(2.4)

and

\begin{aligned}f(U) = V \left( \sum_k \alpha_k \Sigma^k \right) V^\dagger = V {\begin{bmatrix} f(\lambda_i) \end{bmatrix}}_i V^\dagger.\end{aligned} \hspace{\stretch{1}}(2.5)

Provided the logarithm has a convergent power series representation for U, we then have for our Hermitian matrix C

\begin{aligned}C = -i V (\ln \Sigma) V^\dagger\end{aligned} \hspace{\stretch{1}}(2.6)

Evaluate this logarithm for an x,y plane rotation.

Given the rotation matrix

\begin{aligned}U =\begin{bmatrix}\cos\theta & \sin\theta \\ -\sin\theta & \cos\theta\end{bmatrix},\end{aligned} \hspace{\stretch{1}}(2.7)

We find that the eigenvalues are e^{\pm i\theta}, with eigenvectors proportional to (1, \pm i) respectively. Our decomposition for U is then given by
2.4, and

\begin{aligned}V &= \frac{1}{{\sqrt{2}}}\begin{bmatrix}1 & 1 \\ i & -i\end{bmatrix} \\ \Sigma &=\begin{bmatrix}e^{i\theta} & 0 \\ 0 & e^{-i\theta}\end{bmatrix}.\end{aligned} \hspace{\stretch{1}}(2.8)

Taking logs we have

\begin{aligned}C&=\frac{1}{2}\begin{bmatrix}1 & 1 \\ i & -i\end{bmatrix}\begin{bmatrix}\theta & 0 \\ 0 & -\theta\end{bmatrix} \begin{bmatrix}1 & -i \\ 1 & i\end{bmatrix} \\ &=\frac{1}{2}\begin{bmatrix}1 & 1 \\ i & -i\end{bmatrix}\begin{bmatrix}\theta  & -i\theta \\ -\theta & -i\theta\end{bmatrix}  \\ &=\begin{bmatrix}0 & -i\theta \\ i\theta & 0\end{bmatrix}.\end{aligned}

With the Pauli matrix

\begin{aligned}\sigma_2 =\begin{bmatrix}0 & -i \\ i & 0\end{bmatrix},\end{aligned} \hspace{\stretch{1}}(2.10)

we then have for an x,y plane rotation matrix just:

\begin{aligned}C = \theta \sigma_2\end{aligned} \hspace{\stretch{1}}(2.11)

and

\begin{aligned}U = e^{i \theta \sigma_2}.\end{aligned} \hspace{\stretch{1}}(2.12)

Immediately, since \sigma_2^2 = I, this also provides us with a trigonometric expansion

\begin{aligned}U = I \cos\theta + i \sigma_2 \sin\theta.\end{aligned} \hspace{\stretch{1}}(2.13)

By inspection one can see that this takes us full circle back to the original matrix form 2.7 of the rotation. The exponential form of
2.12 has a beauty that is however far superior to the plain old trigonometric matrix that we are comfortable with. All without any geometric algebra or bivector exponentials.

Three dimensional exponential rotation matrices.

By inspection, we can augment our matrix C for a three dimensional rotation in the x,y plane, or a y,z rotation, or a x,z rotation. Those are, respectively

\begin{aligned}U_{x,y}&=\exp\begin{bmatrix}0 & \theta & 0 \\ -\theta & 0 & 0 \\ 0 & 0 & i\end{bmatrix} \\ U_{y,z}&=\exp\begin{bmatrix}i & 0 & 0 \\ 0 & 0 & \theta \\ 0 & -\theta & 0 \\ \end{bmatrix} \\ U_{x,z}&=\exp\begin{bmatrix}0 & 0 & \theta \\ 0 & i & 0 \\ -\theta & 0 & 0 \\ \end{bmatrix}\end{aligned} \hspace{\stretch{1}}(2.14)

Each of these matrices can be related to each other by similarity transformation using the permutation matrices

\begin{aligned}\begin{bmatrix}0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \\ \end{bmatrix},\end{aligned}

and

\begin{aligned}\begin{bmatrix}1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \\ \end{bmatrix}.\end{aligned}

Exponential matrix form for a Lorentz boost.

The next obvious thing to try with this matrix representation is a Lorentz boost.

\begin{aligned}L =\begin{bmatrix}\cosh\alpha & -\sinh\alpha \\ -\sinh\alpha & \cosh\alpha\end{bmatrix},\end{aligned} \hspace{\stretch{1}}(2.17)

where \cosh\alpha = \gamma, and \tanh\alpha = \beta.

This matrix has a spectral decomposition given by

\begin{aligned}V &= \frac{1}{{\sqrt{2}}}\begin{bmatrix}1 & 1 \\ -1 & 1\end{bmatrix} \\ \Sigma &=\begin{bmatrix}e^\alpha & 0 \\ 0 & e^{-\alpha}\end{bmatrix}.\end{aligned} \hspace{\stretch{1}}(2.18)

Taking logs and computing C we have

\begin{aligned}C&=-\frac{i}{2}\begin{bmatrix}1 & 1 \\ -1 & 1 \end{bmatrix}\begin{bmatrix}\alpha & 0 \\ 0 & -\alpha\end{bmatrix} \begin{bmatrix}1 & -1 \\ 1 & 1\end{bmatrix} \\ &=-\frac{i}{2}\begin{bmatrix}1 & 1 \\ -1 & 1 \end{bmatrix}\begin{bmatrix}\alpha & -\alpha \\ -\alpha & -\alpha\end{bmatrix} \\ &=i \alpha\begin{bmatrix}0 & 1 \\ 1 & 0 \end{bmatrix}.\end{aligned}

Again we have one of the Pauli spin matrices. This time it is

\begin{aligned}\sigma_1 =\begin{bmatrix}0 & 1 \\ 1 & 0 \end{bmatrix}.\end{aligned} \hspace{\stretch{1}}(2.20)

So we can write our Lorentz boost 2.17 as just

\begin{aligned}L = e^{-\alpha \sigma_1} = I \cosh\alpha - \sigma_1 \sinh\alpha.\end{aligned} \hspace{\stretch{1}}(2.21)

By inspection again, we can come full circle by inspection from this last hyperbolic representation back to the original explicit matrix representation. Quite nifty!

It occurred to me after the fact that the Lorentz boost is not Unitary. The fact that the eigenvalues are not a purely complex phase term, like those of the rotation is actually a good hint that looking at how to characterize the eigenvalues of a unitary matrix can be used to show that the matrix C = -i V \ln \Sigma V^\dagger is Hermitian.

References

[1] BR Desai. Quantum mechanics with basic field theory. Cambridge University Press, 2009.

Posted in Math and Physics Learning. | Tagged: , , , , , , , | Leave a Comment »

Dirac Notation Ponderings.

Posted by peeterjoot on July 24, 2010

[Click here for a PDF of this post with nicer formatting]

Motivation.

I’ve got the textbook [1] now for the QMI course I’ll be taking in the fall, and have started some light perusing. Starting things off is the Dirac bra ket notation. Some aspects of that notation, or the explanation in the text, are not quite obvious to me so here I try to make sense of things.

Dirac Adjoint notes.

There are a pair of relations given to define the Dirac Adjoint. These are 1.26 and 1.27 respectively:

\begin{aligned}\left( A {\lvert {\alpha} \rangle} \right)^{*} &= {\langle {\alpha} \rvert} A^\dagger \\ {{\langle {\beta} \rvert} A {\lvert {\alpha} \rangle}}^{*} &= {\langle {\alpha} \rvert} A^\dagger {\lvert {\beta} \rangle}\end{aligned}

Is there some redundancy to these definitions. Namely is 1.27 a consequence of 1.26?

Since the ket was defined as the conjugate of the bra, we can probably rewrite 1.26 as

\begin{aligned}{\langle {\alpha} \rvert} A^{*} &= {\langle {\alpha} \rvert} A^\dagger \end{aligned}

Left “multiplication”, by the ket {\lvert {\beta} \rangle} gives

\begin{aligned}({\langle {\alpha} \rvert} A^{*}) {\lvert {\beta} \rangle} &= ({\langle {\alpha} \rvert} A^\dagger) {\lvert {\beta} \rangle} \\ {{\langle {\beta} \rvert} (A {\lvert {\alpha} \rangle})}^{*} &= ({\langle {\alpha} \rvert} A^\dagger) {\lvert {\beta} \rangle} \\ \end{aligned}

I’ve added and retained parenthesis to retain the operational direction. Is that operational direction not important? For example, given an operator like p = -i \hbar \partial_x, it makes a big difference whether the operator operates to the left or to the right. In the text, this last relation is equation 1.27 once the parens are dropped, so it does appear that 1.27 is a consequence of 1.26. This also then seems to imply that in a bra operator ket sandwich, the operator implicitly operates on the ket (to the right), while an adjoint operator implicitly operates on the bra (to the left).

Let’s compare this to the simpler and more pedestrian notation found in an old fashioned book like Bohm’s [2]. His expectation values explicitly use an integral definition, and his adjoint definition is very explicit about order of operations. Namely

\begin{aligned}\int \phi^{*} (A \psi) &\equiv \int \psi (A^\dagger \phi^{*}) \end{aligned} \hspace{\stretch{1}}(2.1)

Starting with a concrete definition like this seems a bit easier. Suppose we also define the bra ket sandwich based on the integral as follows

\begin{aligned}{\langle {\phi} \rvert} A {\lvert {\psi} \rangle} &\equiv {\langle {\phi} \rvert} (A {\lvert {\psi} \rangle}) \\ &\equiv \int \phi^{*} (A \psi) \\ \end{aligned}

Now, we can rewrite 2.1, as

\begin{aligned}\int \phi^{*} (A \psi)   &\equiv \int \psi (A^\dagger \phi^{*}) \\ &\implies \\ {\langle {\phi} \rvert} (A {\lvert {\psi} \rangle})  &= {\langle {\psi^{*}} \rvert} ( A^\dagger {\lvert {\phi^{*}} \rangle} ) \\ &\implies \\ \left({\langle {\phi} \rvert} (A {\lvert {\psi} \rangle}) \right)^{*}  &= ( {\langle {\phi} \rvert} A^\dagger ) {\lvert {\psi} \rangle}\end{aligned}

When starting off with the integral we see the notational requirement for non-adjoint operators to operate implicitly to the right, and the adjoint operators to operate implicitly to the left. With that notation requirement we can drop the parens and recover 1.27.

A couple clarification goals are now complete. The first is seeing how equation 1.26 in the text implies 1.27. We also have reconciled the Dirac notation with the familiar integral inner product notation, and seen two different ways that clarify the implicit operator directionality in the bra operator ket sandwiches.

References

[1] BR Desai. Quantum mechanics with basic field theory. Cambridge University Press, 2009.

[2] D. Bohm. Quantum Theory. Courier Dover Publications, 1989.

Posted in Math and Physics Learning. | Tagged: , , , , , , , | Leave a Comment »

Use of lwsync instead of isync as a mutex acquire memory barrier

Posted by peeterjoot on July 23, 2010

In a previous post on an AIX dbx quirk with the stepi command, the following mutex acquision fragment was used as an illustration

[10] stopped in main at line 187 in file "ossPowerPcVACPP.h" ($t1)
  187      return __fetch_and_or( (volatile unsigned int *)pAtomic, value ) ;
(dbx) listi
0x100000948 (main+0x108) 7ca01828       lwarx   r5,r0,r3
0x10000094c (main+0x10c) 7ca40378          or   r4,r5,r0
0x100000950 (main+0x110) 7c80192d      stwcx.   r4,r0,r3
0x100000954 (main+0x114) 40a2fff4         bne   0x100000948 (main+0x108)
0x100000958 (main+0x118) 74a01000      andis.   r0,r5,0x1000
0x10000095c (main+0x11c) 40820034         bne   0x100000990 (main+0x150)
0x100000960 (main+0x120) 4c00012c       isync

The isync represents the end of the mutex successfully acquired codepath, which will typically be followed by some number of loads and stores. For example, code like the following fragment that uses the pthreads library:

pthread_mutex_lock( &m ) ;

// pSharedMem->v++ ;
int tmp = pSharedMem->v ;
tmp++ ;
pSharedMem->v = tmp ;

pthread_mutex_unlock( &m ) ;

Here we have a load of pSharedMem->v, and increment of some register, and a store back from that register into the same address. What you don’t want is for that load to occur before the pthread_mutex_lock() is done. What gives you this guarentee may be a memory barrier instruction such as the powerpc isync (as can the heavyweight sync instruction). If you were to instruction step through an uncontended pthread_mutex_lock call on a powerpc platform, you would likely see something like the assembly listing above.

Now, a compiler guru told me today that isync is actually more than we need to supply the required ordering. This came as a bit of a surprise. With a careful read of the lwsync semantics one finds that it will (when placed between the memory accesses) order a pair of loads, or a pair of stores, or a load followed by a store, but NOT a store followed by a load. The isync on the other hand can order the store-load pair. It does this by imposing a total ordering of all instructions following the isync. Rather much like a wall, no instructions can be speculatively executed in the pipeline if an isync is first encountered. Consider the following busted examples of mutex protected reads and writes

volatile unsigned int g_mutex = 0 ;

int g_someGlobalCounter = 0 ;

int incorrect_mutex_read()
{
   while ( 1 == __fetch_and_or( &g_mutex, 1 ) )
   {
   }

   int v = g_someGlobalCounter ;

   __lwsync() ;
   g_mutex = 0 ;

   return v ;
}

void incorrect_mutex_write( int v )
{
   while ( 1 == __fetch_and_or( &g_mutex, 1 ) )
   {
   }

   g_someGlobalCounter = v ;   

   __lwsync() ;
   g_mutex = 0 ;
}

These are busted because the accesses (loads and stores respectively) of g_someGlobalCounter may be executed by the hardware before the atomic fetch and or operation writes the mutex is acquired value into the mutex memory location. There is no data dependence that prohibits such a hardware reordering, and it is not enough if you have verified that the compiler has generated the naive “ordered” instruction sequence.

The end effect of the hardware reordering these instructions would be as if you had the following “mutex protected” code:

pthread_mutex_t g_mutex = ... ; // assume properly initialized

int g_someGlobalCounter = 0 ;

int incorrect_mutex_read()
{
   int v = g_someGlobalCounter ;

   pthread_mutex_lock( &g_mutex ) ;
   pthread_mutex_unlock( &g_mutex ) ;

   return v ;
}

void incorrect_mutex_write( int v )
{
   g_someGlobalCounter = v ;

   pthread_mutex_lock( &g_mutex ) ;
   pthread_mutex_unlock( &g_mutex ) ;
}

Obviously, neither use of the mutex here is of any particular value. The code may even appear to work sometimes if you have minimal contention or nothing that provides a means to consistency check the correctness of the code.

To make these correct it was previously my impression that the isync was required, like so:

volatile unsigned int g_mutex = 0 ;

int g_someGlobalCounter = 0 ;

int correct_mutex_read()
{
   while ( 1 == __fetch_and_or( &g_mutex, 1 ) )
   {
   }

   __isync() ; // THIS IS THE CORRECTION.

   int v = g_someGlobalCounter ;

   __lwsync() ;
   g_mutex = 0 ;

   return v ;
}

void correct_mutex_write( int v )
{
   while ( 1 == __fetch_and_or( &g_mutex, 1 ) )
   {
   }

   __isync() ; // THIS IS THE CORRECTION.

   g_someGlobalCounter = v ;   

   __lwsync() ;
   g_mutex = 0 ;
}

(Never actually write a mutex like this with no breather in the tight spin like this).

Now all of the data accesses that follow the mutex acquision must neccessarily occur after the memory operation that acquired the mutex. Even the read case where we have a STORE 1 to g_mutex, followed by a load from g_someGlobalCounter we have the required ordering. One may not think that lwsync would be able to provide that ordering since it does not provide a STORE,LOAD barrier. This is apparently wrong. How does one reconcile this? Let’s start by observing that an atomic operation like fetch and or is a composite operation on powerpc. Let’s break this down, also using the nicely expressive xlC compiler’s builtins. Here’s an equivalent, but more explicit version of the code

volatile unsigned int g_mutex = 0 ;

int g_someGlobalCounter = 0 ;

int mutex_read_with_isync()
{
   int oldValue ;
   do {
      oldValue = __lwarx( &g_mutex ) ;
      int bStoreSuccessful = __stwcx( &g_mutex, 1 ) ; 
   } while ( !bStoreSuccessful || oldValue ) ;

   __isync() ;

   int v = g_someGlobalCounter ;

   __lwsync() ;
   g_mutex = 0 ;

   return v ;
}

int mutex_read_with_lwsync()
{
   int oldValue ;
   do {
      oldValue = __lwarx( &g_mutex ) ;
      int bStoreSuccessful = __stwcx( &g_mutex, 1 ) ; 
   } while ( !bStoreSuccessful || oldValue ) ;

   __lwsync() ;

   int v = g_someGlobalCounter ;

   __lwsync() ;
   g_mutex = 0 ;

   return v ;
}

This isn’t entirely equivalent to the old code since it uses a direct store of the “locked” value (1), and doesn’t actually implement a fetch and or, but it is close enough. Notice now that in the lwsync variation a successful acquire will have a LOAD(and reserve),STORE,LWSYNC,LOAD sequence of instructions. This appears vulnerable to be reordered by the hardware as LOAD(and reserve),LOAD,STORE,LWSYNC since lwsync doesn’t enforce an ordering of STORE,LOAD.

Now, in PowerPC Book II, Appendix B.2.1.1 Acquire Lock and Import Shared Storage, we have the following note after a very similar example:

If the shared data structure is in storage that is
neither Write Through Required nor Caching Inhibited,
an lwsync instruction can be used instead of the isync
instruction. If lwsync is used, the load from "data1"
may be performed before the stwcx.. But if the
stwcx. fails, the second branch is taken and the lwarx
is reexecuted. If the stwcx. succeeds, the value
returned by the load from "data1" is valid even if the
load is performed before the stwcx., because the
lwsync ensures that the load is performed after the
instance of the lwarx that created the reservation
used by the successful stwcx..

This raises a few questions. First of all, can we assume that we have never have “Write Through Required” and also never have “Caching Inhibited” memory access in our (presumed) userspace code that we are protecting with this mutex? The friendly neighborhood compiler guru says that he believes that is the case unless one is interacting with really low level stuff like kernel programming and device drivers and so forth. The gotcha here is that the lwarx and stwcx loads and stores are coupled, so even if the regular load can precede the store that acquires the mutex, this will never be a load that occurs when the mutex could not also be acquired. So if this is reordered in the successful mutex acquisition case, it will really be as if the instructions occurred in a LOAD(and reserve),LWSYNC,LOAD,STORE sequence at runtime. Very tricky. I’ve not yet tried to implement this change in our code and am a little bit scared to try it. Will our functional test be sufficient to verify that the new weaker barrier is sufficient. Also, I’m curious to attempt to measure the cost difference between isync and lwsync. Since lwsync defaults to full sync on pre-power4 hardware there was probably a time where such a change was not desirable even if it works, but nowadays I don’t expect we ever run on pre-power4 cpus. Taking some timings of isync, lwsync, and sync would probably be worthwhile to see how they all compare. Is contention required to make the timings of these barrier operations more meaningful?

More questions get raised than answered by starting down this path. But if it works it could make a small difference in a very frequently executed codepath … perhaps even enough to measure and be worthwhile. Time will tell.

Posted in C/C++ development and debugging. | Leave a Comment »

What is romance?

Posted by peeterjoot on July 18, 2010

 

In online dating profiles women almost invariably state that they are romantic and looking for somebody that is romantic.  This is so common, that it would make more sense to invert this, and only include a statement about it if it was not true.  We can assume that you are romantic and desire romance unless you state otherwise.  The only problem is that I expect most guys don’t have a clue what this means.  At the very least, I certainly don’t.

I think it is time for women to step up to the bat and help explain to us poor guys what specific steps this romance stuff requires.  I’d like a checklist please.  You say, “But, a checklist is not romantic!”  Probably true, but will leaving us to figure it out for ourselves get you what you want?  Since romance appears to be in short supply so much that almost every female profile includes listing this a strong requirement, it would not be unreasonable to give us a bit more guidance.  If you have a set of expectations for us, and don’t tell us what they are, then it shouldn’t surprise you later if you didn’t get what you wanted.

Under duress, I now have a couple small inklings of what romance means.  Part of this is listening, figuring out the hidden meanings in what was said, and acting on them without being asked.  I should have asked for a refund on my old wife-turpreter, because it never worked well.  For example, a statement like “I’ve never been to a concert”, really meant, “it would be romantic if you would take me to a concert”.  A clue to clueless guys like me, my factual response of “yes, neither have I” does not appear to be romantic.

So, as I now understand it, to a woman, romance includes at least the following: listening carefully, observation of what is desired, doing small things semi-regularly that make the lady feel appreciated and special (not just flowers and chocolates and boring stuff like that, especially if you use these in reaction for being in trouble all the time).  Is there more to it?

Does a guy crave romance?  If a guy was to say he wants romance what would it include?  It would certainly include sex, and without having to feel that it is given grudgingly, or that one has to hound for it.  It isn’t clear to me what other meaning to give the word romance when it is directed at me.

Perhaps the key to romance is just communication?  To understand one’s partner, and act on that understanding to make things better continually.

Posted in Incoherent ramblings | 1 Comment »

Basement electrical now done!

Posted by peeterjoot on July 18, 2010

(a march post from my to-be-deleted entropy blog)

I installed the rest of the ceiling lights today, and wired up the switches, and ran a new line back to the panel and installed the breaker for it.  This almost wraps up the basement work that I’d left partially completed for so long.  I even put the freezer into its nook, and the kids delighted in finding out how accessible the freezies now are.

IMG_2488

IMG_2490

IMG_2494

Rotating to the left from the bottom of the stairs leads to the other half of the now finished corridor, which now looks pretty spiffy (except for the floor).

IMG_2493

Posted in Home improvement | 2 Comments »

Basement electrical progress.

Posted by peeterjoot on July 18, 2010

(a march post from my to-be-deleted entropy blog)

Today I got the outlets in the basement hooked up, so I can finally put away the extension cord that I’ve got the lamp on.  This was an easy job since I had the breaker installed already, and it was just hooking up the outlets themselves

IMG_2479 (Small)

There’s a coax and cat5 cable in the wall there too, and I’ll have to hook that up too, or perhaps just put a cover plate on it.

I’ve also figured out what all my rough in wiring for the lights was once again.  It’s been a long time since I did that rough in work, so it was a bit of a detective job, but wasn’t too hard.

IMG_2477 (Small) 

IMG_2476 (Small)

 

IMG_2478 (Small)

This time I labeled things like I should have originally.  The ceiling lights in the corridor here are all pretty simple.  Four in series, and I’ve figured out what wire goes to them as supply from the switch.  I’ve got two rough in wires to the dark part of the basement on the left that I had originally intended to put on dimmers.  I don’t know yet what I’ll end up doing with those, if anything.

I also got two of the ceiling lights hooked up.  This is the one for the nook, before I stuffed it all back in the hole.  Notice the white to black connection.  This is because of the Ontario electrical code convention (perhaps a convention elsewhere too), where “white is hot at the switch”.  The idea must be, that when the light switch is on, the black that comes back to the light, is still the “hot”.

IMG_2480 (Small)

Once the nook light and outlet and switch were all hooked up, this is what it looked like.  Once I’ve cleaned up, I’ll put the freezer in there like we originally intended.

 

IMG_2481 (Small)

The one other light that I hooked up today is the one on the stairwell switch, that you see as soon as you come down the stairs.  What we used to see coming down the stairs was an ugly yellow electrical box nailed to the ceiling, with a simple light screwed directly into it.  Recently it’s been even worse looking since I had the same box suspended in the air from the wires as a temporary light while I got the ceiling in place.  Now that the ceiling is done (and painted even), this nice little light recessed fixture replaces the old temporary construction-look.

IMG_2486 (Small)

Posted in Home improvement | Leave a Comment »

Roasted Meat Flavour?

Posted by peeterjoot on July 18, 2010

 

This caught my eye when I saw it on our counter before dinner.  Notice that it is not beef nor ham, nor turkey, nor chicken.  Just meat?

 

IMG_2387

Wow, how curious.  Perhaps the ingredients help clear things up?

IMG_2388

Nope.  It’s still just meat.  That leaves a lot of room for interpretation.  Some possibilities that come to mind are: Kangaroo?  Armadillo?  Bear?

Posted in Incoherent ramblings | Leave a Comment »