Peeter Joot's (OLD) Blog.

Math, physics, perl, and programming obscurity.

Archive for June, 2010

More problems from Liboff chapter 4. Hermitian operators.

Posted by peeterjoot on June 27, 2010

[Click here for a PDF of this post with nicer formatting]


Some more problems from [1].

Problem 4.11

Some problems on Hermitian adjoints. The starting point is the definition of the adjoint A^\dagger of A in terms of the inner product

\begin{aligned}\left\langle{{\hat{A}^\dagger \phi}} \vert {{\psi}}\right\rangle = \left\langle{{\phi}} \vert {{\hat{A} \psi}}\right\rangle\end{aligned}

4.11 a

\begin{aligned}\left\langle{{ \phi }} \vert {{ (a \hat{A} + b \hat{B}) \psi }}\right\rangle &=a \left\langle{{ \phi }} \vert {{ \hat{A} \psi }}\right\rangle + b \left\langle{{ \phi }} \vert {{ \hat{B} \psi }}\right\rangle  \\ &=a \left\langle{{ \hat{A}^\dagger \phi }} \vert {{ \psi }}\right\rangle + b \left\langle{{ \hat{B}^\dagger \phi }} \vert {{ \psi }}\right\rangle  \\ &=\left\langle{{ a^{*} \hat{A}^\dagger \phi }} \vert {{ \psi }}\right\rangle + \left\langle{{ b^{*} \hat{B}^\dagger \phi }} \vert {{ \psi }}\right\rangle  \\ &=\left\langle{{ (a^{*} \hat{A}^\dagger + b^{*} \hat{B}^\dagger ) \phi }} \vert {{ \psi }}\right\rangle  \\ &\implies \\ (a \hat{A} + b \hat{B})^\dagger = (a^{*} \hat{A}^\dagger + b^{*} \hat{B}^\dagger)\end{aligned}

4.11 b

\begin{aligned}\left\langle{{ \phi }} \vert {{ \hat{A} \hat{B} \psi }}\right\rangle &=\left\langle{{ \hat{A}^\dagger \phi }} \vert {{ \hat{B} \psi }}\right\rangle  \\ &=\left\langle{{ \hat{B}^\dagger \hat{A}^\dagger \phi }} \vert {{ \psi }}\right\rangle  \\ &\implies \\ (\hat{A} \hat{B} )^\dagger &=\hat{B}^\dagger \hat{A}^\dagger \end{aligned}

4.11 d

Hermitian adjoint of D^2, where D = {\partial {}}/{\partial {x}}. Here we need the integral form of the inner product

\begin{aligned}\left\langle{{\phi}} \vert {{D^2 \psi}}\right\rangle &=\int \phi^{*} \frac{\partial {}}{\partial {x}}\frac{\partial {\psi}}{\partial {x}} \\ &=-\int \frac{\partial {\phi^{*}}}{\partial {x}} \frac{\partial {\psi}}{\partial {x}} \\ &=\int \psi \frac{\partial {}}{\partial {x}}\frac{\partial {\phi^{*}}}{\partial {x}} \\ &\implies \\ (D^2)^\dagger &= D^2\end{aligned}

Since the text shows that the square of a Hermitian operator is Hermitian, one perhaps wonders if D is (but we expect not since \hat{p} = -i \hbar D is Hermitian).

Suppose \hat{A} = aD, we have

\begin{aligned}\hat{A}^\dagger = -a^{*} D,\end{aligned}

so for this to be Hermitian (\hat{A} = \hat{A}^\dagger) we must have - a^{*} = a. If a = r e^{i\theta}, we have

\begin{aligned}-1 = e^{2 i\theta}\end{aligned}

So \theta = \pi (1/2 + n), and a = \pm i r. This fixes the scalar multiples of D that are required to form a Hermitian operator

\begin{aligned}\hat{A} &= \pm i r D\end{aligned}

where r is any real positive constant.

4.11 e

\begin{aligned}(\hat{A} \hat{B} - \hat{B} \hat{A})^\dagger &= - (\hat{A}^\dagger \hat{B}^\dagger - \hat{B}^\dagger \hat{A}^\dagger)\end{aligned}

4.11 f

\begin{aligned}(\hat{A} \hat{B} + \hat{B} \hat{A})^\dagger &= \hat{A}^\dagger \hat{B}^\dagger + \hat{B}^\dagger \hat{A}^\dagger\end{aligned}

4.11 g

\begin{aligned}i (\hat{A} \hat{B} - \hat{B} \hat{A})^\dagger &= i ( \hat{A}^\dagger \hat{B}^\dagger - \hat{B}^\dagger \hat{A}^\dagger)\end{aligned}

4.11 h

This one was to calculate (\hat{A}^\dagger)^\dagger. Intuitively I’d expect that (\hat{A}^\dagger)^\dagger = \hat{A}. How could one show this?

Trying to show this with Dirac notation, I got all mixed up initially.

Using the more straightforward and old fashioned integral notation (as in [2]), this is more straightforward. We have the Hermitian conjugate defined by

\begin{aligned}\int \psi_2^{*} (\hat{A} \psi_1) = \int (\hat{A}^\dagger \psi_2^{*}) \psi_1,\end{aligned}

Or, more symmetrically, using braces to indicate operator direction

\begin{aligned}\int \psi_2^{*} (\hat{A} \psi_1) = \int (\psi_2^{*} \hat{A}^\dagger) \psi_1.\end{aligned}

Introduce a couple of variable substuitions for clarity

\begin{aligned}\phi_1 &= \psi_1^{*} \\ \phi_2 &= \psi_2^{*} \\ \hat{B} &= \hat{A}^\dagger.\end{aligned}

We then have

\begin{aligned}\int \psi_2^{*} (\hat{A} \psi_1)&=\int (\psi_2^{*} \hat{A}^\dagger) \psi_1 \\ &=\int (\phi_2 \hat{B}) \phi_1^{*} \\ &=\int \phi_1^{*} (\hat{B} \phi_2) \\ &=\int (\phi_1^{*} \hat{B}^\dagger) \phi_2 \\ &=\int \phi_2 (\hat{B}^\dagger \phi_1^{*}) \\ &=\int \psi_2^{*} (\hat{A}^{\dagger \dagger} \psi_1) \\ \end{aligned}

Since this is true for all \psi_k, we have \hat{A} = \hat{A}^{\dagger \dagger} as expected.

Having figured out the problem in the simpleton way, it’s now simple to go back and translate this into the Dirac inner product notation without getting muddled. We have

\begin{aligned}\left\langle{{ \psi_2 }} \vert {{ \hat{A} \psi_1 }}\right\rangle &=\left\langle{{ \hat{A}^\dagger \psi_2 }} \vert {{ \psi_1 }}\right\rangle  \\ &=\left\langle{{ \hat{B} \phi_2^{*} }} \vert {{ \phi_1^{*} }}\right\rangle  \\ &={\left\langle{{ \phi_1 }} \vert {{ \hat{B}^{*} \phi_2}}\right\rangle}^{*}  \\ &={\left\langle{{ (\hat{B}^{*})^\dagger \phi_1 }} \vert {{ \phi_2}}\right\rangle}^{*}  \\ &=\left\langle{{\phi_2^{*} }} \vert {{ \hat{B}^\dagger \phi_1^{*} }}\right\rangle \\ &=\left\langle{{\psi_2 }} \vert {{ \hat{A}^{\dagger \dagger} \psi_1 }}\right\rangle \\ \end{aligned}

4.11 i

\begin{aligned}(\hat{A} \hat{A}^\dagger)^\dagger &= (\hat{A}^\dagger)^\dagger \hat{A}^\dagger \end{aligned}

since (\hat{A}^\dagger) ^\dagger = \hat{A}

\begin{aligned}(\hat{A} \hat{A}^\dagger)^\dagger &= \hat{A} \hat{A}^\dagger.\end{aligned}

Problem 4.12 d

If \hat{A} is not Hermitian, is the product \hat{A}^\dagger \hat{A} Hermitian? To start we need to verify that \left\langle{{\psi}} \vert {{\hat{A}^\dagger \phi}}\right\rangle = \left\langle{{\hat{A} \psi}} \vert {{\phi}}\right\rangle.

\begin{aligned}\left\langle{{ \psi }} \vert {{ \hat{A}^\dagger \phi }}\right\rangle &={\left\langle{{ (\hat{A}^\dagger)^{*} \phi^{*} }} \vert {{ \psi^{*} }}\right\rangle}^{*} \\ &={\left\langle{{ \phi^{*} }} \vert {{ \hat{A}^{*} \psi^{*} }}\right\rangle}^{*} \\ &=\left\langle{{ \psi }} \vert {{ \hat{A} \psi }}\right\rangle.\end{aligned}

With that verified we have

\begin{aligned}\left\langle{{ \psi }} \vert {{ \hat{A}^\dagger \hat{A} \phi }}\right\rangle &=\left\langle{{ \hat{A} \psi }} \vert {{ \hat{A} \phi }}\right\rangle  \\ &=\left\langle{{ \hat{A}^\dagger \hat{A} \psi }} \vert {{ \phi }}\right\rangle,\end{aligned}

so, the answer is yes. Provided the adjoint exists, that product will be Hermitian.

Problem 4.14

Show that \left\langle{{\hat{A}}}\right\rangle = \left\langle{{\hat{A}}}\right\rangle^{*} (that it is real), if \hat{A} is Hermitian. This follows by expansion of that conjuagate

\begin{aligned}\left\langle{{\hat{A}}}\right\rangle^{*} &= \left(\int \psi^{*} \hat{A} \psi \right)^{*} \\ &= \int \psi \hat{A}^{*} \psi^{*} \\ &= \int (\hat{A} \psi)^{*} \psi \\ &= \left\langle{{ \hat{A} \psi }} \vert {{ \psi }}\right\rangle \\ &= \left\langle{{ \psi }} \vert {{ \hat{A}^\dagger \psi }}\right\rangle \\ &= \left\langle{{ \psi }} \vert {{ \hat{A} \psi }}\right\rangle \\ &= \left\langle{{\hat{A}}}\right\rangle\end{aligned}


[1] R. Liboff. Introductory quantum mechanics. 2003.

[2] D. Bohm. Quantum Theory. Courier Dover Publications, 1989.

Posted in Math and Physics Learning. | Tagged: , , | Leave a Comment »

Oh cruel compiler.

Posted by peeterjoot on June 22, 2010

Here’s a code fragment from some code I was debugging today.

   sqlzRc rc = 0 ;

   switch ( cfRc )
         rc = COMM_ERROR ;
         break ;


I expected COMM_ERROR out of this for the input (which based on log output appeared that it should have been cfRc==RECV_MRB_FAILURE), and was totally frigging perplexed.

Faced with the incontrovertible proof in the debugger that the input was what I thought it was, and the output was the result of the default case, I had another look.

Damn. Each of those labels is missing the case keyword, and the damn thing still compiles!

There’s so much here that the compiler could have noticed and didn’t:

  • Labels that didn’t have gotos connecting them to anything.
  • a break from a context that didn’t have a case.
  • labels that were typedef values too (I guess that is allowed … bizzarre).

This was with GCC4.2.1. After I fix this I have the inclination to revert the fix and try with some other compilers to see if they are more sane.

Posted in C/C++ development and debugging. | Leave a Comment »

comparing some times for perl vs sort command line hacking

Posted by peeterjoot on June 22, 2010

I had a 2M line file that contained among other things function identifier strings such as:


I wanted to extract just these and sort them by name for something else. I’d first tried this in vim, but it was taking too long. Eventually I control-C’ed it and realized I had to be a bit smarter about it. I figured something like perl would do the trick, and I was able to extract those strings easily with:

cat flw.* | perl -p -e 's/.*?(\S+::\S+).*/$1/;'

(ie: grab just the not-space::not-space text and spit it out). passing this to ‘sort -u’ was also taking quite a while. Here’s a slightly smarter way to do it, still also a one-liner:

cat flw.* | perl -n -e 's/.*?(\S+::\S+).*/$h{$1}=1/e; END{ foreach (sort keys %h) { print "$_\n" ; } } '

All the duplicates are automatically discarded by inserting the matched value into a hash instead of just spitting it out. Then a simple loop over the hash keys gives the result directly. For the data in question, this ended up reducing the time required for the whole operation to just 12.5seconds (eventually I ran the original ‘perl -… | sort -u’ in the background and found it would have taken 1.6 minutes). It took far less time to tweak the command line than the original command would have taken, and provides a nice example where an evaluated expression in the regex match can be handy.

Of course, I then lost my time savings by writing up these notes for posterity;)

Posted in perl and general scripting hackery | Tagged: , , , , | 4 Comments »

A hoop and spring oscillator problem.

Posted by peeterjoot on June 19, 2010

[Click here for a PDF of this post with nicer formatting]


Nolan was attempting to setup and solve the equations for the following system (\ref{fig:hoopSpring})

Coupled hoop and spring

One mass is connected between two springs to a bar. That bar moves up and down as forced by the motion of the other mass along a immovable hoop. While Nolan didn’t include any gravitational force in his potential terms (ie: system lying on a table perhaps) it doesn’t take much more to include that, and I’ll do so. I also include the distance L to the center of the hoop, which I believe required.


The Lagrangian can be written by inspection. Writing x = x_1, and x_2 = R \sin\theta, we have

\begin{aligned}\mathcal{L} = \frac{1}{{2}} m_1 \dot{x}^2 + \frac{1}{{2}} m_2 R^2 \dot{\theta}^2 - \frac{1}{{2}} k_1 x^2 - \frac{1}{{2}} k_2 ( L + R \sin\theta - x )^2- m_1 g x- m_2 g ( L + R \sin\theta).\end{aligned} \hspace{\stretch{1}}(2.1)

Evaluation of the Euler-Lagrange equations gives

\begin{aligned}m_1 \dot{d}{x} &= - k_1 x + k_2 ( L + R \sin\theta - x ) - m_1 g \\ m_2 R^2 \dot{d}{\theta} &= - k_2 ( L + R \sin\theta - x ) R \cos\theta - m_2 g R \cos\theta,\end{aligned} \hspace{\stretch{1}}(2.2)


\begin{aligned}\dot{d}{x} &= -x \frac{k_1 + k_2}{m_1} + \frac{k_2 R \sin\theta}{m_1} - g + \frac{k_2 L }{m_1} \\ \dot{d}{\theta} &= - \frac{1}{{R}}\left( \frac{k_2}{m_2} \left( L + R \sin\theta - x \right) +g \right) \cos\theta.\end{aligned} \hspace{\stretch{1}}(2.4)

Just like any other coupled pendulum system, this one is non-linear. There’s no obvious way to solve this in closed form, but we could determine a solution in the neighborhood of a point (x, \theta) = (x_0, \theta_0). Let’s switch our dynamical variables to ones that express the deviation from the initial point \delta x = x - x_0, and \delta \theta = \theta - \theta_0, with u = (\delta x)', and v = (\delta \theta)'. Our system then takes the form

\begin{aligned}u' &= f(x,\theta) =-x \frac{k_1 + k_2}{m_1} + \frac{k_2 R \sin\theta}{m_1} - g + \frac{k_2 L }{m_1} \\ v' &= g(x,\theta) =- \frac{1}{{R}}\left( \frac{k_2}{m_2} \left( L + R \sin\theta - x \right) +g \right) \cos\theta \\ (\delta x)' &= u \\ (\delta \theta)' &= v.\end{aligned} \hspace{\stretch{1}}(2.6)

We can use a first order Taylor approximation of the form f(x, \theta) = f(x_0, \theta_0) + f_x(x_0,\theta_0) (\delta x) + f_\theta(x_0,\theta_0) (\delta \theta). So, to first order, our system has the approximation

\begin{aligned}u' &= -x_0 \frac{k_1 + k_2}{m_1} + \frac{k_2 R \sin\theta_0}{m_1} - g + \frac{k_2 L }{m_1} -(\delta x) \frac{k_1 + k_2}{m_1} + \frac{k_2 R \cos\theta_0}{m_1} (\delta \theta)\\ v' &= - \frac{1}{{R}}\left( \frac{k_2}{m_2} \left( L + R \sin\theta_0 - x_0 \right) +g \right) \cos\theta_0+ \frac{k_2 \cos\theta_0}{m_2 R} (\delta x)- \frac{1}{{R}}\left( \frac{k_2}{m_2} \left( \left( L - x_0 \right) \sin\theta_0 + R \right) + g \sin\theta_0 \right) (\delta \theta)\\ (\delta x)' &= u \\ (\delta \theta)' &= v.\end{aligned} \hspace{\stretch{1}}(2.10)

This would be tidier in matrix form with \mathbf{x} = (u, v, \delta x, \delta \theta)

\begin{aligned}\mathbf{x}' &=\begin{bmatrix}-x_0 \frac{k_1 + k_2}{m_1} + \frac{k_2 R \sin\theta_0}{m_1} - g + \frac{k_2 L }{m_1} \\ - \frac{1}{{R}}\left( \frac{k_2}{m_2} \left( L + R \sin\theta_0 - x_0 \right) +g \right) \cos\theta_0 \\ 0 \\ 0\end{bmatrix}+\begin{bmatrix}0 & 0 &-\frac{k_1 + k_2}{m_1} & \frac{k_2 R \cos\theta_0}{m_1} \\ 0 & 0 & \frac{k_2 \cos\theta_0}{m_2 R} &- \frac{1}{{R}}\left( \frac{k_2}{m_2} \left( \left( L - x_0 \right) \sin\theta_0 + R \right) + g \sin\theta_0 \right) \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ \end{bmatrix}\mathbf{x}.\end{aligned} \hspace{\stretch{1}}(2.14)

This reduces the problem to the solutions of first order equations of the form

\begin{aligned}\mathbf{x}' = \mathbf{a} + \begin{bmatrix}0 & A \\ I & 0\end{bmatrix}\mathbf{x} = \mathbf{a} + \mathbf{B} \mathbf{x}\end{aligned} \hspace{\stretch{1}}(2.15)

where \mathbf{a}, and A are constant matrices. Such a matrix equation has the solution

\begin{aligned}\mathbf{x} = e^{B t} \mathbf{x}_0 + (e^{Bt} - I) B^{-1} \mathbf{a},\end{aligned} \hspace{\stretch{1}}(2.16)

but the zeros in B should allow the exponential and inverse to be calculated with less work. That inverse is readily verified to be

\begin{aligned}B^{-1} =\begin{bmatrix}0 & I \\ A^{-1} & 0\end{bmatrix}.\end{aligned} \hspace{\stretch{1}}(2.17)

It is also not hard to show that

\begin{aligned}B^{2n} = \begin{bmatrix}A^n & 0 \\ 0 & A^n\end{bmatrix} \\ B^{2n+1} = \begin{bmatrix}0 & A^{n+1} \\ A^n & 0\end{bmatrix}.\end{aligned} \hspace{\stretch{1}}(2.18)

Together this allows for the power series expansion

\begin{aligned}e^{Bt} &=\begin{bmatrix}\cosh(t \sqrt{A}) & \sinh(t \sqrt{A}) \\ \sinh(t \sqrt{A}) \frac{1}{{\sqrt{A}}} & \cosh(t \sqrt{A})\end{bmatrix}.\end{aligned} \hspace{\stretch{1}}(2.20)

All of the remaining sub matrix expansions should be straightforward to calculate provided the eigenvalues and vectors of A are calculated. Specifically, suppose that we have

\begin{aligned}A = U \begin{bmatrix}\lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix}U^{-1}.\end{aligned} \hspace{\stretch{1}}(2.21)

Then all the perhaps non-obvious functions of matrixes expand to just

\begin{aligned}A^{-1} &= U \begin{bmatrix}\lambda_1^{-1} & 0 \\ 0 & \lambda_2^{-1} \end{bmatrix}U^{-1} \\ \sqrt{A} &= U \begin{bmatrix}\sqrt{\lambda_1} & 0 \\ 0 & \sqrt{\lambda_2}\end{bmatrix}U^{-1} \\ \cosh(t \sqrt{A}) &= U \begin{bmatrix}\cosh( t \sqrt{\lambda_1} ) & 0 \\ 0 & \cosh( t \sqrt{\lambda_2} )\end{bmatrix}U^{-1} \\ \sinh(t \sqrt{A}) &= U \begin{bmatrix}\sinh( t \sqrt{\lambda_1} ) & 0 \\ 0 & \sinh( t \sqrt{\lambda_2} )\end{bmatrix}U^{-1} \\ \sinh(t \sqrt{A}) \frac{1}{{\sqrt{A}}} &= U \begin{bmatrix}\sinh( t \sqrt{\lambda_1} )/\sqrt{\lambda_1} & 0 \\ 0 & \sinh( t \sqrt{\lambda_2} )/\sqrt{\lambda_2}\end{bmatrix}U^{-1}.\end{aligned} \hspace{\stretch{1}}(2.22)

An interesting question would be how are the eigenvalues and eigenvectors changed with each small change to the initial position \mathbf{x}_0 in phase space. Can these be related to each other?

Posted in Math and Physics Learning. | Leave a Comment »

The meaning of continue in C

Posted by peeterjoot on June 16, 2010

I found myself looking at some code and unsure how it would behave. Being a bit tired today I couldn’t remember if continue pops you to the beginning of the loop, or back to the predicate that allows you to break from it. Here’s an example:

int main()
   volatile int rc = 0 ;
   volatile int x = 0 ;

   do {
      rc = 1 ;

      if ( 1 == x )
         rc = 0 ;

         continue ;
   } while ( rc == 1 ) ;

   return 0 ;

And if you say, “What you’ve been programming for 12 years and don’t know?” Well it looks that way. I chose to walk through this code in the debugger to see how it worked:

(gdb) b main
Breakpoint 1 at 0x40055c: file t.C, line 3.
(gdb) run
Starting program: /vbs/engn/.t/a.out

Breakpoint 1, main () at t.C:3
3          volatile int rc = 0 ;
(gdb) n
4          volatile int x = 0 ;
(gdb) n
7             rc = 1 ;
(gdb) n
9             if ( 1 == x )
(gdb) n
6          do {
(gdb) n
7             rc = 1 ;
(gdb) n
9             if ( 1 == x )
(gdb) p x=1
$1 = 1
(gdb) n
11               rc = 0 ;
(gdb) n
6          do {
(gdb) n
17         return 0 ;
(gdb) n
18      }
(gdb) q

Once the variable x is modified, sure enough we break from the loop (note the sneaky way you have to modify variables in gdb, using the print statement to implicitly assign). There’s no chance to go back to the beginning and reset rc = 1 to keep going.

The conclusion: continue means goto the loop exit predicate statement, not continue to the beginning of the loop to retry. In the code in question a goto will actually be clearer, since what was desired was a retry, not a retry-if.

Posted in C/C++ development and debugging. | Tagged: , , , | Leave a Comment »

A new creative way to mess up a memset call.

Posted by peeterjoot on June 8, 2010

Here’s a way to mess up a memset call that impressed me with it’s creative uniqueness:


memset( &ud, 0x00, sizeof( ud[MAX_ATTACHMENTS] ) ) ;

What was really meant was:

memset( &ud, 0x00, sizeof( ud ) ) ;

(ie: the whole array, which has MAX_ATTACHMENTS elements). Instead the person who coded this, ended up only initializing the first element, since they asked to initialize just the size of the last element in the array.

Posted in C/C++ development and debugging. | Tagged: , | Leave a Comment »

Address of an array base. What was the C language designer smoking?

Posted by peeterjoot on June 8, 2010

I regularly see code that grabs the address of the first element in an array in different ways. Some made up examples:

void bar( char * ) ;

void foo
   char x[3] ;

   bar( x ) ;
   bar ( &x ) ;
   bar( &x[0] ) ;
   bar( (&x[0]) ) ;

Sometimes these get really complex looking when the array element is in some nested structure and the coder in question is so unsure about how to get the first element address that lots of extra braces are tossed in.

It’s worthwhile demonstrating to oneself that these are all identical. Something like the following does the job:

#include <stdio.h>

int main()
   char x[3] ;

   printf("%p %p %p\n", x, &x, &x[0] ) ;

   return 0 ;

What I have to wonder about something like this is why would the language allow both x and &x as equivalent syntax? Is it so counterintuitive to have x == &x, but this is actually true in this case! Even though I’ve seen it time and time again, the &x syntax seems to be much less common than just x or &x[0], and it throws me off when I see it.

I get the rough impression, at least from our code, that the &x[0] form is generally preferred. I’d infer from this that is it not obvious to many that just plain x is the array base address, and thus the address of the first element.

Posted in C/C++ development and debugging. | Tagged: , , | 6 Comments »

A nice simple example of a memory barrier requirement

Posted by peeterjoot on June 7, 2010

[Click here for a printable PDF version of this post]


We apparently have some code where a pointer is used to flag that specific updates to a structure can be consumed by other threads. This presents a nice opportunity to document one of the simpler, yet realistic, fashions that memory barriers are required for production code. This description should complement the previous barrier related posts An attempt to illustrate differences between memory ordering and atomic access, and Intel memory ordering, fence instructions, and atomic operations.


I haven’t seen the code in question, but would imagine it could look something like this:

struct broadcast
   int x ;
   int y ;
   // ...
} ;

struct sharedMem
   volatile broadcast * publicUnprotectedMemoryWithStuff ;
} ;

/* producer */
void updateMemoryAndLetEverybodySeeIt( broadcast * pStuff, sharedMem * pShared )
   pStuff->x = 3 ;
   pStuff->y = 7 ;

   pShared->publicUnprotectedMemoryWithStuff  =  pStuff ;

/* consumer */
void consumePublicMemory( broadcast * pStuff )
   if ( pStuff->publicUnprotectedMemoryWithStuff )
      int v = pStuff->publicUnprotectedMemoryWithStuff-> x ;
      int w = pStuff->publicUnprotectedMemoryWithStuff-> y ;

      printf( "%d %d\n", v, w ) ;

There’s a few assumptions here. One is that the pointer location used to “broadcast” the updates to the fields x, and y, resides in appropriately aligned memory (for us that means 8 byte aligned since we only support 64-bit systems now). Another assumption is that this alignment is sufficient that a store to the pointer (and subsequent access) won’t ever be attempted in a fragmented way. We used to see on old HP PARISC systems generated assembly code where a 32-bit access was done with two 16-bit operations, so two halves of a pointer load or store wouldn’t have a guarantee of being coherent. Given aligned memory on modern systems (ie: not PARISC) it isn’t too unreasonable to assume that loads and stores are not done in this piecewise fashion, either by the compiler or the hardware.

Another assumption made here is that the compiler has generated code that executes in the same order as the higher level C code. That’s probably a bad assumption unless something else is done to explicitly enforce this ordering. On some systems we have intrinsics or low level compiler methods to enforce this ordering. One example is the no-op barrier mechanism available in the GCC compiler suite (or the Intel compiler that also implements GCC style inline assembly). So a more correct, but now platform specific, way of coding this broadcast function would be:

void updateMemoryAndLetEverybodySeeIt( broadcast * pStuff, sharedMem * pShared )
   pStuff->x = 3 ;
   pStuff->y = 7 ;

   __asm__ __volatile__  ( "" ::: "memory" ) ;
   pShared->publicUnprotectedMemoryWithStuff  =  pStuff ;

The next assumption here is one of sequential memory, one that also cannot be correctly presumed. By the time the publicUnprotectedMemoryWithStuff assignment is made, it is assumed that the previous x, and y memory operations are complete. i.e.: when somebody accesses *pStuff->publicUnprotectedMemoryWithStuff, the 3 and 7 values will be seen, and not some previous values. This is not correct on a system where out of order memory accesses are possible. Two example systems like this are AIX and HP-IPF (powerpc and ia64 respectively).

This code would actually work on HP-IPF since the compiler emits an st8.rel instruction for the volatile store. The .rel indicates that the store is to have “release” semantics, and behaves in a similar way to what is done in a mutex release operation: all previous loads and stores have to be complete before the effects of the store is visible to other cpus. That does exactly what the caller expects, by virtue of using volatile for the pointer. This requires that the caller know explicitly that volatile behaves this way on ia64 systems, unless one has explicitly disabled this behavior with suitable compilation options (every ia64 compiler I’ve seen documents this volatile behavior).

However, on AIX (PowerPC), or Linux PPC, this code is not correct. There we need an lwsync instruction between the pStuff->y store and the assignment of the pointer. Without that the store to publicUnprotectedMemoryWithStuff may occur before the stores to x, y are done. At the read point we have a data dependency that protects us, since we can’t dereference without the pointer being non-null, so no value for x or y can be prefetched before the pointer is accessed and observed to be non-null. That wouldn’t be the case if a flag (such a separate piece of atomically manipulated memory was used to flag the availability of the producer’s pair of stores. Again, we have platform specific requirements to make this code right. On Linux PPC, using the GCC compiler, one could do this with:

void updateMemoryAndLetEverybodySeeIt( broadcast * pStuff, sharedMem * pShared )
   pStuff->x = 3 ;
   pStuff->y = 7 ;
   __asm__ __volatile__  ( "lwsync" ::: "memory" ) ;
   pShared->publicUnprotectedMemoryWithStuff  =  pStuff ;

whereas on AIX, using xlC, one could use:

void updateMemoryAndLetEverybodySeeIt( broadcast * pStuff, sharedMem * pShared )
   pStuff->x = 3 ;
   pStuff->y = 7 ;
   __lwsync() ; // from builtins.h
   pShared->publicUnprotectedMemoryWithStuff  =  pStuff ;

The __lwsync() intrinsic also has the side effect of preventing code motion, so it does both the jobs of making the compiler generate the desired “sequential” code, and also makes the hardware do things in the same enforced ordered fashion. This sort of platform dependenence is the cost that you are forced to pay if you choose to attempt to avoid the use of mutex operations that would normally be used to make this code safe and correct.

Posted in C/C++ development and debugging. | Tagged: , , , , | Leave a Comment »

Fixing a loose banister

Posted by peeterjoot on June 6, 2010


Over the years the kids and I have over aggressively used the banister as we charge up and down the stairs.  It’s been loose for a while but it didn’t really occur to me that it was fixable.  Now, we’ve just sold the house, and the buyer’s identified this as a defect that they wanted fixed as part of the closing conditions.

It turns out that it is fixable.  The very knowledgeable guy who works in our neighborhood home depot in doors and windows department told me how to do this.  Basically, take it all apart, and then put it all back together.

This requires:

  • First taking out the underside screws that hold the wood rail onto the metal underside.
  • Now you can unscrew the metal rail from the spindles.
  • Pull the loose spindles
  • Clean the old glue off the floor and the spindles.
  • Glue them back in.
  • Screw the metal rail back onto the top of the spindles.
  • Screw the wood rail back on.

This is a lot easier to see in pictures.  The end result of the wild kids and dads in the house over the years resulted in the spindles being pulled from the floor like so:

IMG_2562 (Large)

Are you wondering why there is tape on the floor?  I put that there like registration marks in printing.  That way the spindles end up aligned the same way they were originally.  I did these tape markers all the way up and down the stairs, and it probably mattered most right down at the base (where the builder was a cheapskate and didn’t put a newel post that probably would have made the banister properly stable).

IMG_2561 (Large)

I also marked each post with a number so that the sequence of the posts was retained.  You can see that in this picture, where I’d taken the wood railing off:

IMG_2560 (Large)

Once the metal was removed, you can really see how free the spindles were in their holes:

IMG_2565 (Large)

With very little effort I could rotate these in place once the screw was removed.  Next step is removing the metal rail completely, and setting it aside with the wood rail

IMG_2566 (Large)

Now that the rails are both off, the posts can be pulled, and it’s time to clean up the old glue residue.  I was actually surprised how little glue the original installer appeared to use, and cleaning it off the floor wasn’t too hard.  I used a chisel to lightly scrap it off without damaging the surrounding stair finish (since my initial attempt using sandpaper looked like it was going to wreck the finish)

IMG_2569 (Large)

I ended up glad that I’d marked the posts numerically, since it would have been easy to mix them up, even with orderly placement.  When you look at a finished staircase banister, you don’t even notice the fact that each one is a different length

IMG_2571 (Large)

The glue also has to be cleaned off of the spindle bases, once they are pulled out.  You can see how it wasn’t wiped off well by the original installer


IMG_2572 (Large) 

A little chiselling and sandpapering handles that nicely

IMG_2573 (Large)

Again I was really surprised how little glue the original installer used.  I hardly had to take any off of all the little knobs on the base of each spindle.

I ended up pulling out all but two of the spindles, since two at the base were in firmly enough that I didn’t think I needed to touch them.

IMG_2575 (Large)

That may have been a mistake, since that part of the banister is still the loosest part of the railing.  I think that what really ought to have been on the staircase is a newel post, and not the wimpy set of curved spindles that were used.  I wasn’t about to try to retrofit one in though, since that probably would have meant completely redoing the staircase.  With everything removed, it was time for some carpenters glue on the spindle begs and bottom, put em back in the holes, align them, and wipe up the excess glue after squishing them down into their seats nicely.  That looked like it did near the beginning:

IMG_2576 (Large)

Next was putting the wood back on.  I actually had a fair amount of trouble with this part, and it didn’t align properly anymore.  I loosened up the big screw at the top of the stairs that held the metal rail to the wall, and was able to get it back on in the end, but I had to kind of lever it on.  The end result is something that doesn’t look any different than the original from a distance

IMG_2577 (Large)

If you look closely now, the gaps that used to be at the base of each post are now gone, and the wall near the top is a bit mucked up (needs a small touch up plastering and paint touch up that I’m not going to bother with).  But it is a whole lot stronger feeling, and the new owner should be satisfied with the result. I’m assuming that they only wanted that one repaired.  The top part of the staircase banister now feels looser in comparison, but I didn’t used to notice that one, and don’t feel like tackling that project unless forced.  I’d like that to be a SOP for the new owner.

Posted in Home improvement | 1 Comment »