Again and again in DB2 code, developers appear to have discovered a “clever” way to manage shared memory cleanup of memory that is protected by a mutex (what is called a latch internally in the DB2 codebase). The pattern is roughly of the following form:
mutex m ;
// ... other stuff.
void myStuffCleanup( myStuff * pSharedMem )
if ( pSharedMem->m.acquire() )
freeMemory( pSharedMem ) ;
The developer coding this, based on other state and knowledge of when the myStuffCleanup function is executed, knows that if they get to this point in the code, no new attempts to acquire the mutex will ever be made. However, before the memory can be freed, there needs to be some sort of guarentee that all the old users of the memory (and mutex) are gone.
For cleanup problems like this we appear to have a number of developers that have been clever enough to realize that the last thing any consumer of this memory will ever do is release the mutex. So, they code an acquire guard like the above, believing that if the mutex can be acquired at this point in the code, they can “safely” free the memory. However, it is actually unfortunately that the developer has been clever enough to figure out this easy way to handle the cleanup, because it is not safe.
First off, observe that unless the developer knows the internals of the mutex implementation, this isn’t a terribly safe thing to do. For example, the mutex could internally use something like a Unix semaphore or a Windows EVENT, so cleanup code may be required before the memory containing the mutex is freed.
However, in this case, the mutex implementation in question historically has never required any sort of cleanup (provided it wasn’t in use at the cleanup point). As it happens, we didn’t even historically have a cleanup method for this particular mutex implementation. We have one now, defined for consistency with some of our other mutex implementations which do require explicit cleanup, but it states that it is just for consistency and is a no-op. Reading that documentation (or bad assumptions) is probably what leads the developers to believe that they can free memory containing this mutex type even if it is held.
[An aside. Our mutex implementation Windows actually does use an EVENT to manage the wait for the myStuffCleanup() acquire caller, and that EVENT HANDLE will still be allocated, assigned to the mutex even after the mutex release. Our clever developer has gotten lucky because we happen to clean up that EVENT HANDLE in the acquire() call (presuming there really was no other use of the mutex).]
Despite having nothing to cleanup after this last man out acquire, this sort of pattern is neither correct nor safe. What the developer doesn’t know is what our release method does internally. It happens that the operation that logically releases the mutex, allowing it to be acquired, isn’t necessarily the last operation or access of the mutex memory performed by our release code.
Our release code has roughly the following form:
waitlistOperationsIfAny() ; // some platforms only.
releaseInternalMutexForWaiterManagement() ; // some platforms only.
After that markMutexFree() point in the code, there are a number of possible accesses to the mutex memory. If that “final” release() caller gets context switched out just after that point, and the memory freed before it resumes (or the SMP system is fast enough to allow the free to proceed concurrently while the mutex::release() code executes), then we will be in trouble when the release() code resumes.
Here’s an enumeration of the memory accesses to the mutex after the internal-“release” that makes it available for new acquire:
- waitlistOperationsIfAny(). On our Unix platforms for this mutex type we keep what we call a waitlist, one for all the write (exclusive) waiters, and one for all the read (shared) waiters for the mutex. Similarily in our Windows mutex implementation we have an EVENT HANDLE pointer in the mutex (although we don’t update that in release after the wakeup like we do on Unix). After we’ve released the mutex, we’ll wake up any waiters, and then store the new waitlist values in the mutex. In this scenerio we’ll be storing only a zero as the new waitlist value, because there’s either no waiters, or the only waiter should be the cleanup caller, and we’ll have just woken up that “last waiter”. We happen to avoid a requirement for actually storing the waitlist separately for our 64-bit Unix implementation, but we do still unfortunately ship a 32-bit server version of DB2 (a developer only version that runs on Linux ia32). Long story made short, if the memory is recycled after acquire and that happens before these waitlist stores, these zero stores could be corrupting memory (or attempting to access memory that could be unmapped).
- releaseInternalMutexForWaiterManagement(). Our Windows and 32-bit Unix implementations currently have an internal mutex (in this case, a plain old spinlock, not a reader/writer mutex) that we use for storing either our waitlist pointer or an pointer to our EVENT handles for the mutex. This internal mutex free will result in a store (i.e a store zero), plus some memory barrier instructions if appropriate. Again, if the mutex memory has been free()’d, this would be bad.
- logMutexStateToTraceBufferIfTraceIsOn(). This is problem determination and performance related tracing. It may or may not be enabled at runtime, but should be alllowed to look at the mutex state if executed. If the memory has been free()’d and unmapped then this trace code would look at the mutex memory, and could trap.
- Our final validate() call. This regularly finds code that uses this free-if-I-can-acquire pattern, since our mutex is fairly stateful, and good for catching this pattern. In particular, the memset to 0xDD that occurs in our free() code will set inappropriate bits that validate() complains about.
There have been developers that have objected to our final validate() call, saying that we shouldn’t be doing it (or any other operation on the mutex) after the logical release point. They’d like this to be our bug, not theirs. To win this working-as-designed argument, we essentially have to argue that what has been done is free memory that has been passed to a function that is still operating on it.
I’d be curious to know if this pattern of acquire mutex and then destroy memory containing it is common in other codebases (using other mutex types). If this is a common pattern, I wonder how many other types of mutex implementations can tolerate a free of the mutex memory while the release code for an uncontended mutex is still executing? Our pure spinlock implementation happens to be able to tolerate this, and does no further access to the mutex memory after the logical release. However, our reader-writer mutex (the mutex type in question here), cannot tolerate this sort of free while in use on some platforms … but we abort unconditionally if detected on any.