This was a somewhat goofy project. I was assigned to continue work on a previous undergrad thesis project, where a matrix library had been written for loop transformations.
This was to be plugged into an optimizing compiler that took Fortran code and produced different fortran code. The idea was that the compiler could be guided into blocking large memory intensive operations into ones that used the cache better, but the whole approach was very academic, requiring specially contrived code and compiler interaction.
The one thing that I liked about this was that, in the end, it did work nicely. I measured a 12x times improvement in the matrix multiplier when the right blocking size was picked. Picking that blocking size required experiment, so practical application of these methods were a long way off.