Cortex M4 hard fault finding root cause on LPC4078

Unfortunately, that technique doesn’t work reliably for the same reason that the built-in stack overflow cxheck doesn’t work reliably. Assume the stack is close to its end, and the caller task calls a function that does something like that:

SomeFn()
{
    unsigned char aBuf[20];
    ....
}

That function while executing may not touch all of the 20 bytes it set aside.

Also, you don’t need a static task; you can program the DWT on the fly with any dynamically determined trigger address at runtime.