Cortex M4 hard fault finding root cause on LPC4078

RAc · March 19, 2021, 8:14pm

Unless, of course, the offset into the stack where we check for stack overflow does not harbor any of the classes’ variables but opaque class elements such as parts of the vtable that you don’t have any influence on and that the CPP runtime code may or may not touch… but let’s leave it at that.

Good night!

carlk3 · March 19, 2021, 8:30pm

There is also:

void SomeFn(void)
{
    CustomCppObject l_TempObject = {};
    ...
}

which would cover a lot of cases. The crude way, of course, is to get a bigger hammer:

memset(&l_TempObject, 0, sizeof l_TempObject);

RAc · March 19, 2021, 8:47pm

No, that wouldn’t work, it would leave the object dysfunctional. You can’t overwrite the vtable and of other things the run time system needs.

richard-damon · March 19, 2021, 9:35pm

It is only stretches of the object that aren’t initiated that can cause problems, so the opaque parts are unlikely to be those. It basically HAS to be arrays of primitive types (or a number of variables) which are just default initialized (which does nothing). If we ACTUALLY initialize all parts to something, then we will detect the overflow.

Now, what would really be better is if more processors had a stack limit register that generated a trap if the stack pointer passed the limit, but this then would need some careful code to handle the switch from one tasks stack to the other, as either there needs to be an instruction to load a new stack pointer and limit at the same time, or a way to disable the check for the change.

carlk3 · March 19, 2021, 9:37pm

I might be out of touch, but I don’t see polymorphism used a lot in embedded. But, even for a closed library, you should be able to tell from the header file whether any virtual methods are used.

carlk3 · March 19, 2021, 10:09pm

I agree. +1 for Cortex-M33.

RAc · March 20, 2021, 10:55am

Most of my customers employ C++ to some degree (I’m talking embedded industrial only). In some applications it’s a nice to have, in some a true life saver. Anyways, you don’t lose a whole lot by using it; as long as you don’t use SEH and a few other quirky elements, the footprint overhead of C++ is negligable, you get almost full backward compatibility (with the notable exception of stronger type checking, which is really an advantage) and superior encapsulation mechanisms.

richard-damon · May 8, 2021, 12:03am

RAc:

Also, can you think of a technique to catch the following?
void SomeFn(void)
{
    CustomCppObject l_TempObject;
    ...
}
Yet again, this is leading off the original issue, and it’s about coding technique preferences where there is no ultimate good or bad.

The key to catching that one is the contractor needs to initialize every member of the class, and not let any fundamental type elements just default initialize (which just leaves them unchanged). It is letting uninitiated stuff get on the stack that derails the test.

RAc · May 8, 2021, 9:24am

richard-damon:

RAc:
Also, can you think of a technique to catch the following?
void SomeFn(void)
{
CustomCppObject l_TempObject;
…
}
Yet again, this is leading off the original issue, and it’s about coding technique preferences where there is no ultimate good or bad.
The key to catching that one is the contractor needs to initialize every member of the class, and not let any fundamental type elements just default initialize (which just leaves them unchanged). It is letting uninitiated stuff get on the stack that derails the test.

Richard:

What you describe is good practice but doesn’t have anything to do with the “untouched-locations-in-the-stack-that-make-stack-overwrite-not-always-detectable” problem. C++ doesn’t let you make any assumptions about the object layout, so you don’t have any idea what members end up where and whether some runtime idiosyncracy puts uninitialized invisible members right into the 4 words that the FreeRTOS stack overflow engine looks for.

You are mixing up issues. Again, I agree with David that runtime stack overflow tests remain undetected many times, and the right answer to that is NOT to bloat the code with filling up stack space to the last bit just to make the engine work more reliably. Even though (again) you are right that it is good practice to initialize C++ member variables explicitly.

Remember that the stack overflow detection mechanism is a debugging aid and does not gain you anything in the field, so any code you add to make that work more reliably will be footprint added without benefits for the shipping software.

richard-damon · May 8, 2021, 6:27pm

RAc:

Richard:

What you describe is good practice but doesn’t have anything to do with the “untouched-locations-in-the-stack-that-make-stack-overwrite-not-always-detectable” problem. C++ doesn’t let you make any assumptions about the object layout, so you don’t have any idea what members end up where and whether some runtime idiosyncracy puts uninitialized invisible members right into the 4 words that the FreeRTOS stack overflow engine looks for.

You are mixing up issues. Again, I agree with David that runtime stack overflow tests remain undetected many times, and the right answer to that is NOT to bloat the code with filling up stack space to the last bit just to make the engine work more reliably. Even though (again) you are right that it is good practice to initialize C++ member variables explicitly.

Remember that the stack overflow detection mechanism is a debugging aid and does not gain you anything in the field, so any code you add to make that work more reliably will be footprint added without benefits for the shipping software.

The key is to just not have any untouched locations (other than padding) in the class. If you make ALL your classes value initialize all their members you are ok. Yes, this says you need to be careful about using structures from the library which might not follow this rule.

You can condition this fill code to only be enabled if stack checking is on.

I have found that by following this, EVERY stack overflow is found either by the stack checking code, or by a system crash where I can find the guilt task by looking at the current task (who WILL be the guilty party).

RAc · May 9, 2021, 8:20am

True, but then you run the risk of stumbling into the Heisernberg paradoxon, meaning that your debug and release code bases drift apart from each other so much that they expose runtime behavior so different from each other that errors showing up in one code base don’t show in the other.

The system crash will eventually occur either way, and the additional initalization code will not help you at all in that case. Basically this statement agrees with mine and David’s observation that the stack overflow mechanism is unreliable. It contradicts what you wrote earlier that the object initalization would make it more reliable.

richard-damon · May 9, 2021, 10:54am

First point, If you don’t want the splitting, don’t condition it, but spend the cycles. Designer choice. As I said, compared to the cost of later filling to buffer, the cost is minimal.

Second Point, the key is you can KNOW where to look for the problem. As I said, doing this will find stack overflows 100% of the time in the time slot they occurred, rather than ‘some point later’. If you let some overflows slip by to crash later, then you can’t tell as easy which task has the problem.

RAc · May 9, 2021, 11:38am

If you had bothered to read my earlier examples in the thread, you would have seen that there are examples in which the cost can be significant.

Again, it is good practice to avoid uninitializes variables, bur you propose doing the right thing for the wrong reasons. I’ve come across many situations where program memory is so scarce that we had to battle for single lines of code; in those scenarios, there is no place for the luxury to add code simply in order to catch a scenario under debug conditions with one mechanism that you can trace as well with a non invasive solution.

I still disagree, but most likely there is no point in continuing the debate. I draw from 15 years+ experience of writing firmware with and for FreeRTOS, and doubtlessly you have a lot of experience as well. Just like you (I’m sure), I’ve gone through many many hours of debugging and was able to compare debugging techniques, finding out about the pros and cons of each. By now I can locate stack overflows (which still account for most cases of runtime problems, as the many instances in which Hartmut’s suggestions succeed testify) reliably. The stack signature helps a lot here, as they make it visually straightforward to scan the top of a stack for it and see how close the “no signature watermark” moves towards it, regardless of whether there are untouched holes in it or not.

That doesn’t mean my solution is perfect (it works well for me but that doesn’t mean it must work for everyone). The main benefit I found for using it is that it does not cause unneccessary oberhead and minimizes the Heisenberg effect.

Topic		Replies	Views
How to catch code that caused the hard fault Kernel	21	2821	July 12, 2017
Intermittent CM4 FAULT! Bus Fault! in floating point app Kernel	46	4338	July 2, 2020
FreeRTOS stack corruption on STM32F4 with gcc Kernel	11	1203	February 25, 2013
ARM Cortex M7 fault exception and stack corruption Kernel debug	15	3865	July 4, 2023
CortexM3/PSoC-5LP : Trouble identifying the task which caused an MPU exception Kernel	3	364	January 10, 2014

Cortex M4 hard fault finding root cause on LPC4078

Related topics