Cortex M4 hard fault finding root cause on LPC4078

Unless, of course, the offset into the stack where we check for stack overflow does not harbor any of the classes’ variables but opaque class elements such as parts of the vtable that you don’t have any influence on and that the CPP runtime code may or may not touch… but let’s leave it at that.

Good night!

There is also:

void SomeFn(void)
{
    CustomCppObject l_TempObject = {};
    ...
}

which would cover a lot of cases. The crude way, of course, is to get a bigger hammer:

memset(&l_TempObject, 0, sizeof l_TempObject);

No, that wouldn’t work, it would leave the object dysfunctional. You can’t overwrite the vtable and of other things the run time system needs.

It is only stretches of the object that aren’t initiated that can cause problems, so the opaque parts are unlikely to be those. It basically HAS to be arrays of primitive types (or a number of variables) which are just default initialized (which does nothing). If we ACTUALLY initialize all parts to something, then we will detect the overflow.

Now, what would really be better is if more processors had a stack limit register that generated a trap if the stack pointer passed the limit, but this then would need some careful code to handle the switch from one tasks stack to the other, as either there needs to be an instruction to load a new stack pointer and limit at the same time, or a way to disable the check for the change.

I might be out of touch, but I don’t see polymorphism used a lot in embedded. But, even for a closed library, you should be able to tell from the header file whether any virtual methods are used.

I agree. +1 for Cortex-M33.

Most of my customers employ C++ to some degree (I’m talking embedded industrial only). In some applications it’s a nice to have, in some a true life saver. Anyways, you don’t lose a whole lot by using it; as long as you don’t use SEH and a few other quirky elements, the footprint overhead of C++ is negligable, you get almost full backward compatibility (with the notable exception of stronger type checking, which is really an advantage) and superior encapsulation mechanisms.

2 Likes

The key to catching that one is the contractor needs to initialize every member of the class, and not let any fundamental type elements just default initialize (which just leaves them unchanged). It is letting uninitiated stuff get on the stack that derails the test.

Richard:

What you describe is good practice but doesn’t have anything to do with the “untouched-locations-in-the-stack-that-make-stack-overwrite-not-always-detectable” problem. C++ doesn’t let you make any assumptions about the object layout, so you don’t have any idea what members end up where and whether some runtime idiosyncracy puts uninitialized invisible members right into the 4 words that the FreeRTOS stack overflow engine looks for.

You are mixing up issues. Again, I agree with David that runtime stack overflow tests remain undetected many times, and the right answer to that is NOT to bloat the code with filling up stack space to the last bit just to make the engine work more reliably. Even though (again) you are right that it is good practice to initialize C++ member variables explicitly.

Remember that the stack overflow detection mechanism is a debugging aid and does not gain you anything in the field, so any code you add to make that work more reliably will be footprint added without benefits for the shipping software.

1 Like

The key is to just not have any untouched locations (other than padding) in the class. If you make ALL your classes value initialize all their members you are ok. Yes, this says you need to be careful about using structures from the library which might not follow this rule.

You can condition this fill code to only be enabled if stack checking is on.

I have found that by following this, EVERY stack overflow is found either by the stack checking code, or by a system crash where I can find the guilt task by looking at the current task (who WILL be the guilty party).

True, but then you run the risk of stumbling into the Heisernberg paradoxon, meaning that your debug and release code bases drift apart from each other so much that they expose runtime behavior so different from each other that errors showing up in one code base don’t show in the other.

The system crash will eventually occur either way, and the additional initalization code will not help you at all in that case. Basically this statement agrees with mine and David’s observation that the stack overflow mechanism is unreliable. It contradicts what you wrote earlier that the object initalization would make it more reliable.

First point, If you don’t want the splitting, don’t condition it, but spend the cycles. Designer choice. As I said, compared to the cost of later filling to buffer, the cost is minimal.

Second Point, the key is you can KNOW where to look for the problem. As I said, doing this will find stack overflows 100% of the time in the time slot they occurred, rather than ‘some point later’. If you let some overflows slip by to crash later, then you can’t tell as easy which task has the problem.

If you had bothered to read my earlier examples in the thread, you would have seen that there are examples in which the cost can be significant.

Again, it is good practice to avoid uninitializes variables, bur you propose doing the right thing for the wrong reasons. I’ve come across many situations where program memory is so scarce that we had to battle for single lines of code; in those scenarios, there is no place for the luxury to add code simply in order to catch a scenario under debug conditions with one mechanism that you can trace as well with a non invasive solution.

I still disagree, but most likely there is no point in continuing the debate. I draw from 15 years+ experience of writing firmware with and for FreeRTOS, and doubtlessly you have a lot of experience as well. Just like you (I’m sure), I’ve gone through many many hours of debugging and was able to compare debugging techniques, finding out about the pros and cons of each. By now I can locate stack overflows (which still account for most cases of runtime problems, as the many instances in which Hartmut’s suggestions succeed testify) reliably. The stack signature helps a lot here, as they make it visually straightforward to scan the top of a stack for it and see how close the “no signature watermark” moves towards it, regardless of whether there are untouched holes in it or not.

That doesn’t mean my solution is perfect (it works well for me but that doesn’t mean it must work for everyone). The main benefit I found for using it is that it does not cause unneccessary oberhead and minimizes the Heisenberg effect.

1 Like