Stack Overflow Detection - Method 2

eadadi · May 23, 2023, 4:10pm

Hi all, I need to have small deep dive in the title’s topic

This is the description in the document:

When a task is first created its stack is filled with a known value. When swapping a task out of the Running state the RTOS kernel can check the last 16 bytes within the valid stack range to ensure that these known values have not been overwritten by the task or interrupt activity. The stack overflow hook function is called should any of these 16 bytes not remain at their initial value.

This method is less efficient than method one, but still fairly fast. It is very likely to catch stack overflows but is still not guaranteed to catch all overflows.

And this is some of the code for when the stacks growth from high addresses to low addresses:

#if ...
#define ...
const uint32_t * const pulStack = (uint32_t *) pxCurrentTCB->pxStack;
... ulCheckValue = 0xa5a5a5a5;
if ( puLStack[0] != ulCheckValue ||
      puLStack[1]!=... ||
      puLStack[2]!=..
      puLStack[3]!=..
) StackOverFlowHook(...)
}

The validation of overflow in this method is by checking the last 4 words within the valid stack range. I would like to discuss on this issue:

We are assuming (in contrary to the other stack overflow check method that compares pointers) that 4 bytes of the allocated space in task are dedicated to the marker (as I have seen, in the dynamic task creation there are no extra 4 bytes allocated, and surely similarly in the static task creation).
I have also seen that the stack_high_water_mark which should indicate the available space in stack start counting from pxStack which is the pointer that point to the start of stack === i.e. points to the marker, so it’s reasonable to say that the high water mark is not exactly accurate, since it is possible that we don’t have space in task stack (since the only left space is the overflow mark) but the high water mark will indicate that we still have some space available, so the next write to the stack will cause stack overflow

What do you think?

Regards,
E

richard-damon · May 23, 2023, 5:16pm

Small point, FreeRTOS uses 4 32-bit WORDS, not 4 Bytes for the stack overflow checking.

Yes, with overflow checking if you ever get to less than 32 bytes of the allocated stack unused, you get an overflow error, so the size and high water mark aren’t accurate for “usable” stack, just allocated stack.

In my opinion, if you are trying to cut thinns that close, you should probably understand that offset anyway. My guess is that changing the code to adjust the values base on that form of overflow checking would be confusing, especially for statically created tasks, should the size listed by the size given from the stack array, or the usable size of the stack (and should that difference be an exposed value in the API).

aggarg · May 24, 2023, 5:05am

Just to add to what @richard-damon said, stack overflow checking is a debug support to help the application writer to catch the stack overflows and tune the stack sizes. Once you have tuned your stack sizes, you should turn off the stack overflow checking in the production code.

eadadi · May 28, 2023, 9:18am

Hi,

Yea, I intended to write words and not bytes there. Thank you for your answers.

@aggarg
Can we clarify a bit your intention in “turning of stack overflow checking in production”?
How is turning of this feature on production is profitable in your view? Do you suggest on cases where memory amount in production is different (lesser) than it is before it? Because o.w. for static memory usage that managed to have this feature on in pre-production, it’s not that there is direct profit in terms of available memory.
Moreover, don’t you think this feature might be useful in detecting some of the cases when one task’s stack corrupts other task’s stack?

b.t.w apologies for late reply

richard-damon · May 28, 2023, 10:46am

I think the main reason it is suggested to remove this sort of test in production is that doing it takes time, so removing it can make the system a bit more performant.

There is no requirement to do so

aggarg · May 29, 2023, 5:08am

As @richard-damon mentioned, it is only for performance reasons.

You can keep it if you are okay to pay the performance price.