Using the ARM MPU to detect stack overflow

dc42 · November 1, 2021, 2:01pm

One of the issues we face is determining how much stack to allocate to each task, in a system in which RAM is precious. We have configCHECK_FOR_STACK_OVERFLOW set to 2, however not all stack overflows get detected because our code often allocates buffers of up to 256 bytes on the stack, which for performance reasons are not filled with data when they are created; so the stack may overflow without overwriting the 16-byte canary.

As most of the processors we use are ARM with MPU, we are wondering about using the MPU to offer improved stack protection. So I am considering using the MPU to protect the area of memory immediately below the stack. Preferably, we would protect 256 bytes of memory, without wasting 256 bytes of RAM per task. So I am thinking of allocating the task stacks consecutively, so that when a task is running, the protected memory belongs to the stack of another task. This assumes that a task never has reason to access the stack of another task, and that ISRs never need to access any task stack (which I believe is true in our application). It would mean that all task stacks will need to be a multiple of 256 bytes long and allocated on 256-byte boundaries.

FreeRTOS would need to be modified to update that MPU region descriptor when it switches to a task.

Have I missed something? Has anyone done something similar before?

RAc · November 1, 2021, 2:14pm

Hi David,

I’ve been thinking about this myself, but abandoned the plan.

Basically, the issue is that because typically Embedded Systems are highly interdependent collections of tasks, it doesn’t help robustness if you protect individual tasks’s stacks - whichever code tries to access the memory illegally in the first place has the problem, so even if the victim task could be protected, the culprit task will fault, which leaves the entire system dysfunctional. To try to recover an individual task in the access violation fault handler would require quite some AI. The only possible benefit would be debugging information sampled at cause time.

I’d think that the coding effort needed to render this would not justify the possible benefits.

But I’ll happily be proven wrong!

richard-damon · November 1, 2021, 2:28pm

FreeRTOS can run tasks in a restricted mode (for ports with a MPU). The issue is that MPUs are fairly coarse grained in their protection ability, which sort of defeats this sort of use for memory limited environments.

What can be done for your case is first modify the stack overflow checks so level 2 also does the level 1 check, and then perhaps add an explicit check in leaf routines by doing a taskYIELD

Another method to help would be to statically create the tasks, and intentionally place the stacks after places that help you detect ‘softly’ that overflow has occurred.

dc42 · November 1, 2021, 2:47pm

@RAc, thanks for responding. To be clear, I am not looking to protect tasks from each other. I am merely looking for a way to detect the type of stack overflow that the current scheme does not detect. I don’t need to recover from a stack overflow, I just need to know that it happened, and by how much so that I can log it and shut down. With luck this should mean that a greater proportion of stack overflows are discovered during beta testing.

@richard-damon, thanks too. I do create the tasks statically. There is never a “good” area of memory for the stack to overflow into and we can’t afford the RAM needed to increase all the task stacks by 256 bytes or more. That’s why I was looking for a way to detect if a stack tries to overflow into a region of RAM that is being used for something, but that the task has no reason to write to. The ARM MPU supports regions of 32 bytes and higher powers of 2, so protecting just 256 or perhaps 512 bytes is entirely possible.

hs2 · November 1, 2021, 3:13pm

I think without HW support there is no other automatic way than the existing stack checking facilities. With Cortex-M33 and its stack limit registers this has been improved.
I even stepped through stack-wise worst case code paths to verify stack usage to find a feasible minimum stack size.
On the other hand it could be possible to setup an adjacent MPU region as red-zone.
But if you’re already short in RAM it might be difficult to reserve enough memory for this purpose.

richard-damon · November 1, 2021, 3:16pm

Look at the restricted tasks API as that runs the tasks with the MPU enabled. Each task is given a region map of what it is allowed to access, so if you make sure that you don’t allocate something the task needs to write just before it stack, you should be able to set them up. Depending on what processor you are using, you will have different restrictions on size and alignment of the regions, so doing this might make you have to control how the linker puts everything together.

rtel · November 1, 2021, 3:20pm

The MPU port for the ARMv7M (Cortex-M3/4/7) uses the MPU to detect stack overflow - but as Richard-D already pointed out, the memory region size and alignment requirements will most likely mean you loose more RAM than you save.

aggarg · November 1, 2021, 6:00pm

The portion below the stack is already protected as the task is only granted access to its stack - this is assuming that the application code does not explicitly grant access to that memory using user-definable MPU regions: FreeRTOS-MPU, a FreeRTOS with memory protection support. Do you see a case where an overflow is not getting detected?

Even though the scheme you mentioned should not be needed as I explained above, it is still possible to implement using user-definable MPU regions and should not need any change in FreeRTOS: FreeRTOS-MPU, a FreeRTOS with memory protection support

Thanks.

richard-damon · November 1, 2021, 6:18pm

If the current application is not built under the assumption that these tasks are as restricted as the MPU version defaults, they may want to just add allow regions to most of memory, and let FreeRTOS provide it’s built in region for the stack, and leave the block just below it restricted to catch the overflow.

aggarg · November 1, 2021, 6:47pm

Thank you for the clarification - you are right, if FreeRTOS MPU port is not being used, then the application would need to program the MPU. In that case, there are two options:

Use a static MPU configuration which would result in memory being wasted (as described by @dc42 already).
Dynamic MPU configuration which would need support in FreeRTOS to re-program the MPU on every context switch.

If a dynamic configuration is needed, the following can be a faster path:

Use the FreeRTOS MPU port for your architecture.
Create all the tasks as privileged by OR’ing the task priority with portPRIVILEGE_BIT : FreeRTOS-Kernel/task.h at main · FreeRTOS/FreeRTOS-Kernel · GitHub. This is not the best approach in terms of isolation but it ensures that the existing application works with the MPU port without any modifications.
Then place the task stacks consecutively as you mentioned above and use a user-definable MPU region to make the memory below the task stack inaccessible.

Thanks.

rgarnett · December 3, 2021, 8:37am

Hi David,

I like that idea. I use the MPU in M7’s to protect the ITCM ram from writes and to control where I want to cache and where not. It wouldn’t be hard to use the MPU to set boundaries for stacks, although determining the boundary may not always be simple. I guess the MPU is a more sophisticated protection than a stack limit register.

Best regards
Rob

rgarnett · December 3, 2021, 8:44am

I would think that if a task wrote into a memory section “illegally” it would be better to have a hard fault than let the task using the memory illegally written into, from doing something unsafe based on the corrupt data.

Having a hard fault rather than the system soldiering on with bad data in a lot of cases would have a better outcome.

Stacks are a mechanism to make a small memory look big, but they come with a lot of risks which are not always mitigated.