Posix port and small stack size, the stack used is not the one in the TCB

rowbearto · January 5, 2021, 11:15pm

I’m using the official posix port and I am encountering an issue when the task stack size is small.

In my application the task stack size is 2048 bytes. This is not a problem on ARM target. But it may be a potential problem when running with the POSIX port.

In portable/ThirdParty/GCC/Posix/port.c, pxPortInitialiseStack(), the posix port has this code to set the stack location and size with pthreads:

pthread_attr_setstack( &xThreadAttributes, pxEndOfStack, ulStackSize );

I have checked the return value of this call with a small stack size of 2048 bytes and it fails, it returns 22 which is EINVAL.

According to the docs of pthread_attr_setstack it can fail with EINVAL if the stack size is too small:

EINVAL stacksize is less than PTHREAD_STACK_MIN (16384) bytes.
              On some systems, this error may also occur if stackaddr or
              stackaddr + stacksize is not suitably aligned.

After this failure when I’m in the created task with the debugger I check the stack pointer (‘esp’ register) and it is outside of the stack area that the TCB thinks it is using. So pthreads used its own stack instead of the one in the TCB. I didn’t notice any other failures my application seems to run even with a discrepancy between the stack addresses in the TCB and the real stack. But maybe there will be issues?

If I change my stack size to a little more than 16384 then the call to pthread_attr_setstack() is successful and then inside the task the esp register is in the expected range of the stack addresses in the TCB.

I first looked into this because I was getting stack overflows on target that I didn’t see in posix. I would have liked to debug the stack overflows in posix before I went to target. I suppose that since the minimal stack size is much larger in the posix port I won’t be able to detect this kind of stack overflow there? Maybe there is a way for the posix port to address this?

Luckily I am able to easily set the stack size using #ifdef depending on if it is target build or posix build. So for my posix build I can set the stack size to the higher value of PTHREAD_STACK_MIN. However would be nice if I didn’t have to do that and the posix port could force the stack size of PTHREAD_STACK_MIN if requested stack is too small. But I don’t see an easy way to do this?

Or maybe the posix port should adjust the stack pointers in the TCB after the thread is started to point to the “real stack” in case the earlier call to pthread_attr_setstack() failed?

Perhaps the documentation of the posix port should mention this limitation and advise users to not use small stacks in the posix port and define larger stacks instead?

EDIT: Just realized if I make my stack sizes the minimum of PTHREAD_STACK_MIN for posix then it also requires me to allocate much more space in my FreeRTOS heap (configTOTAL_HEAP_SIZE) due to the stacks getting pvPortMalloc() off the heap. So I’m thinking maybe it would be better to keep the original stack size allocated in the heap and then if the pthread_attr_setstack() failed, adjust the values of the stack pointers stored in the TCB after the thread starts? This way my posix port can still properly check if I allocate too much in the heap. If this is not done then is there any danger if the real stack doesn’t correspond to the one pointed to in the TCB?

cc: @cobusve, @alfred2g

gedeonag · January 6, 2021, 12:10am

Hi Rob,
Good Catch!
I think the best/smoothest way is to check the passed stack size from FreeRTOS and if it is below the pthread minimum stack size we allocate that much, and we just ignore the extra allocated memory?
Memory leaks will still be detected I believe as the perceived size is still small in FreeRTOS

ESP register is not the same when it fails because ‘pxEndOfStack’ is not used in that case, a random one supplied by pthread is instead

“But maybe there will be issues?” most probably at some point or some other machine issues will show up

“Just realized if I make my stack sizes the minimum of PTHREAD_STACK_MIN for posix then it also requires me to allocate much more space in my FreeRTOS heap (configTOTAL_HEAP_SIZE) due to the stacks getting pvPortMalloc() off the heap” : possibly if you are sending the extra size from FreeRTOS itself, it should not be a problem if the posix port handled that internally (we use normal free and malloc internally)

I will work on a patch for that, do you have other ideas or expect a problem with that approach?

Thanks,
Alfred

rowbearto · January 6, 2021, 1:35am

Hi Alfred:

I think that is a great idea to malloc() the stack if the size is too small!

I think it would then be required to overwrite the TCB structure values with the new pxEndOfStack, pxStack and pxTopOfStack based on the malloc()? I know that the pxPortInitialiseStack() returns pxTopOfStack that could be the result of the malloc, but not sure how to overwrite the pxEndOfStack and pxStack?

And then what about task destruction? At task destruct time you’d have to restore the original value of pxStack so that it can be used in the call to vPortFree() inside of prvDeleteTCB()? Probably good to restore the pxEndOfStack and pxTopOfStack as well at destruct time?

A big bonus for me would be if I could detect stack overflows that would occur on my target using the taskCHECK_FOR_STACK_OVERFLOW() macro that is checked between task switches. That is what led me here in the first place. taskCHECK_FOR_STACK_OVERFLOW() checks if the value at pxStack is still equal to 0xa5a5a5a5. Ideally I would want this pxStack to be located at the original stack size bytes less than the value the esp register has before the task entry point (call to pxThread->pxCode() in prvWaitForStart()). I suspect that the pthread implementation will put stuff on the stack that I don’t care about so esp may be significantly less than pxEndOfStack when my task entry function is called, so that making pxStack simply the value of pxEndOfStack less the stack size may not be so good and too small. esp register is only on x86, I suppose it would be a different on x86_64 or ARM/posix architectures so could be hard to make that portable?

EDIT: I see that for gcc and clang you could retrieve the stack pointer just before the task entry point by using __builtin_frame_address . That would work on different architectures but maybe not all compilers? gcc is good enough for me though if a compile time option could be added to use this for setting pxStack at task entry point time.

EDIT2: Maybe instead of checking if the stack size is too small, malloc() should always be used to allocate the stack to a size of PTHREAD_STACK_MIN+desired_stack_size? This is because pthread implementation probably puts its own stuff on the stack and we don’t want that to take away from our stack?

Thanks,
Rob

rtel · January 6, 2021, 2:53am

I’m not familiar with the Linux/POSIX port at all - but if it operates in any way similar to the Windows port, then the “stack” allocated by xTaskCreate() is not used as a stack at all as the Windows thread creates its own stack. Instead the stack allocated by FreeRTOS is just used to hold a descriptor of a fixed size in all cases.

Note sure how you could use the POSIX port to debug a stack overflow that occurs on a real target as the stack usage will be very different on a real target?

gedeonag · January 6, 2021, 3:37am

@rtel The posix port uses the stack passed by FreeRTOS, which is previously malloced in tasks.c which is different than the windows port in that case

@rowbearto I will read a bit how tasks.c allocates the stack to pass it to posix port, as we are using that stack to pass it to pthread_attr_setstack, by allocating a new one, I need to find way to tell FreeRTOS to use it instead of its own… which is smaller (possibly something like the windows port)
We could use realloc but it doesn’t guarantee the same pointer is returned when we want a larger piece… i’ll do some thinking/reading and get back to you

gedeonag · January 6, 2021, 3:48am

if pthread does’t put stuff on the stack the port.c implementation definitely does, by ways of doing context switches and waiting for signals (function calls frames)… but they should be unwinded when control is back to the application thread
so I think having a stack which is too tight might cause some issues

rowbearto · January 6, 2021, 3:50am

@gedeonag: Thanks!

@rtel: In the windows port using the stack from the Windows thread, does it mean that the stack pointers in the TCB, mainly pxStack and pxEndOfStack are not pointing to the Windows stack?

I also think that the stack usage in posix would similar to a real target so similar stack overflow could be detected. Same functions are getting called, same local variables are used. There will be some differences such as what registers are pushed to the stack upon function calls, but I think this will be a minor difference, and the function call nesting is usually not very deep.

gedeonag · January 6, 2021, 3:53am

Thats why we decided to go with that approach, the original submitter did it like the windows port, and we suggested to use the same stack

rowbearto · January 6, 2021, 4:10am

@gedeonag: In my study of the tasks.c code looks like the TCB struct is defined in tasks.c. So the only way to modify it (such as putting a value into pxStack) would be to add a new function into tasks.c itself.

My proposal would be that in prvWaitForStart() in port.c, just before the call to the task entry point, we overwrite the pxStack in the current TCB (by calling the new function in tasks.c). Before overwriting pxStack we save the old value. Then after the task returns from its entry point restore the old pxStack value that we saved (so it can be used for the vPortFree() call later). This approach does have issues if it is possible for FreeRTOS to terminate a task without having the task exit its entry point, I don’t know if that is possible or not? In that case the pxStack would not get restored with its old value. In my application my tasks don’t end so its not an issue for me, but should be explored in the official port.

And I think that pxStack should be written with the current stack pointer (just before task entry point) less the stack size (so the stack size needs to end up here somehow so it can be used). With gcc/clang the stack pointer can be obtained using __builtin_frame_address (perhaps create a #define in portmacro.h to wrap it?). Thats fine for me because I use both gcc and clang and not other compilers. For other compilers could use pxEndOfStack instead of the current stack pointer.

gedeonag · January 6, 2021, 5:53am

The windows port works almost exactly as if the pthread_attr_setstack failed and pthread is using its own thread disregarding the FreeRTOS allocated stack.
stack overflow techniques obviously will not work in that case, but no problems with runtime should happen as you said earlier.

I am not sure stack overflows can be accurately detected as pointers, including function return addresses on the stack have different sizes across architectures ranging from 16 to 64 bytes, also the way architectures ABIs pass parameters to functions is different, you will always need a bigger stack with linux.

It could be that your local solution is good enough and we need to document this limitation in the docs

rowbearto · January 6, 2021, 2:17pm

In my case my posix has 32bit pointers just like my target, ARM cortex M7. By design I use an x86/32bit compiler on POSIX to be as similar as possible. I think anyone who is serious about having a posix environment similar to target would make the conscious design decision to have same pointer sizes.

Yes there are differences in ABI but I think this will be minor as I don’t expect going more than 20 functions deep. I also think it is small. I believe on my target, ARM cortex M7 that 32bytes are put on the stack each function call, probably similar sizes for x86/posix?

Having a local solution can be a pain for me because every time FreeRTOS and/or the posix port update I would have to re-merge my solution in.

It seems a little inconsistent that the official FreeRTOS posix port can detect stack overflows for stacks > 16KByte in size (since pthread_attr_setstack won’t fail there) but not for smaller stacks.

cc: @gedeonag

gedeonag · January 6, 2021, 7:12pm

Have you considered/tried using our Qemu port for a very close simulation purposes? (we have an MPS2 port for it)

rowbearto · January 6, 2021, 7:35pm

I don’t think Qemu would work for us, or if it possibly could it would require lots of work to redo our infrastructure. In posix mode we emulate our hardware read/writes by calling an x86 library which communicates over sockets to actual hardware, FPGA prototypes, hardware simulators and/or hardware emulators. We’d have to get that x86 library running in Qemu as well, could be either infeasible or lots of work.

Additionally in our real product we have another processor that communicates with this target via PCIexpress. We simulate this in posix by having that processor’s x86 simulation do socket communication to simulate read/write PCIe to memory and also trigger simulated interrupts (these interrupts get processed during tick handler in our posix sim).

We have a big working infrastructure around this. Not sure how well it could be done with Qemu and may require lots of development to get working there. Our many developers like to compile on Linux and simply run it there right now, it is very convenient.

gedeonag · January 6, 2021, 7:50pm

You will still have that option with Qemu, everything can be done one linux (other platforms as well)
But yeah as you said, if you already invested lots of time and resource on linux, porting the whole thing to Qemu would not make much sense.

I will go back to some thinking on how to make a good solution without modifying tasks.c, bearing in mind, that we could have another set of problems with xTaskCreateStatic where a preallocated memory(stack, or dynamic) is passed by the caller, which we should not mess with, especially calling some malloc family function on

rowbearto · January 6, 2021, 8:05pm

@gedeonag: I see a “messy” way to not modify tasks.c, in FreeRTOS.h there is the “struct xSTATIC_TCB” struct definition which exposes the TCB structure, although it clearly recommends not using this. But it also says it is guaranteed to match.

As you said there are more complications though that you need to look into, wanted to put this out there.

I’m really glad to hear that if pthread uses its own stack different than what FreeRTOS thinks the stack is that there shouldn’t be runtime issues and the windows port uses that. My simplest path is just to leave things as is and live without detecting stack overflows in Posix. But stack overflow detection in posix would be really nice.

gedeonag · January 6, 2021, 10:19pm

it matches in size and alignment, but strict aliasing rules might cause issues when using 2 different types for the same memory location.

gedeonag · January 7, 2021, 7:03pm

I had a thought to detect stack overflows, but you have to run the process under gdb
you can set a watchpoint at the stack location of your original size as opposed to the new forced size of PTHREAD_STACK_MIN, whenever that watchpoint is hit, it means you have a stack overflow. The watchpoint list could be saved and restored when a new session is on.

The next real thing I could think of it to modify tasks.c which is a bit annoying, but this file almost never changes, and when it does changes are usually tiny and should not cause problems with any merging tool.

rowbearto · January 7, 2021, 7:12pm

Most of our developers don’t run under gdb they only do that if they are debugging an issue so that wouldn’t really help.

What changes to tasks.c would you make? Would it be so difficult to merge these changes into the main branch? Even if it can’t be merged into main branch I’m curious as to how you would go about this?

EDIT: Could it be possible that the changes to tasks.c could all go at the end so that I could concatenate the changes to the main tasks.c in my automated build? Or maybe I could do the merge in the automated build?

rowbearto · January 7, 2021, 7:21pm

@gedeonag: Could it be possible to modify tasks.c such that the code is wrapped with preprocessor #if and is only added with the posix port?

gedeonag · January 7, 2021, 7:50pm

Nope unfortunately the could not be merged into main, as they are very specific to the posix port, and would break all other ports, and no one would be happy about this, including yourself

That would be the best option, as it requires modifying some functions in the middle of the file

I could create a patch and post it here.
But the changes in general would be to modify xTaskCreate and prvInitialiseNewTask to “fool” the system about the amount of stack it got.