Deferred Interrupt Processing and posting a Binary Semaphore more than once

I’m working on FreeRTOS + FAT + CLI. I’m trying to use Deferred Interrupt Processing for a UART interrupt. At first, I tried using Direct to Task Notification, but some of the commands that can be entered in the CLI make +FAT calls that call into my Media Driver, which uses Direct to Task Notification for deferring SPI interrupts. This caused a conflict. Since the Media driver is much more performance critical than the CLI, I decided to use some other mechanism for the UART interrupts. I tried to follow the example given in Mastering the FreeRTOS™ Real Time Kernel. I used a Binary Semaphore, and I followed this advice: “… to minimize the chance of an interrupt being missed, the deferred interrupt handling task must be structured so that it processes all the events that are already available between each call to xSemaphoreTake().” However, this means that sometimes xSemaphoreGiveFromISR() is called when the semaphore is already given (and not yet taken). In this situation, I get an assertion:

assertion “pxQueue->uxItemSize == 0” failed: file “…\FreeRTOS\FreeRTOS\Source\queue.c”, line 1164, function: xQueueGiveFromISR

What am I doing wrong? All I want to do is give the UART task something on which to block until an interrupt occurs, after it has exhausted all available input. What is the right FreeRTOS facility to use here?

uxItemSize is not equal to zero, doesn’t that mean that the binary semaphore is already given?

Right, and that is not an error in this case. I suppose my ISR could do something like uxSemaphoreGetCount() and only do xSemaphoreGiveFromISR() if it is not already given. I was thinking that I could just do xSemaphoreGiveFromISR() unconditionally and ignore the return code.

Oh, never mind: looks like I didn’t have enough task space, and was getting some kind of undefined behavior.

#define configCHECK_FOR_STACK_OVERFLOW 1

didn’t save me, this time. I can now call xSemaphoreGiveFromISR() more times than xSemaphoreTake() without getting an assert.

Hello Carl,

First of all, I’m sorry that this reply comes from my interest. But how about the following setting? Doesn’t this setting cause assertion of stack overflow? Or, even this setting, doesn’t the setting help you?

#define configCHECK_FOR_STACK_OVERFLOW 2 <-- Not ‘1’ but ‘2’

Best regards,
NoMaY

Thanks, for your interest, NoMaY. In this case, I was off by a factor of two. I had problems at 256 words, and the problems went away at 512 words. Now, at 512, task-stats shows only 30 words of headroom. I have tried configCHECK_FOR_STACK_OVERFLOW 2 before, but I didn’t find it helpful in cases like this where the stack size was way off. I don’t understand why configCHECK_FOR_STACK_OVERFLOW 1 doesn’t work better in this situation.

A task’s entire execution context is saved onto its stack each time it gets swapped out. It is likely that this will be the time at which stack usage reaches its peak. When configCHECK_FOR_STACK_OVERFLOW is set to 1, the kernel checks that the stack pointer remains within the valid stack space after the context has been saved. The stack overflow hook is called if the stack pointer is found to be outside its valid range.

I would think that in my problem case the stack pointer would have been well outside its valid range. Maybe the UART task wasn’t getting swapped out?

Method 1 check the stack actual usage at the point of time when the task is switched out by looking at the stack pointer register. It has the advantage that it is quick, but it can’t tell if previously the stack overflowed, but the subroutine that did that has now returned, as if the excess usage is in a subroutine, its becomes a question if it ever get ‘caught’ in that subroutine.

Method 2 pre-fills the stack with a fixed value and checks a number of byte at the end of the stack to see if they are disturbed. It can find a ‘historical’ error that has returned, but if the overflowing subroutine has some large buffers that it doesn’t always write to, those can hit that guard band and the overflow can be missed.

Hello Carl,

Thank you for your reply. I agree with you that it is really strange:

(1) The problem went away after increasing stack size: 256 words --> 512 words.
(2) But task-stats shows only 30 words of headroom.

In such case, I will check the memory window/view of the debugger regarding the stack area of the task. If it shows that the stack area of 30 words ~ 512 words is really unused (i.e. it keeps the pre-filled value
written by FreeRTOS in case of configCHECK_FOR_STACK_OVERFLOW = 2), actual reason of the problem isn’t the ‘stack overflow’. On the other hand, if the stack is really used, it means that somehow task-stats shows unexpected stack usage size.

Best regards,
NoMaY

Hi NoMaY,

Oh, it is used alright. This was a dumb mistake on my part. As this project has come together, I use the CLI less and less, since I now have a keypad and LCD interface. Meanwhile, what was once a plethora of RAM has gotten relatively full. Not too long ago, I was working on a part of the project involving a memory hungry FFT, and I was quickly looking for places I could free up some memory space. Since the CLI was becoming less important, and I had some vague ideas about changing parts of it to use dynamic memory allocation, I just slashed that and forgot about it. It doesn’t need all that much stack until you try to do heavy things with the +FAT filesystem. And, I hadn’t tried to do that until a problem came up in integration testing. I still might change it to spin off a separate, temporary, task for stack-intensive commands.

Best regards,

    Carl

What about the “high water mark” that uxTaskGetSystemState() uses? Is that information not generally available? Seems like there could be a Method 3, similar to Method 1, but instead of using the current stack pointer, uses a high water mark pointer, or something like that. (Looks like usStackHighWaterMark itself is unsigned, so I guess the high water mark can go to 0 but never negative?)

The High Water Mark basically uses Method 2, but rather than a go/ o-go test of just the end, scans from the end up until it finds the first changed location of the stack.

Hmmm. That makes me think that a Method 3 could really be helpful. I see that there’s already a pxTopOfStack and (potentially) a pxEndOfStack:

typedef struct tskTaskControlBlock       /* The old naming convention is used to prevent breaking kernel aware debuggers. */
{
volatile StackType_t * pxTopOfStack; /*< Points to the location of the last item placed on the tasks stack.  THIS MUST BE THE FIRST MEMBER OF THE TCB STRUCT. */

//...
    #if ( ( portSTACK_GROWTH > 0 ) || ( configRECORD_STACK_HIGH_ADDRESS == 1 ) )
        StackType_t * pxEndOfStack; /*< Points to the highest valid address for the stack. */
    #endif

Seems to me that comparing pxTopOfStack to pxEndOfStack would be quite efficient (a couple of machine instructions). Maybe efficient enough that it could be done every time something is placed on the stack? If not maybe something like another pointer in the tskTaskControlBlock (e.g., pxStackHighWaterMark) could record the highest (or lowest, depending on portSTACK_GROWTH) pxTopOfStack for later comparison?

Top of stack is set when the task is switched out. You could add code to your program to check the current stack pointer to the stack limit (which might be pxStack or pxEndOfStack depending on portSTACK_GROWTH).

I suppose FreeRTOS could be programmed to do this sort of test on calls to its API, or there could be an API to get the amount of stack still available that does that computation. FreeRTOS could not be made to check ‘every time something is placed on the stack’, as the compiler won’t emit this sort of call automatically.

I guess what I am grasping for is something like the new stack limit checking features in the Armv8-M architecture: For processors based on the Armv8-M Mainline architecture, each of the stack pointers has a corresponding stack limit register which allows software to define watermark levels for stack overflow detection, and when stack overflow occurs, a Usage fault or HardFault exception is triggered.

The big problem is most processors don’t support that, so the FreeRTOS core isn’t going to have support. The port layer could definitely set the limit registers, so the trap would occur when the stack was overflowed, as that limit is stored in the TCB for the current software stack checking (and to free the stack on task ending)

I came across an interesting idea in How to Prevent and Detect Stack Overflow:

You can set up the Cortex processors, data watchpoint and trace, DWT unit, to set the data watchpoint at the end of a stack switch. You then enable the debug monitor exception which will be triggered on the program addressed accessed. The DWT can be reprogrammed at runtime such that you can change it during a contact switch. It also works without a debugger connected so that you can use it in a production build.

I found some example code in Cortex-M – Debugging runtime memory corruption.

I’m thinking Method 2 could be extended to set a data watchpoint on the last 16 bytes within the valid stack range (on Cortex-M3 and M4 cores).

EDIT: Of course, that still doesn’t solve the problem of leaping over the guard bytes, like I tend to do. It might help locate small stack overflows more quickly, though.

FreeRTOS Cortex-M33 ports do program the stack limit register for each task - as a result, any stack overflow triggers a Usage fault or HardFault exception.

Thanks.

1 Like

One issue with using the Debug Registers is that you need to make sure the Debugger knows you are doing this or it might get confused or it takes over the vector.

Also, as I remember that actual ‘stack checking’ code is in Tasks.c, so it is device independent (most, it does get a couple of parameters from the port layer, like the direction the stack grows). This sort of device-dependent code would need to be in the port layer, so generally not controlled by the stack checking macro. Also, the main deterrent to using the level 2 check is the CPU cost on each context switch, but with the debug watchpoint, there is no significant CPU cost, just the loading of the control registers. The watchpoint also can only check a very small guard band, so the skip over problem is much larger.

I agree. The best solution is the stack limit registers. I hope more MCUs implement them in the future.

The debug watchpoint trick is nice to have in the tool bag, though. I could see having a couple of watchpoints on guard words in production code. For example, one at the (newlib’s) heap limit. Maybe one at the MSP stack limit.