uxListRemove() referencing NULL pointer during QueueSend()

Hello. I am seeing an issue with an xQueueSend to a blocked task wherein the kernel is performing some list management and in doing so references a NULL pointer and generates a hardfault. V10.4.1 running on Maxim 32665.

Here is the line causing the fault:

It looks like the listGET_OWNER_OF_HEAD_ENTRY() is returning a pxUnblockedTCB that includes a NULL pointer for pvContainer before calling uxListRemove(). [tasks.c line 3167-3169].

These transmissions are only behaving this way in one specific context, which is initiated with a CLI command. The calls to this same xQueueSend() work in other conditions.

Any thoughts on what I might have wrong?

A corruption of FreeRTOS internal data structures is very often caused by stack overflows provided there are no other fatal application programming bugs or wrong interrupt priority configuration.
Did you define configASSERT and enabled stack overflow checking (for development/debugging) ?

Another very common cause for this is faulty usage of the critical section or faulty arrangements of the SysTick and Service Handler priorities (I had almost exactly the same error condition a month ago when I joined which turned out to be a rather subtle IRQ priority assignment problem).

Thank you for the response. Stack allocation for the CLI task is one of the first things I checked. I did, however, enable automated stack checking and I am not seeing a call to the handler. configASSERT is enabled and I am not seeing it hit.

I did find that the demo we started with for this (not supplied by FreeRTOS) had a priority assignment for the CLI that was higher than configMAX_SYSCALL_INTERRUPT_PRIORITY but correcting that had no impact on the NULL pointer issue.

RAc, could you elaborate on the issue you had?

Don’t get confused by (hardware) interrupt priorities vs. logical FreeRTOS task priorities.
Did you check the (NVIC) interrupt priorities against configMAX_SYSCALL_INTERRUPT_PRIORITY according to this documentation ?
There are also some good posts here in the forum related to this topic.
Is already the 1st send showing the problem or does it happen randomly ?

Yes, I did check the NVIC priorities and, in fact, found that there was one that was set too high, namely the CLI UART priority was set to 1. This only applies to ISRs that use the FreeRTOS safe APIs correct?

The problem does not occur on the first send, but always on the second.

Hi sjenyart,

you’ll find the analysis here. In a nutshell, FreeRTOS lists are organized such that the last list element does not have a payload, so trying to use it as a list element WITH a payload will result in looking at invalid memory. Thus, the condition “list length > 0” along with “the first list element is a terminator” is illegal.

Now how could it happen that a linked list has a length > 0 but consists of only the terminating element (that is, is empty)? Normally via missing atomicity. Operations that manipulate FreeRTOS lists must not be interruptible by other pieces of code that also manipulate the list; otherwise one thread of execution may change the list count but not be able to adjust the list itself accordingly.

That is a fairly reliable hint that there was no critical section in effect where there should have been one, because the critical section is the one FeeRTOS mechanism that ensures atomicity on the OS level.

In my case, a context switch - which manipulates the task ready list array - was not atomic due to incorrect ordering of the service and sys tick ISRs - service MUST POSITIVELY have the lowest priority since it must only interrupt a task context, never an interrupt context. Thus the sys tick interrupt was being interrupted by the service ISR which broke atomicity on the task ready list.

The next thing to look for, then, is an incorrect usage of the critical section, for example in the guise that an interrupt w/ priority > MAX_SYSCALL illegaly performs operations on FreeRTOS lists.

Sorry for the rather elaborate explanation, I hope it gives you a useful pointer or two…

I appreciate the detail. I will look into it and reply back as to whether my issue was similar. Thanks!

Right, the FreeRTOS interrupt priority restriction specified/evaluated by configMAX_SYSCALL_INTERRUPT_PRIORITY applies to ISRs using FreeRTOS API.
ISRs of interrupts with exceeding interrupt priority and using FreeRTOS API break this contract and bad things might happen as RAc explained in detail. Any other FreeRTOS managed data structure might be affected i.e. corrupted as well.
I bet after fixing the CLI UART interrupt priority you’re application will run seamlessly :+1:

Thought I would give you guys an update. The search for ISRs that were of too high a priority has not been fruitful but I have narrowed the problem down to a fairly innocuous looking line of code. Line 97 below is the course of the problem. Line 101 works fine. uChargePercent is of type uint8_t.

image

In the end, this seems to have been a stack overflow problem, although I do not appear to be receiving an indicator from FreeRTOS. I have not looked at how exactly FreeRTOS is monitoring stack usage but, in other RTOSs, I have observed a similar issue:

If the OS is monitoring watermarks, a string utility such as snprintf may actually modify values in an adjacent stack without changing the watermarks in its own stack. This may or may not be the case here. Anyway, an increase in stack allocation for the task corrected this issue.

Thanks for the help.

(sn)printf can indeed overflow the stack without touching watermarks but overwriting memory outside the allocated stack. I also stumbled into this problem time ago.
Thanks for reporting back !