Still hitting null pxCurrentTCB when handling interrupts

Hello Forum Folks!

Using FreeRTOS on an STM32H573 no TMZ, non-secure

I am still receiving frequent occurrences of Null or invalid pxCurrenbtTCB pointers in FreeRTOS when calling ISR approved functions. Most of the literature and forum posts point to the ISR priority setup as a culprit but I believe that I am configured correctly.

[ 3] Heartbeat::heartbeatTask: __NVIC_PRIO_BITS: 4
[ 3] , configPRIO_BITS: 4
[ 3] , configLIBRARY_LOWEST_INTERRUPT_PRIORITY: 15
[ 3] , configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY: 5
[ 3] , configKERNEL_INTERRUPT_PRIORITY: 240
[ 3] , configMAX_SYSCALL_INTERRUPT_PRIORITY: 80

In the ever to frequent stack trace below I hit an assert I added to FreeRTOS tasks.c (where I validate that pxCurrentTCB is in valid ram on every dereference.

I am using the following ST HAL call
/* FDCAN1 IT0 interrupt Init */
HAL_NVIC_SetPriority(FDCAN1_IT0_IRQn, 5, 0);
To set the interrupt priority and am a bit confused I have tried to use
configMAX_SYSCALL_INTERRUPT_PRIORITY+5 as the parameter as well but gets the same result

I’ve used FreeRTOS for years on several processors without ever being haunted by this persistant of a system failure and am sure it’s in my setup somewhere.

Have you turned on (defined) configASSERT()? That might be able to capture issues regarding interrupt priority configuration and what might cause the memory corruption.

Can you check the implementation of HAL_NVIC_SetPriority to ensure that it shifts the value correct to account for __NVIC_PRIO_BITS? Also, as @xuelix suggested, please define configASSERT.

I do have config assert mapped to my assert implementation. The FreeRTOS distro does not check all dereferences of pxCurrentTCB so it’s hitting an assert I have added that verifies that a pointer points to a valid RAM address.

It could it be a case of memory overrun then. Can you declare a variable right next to pxCurrentTCB and check its value when your assert fires:

portDONT_DISCARD PRIVILEGED_DATA TCB_t * volatile pxUnused = NULL;

If you see this variable pxUnused is modified, then you can use data breakpoint to catch when the corruption happens.

Hello Aggarg,

__NVIC_PRIO_BITS is set to 4,., Using all of the defines in my first post the bottom of the ST hal uses the code “”

NVIC->IPR[((uint32_t)IRQn)] = (uint8_t)((priority << (8U - __NVIC_PRIO_BITS)) & (uint32_t)0xFFUL);

After the line executes I wind up with

NVIC_IPR_13 holds 0x5000000

which sets Priority if Interrupt 55 to 0x50 but since the __NVIC_PRIO_BITS is 4 the actual value of the byte register os 80 which I think is an ok priority ok executing FreeRTOS IDR-ready Metghods.

With many of the pxCurrentTCB dereferences in place I now assert in place of hard-faulting in many instances. Below is an instance of an invalid TCB pointer.

note: Valid Ram is between 0x20000000 and 0x2009FFFF

Thread 2 hit Breakpoint 2, _assert_failed (assertion=0x8127004 "(((uint32_t)(pxCurrentTCB) >= 0x20000000) && ((uint32_t)(pxCurrentTCB) <= 0x2009FFFF))",
    file=0x8126f14 "/home2/miller/src/vip-rcip/vip/libs/FreeRTOSV101/Source/tasks.c", line=3033) at /home2/miller/src/vip-rcip/vip/libs/assert/assert.c:123
123         if (! isAssertAlreadyInProgress)
(gdb) bt
#0  _assert_failed (assertion=0x8127004 "(((uint32_t)(pxCurrentTCB) >= 0x20000000) && ((uint32_t)(pxCurrentTCB) <= 0x2009FFFF))",
    file=0x8126f14 "/home2/miller/src/vip-rcip/vip/libs/FreeRTOSV101/Source/tasks.c", line=3033) at /home2/miller/src/vip-rcip/vip/libs/assert/assert.c:123
#1  0x080a0f8c in vTaskSwitchContext () at /home2/miller/src/vip-rcip/vip/libs/FreeRTOSV101/Source/tasks.c:3033
#2  0x080a3f48 in PendSV_Handler () at /home2/miller/src/vip-rcip/vip/libs/FreeRTOSV101/Source/portable/GCC/ARM_CM33_NTZ/non_secure/portasm.c:236
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) up
#1  0x080a0f8c in vTaskSwitchContext () at /home2/miller/src/vip-rcip/vip/libs/FreeRTOSV101/Source/tasks.c:3033
3033            configASSERT( isValidRAM(pxCurrentTCB));
(gdb) p pxCurrentTCB
$4 = (TCB_t * volatile) 0x64227b20

This seems like a memory corruption as I mentioned before. Did you try my previous suggestion? Also, can you try to disable parts of your application to narrow down the problem?

Will I just hit another one and it’s odd… I had originally bracketed the pxCurrentTCB pointer with a fixed variable on each side.

portDONT_DISCARD PRIVILEGED_DATA TCB_t * volatile pxCurrentTCB_pre = (TCB_t *) 0xdeadbeef;
portDONT_DISCARD PRIVILEGED_DATA TCB_t * volatile pxCurrentTCB = NULL;
portDONT_DISCARD PRIVILEGED_DATA TCB_t * volatile pxCurrentTCB_post = (TCB_t *) 0xac987654;

After running under moderate CAN traffic for about 5 minutes I hit my bus fault.

The really odd part was that the value of the pxCurrentTCB was pointed at an arbrutary space above my statically allocated heartbeat task stack? which I report at startup as

main: TaskCreated: Heartbeat, stack address: 0x0x20006c00, TCB address: 0x0x200020d8 

After the fault the bookends that I put around the pxCurrentTCB pointer are unchanged.

The pxCurrentTCB pointer is set to somewhere beyond even the end of the uxHeartbeatTaskStack as is indicated below.

Thread 2 hit Breakpoint 3, BusFault_Handler () at /home/miller/src/vip-rcip/vip/src/system/stm32h5xx_it.c:127
127  while (1)
(gdb) bt
#0  BusFault_Handler () at /home/miller/src/vip-rcip/vip/src/system/stm32h5xx_it.c:127
#1  0xffffffac in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p pxCurrentTCB
$1 = (TCB_t * volatile) 0x20007ba3 <uxHeartbeatTaskStack+4003>
(gdb) p pxCurrentTCB_pre
$2 = (TCB_t * volatile) 0xdeadbeef
(gdb) p pxCurrentTCB_post
$3 = (TCB_t * volatile) 0xac987654

Were you able to set a data access breakpoint on pxCurrentTCB to see what is trying to modify it?
Would be interesting to see what is in your pxReadyTasksList[your task priority] while running.

I can enter the watch on pxCurrentTCB but it hits all the time (which I guess I expect.) Where my arm-none-eabi-gdb fails is when I tried to set a condition to only break when I hit an invalid TCB… I only have 8 tasks. All statically allocated.

[       0] main: TaskCreated: CAN, TCB address: 0x0x20001800
[       0] main: TaskCreated: Status, TCB address: 0x0x20001c6c
[       0] main: TaskCreated: Heartbeat, TCB address: 0x0x200020d8
[       0] main: TaskCreated: Logging, TCB address: 0x0x20002544
[       0] main: TaskCreated: SFI SPI Task, TCB address: 0x0x200029b0
[       0] main: TaskCreated: SFI to VIP, TCB address: 0x0x20002e1c
[       0] main: FreeRTOS Timer Task TCB   0x0x20003288
[       0] main: FreeRTOS Idle Task TCB    0x0x200032f4

I tried to setup gdb with:

(gdb) watch pxCurrentTCB
Watchpoint 5: pxCurrentTCB
(gdb) condition 5 pxCurrentTCB != 0x0x20001800 && pxCurrentTCB != 0x0x20001c6c && pxCurrentTCB != 0x0x200020d8 && pxCurrentTCB != 0x0x20002544 && pxCurrentTCB != 0x0x200029b0 && pxCurrentTCB != 0x0x200029b0 && pxCurrentTCB != 0x0x20003288 && pxCurrentTCB != 0x0x200032f4

But it’s breaking on every update on pxCurrentTCB which is pretty useless.

In my latest cut i’m writing a isValidTCB() macro so that I can instrument FreeRTOS to assert if one of the half dozen places where tasks.c and list.h actually updates pxCurrentTCB.

used in tasks.c like…

                    configASSERT(isValidTCB( pxNewTCB ));
                    pxCurrentTCB = pxNewTCB;

isValidTCB() will look something like…

    #define isValidTCB(x)     (     ((uint32_t)(x) == 0x20001800)   \
                                 || ((uint32_t)(x) == 0x20001c6c)   \
                                 || ((uint32_t)(x) == 0x200020d8)   \
                                 || ((uint32_t)(x) == 0x20002544)   \
                                 || ((uint32_t)(x) == 0x200029b0)   \
                                 || ((uint32_t)(x) == 0x20002e1c)   \
                                 || ((uint32_t)(x) == 0x20003288)   \
                                 || ((uint32_t)(x) == 0x200032f4)   \
                                 )

…Stay tuned, still working on it

If this works I may build something a little more elegant that does not have literal TCB pointers but for my current purposes it ‘should’ work.

That is not necessarily wrong as pxCurrentTCB is not supposed be in the task stack range because- TCB is not kept on the stack.

As I understand it pxCurrentTCB should only point to valid TCB structures once the scheduler is running. Since all my TCB’s are statically allocated their addresses are pretty consistent.

The fact that the pxCurrentTCB pointer was assigned a value in one of the tasks stack space (also all statically allocated) was just an observation I did not mean to conclude anything from it.

Understood thanks. Did you try to read the memory at that address and see if it looks like a TCB like it has a task name?