Highest priority task keeps in running state blocking the other tasks

Hi, we are developing an application with PIC32MZ and near 14 tasks running in the RTOS, but one of those tasks (priority 5) is crashing the code, this task executes without issues for x (random) minutes then keeps in running state forever and blocks all the other tasks, this is a capture we get from tracealyzer, we are working with MPLAB.

fail:

this is the red task:

    void TCAN4x5x_vTaskWatchDog() {

        TickType_t xTicksToDelay = pdMS_TO_TICKS(300); // 300 ms
        TickType_t xLastWakeTime = xTaskGetTickCount();

        for (;;) {
            vTaskDelayUntil(&xLastWakeTime, xTicksToDelay);

            if (TEST_SWITCH_1_Get()) { //Switch acts as a board reset
                TCAN4x5x_WDT_Reset();
            }
            
            TCAN4x5x_SPI_Status status;
            TCAN_Device_ReadSPIStatus(&status);
            
            if(status.Internal_error_interrupt){
                CAN2sig_ICU_Error_Code.setValue(CAN2sig_ICU_Error_CodeVT_TCAN_Internal_Error);
            }else if(status.SPI_error_interrupt){
                CAN2sig_ICU_Error_Code.setValue(CAN2sig_ICU_Error_CodeVT_TCAN_SPI_Error);
            }

        }
    }

    void
    TCAN_Device_ReadSPIStatus(TCAN4x5x_SPI_Status *reg){
        reg->word = AHB_READ_32(REG_SPI_STATUS);
    }

Do you have any idea of what could cause this issue?

also, the red task is not created inside the SYS_Tasks ( ); function, this task is created inside another task (one with lower priority), is this correct or all the tasks must be created inside SYS_Tasks ( )?

thanks

The only scenario I can think of here that involves FreeRTOS is that your xLastWakeTime gets overwritten somehere in the function body so that the wait always returns immediately. To test that theory, replace the wait until with an absolute wait and see if the behavior changes.

The usual debugging technique here is to add several counters in the infinite loop and monitor those counters via real time watch if your IDE allows for that.

Another thing to check is to break it in debugger and see what the task is doing when it appears to be running forever.

Thanks.

Is it possible that this line of code gets stuck in a loop somewhere?

As per Gaurav’s suggestion - if you know the task is running then break in the debugger and see what it is executing - Pilot’s suggestion that it might be in TCAN_Device_ReadSPIStatus() would seem likely unless you just have a data corruption somewhere.

We have a task that runs every 1ms, it is recommended to have a task that executes at the same frequency that the RTOS tick?

Is that the task that’s trashing? Might be an issue with the delay until then. Again, test with an absolute wait.

Hi, but in that case the task would fail immediately but that is not the case, if you see the image the red task runs normally and suddenly keeps in running state

no, the task wit issues is detailed in the code y shared, it runs every 300ms with priority 5, the task running at 1ms is a task with priority 2 (color blue - QUECTEL_GNSS)

The task running at the same frequency as the RTOS tick shouldn’t be a problem, though you will likely find that it doesn’t ACTUALLY run every millisecond. That’s a whole separate issue though, and has nothing to do with this query.

As for this issue, you really need to break the debugger when it hangs in that task and find out what it’s doing. Until you know that, there’s not going to be much more anyone can tell you. In all likelihood, the RTOS isn’t forcing that task to stay active, the task itself if forcing the RTOS to allow it to remain active because it’s busy doing something.

No, it can theoretically fail at any time.

Again, you should first determine where in the infinite loop you are “stuck.” Either any of the functions called within the for loop is stuck in a “local” infinite loop, or the wait function always returns immediately without waiting, indicating some kind of Problem with the xLastWakeTime variable once your scenario is reached. To me it looks like all of the code in your body except the wait may be non yielding, so if the wait does not block, you would always run through the loop in a CPU bound computation.

It is fairly straightforward debugging to determine which of the two is the case. Several techniques have been proposed here. Once you have figured out which of the two scenarios hold, the rest is routine.