Assert in taskSELECT_HIGHEST_PRIORITY_TASK

Hi
I’m getting consistent assert in taskSELECT_HIGHEST_PRIORITY_TASK. Triggered by communication traffic , but not directly related to it.
Communications are polling based (Ethernet emulation over shared memory, no interrupts)
Current task is IdleTask, happens when idle task is running.
Tried CPU trace - nothing that eye can catch just a long trace of housekeeping work.

Please look pxReadyTasksLists (memory). Does it look normal?

Need some fresh idea.
Thanks
Rasty

First glance, that sort of problem indicates that somehow no tasks are ready, like if an IdleHook does something to block.

It could also happen if some ISR corrupts the ready list.

Another bit of information.
normall pxReadyTasksLists[0] is 1. System can run for hours.
I put h/w watchpint to pxReadyTasksLists[0], the trigger the problem.
IdleTask writes 0 to pxReadyTasksLists[0] (was 1) by itself(!) and then Asserts error.

You are in a call to xQueueSemaphoreTake, which appears to have been called with a non-zero wait time. That isn’t allowed to be done in the Idle task.

As I said, you appear to have something in your IdleHook that sometimes blocks, which isn’t allowed.

You may need to create your own Idle Priority task instead of using the IdleHook if you are needing to do something that has any possibility of blocking.

I will add that the trace back to the call to xQueueSemaphoreTake looks funny, so perhaps there is something else going wrong there.

configUSE_IDLE_HOOK is defined as “0”, breakpoint in vApplicationLoadHook is not hit

Everything is compiled with -O3 so things/stack may look a bit unusual due to optimization.

can we see your code?

we have a lot of code. Executable is apx 8 mbytes.
If you’re interested in specific part and it does not contain any confidential info I can share.
As I mentioned it is pretty strange problem.
No DMA, No interrupts.
But there 2 cores that share memory (I double checked for overlapping memories). Unfortunately I cannot stop the second core.
crash1_short.zip (15.7 KB)
This is trace (ARM embedded trace is awesome) , read from top to bottom.
It is stopped 1300+ lines away from the any application code (for clarity I filtered out instructions that belong to the same line) before it fail into Assert.

Why Idle task would consistently decide to write 0 into pxReadyTasksLists[0]?

can you obtain more information about this here debug assert? It looks suspicious:

123720,0x802D7AA2,0x0000BF00,123719,_DebugP_assert,DebugP_log.c:111
123717,0x802D7A2E,0x000068FB,123716,_DebugP_assert,DebugP_log.c:93
123710,0x802D7A20,0x0000B5B0,123709,_DebugP_assert,DebugP_log.c:92
123709,0x70183D6C,0xE51FF004,123708,___DebugP_assert_from_arm,:?
123695,0x700F1C86,0xF64241E8,123694,vTaskExitCritical,tasks.c:7050
123691,0x802D7AA2,0x0000BF00,123690,_DebugP_assert,DebugP_log.c:111
123688,0x802D7A2E,0x000068FB,123687,_DebugP_assert,DebugP_log.c:93
123681,0x802D7A20,0x0000B5B0,123680,_DebugP_assert,DebugP_log.c:92
123679,0x70183D6C,0xE51FF004,123678,___DebugP_assert_from_arm,:?

O3 doesn’t affect the call stack so that the trace goes like that. Some functions in the chain might not show up, but not like what your stack trace shows.

Your likely problem is memory corruption which means it is very hard to know what is the cause.

I will note that the “Idle Task” won’t write the 0, it is that SOMETHING has gotten into the system to make something call AddCurrentTaskToDelayList in the context of the Idle task. The fact that the trace can’t go past that call is the suspicious thing. As well as the copysignf at 0xFFFFFFFE in the module dd3s_v0_core.

I will also add that NO interrupts seems strange, as under an RTOS you do most things with interrupts, and at least you need a time base.

I compared to “happy” case. Stack trace looks the same - weird.

One thing you could do is to change application tasks priority from 0 to 1. This will ensure that only priority 0 task is idle task. Then the watchpoint pxReadyTasksLists[0]->uxNumberOfItems would be more helpful. Also, consider removing O3 for debugging.

full_trace_short.zip (37.2 KB)
Here is unfiltered trace and location of breakpoint
What I normally see is
123890,0x802D7A2E,0x000068FB,123889,_DebugP_assert,DebugP_log.c:93
123889,0x802D7A2C,0x0000603B,123888,_DebugP_assert,DebugP_log.c:92
It reaches breakpoint at line #95 when something wrong happens

I want to put a reminder that the problem isn’t were the ASSERT got triggered, but when the Idle task got blocked. The ASSERT is just the point when the problem gets notices.

The stack trace of where the watchpoint on pxReadyTasksLists[0] is what needs to be explained.

This shows that a call to xQueueSemaphoreTake was called that decided to block.

The Idle task never makes such a call.

The call chain leading to that call looks corrupted.

THIS is what needs to be determined. Has some task corrupted the Idle task stack to make something like this happen, or has something corrupted the system so it just looks like this has happened.

This sort of error tends to be hard to find, as it could be almost anywhere in the code doing a “wild write” to memory that shouldn’t be being written.

I found something suspicious
I see that the same DMA and IRQ are assigned to 2 cores via TI sysconfig and I cannot change it.
I’m curious how it is going to work.
I go to TI support.
Thank you meanwhile.

update:
I was able to narrow down the problem to shutdown of communication server tasks under heavy stress load.
We ported our aplication from TI-RTOS (SysBIOS) to FreeRTOS. problem was introduced in porting layers due to minor differences in API, such as attachment and deletion of thread storage and task name (FreeRTOS copies it into TCB, while TI-RTOS stores only pointer that has to be handled (freed) by application).

Thank you for reporting back!