FreeRTOS vEventGroupDelete assert failing - called from aws_ota_agent.c

nickm2018 wrote on October 08, 2018:

This question is more focused on the FreeRTOS implementation and not a AWS library question.

I am running a custom platform using the heap_3 configuration.

In my application, I am trying to reset the OTA connection (disconnect and reconnect) but when I call OTA_AgentShutdown, I end up hitting an assert in FreeRTOS queue. The problem appears to be related to the vEventGroupDelete call in the OTA agent thread of aws_ota_agent.c, specifically the following lines:

prvAgentShutdownCleanup( &xMsgMetaData );
vEventGroupDelete( xOTA_Agent.xOTA_EventFlags );

I stepped into the code with my debugger and the eventGroup is valid and not null.

Call tree

vEventGroupDelete
–> vPortFree
–> free
–> some platform library call
–> __malloc_lock (heap_3.c)
–>xSemaphoreTakeRecursive
–>xQueueGenericReceive
configASSERT( !( ( xTaskGetSchedulerState() == taskSCHEDULER_SUSPENDED ) && ( xTicksToWait != 0 ) ) );

Going back through the call stack, configUSE_NEWLIB_MALLOC_LOCK is defined to 1, which should suspend the scheduler through vTaskSuspendAll. If I step into the xTaskGetSchedulerState(),

xSchedulerRunning = 1, so scheduler is started
uxSchedulerSuspended = 1, so scheduler is suspended.

So xTaskGetSchedulerState returns taskSCHEDULER_SUSPENDED and xTicksToWait is set to 0xFFFFFFFF, which means the assert will fail.

Seems like there are two calls to vTaskSuspendAll in the call tree, one in vPortFree and the other in vEventGroupDelete. Is this a FreeRTOS configuration issue? It seems like the scheduler cannot be suspended for xQueueGenericReceive to run.

RIchardBarry-AWS wrote on October 08, 2018:

Sorry to hear you are having troubles. First I can confirm that, as a general rule of thumb, you should not call FreeRTOS kernel API functions that can block while the scheduler is either suspended or you are in a critical section. That is to prevent logic errors - if a function needs to block because, for example, you are reading from a queue with a block time but the queue is empty then a context switch must be allowed to happen otherwise the queue read function will return without either timing out or receiving date.

Normally the C library malloc() and free() would only be used if you are using other third party libraries that are using these functions, otherwise heap_4.c is the preferred memory allocater (https://www.freertos.org/a00111.html ). If you need to use the stand library heap then you can use heap_3.c, which wraps the standard library malloc() and free() to make them thread safe (somewhat crudely), and then you can direct malloc() to pvPortMalloc() and free() to vPortFree() (see the link already posted).

I’m not sure what configUSE_NEWLIB_MALLOC_LOCK does - I searched the kernel source files and can’t find a reference to it.

nickm2018 wrote on October 08, 2018:

configUSE_NEWLIB_MALLOC_LOCK is added from my platform SDK.

It looks like the simplest work around is to change to a statically allocated EventGroup.