I working on OTA over cellular on a STM32. First goal is to get the MQTT Agent running and I’m nearly there. When I run this loop, it runs for a few cycles before it hangs on a configASSERT:
The configASSERT in question is configASSERT( uxSchedulerSuspended == 0 );
vTaskDelay() calls VTaskSuspendAll(); and normally it calls xTaskResumeAll(); later. This works for a few cycles of my loop, but then for some reason xTaskResumeAll(); is not called, which triggers the assert on the next time vTaskDelay() is called.
I’m quite lost as to why this occurs, does anyone have a hunch?
For reference, here’s where this happens in tasks.c:
void vTaskDelay( const TickType_t xTicksToDelay )
{
BaseType_t xAlreadyYielded = pdFALSE;
/* A delay time of zero just forces a reschedule. */
if( xTicksToDelay > ( TickType_t ) 0U )
{
configASSERT( uxSchedulerSuspended == 0 );
vTaskSuspendAll();
{
traceTASK_DELAY();
/* A task that is removed from the event list while the
scheduler is suspended will not get placed in the ready
list or removed from the blocked list until the scheduler
is resumed.
This task cannot be in an event list as it is the currently
executing task. */
prvAddCurrentTaskToDelayedList( xTicksToDelay, pdFALSE );
}
xAlreadyYielded = xTaskResumeAll();
}
else
{
mtCOVERAGE_TEST_MARKER();
}
/* Force a reschedule if xTaskResumeAll has not already done so, we may
have put ourselves to sleep. */
if( xAlreadyYielded == pdFALSE )
{
portYIELD_WITHIN_API();
}
else
{
mtCOVERAGE_TEST_MARKER();
}
}
First thing to check would be the ‘usual suspects’ - things like the interrupt priorities and stack overflows. This page provides pointers to the relevant information on things like configASSERT() and configCHECK_FOR_STACK_OVERFLOW https://freertos.org/FAQHelp.html
Which version of FreeRTOS are you using? The newer the more configASSERTS() there are to catch these things.
I have implemented the vApplicationMallocFailedHook() as well as the vApplicationStackOverflowHook() and these do not get triggered.
There is only one interrupt in my own code that can occur during this, which is when the cellular modem sends a command. My code needs a UART receive-idle interrupt for this, which fires after a serial command line has been received from the modem.
Could this interrupt be messing with vTaskDelay(); ?
I’m very new to FreeRTOS so I’m probably overlooking some beginners mistake.
I’m running FreeRTOS 10.3.1, generated by STM32 CubeMX
In which case we will need a little more information. If you are sending publish messages I assume you have already connected to the MQTT broker. Did you use one of the examples provided in either of the FreeRTOS or AWS git repos as a starting point? Or maybe STM32Cube as the starting point? Is that where the implementation of prvMQTTPublish() comes from?
What I did was I took the OTA demo, intended for running in a windows simulator, and ported that to my STM32 by changing the transport interface and PAL interface. I wrote a custom driver for the cellular modem, which is the glue between the modem AT commands and the socket needed for the transport layer.
The prvMQTTpublish function works. I can see the messages coming in inside my AWS console. But it only works about 5 times, then the scheduler stops as the TaskDelay suspends it for some reason.
Try publishing a constant string each time, rather then creating a new string each time. That would remove the possibility of the sprintf() function causing an issue (implementations can do unexpected things), and potentially the buffer being access from more than one thread simultaneously (just looking for something that could cause a data corruption). Some something like:
static const char * const StringToSend = "Send me to the clouds";
So its not on the stack either.
If that doesn’t move you forward please attach your FreeRTOSConfig.h file here (or send it to me a r dot barry at freertos dot org if you don’t want to attach).
As a further observation, the unexpected suspension of the scheduler seems to be happening just after data is received from the cellular modem. I’m running a state machine for handling the modem in a separate task. The state machine is driven by UART interrupts from the modem which can change the state. The interrupt routine for this is given below. As data reception seems to trigger the problem, I am possibly doing something wrong here, I’m just not seeing it.
Ammending this post as am working on it. Another observation; when I do not call MQTTPublish and just wait in a loop, the MQTTAgent sends a MQTT ping once a minute to keep the connection alive. This goes well for a couple of minutes, after which the scheduler gets suspended and the application hangs. So whatever is causing this, it is not in my own code that does the cyclic publish.
Spent yet another day chasing this bug without success. An important thing to note is that when I set the publish QoS to 0, I’ve been able to publish at 1 Hz for 1 hour without errors. When QoS is 1, the error is still there, it works for a few times, then the scheduler is suspended. So there must be something wrong in receiving.
I have not tried subscribing and receiving yet, but that will very likely also not work…
These are the kind of things that consume so much time, I have to suck it up and keep grinding… Hope you guys here can help out, I’m stumbling in the dark at the moment.
That seems correct. We need to narrow down the problem - Can you try removing this debug port from your ISR so that we do not need to initiate a UART Tx. Also, can you share call stack (possibly a debugger snapshot) when the assert fires?
Ok, I removed the UART Tx, problem persists. Attached is a screenshot of the general registers when the assert is triggered. I’m assuming this is what you meant?
Start debugger and put a breakpoint in your application task (StartDefaultTask).
Put a data breakpoint at the location of your new variable uxSchedulerSuspended_Test.
Remove the breakpoint created in step 2 and let the debugger continue.
The expectation is that the linker places uxSchedulerSuspended_Test and uxSchedulerSuspended close enough and any memory corruption which corrupts uxSchedulerSuspended, will also corrupt uxSchedulerSuspended_Test. Since we have a data breakpoint on uxSchedulerSuspended_Test, the debugger will break as soon as it is changed (which should not happen in a legitimate case).
The debugger does not hit the uxSchedulerSuspended_Test breakpoint.
However, it initially hits a new assert, configASSERT( puxStackBuffer != NULL ) in xTaskCreateStatic. When I click resume, the code keeps running, and works for a couple of publish iterations before it hangs on the scheduler suspended assert again.
If I comment out
PRIVILEGED_DATA static volatile UBaseType_t uxSchedulerSuspended_Test = ( UBaseType_t ) pdFALSE;
Then the configASSERT(puxStackBuffer != NULL) does not fire
Here is my main.c which runs the AWS vStartOtaDemo after the socket is running. I have modified vOtaDemoTask to not start the OTA yet, just the MQTTAgent for testing.
I’ve been placing printf’s in a lot of places and looking at the log. After the Agent posts a publish, it does not call the transport interface anymore to receive the QoS=1 Ack. So it seems the problem occcurs inside the MQTTAgent itself and not in my implementation of the transport interface.