pbleyer wrote on Monday, August 01, 2016:
Hello.
I am experiencing lost task notifications in a communication module under heavy loads. The code affected is essentially a message transmission task that sends data to a transceiver. It gets notified either from a start transmission function or an ISR that sends a new notification when the physical transmission buffers become available.
The transmission task procedure sits in an infinite loop waiting for events:
void
optoProc(void *arg)
{
while (true)
{
uint32_t e = ulTaskNotifyTake(pdTRUE, portMAX_DELAY);
if (e & opto_Tx)
{
// Get message from txQueue and encode bytes for transmission
optoTxProc(...);
}
}
}
The ISR can wake up the task using the following code snippet:
if (buffer_available)
{
BaseType_t pw;
xTaskNotifyFromISR(optoTask, opto_Tx, eSetBits, &pw);
portYIELD_FROM_ISR(pw);
}
And the message send function does basically the same after putting the message in a queue:
int
optoTxPut(OptoDevice *d, const OptoMessage *m)
{
if (xQueueSend(d->txQueue, m, 0) != pdTRUE)
return opto_EOverflow;
xTaskNotify(optoTask, opto_Tx, eSetBits);
return opto_Ok;
}
So everything is pretty straightforward. The interrupt priorities are configured properly and system/memory checking hooks are in place to trap any issues such as memory leaks or corruption.
When running the system under heavy packet traffic, I started noticing that the optoTxPut function was returning queue overflow errors even when hardware transceiver buffers were available, and transmission stopped. I realized that under that condition, the optoProc procedure becomes stuck waiting for notifications even when the xTaskNotifyFromISR function is being called and attempting to send notifications.
I tracked the underlying xTaskGenericNotifyFromISR function and I see that the decision to send a notification and wake up the task is performed there, when the task ucNotifyState state changes from taskWAITING_NOTIFICATION to taskNOTIFICATION_RECEIVED. However, when the issue happens, the task is already in the taskNOTIFICATION_RECEIVED state and therefore the task is never put in the task ready list. I can manually recover the system with the debugger modifying the ucNotifyState variable and triggering the notification actions.
I see that the ucNotifyState only gets changed to taskNOTIFICATION_RECEIVED in the task notify functions (xTaskGenericNotify, xTaskGenericNotifyFromISR, vTaskNotifyGiveFromISR). Is there any other behavior I am missing that can affect the value of the notification or remove the task from the ready list so the notification gets lost or the task is not run (and explaining why the variable ucNotifyState does not revert)? I debugged the interrupt and didnāt see any improper interaction between that and the send function. In the meantime my obvious workaround is to use a time delay instead of waiting forever to periodically force the task procedure to run and check the queues, but I certainly would like to find the root of the problem.
Thanks for any ideas.