UART RX task w/ max priority still starving

I’m running on the RT1052 w/ FreeRTOS Kernel V10.2.1.
When doing (maybe heavy) communications over UART, my uart_rx_task somehow seems to be starving (very occasionally).
The starvation happens when in my uart_tx_task (priority 3), I send something, and then wait for an ACK sent via event group (that usually come from my uart_rx_task quickly/responsively) to be set.

I can confirm the ISR is receiving all data, but on these starvation occasions, the uart_rx_task fails to run for over a second, so the uart_rx_task never has the chance to process the data to signal the TX task.

I can confirm the RX task is starved and the TX task isn’t because on TX task timeout after not getting an ACK, I can dump the UART RX “cb” (see below) over the uart line - and see everything was received by the ISR, just UART task didn’t start back up.
AFTER I DUMP the buffer over UART TX, usually UART RX will start behaving normally again and work fine.

Was wondering if there was anything I’m doing wrong w/ this interrupt structure, or what else might cause the UART rx to starve when it’s at max priority.
One important note - seems that all NXP projects have time slicing turned off. Turning in on seems to break the project right now…

static TaskHandle_t uart_rx_task_handle, uart_tx_task_handle;

void uart_startup(){
  BOARD_InitUARTPins();
  NVIC_SetPriority(LPUART1_IRQn, 13);
  
  if (xTaskCreate(uart_rx_task, “uart_rx_task”, 2000, NULL, 5, &uart_rx_task_handle) != pdPASS) {
    assert(0);
  }
}

void LPUART1_IRQHandler(void) {
  uint8_t data;
  portBASE_TYPE xHigherPriorityTaskWoken = pdFALSE;
  
  if ((kLPUART_RxDataRegFullFlag)&LPUART_GetStatusFlags(LPUART1)) {
    data = LPUART_ReadByte(LPUART1);
    cb_push_back(&cb, &data);
    xTaskNotifyFromISR( uart_rx_task_handle, UART_RX_DATA, eSetBits, &xHigherPriorityTaskWoken );
    portYIELD_FROM_ISR( xHigherPriorityTaskWoken );
  }
}

static void uart_rx_task(void *pvParameters) {
  uint32_t rx_count = 0;
  EventBits_t event_bits;
  
  cb_init(&cb, (void *) cb_uart_backing_buffer, 1024);
  
  lpuart_config_t config;
  LPUART_GetDefaultConfig(&config);
  config.enableTx = true;
  config.enableRx = true;
  LPUART_Init(LPUART1, &config, BOARD_DebugConsoleSrcFreq());
  
  /* Enable RX interrupt. */
  LPUART_EnableInterrupts(LPUART1, kLPUART_RxOverrunInterruptEnable | kLPUART_RxDataRegFullInterruptEnable);
  EnableIRQ(LPUART1_IRQn);
  
  while (1) {
    xTaskNotifyWait( pdFALSE, /* Don’t clear bits on entry. /
                               ULONG_MAX, / Clear all bits on exit. /
                               &event_bits, / Stores the notified value. */
                    portMAX_DELAY );
    
    if ((event_bits & UART_RX_DATA) == UART_RX_DATA){
      process_read_data(&cb);
    }
    
  }
  
  vTaskSuspend(NULL);
}

Just so you have the TX side to see, here is where the TX task is waiting for ACKs:

static status_t uart_tx_await_ack(uint8_t *data, uint16_t size){
  EventBits_t event_bits;
  uint8_t retries = 7;
  uint8_t retry_backoff;
  TickType_t wait_time;
  
  //avoid triggering from stale acks/nacks…
  xEventGroupClearBits(uart_ack_event_group, ON_ACK | ON_NACK);
  
  do {
    uart_tx_enqueue(data, size);
    retry_backoff = (8 - retries) * 50;// ms extra
    wait_time = (200+retry_backoff) / portTICK_PERIOD_MS ;
    event_bits = xEventGroupWaitBits(uart_ack_event_group, ON_ACK | ON_NACK, pdTRUE, pdFALSE, wait_time);
    
    if (((event_bits & ON_ACK) != ON_ACK) && (retries <2)){
      // starvation case - dump the CB over uart here
    }
    
  } while (retries-- && ((event_bits & ON_ACK) != ON_ACK));
  
  if ((event_bits & ON_ACK) == ON_ACK){
    return kStatus_Success;
  } else if ((event_bits & ON_NACK) == ON_NACK){
    return kStatus_Fail;
  }
  
  return kStatus_Fail;
}

The UART RX driver is fine. If I got it right the RX task blocks on getting an ACK via an EventGroup (from the TX task). In this case the (high) prio of the RX task doesn’t help since it’s waiting for the TX task.
I think you should verify the inter task signalling/synchronization including the CB (ring buffer) implementation (if there could be a race condition when accessed from RX/TX) to find the problem.
Did you think about using a light weight task notification (bit) to signal ACK from TX to RX task instead of an EventGroup ? Or alternatively just a semaphore also supporting priority inheritance ?

Hi Harmut,

Thanks for the confirmation on the RX driver.

My explanation wasn’t good but: the RX task is only blocking on the RX interrupt signal, nothing else. So this is where I’m confused, the Rx task is the highest priority, and it just needs to respond to the signal from ISR (then it will parse CB and send an event to the TX task).

The TX task accessing that ring buffer is just for debugging this specific problem. To prove the whole system isn’t frozen, just the RX task is frozen. (And the TX task that is waiting on the RX task is not frozen, is able to access the data the RX task is supposed to handle, etc)

In the normal flow, the RX task is supposed to parse the CB data, and then signal the TX task w/ an event - and Tx would never touch CB.

For me it seems that if RX task has the highest prio and preemption is enabled the only piece of code which could cause such a problem (TX task waiting for RX task event for an unexpected long time) is the CB parsing function.
Also the point you mentioned that dumping the CB (ring buffer) magically unblocks the RX task could be a hint to double check the CB (parsing) code.

1 Like

Maybe a silly question, but does your function process_read_data() process all the data in the RX buffer? Not just a single byte. And is it written to tolerate no data in the RX buffer?

Hi Jeff,
It searches the data for “packets”, and while there are available full packets, it will continue to process, and yes it’s written to tolerate no data.

From what I can see when the RX task appears stuck - the data in the CB is perfectly valid / fine - no reason for there to be an issue on that side.

The data from the ISR goes into a circular buffer ( Adam Rosenfield’s implementation: https://stackoverflow.com/a/827749/2167199 )

The RX task processes data with:

void process_read_data(circular_buffer *uart_cb){
 uint16_t packet_size = 0;

 //check for full packet(s):
 while (cb_has_uart_packet(uart_cb, &packet_size) == kStatus_Success){
   for (uint16_t i=0; i < packet_size; i++){
     //get packet bytes from circular buffer:
     cb_pop_front(uart_cb, &(packet_rx_buf[i]));
   }
   //send events here
 }
}

and that calls this -

status_t cb_has_uart_packet(circular_buffer *cb, uint16_t *packet_size ){
  uint8_t data;
  uint8_t *ptr = cb->tail;
  uint8_t payload_size = 0;
  uint16_t pkt_size = 0;

  //ensure we're starting w/ a start packet
  while (*ptr != 0x02/*PKT_START*/){
    if (cb_pop_front(cb, &data) != kStatus_Success){
      return kStatus_Fail;;
    }
    ptr = cb->tail;
  }
  if(cb->count < 6){
    // not enough data for a full packet (start, size, cmd, checksum1/2, end)
    return kStatus_Fail;
  }
  ptr = cb->tail;
  //get size:
  if (ptr == cb->buffer_end){
    ptr = cb->buffer;
  } else {
    ptr++;
  }
  payload_size = *ptr;
  pkt_size = payload_size + 5;
  if (cb->count >= pkt_size){
    *packet_size = pkt_size;
    return kStatus_Success;
  }
  return kStatus_Fail;
}


I have a dump at freeze of 6 perfectly valid packets (each of 7 bytes that the parser handles usually without problem.

  • If the cb_has_uart_packet fails to find a packet, it will empty the CB and then return fail.
  • If there’s not enough data - returns fail.
  • If there’s a packet, it pulls it off the CB (and thus it will be removed from the CB), and then events are sent.

Since these packets are still in the CB - it means the parser never read them…
From what I’m seeing - parsing the CB isn’t where RX would get stuck… I’ll dig a little deeper

I think there is a bug here:

I think you mean:

  ptr++;
  if (ptr == cb->buffer_end){
    ptr = cb->buffer;
  }

I think this bug would cause your symptoms. Occasionally, you would read the size byte from the first byte of memory after your RX buffer. That size byte could fool your parsing algorithm into waiting for additional bytes when it should not wait.

1 Like

thanks Jeff!!!
With that fix, so long as I keep the UART RX thread at max pri - RX doesn’t seem to be freezing up or dropping packets.

As soon as I drop the UART RX task PRI down to 4 or 3 - losing data again, but I suppose that’s to be expected to some extent perhaps. Will keep it at 5 for the time being and hope it doesn’t fight with other tasks too much!

Thanks Harmut - you were ultimately right - as Jeff figured out for me ^^^!