Periodic task suspended and never runs again

Hello,
I am running FreeRtos on an Atmel SAME54. I have 11 tasks running (including IDLE and TmrSvc).

One task is giving me trouble.

void MyTask()
{
const TickType_t xFrequency = 250 / portTICK_RATE_MS;
xLastWakeTime = xTaskGetTickCount();
while (1)
{
vTaskDelayUntil(&xLastWakeTime, xFrequency);
set spi chip select low
write to spi bus (this includes waiting portMAXDELAY on a semaphore)
set spi chip select high
}
}

This task takes 8 or 9 msec to complete the loop.
I checked each task’s stack pointer, and none are overflowed. Assert is enabled, as is stack overflow checking.

After a while (hours), MyTask stops running. After poking around with the debugger, I find:

xTickCount = 10652969

MyTask is on the SuspendedTaskList,
ItemValue = 9075774
pxTopOfStack has changed from 0x20003224 (when running normally) to 0x200031fc

The semaphore that the SPI driver uses points to a Queue which includes a member, xTasksWaitingToReceive = 0x20001bd8
fields include:
uxNumberOfItems = 0xFFFFFFFF
pIndex = 0x20001bd8
xListEnd = 0x20001bd8

The chip select for SPI is high (as reported by the debugger).

It seems to me that MyTask is not waiting on a semaphore, and it’s delay time has expired. The spi chip select is high, so the spi transfer must have completed.
How did it get here?
Why did the pxTopOfStack change?
How can I get out of this?
Any thoughts on what else to look at?

Thanks

I guess xLastWakeTime in real code also a local variable, right ?
When stopping the target what’s the call stack / where exactly is the task blocked ?
Is the SPI bus also used by other tasks ?

cn ou show us the code tht posts the semphore?

Yes, xLastWakeTime is a local variable.
Unfortunately, my IDE (Atmel Studio 7) doesn’t show the call stack for each task, only for the one currently running (usually IDLE).
The SPI bus is only used by this task.

The semaphore used by the SPI is in the driver code provided by Atmel.

/**

  • \brief Semaphore up
    */
    int32_t sem_up(sem_t *sem)
    {
    return is_in_isr() ? (xSemaphoreGiveFromISR(*sem, pdFALSE) ? 0 : ERR_ABORTED)
    : (xSemaphoreGive(*sem) ? ERR_NONE : ERR_ABORTED);
    }

/**

  • \brief Semaphore down, may suspend the caller thread
    */
    int32_t sem_down(sem_t *sem, uint32_t timeout)
    {
    return xSemaphoreTake(*sem, timeout) ? ERR_NONE : ERR_TIMEOUT;
    }

static void spi_m_os_tx(struct _spi_m_async_dev *dev)
{
struct spi_m_os_descriptor *spi = CONTAINER_OF(dev, struct spi_m_os_descriptor, dev);

if (!(dev->char_size > 1)) {
_spi_m_async_write_one(dev, spi->xfer.txbuf[spi->xfercnt++]);
} else {
_spi_m_async_write_one(dev, ((uint16_t *)spi->xfer.txbuf)[spi->xfercnt++]);
}

if (spi->xfercnt >= spi->xfer.size) {
_spi_m_async_enable_tx(dev, false);
sem_up(&spi->xfer_sem);
}
}

/**

  • \brief Callback for RX
  • \param[in] dev Pointer to the SPI device instance.
    */
    static void spi_m_os_rx(struct _spi_m_async_dev *dev)
    {
    struct spi_m_os_descriptor *spi = CONTAINER_OF(dev, struct spi_m_os_descriptor, dev);

if (!(dev->char_size > 1)) {
/* 8-bit or less /
spi->xfer.rxbuf[spi->xfercnt++] = (uint8_t)_spi_m_async_read_one(dev);
} else {
/
9-bit or more */
((uint16_t *)spi->xfer.rxbuf)[spi->xfercnt++] = (uint16_t)_spi_m_async_read_one(dev);
}

if (spi->xfercnt < spi->xfer.size) {
if (spi->xfer.txbuf) {
// MDF 3/13/20 Correction. dev->char_size is set to 1 or 2 bytes
//if (dev->char_size == SPI_CHAR_SIZE_8) {
if (dev->char_size <= 1) {
_spi_m_async_write_one(dev, spi->xfer.txbuf[spi->xfercnt]);
} else {
_spi_m_async_write_one(dev, ((uint16_t *)spi->xfer.txbuf)[spi->xfercnt]);
}
} else {
_spi_m_async_write_one(dev, dev->dummy_byte);
}
} else {
_spi_m_async_enable_rx(dev, false);
sem_up(&spi->xfer_sem);
}
}

/**

  • \brief Callback for error
  • \param[in] dev Pointer to the SPI device instance.
  • \param[in] status Error status.
    */
    static void spi_m_os_error(struct _spi_m_async_dev *dev, int32_t status)
    {
    struct spi_m_os_descriptor *spi = CONTAINER_OF(dev, struct spi_m_os_descriptor, dev);

if (status == 0) {
return;
}
_spi_m_async_enable_tx(dev, false);
_spi_m_async_enable_rx(dev, false);

spi->error = status;
sem_up(&spi->xfer_sem);
}
int32_t spi_m_os_transfer(struct spi_m_os_descriptor *const spi, uint8_t const *txbuf, uint8_t *const rxbuf,
const uint16_t length)
{
ASSERT(spi && txbuf && rxbuf);

/* Fill transfer descriptor */
spi->xfer.rxbuf = (uint8_t *)rxbuf;
spi->xfer.txbuf = (uint8_t *)txbuf;
spi->xfer.size = length;
spi->xfercnt = 0;
_spi_m_async_enable_rx(&spi->dev, true);
_spi_m_async_write_one(&spi->dev, txbuf[spi->xfercnt]);

if (0 != sem_down(&spi->xfer_sem, ~0)) {
return ERR_TIMEOUT;
}

return ERR_NONE;

}

Here’s the initialization for the SPI semaphore:

#define SEMAPHORE_MAX_COUNT 1

/**

  • \brief Semaphore initialization
    */
    int32_t sem_init(sem_t *sem, uint32_t count)
    {
    ASSERT(count <= SEMAPHORE_MAX_COUNT);

    *sem = xSemaphoreCreateCounting((uint32_t)SEMAPHORE_MAX_COUNT, count);

    return *sem ? ERR_NONE : ERR_NOT_INITIALIZED;
    }

Is it possible to verify that the SPI Tx Complete ISR that is supposed to give the semaphore did fire? You can put a breakpoint in the ISR or set a GPIO and monitor that?

Should this code not be under control of a critical section? What value is the Tx semaphore initalized with (where is the call to sem_init on the xfer sem)? It better be 1. Generally, using a counting semaphore imho for this kind of driver implementation is not a good practice, although I can see that the library developers probably only wanted on centralized point for syncing.

aggarg,
I am making an assumption that it fired, based on the fact that the chip select is high. The call to spi_m_os_transfer() must have returned for that to happen. There is no timeout on the xSemaphoreTake() call.

A breakpoint is not useful to find this error, because it runs correctly for hours. I have set a breakpoint there, just to see what happens in the os, and is does fire. After myTask hangs, is there a way to look at the sempahore to tell what state it is in?

RAc,
I don’t think it’s necessary in this case. The calls to spi_m_os_transfer() come from only one task. The buffers (rxbuf, txbuf) are static char buffers declared in the calling module and are initialized just before calling spi_m_os_transfer().

Does the ISR continue firing even after your task appears to hang?

The only ISR involved here is when the SPI completes a transfer, and no it does not fire after the task hangs.

which makes sense because from your desription of the control flow, it can only become active when being triggered by the task.

Put counters in your code. How many transfers were initiated and how many were completed at hang time?

RAc,
Thank you for your suggestion. I put counters all over the place and eventually isolated the problem to a message queue that had an infinite timeout. I changed the timeout to 0, and now the task never hangs.

Thank to everyone for the help.

1 Like

As a general rule, I try to avoid using “infinite” timeout, and ALWAYS check the return value of the operation, and handle a timeout error appropriately (which might be to just ignore it).

1 Like