Task switching thrashing

I have a fairly complex system with several tasks. I recently added a Wilcoxon 786A general purpose accelerometer plugged into a CN-0540 24-Bit Data Acquisition System for IEPE Sensors. The AD7768-1 Precision 24-Bit ADC on the CN-0540 has a continuous read mode that (in one common configuration) produces a Data Ready (DRDY) pulse every 64 us, or 15.62 kHz.

For each DRDY, my application needs to read a few bytes of sample data over SPI. I’m running this on a NUCLEO-L496ZG with an STM32 at 80 MHz. I need to run this task at a very high priority or it misses long stretches of samples. What I didn’t anticipate is that this task practically takes over the system. Nothing else gets a chance to run. If I cut the sample rate in half, the (fairly optimized, at this point) SPI reader task takes “only” 62% of the processor, according to vTaskGetRunTimeStats. I speculate that FreeRTOS is getting bogged down in context switching.

There are two interrupts in the path for each DRDY. The first is a GPIO interrupt on the DRDY line itself:

void drdy_interrupt() {
	if (drdy_notify_handle) {
		BaseType_t xHigherPriorityTaskWoken = pdFALSE;
		vTaskNotifyGiveIndexedFromISR(drdy_notify_handle, 
				NOTIFICATION_IX_ADC_DRDY, &xHigherPriorityTaskWoken);
		portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
	}
}

The SPI reader task blocks on a ulTaskNotifyTakeIndexed, waiting for this notification. The SPI read (potentially) blocks on a similar vTaskNotifyGiveIndexedFromISR/ulTaskNotifyTakeIndexed notification for the SPI Receive Buffer Not Empty (RXNE) interrupt.

There is very little processing involved; I think FreeRTOS just doesn’t like the frequency of interruption.

I’m looking for suggestions on how I can make this work better. A dual core processor would be nice!

Coincidentally, I’m looking at a similar-sounding problem here: I have a timer interrupt triggering 4000 times per second (placing requests on a different task’s message queue).

The problem is that when that task wakes up and tries to grab that message from its queue, the taskENTER_CRITICAL() call at the start of xQueueReceive() is taking ~80usec to complete the first time round. There’s a similar ~80usec pause at the start of the Tmr Svc task (that is also being woken up.) Multiply 80usec x 2 tasks x 4000, and you quickly run out of processor bandwidth. (I got these figures from Segger SystemView.)

I don’t know what’s causing it yet, but I can’t help but wonder if this is also what you’re seeing.

At those sorts of rates, I would have the DRDY ISR trigger a DMA based SPI transaction and the SPI data complete interrupt gather the data into a buffer to be processed at some rate, and THAT is when a task gets involved.

Otherwise, YES, running out of processor bandwidth is a real issue.

And likely the vendor supplied libraries will be of limited use in implementing this.

Great idea! Thanks.

I have partially implemented it. In this case, the STM32L47xxx, STM32L48xxx, STM32L49xxx and STM32L4Axxx SPI has separate 32-bit embedded Rx and Tx FIFOs, and the data that I need to transmit & receive for each sample fits completely in those. The DRDY ISR fills the Tx buffer. The SPI RXNE interrupt sends a notification to the read task which pulls the data from the Rx buffer. Now there is only one vTaskNotifyGive/ulTaskNotifyTake in the path instead of two.

I will have to look into using DMA to automatically accumulate the read data into a larger buffer.

Since the data fits into the device FIFOs, you might be able to skip the DMA and just have the SPI completion interrupt trigger the ISR to read the couple of bytes and store the part with actual data into the larger buffer.

DMA would give slightly lower processor load, but likely it could be done in the ISR.

1 Like

Ah, yes. No need to overcomplicate things. A couple of static variables – a counter and a pointer – the SPI completion ISR stores the data in the destination buffer, and when the buffer is full notifies the waiting task. That looks very efficient, and easy to implement.

That works beautifully. It runs fine at the highest ADC output data rate. Now, the processor clocks 88% of its time in the IDLE state, even though in addition to collecting the ADC data, I’m calculating an FFT on it and graphing that on an LCD every half second.

Thanks Richard!

1 Like