CAN message gets corrupt inside a FreeRTOS task

aggarg · June 3, 2021, 9:49pm

Did you try raising it’s priority?

Also, do ISR for MCAN1_INT0_IRQn and MCAN1_INT1_IRQn use any FreeRTOS API? If not, you can increase their priority as well.

EduardoGoncalves1966 · June 7, 2021, 7:54am

Hi
thanks for the suggestions.

The CAN ISR does not use any FreeRTOS API, it sets a flag only.

void CAN_1_tx_callback_(struct can_async_descriptor *const descr)
{
	Txfinished = true;
}

I have changed the CAN ISR priorities from:

NVIC_SetPriority ( MCAN1_INT0_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY );
NVIC_SetPriority ( MCAN1_INT1_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY );

to:

NVIC_SetPriority ( MCAN1_INT0_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY +1 );
NVIC_SetPriority ( MCAN1_INT1_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY +1 );

But no changes in the outcome.

hs2 · June 7, 2021, 8:33am

Use configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY -1 for Cortex-M NVIC to bump the priority (above the FreeRTOS covered range).

EduardoGoncalves1966 · June 7, 2021, 8:42am

Changed to:

NVIC_SetPriority ( MCAN1_INT0_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY -1 );
NVIC_SetPriority ( MCAN1_INT1_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY -1 );

Still the same result, all CAN data comes out corrupted.

hs2 · June 7, 2021, 9:03am

Is it possible that there is a caching issue in the driver ? It might be worth to test the driver with disabled data cache / MPU memory range configuration.

EduardoGoncalves1966 · June 7, 2021, 9:18am

An interesting point.
The CAN driver is part of Atmel Start (ASF) so it wasn’t made by myself.
I’ll try to look to the code and maybe something shows up…
It must be said that two memory areas are declared in the CAN driver:

COMPILER_ALIGNED(4)
uint8_t can0_rx_fifo[CONF_CAN0_F0DS * CONF_CAN0_RXF0C_F0S];
COMPILER_ALIGNED(4)
uint8_t can0_tx_fifo[CONF_CAN0_TBDS * CONF_CAN0_TXBC_TFQS];

each of 96 bytes. 16 bytes for each CAN message, 6 messages in total.
Using the debugger I can see this areas properly filled with several messages (in both cases, when the CAN test routine is running under FreeRTOS or standalone).

struct _can_context _can0_context = {.rx_fifo       = can0_rx_fifo,
                                     .tx_fifo       = can0_tx_fifo,
                                     .tx_event      = can0_tx_event_fifo,
                                     .rx_std_filter = can0_rx_std_filter,
                                     .rx_ext_filter = can0_rx_ext_filter};

Instead of Cache could DMA create the same strange result?

hs2 · June 7, 2021, 10:33am

Caching issues are coupled with DMA powered drivers. When everything is done with the CPU there is no problem. Is DMA used in the driver ?
However, instead of disabling data cache yopu can also use the cache managment functions to flush/invalidate data buffers.
I think when defining e.g. uint8_t can0_tx_fifo and you want to put it into a specific MPU region you’ve to use a dedicated section ( GCC: __attribute__((<section>)) ) corresponding this MPU region.
Your current can0_rx/tx_fifo are linked into compiler default bss section.

EduardoGoncalves1966 · June 8, 2021, 10:43am

Hi

I got an answer from Microchip/Atmel regarding this corruption of CAN data when running FreeRTOS:

We have seen similar issues while using MCAN with Freertos. The issue had been narrowed down to cache coherence issue in the Cortex M7 based sam MCU’s. One workaround would be to disable cache to see whether it solves the issue.
For more details on cache coherence issue, please refer to the below techbrief:

Managing-Cache-Coherency-on-Cortex-M7-Based-MCUs-DS90003195A.pdf

Still, when I change the code (see below) disabling the cache, the same problem persists.
Just wondering if you have more suggestions or if the disabling of the DCache should be done somewhere else. thanks.

int main(void)
{
	/* Initializes MCU, drivers and middleware */
	atmel_start_init();

	SCB_DisableDCache();		// attempt to avoid CAN1 corruption in FreeRTOS
	
	while (1)
	{
		extern int main_CSPv71(void);
		main_CSPv71();
    }
}

I have moved the call to SCB_DisableDCache() to just before the call to vStartScheduler() and it ends in the Dummy_Handler().
By calling also SCB_InvalidateDCache() before, it stops crashing but still doesn’t work.
I also moved the calls to inside the FreeRTOS task that is sending the CAN message and it still doesn’t work.

	SCB_InvalidateDCache();
	SCB_DisableDCache();

EduardoGoncalves1966 · June 8, 2021, 2:12pm

Hi
I’m no longer sure that the DCache is enabled after reset.
I remember using a STM32F767 some weeks ago where the both DCache and ICache were disable after reset.

In fact if I add a call to SCB_EnableDCache() in main() the the program runs in a different way

int main(void)
{
atmel_start_init();
SCB_EnableDCache();
…

A for loop that gives 1 second delay runs much faster giving around 65 milliseconds and the CAN starts sending more interesting data.

can0 000 [5] A5 A5 A5 A5 A5 <<== FreeRTOS unused stack???!!!
can0 169 [5] remote request
can0 000 [0]
can0 000 [0]
can0 000 [0]

Again if I run the CANtest routine outside FreeRTOS with DCache enabled I get this
can0 000 [0]
can0 000 [0]
can0 000 [0]
and once again the for loop runs very fast!

So I would conclude that in my code the DCache is not enabled!!!
Could I be wrong?
If I’m right what else could be?

hs2 · June 8, 2021, 2:49pm

So uses the CAN driver DMA ?
If D-cache is enabled you’ve to SCB_CleanDCache_by_Addr the buffer going to be DMAed. Otherwise DMA might transfer arbitrary data (previously) stored in RAM.
At least for testing I’d keep/ensure the D-cache disabled (default after reset) and maybe deal with that later on.
I afraid you’ve to step through the driver code to see what’s going on

EduardoGoncalves1966 · June 8, 2021, 3:15pm

I have tried to call SCB_CleanDCache_by_Addr but it doesn’t alter the situation.
This driver code is supplied by Atmel from Atmel Start.


/**
 * \brief Write a CAN message
 */
int32_t _can_async_write(struct _can_async_device *const dev, struct can_message *msg)
{
	struct _can_tx_fifo_entry *f = NULL;
	hri_mcan_txfqs_reg_t       put_index;

	if (hri_mcan_get_TXFQS_TFQF_bit(dev->hw)) {
		return ERR_NO_RESOURCE;
	}

	put_index = hri_mcan_read_TXFQS_TFQPI_bf(dev->hw);
	
PUT_INDEX = put_index;

#ifdef CONF_CAN0_ENABLED
	if (dev->hw == MCAN0) {
		f = (struct _can_tx_fifo_entry *)(can0_tx_fifo + put_index * CONF_CAN0_TBDS);
	}
#endif
#ifdef CONF_CAN1_ENABLED
	if (dev->hw == MCAN1) {
		f = (struct _can_tx_fifo_entry *)(can1_tx_fifo + put_index * CONF_CAN1_TBDS);
	}
#endif
	if (f == NULL) {
		return ERR_NO_RESOURCE;
	}

F = f;

	f->T0.bit.RTR = (msg->type == CAN_TYPE_REMOTE) ? 1 : 0;

	if (msg->fmt == CAN_FMT_EXTID) {
		f->T0.val     = msg->id;
		f->T0.bit.XTD = 1;
	} else {
		/* A standard identifier is stored into ID[28:18] */
		f->T0.val = msg->id << 18;
	}

	if (msg->len <= 8) {
		f->T1.bit.DLC = msg->len;
	} else if (msg->len <= 12) {
		f->T1.bit.DLC = 0x9;
	} else if (msg->len <= 16) {
		f->T1.bit.DLC = 0xA;
	} else if (msg->len <= 20) {
		f->T1.bit.DLC = 0xB;
	} else if (msg->len <= 24) {
		f->T1.bit.DLC = 0xC;
	} else if (msg->len <= 32) {
		f->T1.bit.DLC = 0xD;
	} else if (msg->len <= 48) {
		f->T1.bit.DLC = 0xE;
	} else if (msg->len <= 64) {
		f->T1.bit.DLC = 0xF;
	}

	f->T1.bit.FDF = hri_mcan_get_CCCR_FDOE_bit(dev->hw);
	f->T1.bit.BRS = hri_mcan_get_CCCR_BRSE_bit(dev->hw);

	memcpy(f->data, msg->data, msg->len);

SCB_CleanDCache_by_Addr ( (uint32_t*)f, sizeof(struct _can_tx_fifo_entry) );  // <<== Flushing the cache area that contains struct _can_tx_fifo_entry

	hri_mcan_write_TXBAR_reg(dev->hw, 1 << hri_mcan_read_TXFQS_TFQPI_bf(dev->hw));
	return ERR_NONE;
}

I have stepped a lot through the driver code… and I can see struct _can_tx_fifo_entry being filled properly.

Could I been using the function SCB_CleanDCache_by_Addr in the wrong way?
f is address of struct _can_tx_fifo_entry.

If I disable the cache by calling SCB_DisableDCache() it gives the same result.

I think FreeRTOS does not enable the cache. true?
Being that the case if it is a Cache issue why does it work outside FreeRTOS and not inside?

RAc · June 8, 2021, 4:06pm

Hi Eduardo,

you can always examine the registers that control the cache. Very easy test, indeed: Break into the code any time after the OS has started and look at the SCB registers.

That’s something I would advise anyways: Run your test suite without FreeRTOS, then take a snapshot of all MCU registers you can get hold of, then do the same with FreeRTOS enabled and compare the output.

If that doesn’t yield anything useful, I’d configure FreeRTOS tickless and non preemptive (meaning you won’t get either sys tick or svc interrupts), see if you still experience the problems and if yes, dump your registers again. If not, enable either sys tick or svc interrupt, then check again. And so on…

hs2 · June 8, 2021, 4:31pm

Again, I’d propose to keep D-cache globally disabled until it works.
One difference between running the test in main and in the task context is that the stack memory (region ?) is different. In main it’s the main stack and in the task context it’s the stack setup by FreeRTOS in xTaskCreate (using pvPortMalloc).
Depending on the MPU region setup and/or enabled caches the behavior might be different.