CAN message gets corrupt inside a FreeRTOS task

EduardoGoncalves1966 · June 3, 2021, 7:37am

Hi

I was able to create a very strange bug!

I’m using an ATSAMV71-XULT board with Atmel Studio 7 and FreeRTOS 10.0

I have a function (that I list below) that keeps sending a CAN message.
If I call this function outside a task of FreeRTOS it works fine.
However if I call it from a FreeRTOS task the data that comes out is corrupt, although when I follow the code with the debugger all seems well.

Obviously this is an impossible question, but any ideas what could be wrong or how I can gather more information?

One of the obvious culprits would be the stack of the FreeRTOS stack but that is set to 10000 bytes, so I don’t think that may be the one. Also I don’t see that many usage of the stack.

thanks

static uint8_t Txfinished = false;

void CAN_1_tx_callback_(struct can_async_descriptor *const descr)
{
	Txfinished = true;
}

void CAN1test (void)
{
	struct can_message msgTx;

	can_async_register_callback(&CAN_1, CAN_ASYNC_TX_CB, (FUNC_PTR)CAN_1_tx_callback_);

	can_async_enable(&CAN_1);

	struct can_filter filter;
	filter.id   = 0;	filter.mask = 0;		// accept every CAN id
	can_async_set_filter(&CAN_1, 0, CAN_FMT_EXTID, &filter);

	msgTx.data = "ola";
	msgTx.id = 0x4455;
	msgTx.len = strlen(msgTx.data);
	msgTx.type = CAN_TYPE_DATA;
	msgTx.fmt  = CAN_FMT_EXTID;

	while(1)
	{
		Txfinished = false;
		can_async_write(&CAN_1, &msgTx);

		while (!Txfinished);

		for(int i=0; i<10000000; i++);

		can_async_enable(&CAN_1);
	}
}


int main_CSPv71(void)
{
	NVIC_SetPriority ( UART1_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY );
	NVIC_SetPriority ( MCAN1_INT0_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY );
	NVIC_SetPriority ( MCAN1_INT1_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY );


void CAN1test (void);
CAN1test();  **<<==  Here it works fine!**


	if ( xTaskCreate ( main_CSP_task, "main_CSP_task", TASK_STACK_SIZE, NULL, TASK_PRIORITY, main_CSP_task_TH ) != pdPASS )
		while(1);


static void main_CSP_task(void *p)
{
    void CAN1test (void);
    CAN1test();  **<<== Here it does not work, corrupt data comes out, but the debugger shows valid data**

hs2 · June 3, 2021, 8:55am

Almost certainly TASK_STACK_SIZE is too small because I bet it’s a stack overflow problem.
I strongly recommend to define configASSERT and also enable stack overflow checking for development/debugging in case it’s not already done.
Note that the stack overflow checks can’t catch all overflows in all situations, but it helps a lot.

EduardoGoncalves1966 · June 3, 2021, 9:15am

Amazing! Even with 10000 bytes of stack?

In FreeRTOSConfig.h I have:

#define TASK_STACK_SIZE (10000 / sizeof(portSTACK_TYPE))
#define TASK_PRIORITY (tskIDLE_PRIORITY +1)

#ifndef configMINIMAL_STACK_SIZE
#define configMINIMAL_STACK_SIZE ((uint16_t)64)
#endif

#ifndef configTOTAL_HEAP_SIZE
#define configTOTAL_HEAP_SIZE ((size_t)(128*1024))
#endif

I’m creating the tasks as:

	if ( xTaskCreate ( main_CSP_task, "main_CSP_task", TASK_STACK_SIZE, NULL, TASK_PRIORITY, main_CSP_task_TH ) != pdPASS )
		while(1);

So I should get a task stack size of 10000 bytes (or 2500 32bits words)?

Could it happen that for whatever reason the configMINIMAL_STACK_SIZE of 64 is being used?
Is what circumstances is configMINIMAL_STACK_SIZE used?

thanks

hs2 · June 3, 2021, 9:32am

Ok - 10kB stack should be far enough.
configMINIMAL_STACK_SIZE is minimal the stack size a very simple task requires or needed by FreeRTOS itself to run it.
You get the stack size you specify for xTaskCreate or it returns an error if the stack can’t be allocated, of course.

This delay loop is broken. It will get optimized out by the compiler at least when using optimization > 0.
Maybe that’s the cause for the corruption.
When using FreeRTOS you can and should use xTaskDelay and friends.

EduardoGoncalves1966 · June 3, 2021, 9:37am

I have tried

	if ( xTaskCreate ( main_CSP_task, "main_CSP_task", **30000**/*TASK_STACK_SIZE*/, NULL, TASK_PRIORITY, main_CSP_task_TH ) != pdPASS )
		while(1);

And still it fails! And 30000 means a stack of 30000*4 bytes.

configASSERT is defined in FreeRTOSConfig.h as:

#define configASSERT(x)                                                                                                \
	if ((x) == 0) {                                                                                                    \
		taskDISABLE_INTERRUPTS();                                                                                      \
		for (;;)                                                                                                       \
			;                                                                                                          \
	}

Also in FreeRTOSConfig.h configCHECK_FOR_STACK_OVERFLOW is defined as:

#ifndef configCHECK_FOR_STACK_OVERFLOW
#define configCHECK_FOR_STACK_OVERFLOW 2
#endif

I have no optimization, -O0, that for loop causes a delay of about 500ms. I just use a for loop because this CANtest function is also used outside FreeRTOS, before calling vTaskStartScheduler().
Nevertheless CANtest() will not make to the final code. I just created it because I’m having the same corruption when I run the real code, and I was able to realize that this problem only happens when FreeRTOs is running. It doesn’t happen before starting the scheduler.

hs2 · June 3, 2021, 9:43am

Are you sure that the delay really works ?
At least you should do the volatile i trick to ensure the loop is executed as expected.
Or you write your own reliable (microsecond) delay() function.
For FreeRTOS testing better just try/use xTaskDekay.
From the code you posted I guess Txfinished is set by the ISR (caling the callback), right ? This flag should also be declared volatile. Otherwise the code may or may not work.

hs2 · June 3, 2021, 9:59am

Finally the problem might be in the ISRs and their interrupt priorities.
If the interrupt priority is set that it’s covered by FreeRTOS configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY to make use of FreeRTOS API calls in those ISRs their interrupts get disabled for a short period of time when FreeRTOS enters a critical section either internally or by application code.
That’s a difference between running the test before and after the scheduler was started.
If you currently don’t use any FreeRTOS API in the ISRs try setting the priorities higher (numerically lower) than configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY.
Do you have very tight ISR/interrupt timing requirements ?

EduardoGoncalves1966 · June 3, 2021, 10:01am

yes I can see that it generates a delay.
I have a scope connected to the CANbus.
It works both in FreeRTOS and outside FreeRTOS.
Due to the callback nature of the CAN API even when the delay is 0 it works fine.

static volatile uint8_t Txfinished = false;

void CAN_1_tx_callback_(struct can_async_descriptor *const descr)
{
	Txfinished = true;
}

while (1)
{
		Txfinished = false;
		can_async_write(&CAN_1, &msgTx);

		while (!Txfinished);
}

The Txfinished variable does the trick!
It’s just the data that becomes corrupt.

EduardoGoncalves1966 · June 3, 2021, 10:02am

I thought about the priorities and I’m using NVIC_SetPriority

NVIC_SetPriority ( MCAN1_INT0_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY );
NVIC_SetPriority ( MCAN1_INT1_IRQn, configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY );

Is this correct? thanks

hs2 · June 3, 2021, 10:08am

Yes. With this setting the interrupts have the highest possible priority covered by FreeRTOS.
Cortex-M interrupt priorities are not easy to get right.
IMO this post Understanding priority levels of ISR and FreeRTOS APIs - #16 by aggarg contains a good and easy to understand explanation. This is also a good posting.

EduardoGoncalves1966 · June 3, 2021, 10:54am

Hi

I see that Atmel library is doing some ‘critical’ stuff when the CAN interface is being initialised!


void atomic_enter_critical(hal_atomic_t volatile *atomic)
{
	*atomic = __get_PRIMASK();
	__disable_irq();
	__DMB();
}

/**
 * \brief Exit atomic section
 */
void atomic_leave_critical(hal_atomic_t volatile *atomic)
{
	__DMB();
	__set_PRIMASK(*atomic);
}


#define MCAN_CRITICAL_SECTION_ENTER() CRITICAL_SECTION_ENTER()
#define MCAN_CRITICAL_SECTION_LEAVE() CRITICAL_SECTION_LEAVE()

#define CRITICAL_SECTION_ENTER()  \
	{   volatile hal_atomic_t __atomic;		atomic_enter_critical(&__atomic); 

#define CRITICAL_SECTION_LEAVE()  \
	atomic_leave_critical(&__atomic); }

static inline void hri_mcan_write_TXBC_reg(const void *const hw, hri_mcan_txbc_reg_t data)
{
	MCAN_CRITICAL_SECTION_ENTER();
	((Mcan *)hw)->MCAN_TXBC = data;
	MCAN_CRITICAL_SECTION_LEAVE();
}

static inline void hri_mcan_clear_TXBC_reg(const void *const hw, hri_mcan_txbc_reg_t mask)
{
	MCAN_CRITICAL_SECTION_ENTER();
	((Mcan *)hw)->MCAN_TXBC &= ~mask;
	MCAN_CRITICAL_SECTION_LEAVE();
}

static inline void hri_mcan_toggle_TXBC_reg(const void *const hw, hri_mcan_txbc_reg_t mask)
{
	MCAN_CRITICAL_SECTION_ENTER();
	((Mcan *)hw)->MCAN_TXBC ^= mask;
	MCAN_CRITICAL_SECTION_LEAVE();
}

Can this somehow interfere with FreeRTOS?
Although this initialisation takes place before the FreeRTOS scheduler is started!

thanks

hs2 · June 3, 2021, 12:09pm

I think using multiple, different implements for critical sections is problematic.
I wonder if the Atmel lib is intended to be used together with FreeRTOS.
FreeRTOS provides taskENTER/EXIT_CRITICAL_FROM_ISR() for critical sections in ISRs and without the FROM_ISR suffix for task code.
Maybe the atmel specific critical sections are also used later on when using their driver from the task. This might be a problem since FreeRTOS and the driver code could both manipulate the (low level) PRIMASK.

EduardoGoncalves1966 · June 3, 2021, 1:03pm

I have created a request for support with Atmel regarding this. Maybe they have some explanation for what is happening.

thanks for the support.

rtel · June 3, 2021, 3:23pm

FreeRTOS manipulates the BASEPRI register - which just masks interrupts below a certain logical priority. The code from Atmel is manipulating PRIMASK, which is effectively disabling all interrupts. That in itself should not be a problem for FreeRTOS.

rtel · June 3, 2021, 3:30pm

A couple of other thoughts - where is msgTx declared? Make sure it is not on the stack of main() as that stack will no longer be in context once the scheduler starts and is actually re-claimed for use as the interrupt stack.

What is the revision number of the Cortex-M7 core? There is a specific FreeRTOS port for r0p1 cores that works around some silicon bugs in regards to how interrupt disabling works. If you have an r0p1 core then make sure you use that specific version, otherwise you should be able to use the Cortex-M4F kernel port.

hs2 · June 3, 2021, 3:33pm

@rtel

called in main_CSP_task. That should be ok.

EduardoGoncalves1966 · June 3, 2021, 3:37pm

The functions that call the critical enter and leave are I believe only used during initialisation of the Atmel CPU, just after entering main.

int main(void)
{
	/* Initializes MCU, drivers and middleware */
	atmel_start_init();       <<== Calls to critical enter and leave are made here

	/* Replace with your application code */
	while (1)
	{
		extern int main_CSPv71(void);

		main_CSPv71();

EduardoGoncalves1966 · June 3, 2021, 3:43pm

I’m using a ATSAMV71-XULT board which has a ATSAMV71Q21 AAB processor.
The FreeRTOS was taken from Atmel Start and it’s version 10.0.0

I can send you the file port.c that I have. How can I attach a file in the forum?

aggarg · June 3, 2021, 5:28pm

Another point - How many other tasks do you have? The reason I ask is to determine the possibility of the task calling CAN1test getting interrupted at a point where it should not have been. An easy way to determine can be to raise main_CSP_task priority.

Thanks.

EduardoGoncalves1966 · June 3, 2021, 6:38pm

At this moment main_CSP_task is the only task running.
The other tasks are created later on, but since CAN1test() is in a forever loop it doesn’t end.