FreeRTOS + TCP with hard FPU on Xilinx ZynqMP UltraScale CortexR5

ChristosZosi · November 7, 2022, 12:50pm

Hello,

we are using the FreeRTOS-Plus-TCP library in our firrmware on the Xilinx ZynqMP UltraScale CortexR5 processor. The version of the FreeRTOS+TCP is V2.3.2 LTS Patch 1 and of the FreeRTOS Kernel V10.4.3.

The Cortex-R5 has a Floating point unit, which we are also making use of it. We also compile our firmware with the -mfloat-abi=hard -mfpu=vfpv3-d16 flags, which enable the compiler to use the FPU for optimizations. For the FreeRTOS-Kernel we use the portable/GCC/ARM_CR5 port.

So, we had the following issue with FreeRTOS + TCP library. At first everything was working fine and all as expecting. However, after some time down the line of the firmware development we had an issue in our release build, which we compile with -O2. The assertion configASSERT( !( ( pvItemToQueue == NULL ) && ( pxQueue->uxItemSize != ( UBaseType_t ) 0U ) ) ); in the function xQueueGenericSend(...) in queue.c file would kick, when an ethernet link couldn’t be established.

After some digging, we find out that the cause of the assertion was that the pvItemToQueue was NULL, however the pxQueue->uxItemSize was 8. This lead us to root cause of the issue which was when the xSendEventStructToIPTask(...) function would be called by the FreeRTOS_NetworkDown( void ) function in the FreeRTOS-IP.c file, the above assertion would kick.

After some more digging, we find out that the compiler made some FPU optimizations in the IP-Task, however nowhere was declared that the IP-Task made use of the FPU-Registers. This would lead to not save/restored FPU registers in the IP-Task, which was at last the cause of the above issue.

After adding the line vPortTaskUsesFPU(); at top of the prvIPTask()-function, the issue was resolved.

static void prvIPTask( void * pvParameters )
{
    IPStackEvent_t xReceivedEvent;
    TickType_t xNextIPSleep;
    FreeRTOS_Socket_t * pxSocket;
    struct freertos_sockaddr xAddress;

    vPortTaskUsesFPU();

    /* Just to prevent compiler warnings about unused parameters. */
    ( void ) pvParameters;

    /* A possibility to set some additional task properties. */
    iptraceIP_TASK_STARTING();
   
   .....

Now, my question would be, how can we fix this issue, if you also of course think that this is the underlying issue, without making any changes to the library sources, as we want to keep them “clean”, to avoid any merge conflicts when we do update the library.

A suggestion from my side would be to add a flag in the FreeRTOSIPConfig.h, something in the lines of ipconfigIP_TASK_USES_FPU, which then would guard the above call to vPortTaskUsesFPU in the IP-Task.

Thank you in advance.

Kind Regards,

Christos

pete-pjb · November 7, 2022, 1:07pm

Hi @ChristosZosi ,

From my own experience I use the Zynq 7000 port and have found setting the configUSE_TASK_FPU_SUPPORT option in FreeRTOSConfig.h to a value of 2 as below is a good solution:

/* If configUSE_TASK_FPU_SUPPORT is set to 1 (or undefined) then each task will
be created without an FPU context, and a task must call vTaskUsesFPU() before
making use of any FPU registers.  If configUSE_TASK_FPU_SUPPORT is set to 2 then
tasks are created with an FPU context by default, and calling vTaskUsesFPU() has
no effect. */
#define configUSE_TASK_FPU_SUPPORT				2

This appears to work well on the Zynq 7000 and presumably will be good for the Zynq MP port also assuming the port is similar. It adds a FPU context to all processes by default which I think is probably a good idea anyway, that way you won’t get caught out by these types of optimisations in any future processes you create.

I hope that is helpful.

Kind Regards,

Pete

ChristosZosi · November 7, 2022, 1:36pm

Hi @pete-pjb,

thank you for your answer.

And yes, you are absolutely correct about the Zynq 7000 Series, as I also have worked with it. However, the flag #define configUSE_TASK_FPU_SUPPORT seems to be a special case for the Cortex-A9 port, as I have not seen this flag in other ports.

For example in the port for the Cortex R5, this flag does not exist. The only way to declare, a task uses the FPU register is by calling the function vPortTaskUsesFPU(); in the task function.

pete-pjb · November 7, 2022, 1:54pm

Hi @ChristosZosi ,

Sorry my answer is not so helpful, maybe I should have checked the MP port first, my bad. If I have time I might take a look at the Cortex R5 port and see if there is a way to align it with the Zynq (Cortex-A9) although I don’t have any hardware to test it on.
If you have a proposed solution in mind it may be worth submitting a pull request at the Github repository for the FreeRTOS Kernel, IMHO a solution which makes changes to the Cortex R5 port might be preferable to changing the +TCP code as the scope of the problem is probably beyond just the +TCP implementation.

If I am speaking out of turn here I am sure @aggarg , @rtel or @richard-damon will probably comment soon.

Kind Regards,

Pete

ChristosZosi · November 7, 2022, 2:13pm

Hi @pete-pjb ,

thank you again for your quick response.

I think also your suggested solution to make the change on the Cortex R5 port would be a better match. As this would also cover the cases where FPU optimizations might be done for the idle and timer service tasks.

I would be more than pleased, to create a pull request, in which I add the flag #define configUSE_TASK_FPU_SUPPORT to the Cortex R5 port, as it is also done in the Cortex A9 port.

Let me know, if you are open to it.

Kind Regards

Christos

pete-pjb · November 7, 2022, 2:14pm

Hi @ChristosZosi ,

Having looked at the source code the variable ulPortTaskHasFPUContext appears to be a single GLOBAL variable. On that basis can you not just call the function vPortTaskUsesFPU() at some point in your initialisation code but before you start the +TCP task?

Kind Regards,

Pete

ChristosZosi · November 7, 2022, 2:59pm

Hi @pete-pjb,

No, unfortunately this will not work. One must always call the vPortTaskUsesFPU() in each task that uses the FPU registers, for them to be able to be save/restored.

The reason for this is that each task is always being initialized without the floating point support option and the variable ulPortTaskHasFPUContext is saved actually in the task’s stack.

See also here, in the function pxPortInitialiseStack() in the file port.c:

.....
    /* The task will start with a critical nesting count of 0 as interrupts are
     * enabled. */
    *pxTopOfStack = portNO_CRITICAL_NESTING;
    pxTopOfStack--;

    /* The task will start without a floating point context.  A task that uses
     * the floating point hardware must call vPortTaskUsesFPU() before executing
     * any floating point instructions. */
    *pxTopOfStack = portNO_FLOATING_POINT_CONTEXT;

    return pxTopOfStack;
}
/*-----------------------------------------------------------*/

and also in .macro portSAVE_CONTEXT in the file portASM.S

....
	/* Push the critical nesting count. */
	LDR		R2, ulCriticalNestingConst
	LDR		R1, [R2]
	PUSH	{R1}

	/* Does the task have a floating point context that needs saving?  If
	ulPortTaskHasFPUContext is 0 then no. */
	LDR		R2, ulPortTaskHasFPUContextConst
	LDR		R3, [R2]
	CMP		R3, #0

	/* Save the floating point context, if any. */
	FMRXNE  R1,  FPSCR
	VPUSHNE {D0-D15}
	/*VPUSHNE	{D16-D31}*/
	PUSHNE	{R1}

	/* Save ulPortTaskHasFPUContext itself. */
	PUSH	{R3}
....

I also just tried it, to verify it, by calling the vPortTaskUsesFPU() function, before the initialization of the TCP IP-Task, and as expected got the same issue as I described in my post.

Here is reference from the Cortex A9 port, how it should actually be done, so that in each task the FPU register get always saved/restored.

	/* The task will start with a critical nesting count of 0 as interrupts are
	enabled. */
	*pxTopOfStack = portNO_CRITICAL_NESTING;

	#if( configUSE_TASK_FPU_SUPPORT == 1 )
	{
		/* The task will start without a floating point context.  A task that
		uses the floating point hardware must call vPortTaskUsesFPU() before
		executing any floating point instructions. */
		pxTopOfStack--;
		*pxTopOfStack = portNO_FLOATING_POINT_CONTEXT;
	}
	#elif( configUSE_TASK_FPU_SUPPORT == 2 )
	{
		/* The task will start with a floating point context.  Leave enough
		space for the registers - and ensure they are initialised to 0. */
		pxTopOfStack -= portFPU_REGISTER_WORDS;
		memset( pxTopOfStack, 0x00, portFPU_REGISTER_WORDS * sizeof( StackType_t ) );

		pxTopOfStack--;
		*pxTopOfStack = pdTRUE;
		ulPortTaskHasFPUContext = pdTRUE;
	}
	#else
	{
		#error Invalid configUSE_TASK_FPU_SUPPORT setting - configUSE_TASK_FPU_SUPPORT must be set to 1, 2, or left undefined.
	}
	#endif

	return pxTopOfStack;
}
/*-----------------------------------------------------------*/

Kind Regards,

Christos

pete-pjb · November 7, 2022, 3:09pm

Hi @ChristosZosi ,

Ah I see, an interesting approach! I had only taken a very quick look. Okay if I were you I would implement it in the same way as the ARM_CA9 port by modifying the port.c/portASM.S files as required for the ARM_CR5 port and submit it as a pull request. I am not a maintainer at FreeRTOS but in my opinion your pull request will be welcomed and merged as required as it is a very valid and helpful addition to the kernel.

Kind Regards,

Pete

richard-damon · November 7, 2022, 3:24pm

This is actually a fairly complicated issue and unfortunately sometimes needs support from the programmer. There are a number of cases:

Simplest, no FPU, and thus no problem.

Next Simplest, Processor has FPU and support for automatically detecting it being used, so the port can just handle the decision.

More complicated, Processor has FPU, but no support for detecting its use, but the compiler doesn’t use it for anything but actually Floating Point. Here the user can add an explicit marking that a given task uses Floating Point.

Worse case, Processor has FPU with no support for detecting it use, and the compiler does sometimes decide to use it for non-floating point operations (the case we have here). The options:

Just assume that ALL tasks might use the FPU and take the hit on always saving the FPU registers.
Add flags to the compile (if available) to stop the compiler from using the FPU where not expected, then we can use the method of marking the tasks that use the FPU. This requires good documentation, and a bit of thought by the programmer
If you can’t add the flag, and don’t want to always save the FPU registers, you need to “guess” which tasks need FPU support, and this could be subject to being wrong at any time if a update to the task causes the compiler to this time decide to use the FPU registers.

Ports of this last type (where the port can’t tell from the hardware if the FPU is being used, but the compiler may also use it without warning) likely need to be configurable to either always save or require the explicit marking, and well documented about that need. Ports that currently don’t have the problem of the compiler using the registers might still want that option in it, because you never know when a future compiler do it.

I would NOT suggest adding configuration flags for built in tasks that might need to be marked for that last case, as I don’t think that is a viable option. If the user wants to use it, then can edit the code of the task to add it, reminding them they have chosen a dangerous path.

ChristosZosi · November 7, 2022, 3:56pm

Hi @richard-damon,

thank for you detailed explanation. Excuse me if I did not understand you correctly.

So according to this sections of yours. You are suggesting, that the GCC port of the Cortex R5 is a such port, and needs to be configurable to either always save or require explicit marking for saving/restoring FPU context, which currently is not the case.

If we want to have the first option, as mentioned here.

We then should be able to configure the port as such. However, this is currently not possible with Cortex R5 port. So, to my understanding you are also suggesting to add the configUSE_TASK_FPU_SUPPORT option to the R5-port, like it is in the port of the Cortex A9? Where, it can be used as option 0 no FPU use, 1 requiring a marking or 2 saving/restoring always the FPU context.

Thank you in advance.

Kind regards,

Christos

richard-damon · November 7, 2022, 4:25pm

I haven’t looked at the R5 in detail, so I don’t know if it includes the FPU usage detection that is in the Cortex M series. If it doesn’t, then yes, in my opinion it would be best to have an option to always save the FPU context, or require marking each task that uses the FPU. IF the compiler defaults to using the FPU registers at times for non-FPU operations, the default should be to save for all cases if the FPU has been “enabled” to the compiler. The documentation should warn of the dangers of compiler using the registers even for apparently non-FPU using tasks.

ChristosZosi · November 8, 2022, 9:39am

Hi @richard-damon ,

I did take a look on the R5’s FPU, and it is using the VFPv3-D16 architecture which is the same as the one of the Cortex A9, with the only difference that it has just sixteen 64-Bits register, instead of thirty-two. As for FPU usage detection, couldn’t find anything on the R5. So, I assume here we have the same case as the A9, as both use the same FPU architecture.

I did prepare a pull request, where I add support for the configUSE_TASK_FPU_SUPPORT constant in the GCC/ARM_CR5 port. Unfortunately, I have not yet tested it in real hardware, will do later today and if everything is working fine, I will send the PR.

Kind Regards,

Christos

ChristosZosi · November 9, 2022, 6:47am

Hi @richard-damon,

Here is the PR.

github.com/FreeRTOS/FreeRTOS-Kernel

Add support for the configUSE_TASK_FPU_SUPPORT constant in the GCC/ARM_CR5 port

FreeRTOS:main ← ChristosZosi:gcc/cortexr5

opened 04:09PM - 08 Nov 22 UTC

ChristosZosi

+118 -16

Add support for the `configUSE_TASK_FPU_SUPPORT` constant in the GCC/ARM_CR5 por…t Description ----------- In this PR, I have added support for the `configUSE_TASK_FPU_SUPPORT` configuration constant to the GCC/ARM_CR5 port. The reason for this is, that when one does compile a FreeRTOS application with FPU on for the Cortex-R5 processor with the GCC compiler, then the compiler by default may make code optimizations using the FP-registers. E.g. code that uses standard functions such as `memcpy` , `memset` and similar ones, the chances are very high, that the compiler will make optimizations using the FPU registers. This will lead certainly to corrupted FP-registers, if these won't be handled accordingly. In the current state of the GCC/ARM_CR5 port, one could use the function `vPortTaskUsesFPU( void )` to mark a task that uses the FPU, to also save/restore the FP-registers during context switches. However, this only option, to ensure that the FP-registers get saved/restored, on its own is not ideal. I had an issue with our application using the FreeRTOS-Plus-TCP library, in which the IP-Task would be optimized by the compiler to use the FPU registers, and for this task we were not able to set the `vPortTaskUsesFPU( void )` mark, without making changes to the library. So, now with the `configUSE_TASK_FPU_SUPPORT`, one can set it to `2` and ensure that every task created in the FreeRTOS application saves/restores the FP-registers. All this, of course, is not something new. The GCC/ARM_CA9 port does it already. I have also created a topic in the FreeRTOS-Forum, where this issue is being discussed more thoroughly. [FreeRTOS + TCP with hard FPU on Xilinx ZynqMP UltraScale CortexR5](https://forums.freertos.org/t/freertos-tcp-with-hard-fpu-on-xilinx-zynqmp-ultrascale-cortexr5/16133) Test Steps ----------- I did test these changes with our application in real hardware and also the FreeRTOS Demo [CORTEX_R5_UltraScale_MPSoC](https://github.com/FreeRTOS/FreeRTOS/tree/main/FreeRTOS/Demo/CORTEX_R5_UltraScale_MPSoC), to verify that everything still works as expected and this was the case on both occasions. Related Issue ----------- By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Kind Regards,

Christos