ARM CA53 float incorrect computation w/ multi-thread

Austin · February 11, 2022, 4:02am

Hello

We use FreeRTOS ARM CA53-64bit port w/ single core. We start two threads to compute bellow float point calculation, and occasionally, we get incorrect result in both threads. Check bellow snapshot w/ the incorrect result in red box.

Refer to Zynq Ultrascale MPSoC task floating point corruption, we tried bellow (patch is attached)
fpu-irq-handler-float-save-restore.c (2.9 KB)

Add a “vPortTaskUsesFPU” in xTaskCreate, but no help.
Do Q0~Q31 register save/restore in FreeRTOS_IRQ_Handler, no help also
No such issue if only run one thread

Would you help to give some comments about it? Thank you.

Float Point Test Code

static uint32_t float_calc()
{
    int32_t width = 128;
    int32_t height = 128;
    int32_t center_x = 64;
    int32_t center_y = 64;
    uint32_t sumR = 0U;
    for (int32_t row = 0 ; row < height ; row++)
    {
        for (int32_t col = 0 ; col < width ; col++)
        {
            int32_t const radius_y        = row - center_y;
            int32_t const radius_x        = col - center_x;
            int32_t const radius_y_sqr   = radius_y * radius_y;
            int32_t const radius_x_sqr   = radius_x * radius_x;

            float const radius        = sqrtf((float)(radius_y_sqr + radius_x_sqr));
            // We must cast the radius to int32_t, due to integer indexes in LUT - Casting error is being averaged
            uint32_t const R              = (uint32_t)(radius);
            sumR += R;
        }
    }
    return sumR;
}

static void hello_world_task1(void *p)
{
    int i=0;

    (void)p;

    while(true) {
        uint32_t l  = float_calc();
        printk("--%s:%d: sumR=0x%x\r\n", __func__, i, l);
        //vTaskDelay(DELAY_MS);
        i++;
    }
}

static void hello_world_task2(void *p)
{
    int i=0;

    (void)p;

    while(true) {
        uint32_t l  = float_calc();
        printk("--%s:%d: sumR=0x%x\r\n", __func__, i, l);
        //vTaskDelay(DELAY_MS/4);
        i++;
    }
}

Computation result:

rtel · February 11, 2022, 5:12am

Similar to Zynq Ultrascale MPSoC task floating point corruption - #17 by wat recently. Has the compiler version changed recently? Are you using floating point instructions in interrupts, or something like memcpy() in an interrupt that may be optimised to use floating point registers?

Austin · February 11, 2022, 5:36am

I did the experiment with patch fpu-irq-handler-float-save-restore.c

This is what I did

I have added float point register save restore in FreeRTOS_IRQ_Handler, but no help
If I only create 1 task, no such issue (it may indicate that IRQ handler doesn’t impact the float point computation in a task)

This is a new issue we observed, no compiler change

rtel · February 11, 2022, 6:53pm

It is curious that corruption would occur even with every floating point register being saved on interrupt entry and exit. Can you please zip up your FreeRTOS/Source directory and send it to me (with these changes made) so I can look into it a bit further. You should be able to attach the zip to the post, if not you can send it to r dot barry at freertos dot org. Thanks.

Austin · February 12, 2022, 9:09am

Hi,

Thank you for offering this kind help. We found the reason.

Function vPortTaskUsesFPU() should be called from each float point computation task function. Originally I put vPortTaskUsesFPU into xTaskCreate which attempts to globally enable the float point save/restore for all tasks. This way doesn’t work.

More illustration for other people who may have doubt about it (apply for ARM CA63):

vPortTaskUsesFPU sets the global variable ullPortTaskHasFPUContext which is defined in port.c. If want to enable all tasks by default to use FPU, we may think to manually set ullPortTaskHasFPUContext to pdTRUE, or force to call vPortTaskUsesFPU in every xTaskCreate, but these two ways don’t work (the reason is as bellow) (port.c)

/* Saved as part of the task context.  If ullPortTaskHasFPUContext is non-zero
then floating point context must be saved and restored for the task. */
uint64_t ullPortTaskHasFPUContext = pdFALSE;

The first task stack restore after a task is created will always reset ullPortTaskHasFPUContext to zero

    /* The task will start without a floating point context.  A task that uses
    the floating point hardware must call vPortTaskUsesFPU() before executing
    any floating point instructions. */
    *pxTopOfStack = portNO_FLOATING_POINT_CONTEXT;

If at this point, the float point computation task is scheduled to run, ullPortTaskHasFPUContext is zero, then after a while, when it is scheduled out, the float point register will not be saved into stack (portASM.S)
If at this point, the float point computation task is scheduled to run, and it calls vPortTaskUsesFPU, ullPortTaskHasFPUContext will be 1, then after a while, when it is scheduled out, the float point register will be saved into stack (portASM.S)

        /* Save the critical section nesting depth. */
        LDR             X0, ullCriticalNestingConst
        LDR             X3, [X0]

        /* Save the FPU context indicator. */
        LDR             X0, ullPortTaskHasFPUContextConst
        LDR             X2, [X0]

        /* Save the FPU context, if any (32 128-bit registers). */
        CMP             X2, #0
        B.EQ    1f
        STP             Q0, Q1, [SP,#-0x20]!
        STP             Q2, Q3, [SP,#-0x20]!
        STP             Q4, Q5, [SP,#-0x20]!
        STP             Q6, Q7, [SP,#-0x20]!

TAnderson · February 18, 2022, 12:20pm

I am running on MiniZed, Zynq 7000, Cortex A9. I inserted the float_calc() into two tasks, and I saw similar results, that is, errors detected in response from float_calc(). I have since inserted vPortTaskUsesFPU() into each of the tasks (at top of task, and I have not seen an error since. (I have been running for about 20 minutes now.)

BTW, I had also set up use_task_fpu_support in the board support settings, but I have now turned if off when experimenting with vPortTaskUsesFPU().

rtel · February 18, 2022, 7:14pm

Here is a relevant link for folks reading this thread in the future: https://www.freertos.org/Using-FreeRTOS-on-Cortex-A-Embedded-Processors.html#floating-point

TAnderson · February 21, 2022, 12:18pm

Update on this - the reason float_calc() fails is because a float was used in an interrupt service routine. After changing that float to uint16, I no longer get failures with float_calc().
If float is necessary in isr, what registers need to be saved/restored? (Or is it a different set depending on the processor?)