FreeRTOS + TCP/IP on Arm Cortexa53

Hello everybody.

Since I have a lot of problems to get the TCP IP stack running on the CortexA53, I wanted to ask if there is someone who also uses it and has managed to get the TCP IP stack running.

Used HW + SW:

  • Zynq UltraScale+
  • Cortex a53 64 bit compiler

Hope someone can help me

What have you tried? What problems have you encountered? Is the Ethernet MAC on the UltraScale different to that on the Zynq as there is a driver for the Zynq already?

Hello, Richard,

Thank you very much for your quick response.

What problems have you encountered?

  • The first problem I had was that an error occurred when creating the uncached memory area. I have fixed this with the following. Therefore I had to change the following:

the following change must be made to uncached_memory.c

the defined size of

“UNCACHED_MEMORY_SIZE”

must be changed from 1MB to 2MB, because the minimum block size in MMU is 2MB.

of: #define UNCACHED_MEMORY_SIZE 0x100000ul
on: #define UNCACHED_MEMORY_SIZE 0x200000ul

and as a further point the attribute must be changed.

from: Xil_SetTlbAttributes( ( uint32_t )pucStartOfMemory, 0x1c02);
to: Xil_SetTlbAttributes( ( ( uint32_t )pucStartOfMemory, 0x409UL );

Now I have the following problems:

When the following function calls come in "tasks.c

  • portYIELD_WITHIN_API();
  • taskYIELD_IF_USING_PREEMPTION();

it goes to the FreeRTOS_abort and remains there in a while loop.

I have now compared the EMAC driver for the Ultrascale and the Zynq 7000.
There are the following differences:

in xemacps.c :
UltraScale+:

XEmacPs_WriteReg(InstancePtr->Config.BaseAddress, XEMACPS_TXSR_OFFSET, XEMACPS_SR_ALL_MASK);

ZYNQ 7000:

XEmacPs_WriteReg(InstancePtr->Config.BaseAddress, XEMACPS_TXSR_OFFSET, 0x0U);

in xemacps_hw.h:
UltraScale+:

#define XEMACPS_SR_ALL_MASK 0xFFFFFFFFU /**< Mask for full register */

ZYNQ 7000:

no definition of XEMACPS_SR_ALL_MASK 0xFFFFFFFFU

So I’d say there’s not much difference.

Hello Philipp,

the defined size of

“UNCACHED_MEMORY_SIZE”

must be changed from 1MB to 2MB, because the minimum block size in MMU is 2MB.
of: #define UNCACHED_MEMORY_SIZE 0x100000ul
on: #define UNCACHED_MEMORY_SIZE 0x200000ul

Thanks for this, we will change that with an #ifdef

and as a further point the attribute must be changed.

from: Xil_SetTlbAttributes( ( uint32_t )pucStartOfMemory, 0x1c02);
to: Xil_SetTlbAttributes( ( ( uint32_t )pucStartOfMemory, 0x409UL );

Do you know the actual meaning of 0x409 ?

When the following function calls come in "tasks.c
portYIELD_WITHIN_API();
taskYIELD_IF_USING_PREEMPTION();

I’ll leave these questions to Richard.

I have now compared the EMAC driver for the Ultrascale and the Zynq 7000.
There are the following differences:

in xemacps.c :
UltraScale+:

XEmacPs_WriteReg(InstancePtr->Config.BaseAddress, XEMACPS_TXSR_OFFSET, XEMACPS_SR_ALL_MASK);

ZYNQ 7000:

XEmacPs_WriteReg(InstancePtr->Config.BaseAddress, XEMACPS_TXSR_OFFSET, 0x0U);

This module xemacps.c is not part of the +TCP Network Interface.
That would be useless to write zero’s to that Status Register. Bits in this register are cleared by writing a 1.

in xemacps_hw.h:
UltraScale+:

#define XEMACPS_SR_ALL_MASK 0xFFFFFFFFU /**< Mask for full register */

ZYNQ 7000:

no definition of XEMACPS_SR_ALL_MASK 0xFFFFFFFFU

I think that the same macro applies to the ZYNQ 7000 : writing ones clear the bits.

/* Memory type /
#define NORM_NONCACHE 0x401UL /
Normal Non-cacheable*/
#define STRONG_ORDERED 0x409UL /* Strongly ordered (Device-nGnRnE)/
#define DEVICE_MEMORY 0x40DUL /
Device memory (Device-nGnRE)/
#define RESERVED 0x0UL /
reserved memory*/

Hello @Mulle
I faced the same and managed to make it work. I wanted to share it, but I deleted the code, cause I did not write it a good way and the task was rejected.
What I remember is:

  1. I used lwIP port provided with Xilinx SDK as a reference.
  2. Ethernet MACs on US+ and 7000 are different. One of the differences is in buffer descriptor structure. The structure for US+ takes into accout a possible 64-bit addressing, so some fields are shifted from where they were in 7000
  3. In US+, SLCR registers are changed and moved - https://www.xilinx.com/html_docs/registers/ug1087/ug1087-zynq-ultrascale-registers.html
    So some of these marcos in x_emacpsif_physpeed.c have to be changed:
    /* Frequency setting */
    #define SLCR_LOCK_ADDR			(XPS_SYS_CTRL_BASEADDR + 0x4)
    #define SLCR_UNLOCK_ADDR		(XPS_SYS_CTRL_BASEADDR + 0x8)
    #define SLCR_GEM0_CLK_CTRL_ADDR	(XPS_SYS_CTRL_BASEADDR + 0x140)
    #define SLCR_GEM1_CLK_CTRL_ADDR	(XPS_SYS_CTRL_BASEADDR + 0x144)
    #ifdef PEEP
    #define SLCR_GEM_10M_CLK_CTRL_VALUE		0x00103031
    #define SLCR_GEM_100M_CLK_CTRL_VALUE	0x00103001
    #define SLCR_GEM_1G_CLK_CTRL_VALUE		0x00103011
    #endif
    #define SLCR_LOCK_KEY_VALUE 			0x767B
    #define SLCR_UNLOCK_KEY_VALUE			0xDF0D
    #define SLCR_ADDR_GEM_RST_CTRL			(XPS_SYS_CTRL_BASEADDR + 0x214)
    #define EMACPS_SLCR_DIV_MASK			0xFC0FC0FF
    
    #define EMAC0_BASE_ADDRESS				0xE000B000
    #define EMAC1_BASE_ADDRESS				0xE000C000
    
    Looked up the required changes in lwIP port
  4. There is a difference in PHY chip configuration. It depends on the board you are using

Hello Maxim Vechkanov, thank you very much for responding to this thread.
Unfortunately, I don’t have such a board my self, I hope that @mulle finds time to check it.
Thanks, Hein

Update to FreeRTOS +TCP/IP on UltraScale+.

Hein and myself are working to get this done. At this moment, all Software Interrupts are eliminated. And all Registers are configurated.

Right now i’ve got an Error:

emacps_check_rx: unable to allocate a Network Buffer.

Somebody as an idea, why that happend?

Thanks, Philipp

Assuming there is no other bug it’s a matter of tuning/limit memory consumption of the stack.
FreeRTOSIPConfig.h contains a number of parameters related to that and also the forum contains some posts regarding this configuration for different scenarios.
Tuning strongly depends on your use case. A rather slow control connection with very few simultaneously opened sockets requires less reserved network buffers, smaller sliding window sizes etc.
I had a surprising effect with a bunch of ARP requests from a Windows peer PC during startup consuming more buffers than expected and had to bump some parameters…
Wireshark helped me to find out what’s going on.
So try to estimate the requirements of your application and tweak FreeRTOSIPConfig.h accordingly and maybe have a look at the wire.

if (((xemacpsif->rxSegments[head].address & XEMACPS_RXBUF_NEW_MASK) == 0 || (pxDMA_rx_buffers[head] == NULL)) {
break;
}
(xemacpsif->rxSegments[head].address & XEMACPS_RXBUF_NEW_MASK) == 0

this happens so he jump into the break.

Looks like the MAC driver runs out of ethernet frame rx-DMA buffer descriptors. There must be a configuration for the (array)size of the rx/tx DMA descriptor rings. Or check the source of the driver where these arrays are defined.
Unfortunately I’m not familiar with the Zynq port…

Hello, Mulle!

Have you corrected the buffer descriptors (BD) related code?
As you said, APU core code is compiled by aarch64. According to UG1085 and Xilinx BSP code, in that case BD is 128 bits wide and BD alignment restriction is 64 bytes
image
It means that

struct xBD_TYPE {
    uint32_t address;
    uint32_t flags;
};
...
struct xBD_TYPE *rxSegments;

will not work.
I guess you can try to add __attribute__((aligned (64))) to xBD_TYPE as an option for gcc.

Maxim, thanks for your input. I have no experience with UltraScale yet.
Would the following declaration solve it?

struct xBD_TYPE {
    uint32_t address;
    uint32_t flags;
    /* Fill it up so the struct gets a size of 16 bytes. */
    uint32_t filler[ XEMACPS_BD_NUM_WORDS - 2 ];
};

PS the first xBD struct already has a alignment of 4 KByte in un-cached memory ( see uncached_memory.c ).

Hein, I hope so.
Thought it requires the alignment for each BD, but you are right, only aligment of the start of BD ring is required.
I think it is better to put it under __aarch64__ symbol definition, like:

struct xBD_TYPE {
    uint32_t address;
    uint32_t flags;
#ifdef __aarch64__
    /* Fill it up so the struct gets a size of 16 bytes. */
    uint32_t address_high;
    uint32_t reserved;
#endif
};

I am not sure what is better: rename “address” to “address_low” or keep it as is.
Someone might need 64-bit addressing support. And for RPU cores of US+, 32-bit addressing still required to be used, according to Xilinx’s driver.

Regarding the uncached memory. US+ baremetal uses 2 MB MMU sections, so I believe 2 MB uncached memory aligned to 2 MB boundary should be used.

Thanks @Maxim.Vechkanov and @htibosch for your replies.

I’ve not change this code. I will do that on Monday and report if it works or not.

Regarding the uncached memory. US+ baremetal uses 2 MB
MMU sections, so I believe 2 MB uncached memory aligned
to 2 MB boundary should be used.

Yes, @Mulle had already noticed that difference, thanks!