FreeRTOS + TCP/IP on Arm Cortexa53

Hello everybody.

Since I have a lot of problems to get the TCP IP stack running on the CortexA53, I wanted to ask if there is someone who also uses it and has managed to get the TCP IP stack running.

Used HW + SW:

  • Zynq UltraScale+
  • Cortex a53 64 bit compiler

Hope someone can help me

What have you tried? What problems have you encountered? Is the Ethernet MAC on the UltraScale different to that on the Zynq as there is a driver for the Zynq already?

Hello, Richard,

Thank you very much for your quick response.

What problems have you encountered?

  • The first problem I had was that an error occurred when creating the uncached memory area. I have fixed this with the following. Therefore I had to change the following:

the following change must be made to uncached_memory.c

the defined size of

“UNCACHED_MEMORY_SIZE”

must be changed from 1MB to 2MB, because the minimum block size in MMU is 2MB.

of: #define UNCACHED_MEMORY_SIZE 0x100000ul
on: #define UNCACHED_MEMORY_SIZE 0x200000ul

and as a further point the attribute must be changed.

from: Xil_SetTlbAttributes( ( uint32_t )pucStartOfMemory, 0x1c02);
to: Xil_SetTlbAttributes( ( ( uint32_t )pucStartOfMemory, 0x409UL );

Now I have the following problems:

When the following function calls come in "tasks.c

  • portYIELD_WITHIN_API();
  • taskYIELD_IF_USING_PREEMPTION();

it goes to the FreeRTOS_abort and remains there in a while loop.

I have now compared the EMAC driver for the Ultrascale and the Zynq 7000.
There are the following differences:

in xemacps.c :
UltraScale+:

XEmacPs_WriteReg(InstancePtr->Config.BaseAddress, XEMACPS_TXSR_OFFSET, XEMACPS_SR_ALL_MASK);

ZYNQ 7000:

XEmacPs_WriteReg(InstancePtr->Config.BaseAddress, XEMACPS_TXSR_OFFSET, 0x0U);

in xemacps_hw.h:
UltraScale+:

#define XEMACPS_SR_ALL_MASK 0xFFFFFFFFU /**< Mask for full register */

ZYNQ 7000:

no definition of XEMACPS_SR_ALL_MASK 0xFFFFFFFFU

So I’d say there’s not much difference.

Hello Philipp,

the defined size of

“UNCACHED_MEMORY_SIZE”

must be changed from 1MB to 2MB, because the minimum block size in MMU is 2MB.
of: #define UNCACHED_MEMORY_SIZE 0x100000ul
on: #define UNCACHED_MEMORY_SIZE 0x200000ul

Thanks for this, we will change that with an #ifdef

and as a further point the attribute must be changed.

from: Xil_SetTlbAttributes( ( uint32_t )pucStartOfMemory, 0x1c02);
to: Xil_SetTlbAttributes( ( ( uint32_t )pucStartOfMemory, 0x409UL );

Do you know the actual meaning of 0x409 ?

When the following function calls come in "tasks.c
portYIELD_WITHIN_API();
taskYIELD_IF_USING_PREEMPTION();

I’ll leave these questions to Richard.

I have now compared the EMAC driver for the Ultrascale and the Zynq 7000.
There are the following differences:

in xemacps.c :
UltraScale+:

XEmacPs_WriteReg(InstancePtr->Config.BaseAddress, XEMACPS_TXSR_OFFSET, XEMACPS_SR_ALL_MASK);

ZYNQ 7000:

XEmacPs_WriteReg(InstancePtr->Config.BaseAddress, XEMACPS_TXSR_OFFSET, 0x0U);

This module xemacps.c is not part of the +TCP Network Interface.
That would be useless to write zero’s to that Status Register. Bits in this register are cleared by writing a 1.

in xemacps_hw.h:
UltraScale+:

#define XEMACPS_SR_ALL_MASK 0xFFFFFFFFU /**< Mask for full register */

ZYNQ 7000:

no definition of XEMACPS_SR_ALL_MASK 0xFFFFFFFFU

I think that the same macro applies to the ZYNQ 7000 : writing ones clear the bits.

/* Memory type /
#define NORM_NONCACHE 0x401UL /
Normal Non-cacheable*/
#define STRONG_ORDERED 0x409UL /* Strongly ordered (Device-nGnRnE)/
#define DEVICE_MEMORY 0x40DUL /
Device memory (Device-nGnRE)/
#define RESERVED 0x0UL /
reserved memory*/

Hello @Mulle
I faced the same and managed to make it work. I wanted to share it, but I deleted the code, cause I did not write it a good way and the task was rejected.
What I remember is:

  1. I used lwIP port provided with Xilinx SDK as a reference.
  2. Ethernet MACs on US+ and 7000 are different. One of the differences is in buffer descriptor structure. The structure for US+ takes into accout a possible 64-bit addressing, so some fields are shifted from where they were in 7000
  3. In US+, SLCR registers are changed and moved - https://www.xilinx.com/html_docs/registers/ug1087/ug1087-zynq-ultrascale-registers.html
    So some of these marcos in x_emacpsif_physpeed.c have to be changed:
    /* Frequency setting */
    #define SLCR_LOCK_ADDR			(XPS_SYS_CTRL_BASEADDR + 0x4)
    #define SLCR_UNLOCK_ADDR		(XPS_SYS_CTRL_BASEADDR + 0x8)
    #define SLCR_GEM0_CLK_CTRL_ADDR	(XPS_SYS_CTRL_BASEADDR + 0x140)
    #define SLCR_GEM1_CLK_CTRL_ADDR	(XPS_SYS_CTRL_BASEADDR + 0x144)
    #ifdef PEEP
    #define SLCR_GEM_10M_CLK_CTRL_VALUE		0x00103031
    #define SLCR_GEM_100M_CLK_CTRL_VALUE	0x00103001
    #define SLCR_GEM_1G_CLK_CTRL_VALUE		0x00103011
    #endif
    #define SLCR_LOCK_KEY_VALUE 			0x767B
    #define SLCR_UNLOCK_KEY_VALUE			0xDF0D
    #define SLCR_ADDR_GEM_RST_CTRL			(XPS_SYS_CTRL_BASEADDR + 0x214)
    #define EMACPS_SLCR_DIV_MASK			0xFC0FC0FF
    
    #define EMAC0_BASE_ADDRESS				0xE000B000
    #define EMAC1_BASE_ADDRESS				0xE000C000
    
    Looked up the required changes in lwIP port
  4. There is a difference in PHY chip configuration. It depends on the board you are using

Hello Maxim Vechkanov, thank you very much for responding to this thread.
Unfortunately, I don’t have such a board my self, I hope that @mulle finds time to check it.
Thanks, Hein

Update to FreeRTOS +TCP/IP on UltraScale+.

Hein and myself are working to get this done. At this moment, all Software Interrupts are eliminated. And all Registers are configurated.

Right now i’ve got an Error:

emacps_check_rx: unable to allocate a Network Buffer.

Somebody as an idea, why that happend?

Thanks, Philipp

Assuming there is no other bug it’s a matter of tuning/limit memory consumption of the stack.
FreeRTOSIPConfig.h contains a number of parameters related to that and also the forum contains some posts regarding this configuration for different scenarios.
Tuning strongly depends on your use case. A rather slow control connection with very few simultaneously opened sockets requires less reserved network buffers, smaller sliding window sizes etc.
I had a surprising effect with a bunch of ARP requests from a Windows peer PC during startup consuming more buffers than expected and had to bump some parameters…
Wireshark helped me to find out what’s going on.
So try to estimate the requirements of your application and tweak FreeRTOSIPConfig.h accordingly and maybe have a look at the wire.

if (((xemacpsif->rxSegments[head].address & XEMACPS_RXBUF_NEW_MASK) == 0 || (pxDMA_rx_buffers[head] == NULL)) {
break;
}
(xemacpsif->rxSegments[head].address & XEMACPS_RXBUF_NEW_MASK) == 0

this happens so he jump into the break.

Looks like the MAC driver runs out of ethernet frame rx-DMA buffer descriptors. There must be a configuration for the (array)size of the rx/tx DMA descriptor rings. Or check the source of the driver where these arrays are defined.
Unfortunately I’m not familiar with the Zynq port…

Hello, Mulle!

Have you corrected the buffer descriptors (BD) related code?
As you said, APU core code is compiled by aarch64. According to UG1085 and Xilinx BSP code, in that case BD is 128 bits wide and BD alignment restriction is 64 bytes
image
It means that

struct xBD_TYPE {
    uint32_t address;
    uint32_t flags;
};
...
struct xBD_TYPE *rxSegments;

will not work.
I guess you can try to add __attribute__((aligned (64))) to xBD_TYPE as an option for gcc.

Maxim, thanks for your input. I have no experience with UltraScale yet.
Would the following declaration solve it?

struct xBD_TYPE {
    uint32_t address;
    uint32_t flags;
    /* Fill it up so the struct gets a size of 16 bytes. */
    uint32_t filler[ XEMACPS_BD_NUM_WORDS - 2 ];
};

PS the first xBD struct already has a alignment of 4 KByte in un-cached memory ( see uncached_memory.c ).

Hein, I hope so.
Thought it requires the alignment for each BD, but you are right, only aligment of the start of BD ring is required.
I think it is better to put it under __aarch64__ symbol definition, like:

struct xBD_TYPE {
    uint32_t address;
    uint32_t flags;
#ifdef __aarch64__
    /* Fill it up so the struct gets a size of 16 bytes. */
    uint32_t address_high;
    uint32_t reserved;
#endif
};

I am not sure what is better: rename “address” to “address_low” or keep it as is.
Someone might need 64-bit addressing support. And for RPU cores of US+, 32-bit addressing still required to be used, according to Xilinx’s driver.

Regarding the uncached memory. US+ baremetal uses 2 MB MMU sections, so I believe 2 MB uncached memory aligned to 2 MB boundary should be used.

Thanks @Maxim.Vechkanov and @htibosch for your replies.

I’ve not change this code. I will do that on Monday and report if it works or not.

Regarding the uncached memory. US+ baremetal uses 2 MB
MMU sections, so I believe 2 MB uncached memory aligned
to 2 MB boundary should be used.

Yes, @Mulle had already noticed that difference, thanks!

I have now changed the structure of xBD_Type, as you said Maxim.

Do i need now to use the variable address_high in my case? oder still use the variable address ?

or should i write it like the following:

xemacpsif->rxSegments[iIndex].address =
((uint32_t) pxBuffer->pucEthernetBuffer)
& XEMACPS_RXBUF_ADD_MASK;

and than

xemacpsif->rxSegments[iIndex].address_high = 0 ;

Hi Philipp,

I think you can use the code you provided, if uncached memory is addressable by 32-bit word.

I would propose you to add some verbose debugging print to emacps_check_rx to see more details about the received descriptor.

I beleive this case is normal:

( xemacpsif->rxSegments[ head ].address & XEMACPS_RXBUF_NEW_MASK ) == 0

Because as far as I understand emacps_check_rx tries to read out all the messages which have been received from the previous call. So it stops once reaches the descriptor which is not yet filled by HW.
The bad thing is if you see "emacps_check_rx: unable to allocate a Network Buffer\n". I beleive it means lack of memory in TCP/IP stack. Have you tried to increase memory for TCP/IP stack buffers? You can start from very big values just to be sure that they are enough.

And another thing I would propose to check is this piece of code:

if( ethMsg != NULL )
{
    passEthMessages( );
}

If ehMsg is NULL after the loop, it means that an rx IRQ is received, but on RX frames can be passed to the stack, what is a bad situation in my opinion.

I think you can use the code you provided, if uncached memory is addressable by 32-bit word.

How can I make sure that this is true?

( xemacpsif->rxSegments[ head ].address & XEMACPS_RXBUF_NEW_MASK ) == 0

i mean this leads that ethMsg is NULL.

emacps_check_rx: unable to allocate a Network Buffer\n

This does not occur because the ( xemacpsif->rxSegments[ head ].address & XEMACPS_RXBUF_NEW_MASK ) == 0 condition is true and therefore a break occurs.

Have you tried to increase memory for TCP/IP stack buffers?

#define ipconfigNETWORK_MTU                            1500
#define ipconfigTCP_MSS		( ipconfigNETWORK_MTU - ( ipSIZE_OF_IPv4_HEADER + ipSIZE_OF_TCP_HEADER  ) )
#define ipconfigTCP_RX_BUFFER_LENGTH                   ( 10 * ipconfigTCP_MSS )
#define ipconfigTCP_TX_BUFFER_LENGTH                   ( 10 * ipconfigTCP_MSS )

#define dmaRX_TX_BUFFER_SIZE 0x1000uL

“emacps_check_rx: unable to allocate a Network Buffer”
I beleive it means lack of memory in TCP/IP stack.

That is too general. The error message occurs when the pool of network buffers is empty.

The total number of network buffers is determined by ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS.

Mind you that at start-up, ipconfigNIC_N_RX_DESC network buffers are needed for the DMA buffers.

As soon as a packet comes in, a new network buffer will be assigned to that slot.

If ehMsg is NULL after the loop, it means that an rx IRQ is received,
but on RX frames can be passed to the stack…

What do you mean with the second part of this phrase?

And please note that the code shown ( ethMsg ) is from a much earlier version of the driver.

The current driver can be found at github/freertos, or at github/aws.

The Zynq 7000 driver has been around for a long time and has been thoroughly tested. Maybe we should concentrate on the differences between the Zynq and UltraScale.

Thanks