Socket Stops Accepting Connections after relocating ucHeap to SDRAM

pseudotronics · June 17, 2023, 2:51am

I am at a bit of a loss trying to figure out what I am missing. I am pretty new to FreeRTOS and the TCP IP stack.

I have an application running on a STM32H750-DK.

With the ucHeap placed in DTCMRAM everything works as expected. I create 3 blinky LED tasks and a simple task that waits for a socket connection. If a client connects the child socket spawns and a simple HTTP exchange occurs before the child socket closes. No problems.

When I add the definition for:

#define configAPPLICATION_ALLOCATED_HEAP        1

And add this line in my main.c

__attribute__((section(".sdram_data"))) uint8_t ucHeap[configTOTAL_HEAP_SIZE];

The blinky LED tasks still run just fine and the main socket task is created. The problem seems to be that the child tasks are never created because FreeRTOS_accept never returns.

Things I have checked so far:

Full check of SDRAM address space writes/reads [OK]
Ping Test [No response]

The lack of ping makes me think it is something in the IP-Task.

Edit: I should add that the board successfully negotiates its link and gets a DHCP address from my router.

I am not sure where to start poking at. Any guidance/wisdom would be very much appreciated.

rtel · June 17, 2023, 7:43pm

Which buffer management scheme are you using? Is the statically allocated heap in cached memory?

pseudotronics · June 17, 2023, 8:11pm

The buffer managment is BufferAllocation_2.

“Is the statically allocated heap in cached memory?”
I think the answer to this is no, but I am not sure that I fully understand what you are asking.

rtel · June 17, 2023, 8:29pm

I was asking if the processor has a data cache that is operating on the heap. If so, can you try turning the cache off as an experiment. Caches may be initialised by the C startup code (before main() is called) or as part of the initialisation performed by main().

pseudotronics · June 17, 2023, 9:07pm

I made sure that the ICache and DCache is disabled. The MPU is also disabled. It doesn’t seem to make a difference.

I think I am going to try explicitly configuring the MPU for the default SDRAM location. I currently have it remapped to a address space that has the same attributes as the AXI_SRAM that runs without issue. I don’t see how this would have an impact considering I have the MPU disabled as it is now.

pseudotronics · June 17, 2023, 9:40pm

I am not sure this helps diagnose the issue but if I place the ethernet data in SDRAM and leave the heap in AXI_SRAM it works. So it is something specific about the heap being in that memory section.

This works:

	.ethernet_data :
	{
		PROVIDE_HIDDEN (__ethernet_data_start = .);
		KEEP (*(SORT(.ethernet_data.*)))
		KEEP (*(.ethernet_data*))
		PROVIDE_HIDDEN (__ethernet_data_end = .);
	} > SDRAM

	.sdram_data : 
    {
		KEEP (*(.sdram_data))
    } > RAM_D1

This Doesn’t:

	.ethernet_data :
	{
		PROVIDE_HIDDEN (__ethernet_data_start = .);
		KEEP (*(SORT(.ethernet_data.*)))
		KEEP (*(.ethernet_data*))
		PROVIDE_HIDDEN (__ethernet_data_end = .);
	} > RAM_D1

	.sdram_data : 
    {
		KEEP (*(.sdram_data))
    } > SDRAM

aggarg · June 18, 2023, 11:34am

The above bus connect matrix from the datasheet explains why the ethernet data needs to be in one of the SRAM.

pseudotronics · June 18, 2023, 2:47pm

I can put the ethernet data section in external SDRAM and the application works fine. So if the ethernet data needs to be in SRAM then this shouldn’t work right?

The application has issues when I put the ucHeap in external SDRAM.

htibosch · June 18, 2023, 9:02pm

Hi @pseudotronics , not sure if it helps you, but have a look at my demo project for STM32H747 here.

It puts .ethernet_data in AXI RAM. For the heap, it uses 3 regions:

static uint8_t ucRAM_1 [384 * 1024] __attribute__( ( section( ".ethernet_data" ) ) );
static uint8_t ucRAM_2 [128 * 1024] __attribute__( ( section( ".ram2_data" ) ) );
static uint8_t ucRAM_3 [ 32 * 1024] __attribute__( ( section( ".ram3_data" ) ) );

In this example, the heap uses part of the AXI RAM, and also uses RAM2 and RAM3 (using heap_5.c of course). The linker will make sure that it all fits nicely.

It took me also a long time before I had a satisfying configuration of memories. The memory layout of STM32Hx is a bit complicated.

pseudotronics · June 18, 2023, 9:21pm

It sure is complicated. I don’t have any issues with the internal memories right now though.

pseudotronics · June 19, 2023, 2:50pm

Another update:

I can step through the IP stack and see that packet reception is working fine.

I have been stepping through the IP stack and I think I found something that is relevant.

xARPCache doesn’t seem to be holding any values when I put the heap in SDRAM.

I was going to share pictures of the wireshark captures but apparently i’m not allowed to do that.

I am going to keep poking and If I find out more I will post again.

pseudotronics · June 19, 2023, 3:22pm

Ok so I found out that the IP address being passed into vARPRefreshCacheEntry is incorrect, the MAC address is however correct.

I enabled ipconfigARP_STORES_REMOTE_ADDRESSES to get it to make the ARP entry regardless.

If I set the IP address manually (through memory manipulation) to the correct address of my laptop it responds to pings and my application runs like it should!

The address that is wrong: 10.11.10.11
What it should be: 192.168.10.11

It probably isn’t a coincidence that my ip is ending in 10.11 and it is duplicated twice in the “incorrect address”.

pseudotronics · June 19, 2023, 4:45pm

So I think I found the root of the issue:

This section of code (FreeRTOS_ARP.c, Line 152):

/* The field ucSenderProtocolAddress is badly aligned, copy byte-by-byte. */

    /*
     * Use helper variables for memcpy() to remain
     * compliant with MISRA Rule 21.15.  These should be
     * optimized away.
     */
    pvCopySource = pxARPHeader->ucSenderProtocolAddress;
    pvCopyDest = &ulSenderProtocolAddress;
	(void)memcpy(pvCopyDest, pvCopySource, sizeof( ulSenderProtocolAddress ) );
    /* The field ulTargetProtocolAddress is well-aligned, a 32-bits copy. */
    ulTargetProtocolAddress = pxARPHeader->ulTargetProtocolAddress;

For example:

pxARPHeader->ucSenderProtocolAddress = [ 192, 168, 10, 11 ] (0x0b0aa8c0)

after memcpy

ulSenderProtocolAddress = 0x0b0a0b0a

Now I just need to figure out why and how to fix this.

Edit: I am guessing this has something to do with the alignment of SDRAM

pseudotronics · June 19, 2023, 5:35pm

Compiling with:

-mno-unaligned-access

fixes the issue, but I am not sure this is the best solution.

richard-damon · June 19, 2023, 9:11pm

My memory is that a number of fields in packets aren’t properly aligned, and access needs to handle that, needing a copy or byte by byte pickup to access.

htibosch · June 20, 2023, 11:31am

@richard-damon wrote:

My memory is that a number of fields in packets aren’t properly aligned

True. All other 32-bit fields are 32-bit aligned, ARP is an exception. That is why the field is declared as an array of 4 bytes:

uint8_t ucSenderProtocolAddress[4]

after memcpy
ulSenderProtocolAddress = 0x0b0a0b0a

That is interesting.

-mno-unaligned-access fixes the issue

Many times I have seen erroneous code created by a compiler when it comes to alignment and optimisations. Sometimes the compiler has the wrong assumptions about the alignment of a pointer.

the following code may crash when optimisation is enabled:

uint32 getAddress ( uint8_t *ucPtr )
{
    uint32_t ulAddress = ( ( uint32_t )ucPtr[ 1 ] << 8 ) |
                         ( ( uint32_t )ucPtr[ 2 ] );
    return ulAddress;
}

My ggc compiler wanted to retrieve 16 bits from the location ucPtr+1:

    ldrh.w    r0, [r2, #1]

which was not allowed.

Without optimisation, my program ran perfectly

When optimising, the compiler may replace calls to memcpy() with faster code, for instance:

-    memcpy( ucTarget, ucSource, 4 );
+    * ( ( uint32_t * )ucTarget ) = * ( ( uint32_t * )ucSource );

The new code is short and efficient. But is has not been tested if the 32-bit access is allowed or not. I think this is related to -mno-unaligned-access.

The functions memcpy() and memset() will check always the alignment of the pointers.

So what I often do is avoid the usage of built-in memcpy() and memset() using:

-fno-builtin-memcpy -fno-builtin-memset

Beside all that, thank you for reporting this problem, and thank you for your efforts to find the cause of the problem. There will always be other developers who find your text and that may save them time.

pseudotronics · June 20, 2023, 12:09pm

htibosch:

When optimising, the compiler may replace calls to memcpy() with faster code, for instance:
-    memcpy( ucTarget, ucSource, 4 );
+    * ( ( uint32_t * )ucTarget ) = * ( ( uint32_t * )ucSource );
The new code is short and efficient. But is has not been tested if the 32-bit access is allowed or not. I think this is related to -mno-unaligned-access.

This to me seems like the most likely cause of the problem.

EDIT: I just tried it with -fno-builtin-memcpy -fno-builtin-memset and that also works. Which gives me even more confidence this is the issue.

htibosch · June 20, 2023, 12:58pm

Right! Thank for testing the hypothesis.
Hein