FreeRTOS Plus TCP example Project going into HardFault

nop · February 22, 2024, 3:55am

Hi everyone,

i was able to compile and run this (unoffical) example Project for the STM32F4: freertos_plus_projects/plus/stm32F40 at master · htibosch/freertos_plus_projects · GitHub
Some adjustments were necessary, thanks htibosch for the help!

At first sight everything seemed fine, ping works, telnet console works, unplug and replug, …
but if i put it under some load, ether by running an iperf test or doing a portscan the cpu gets stuck in HardFault. What could be the Problem? Could it be related to stack sizes or some buffer size being configured wrong?

the PC value getting saved before the HardFault loop points to this line (which is one line after a memcpy)

(gdb) l *pc
0x800e620 is in vIPerfTask (PWD/freertos_plus_projects/plus/Common/Utilities/iperf_task_v3_0g.c:503).
503                                pxDataClient->ullAmount = pxControlClient->ullAmount;

and if i do a portscan with nmap it gets stuck pointing to this line:

(gdb) l *pc
0x801876c is in prvTCPReturnPacket_IPV4 (PWD/freertos_plus_projects/plus/Framework/FreeRTOS-Plus-TCP-main/source/FreeRTOS_TCP_Transmission_IPv4.c:147).
147             pxIPHeader = ( ( IPHeader_t * ) &( pxNetworkBuffer->pucEthernetBuffer[ ipSIZE_OF_ETH_HEADER ] ) );

I was looking forward to using this as a reference for my Project, so it would be great to understand this issue and ultimately get rid of it.

kstribrn · February 23, 2024, 10:27pm

This seems like a solid assumption. How large is your network descriptor ethernet buffer? How many network buffers do you have configured? And what is the size of your task stack?

aggarg · February 26, 2024, 7:03am

nop:

(gdb) l *pc
0x800e620 is in vIPerfTask (PWD/freertos_plus_projects/plus/Common/Utilities/iperf_task_v3_0g.c:503).
503                                pxDataClient->ullAmount = pxControlClient->ullAmount;

Can you examine pxDataClient and pxControlClient and see if those seem corrupted? Also, see dst of memcpy and see if there is a possibility of buffer overrun or stack overflow.

nop · March 16, 2024, 5:32am

Hi again,

thanks for the replies. After looking into this problem a few times (and then getting distracted), i finally looked at the disassembly and now the solution was quite simple.

114│ 147             pxIPHeader = ( ( IPHeader_t * ) &( pxNetworkBuffer->pucEthernetBuffer[ ipSIZE_OF_ETH_HEADER ] ) );
115│    0x0801862a <+58>:    ldr     r3, [r4, #36]   @ 0x24
116├──> 0x0801862c <+60>:    udf     #255    @ 0xff

The problem was gcc generating an udf opcode, this can be disabled with -fno-delete-null-pointer-checks. see this st forum post.
I don’t really understand the underlying mechanics of this problem and why hping --fast makes it not reach this code but hping --faster does, but this solved it.

The other chrash was a similar problem:

 903│ 503                                pxDataClient->ullAmount = pxControlClient->ullAmount;
 904│    0x0800e4d8 <+988>:   ldr.w   r3, [r10]
 905│    0x0800e4dc <+992>:   ldr.w   r0, [r9]
 906├──> 0x0800e4e0 <+996>:   vldr    d16, [r3, #16]

with vldr d16 the compiler tried to use a non existing fpu core register (should only have d0-d15). And indeed the makefile specified the wrong fpu type:

-       -mfpu=vfpv4 \
+       -mfpu=fpv4-sp-d16 \

i hope i will get to upload and share a cleaned up version of the example-project at some point when i got time…

RAc · March 16, 2024, 6:36am

To me this reads as if the compiler inserts code to trap NULL pointer dereferencing, meaning that

pxNetworkBuffer->pucEthernetBuffer

is 0 at the point of execution. Is that a valid address on your platform? If not, you may just be curing the symptoms, but not the cause.

nop · March 17, 2024, 3:30pm

I suspect some things get written by the DMA, so the compiler is not aware of it. If it really still is a nullpointer at execution time, shouldn’t the cpu also go into a fault handler? the code seems to run stable now, but i get what you mean with treating symptoms…

RAc · March 17, 2024, 4:19pm

Not necessarily. On many MCUs, 0 is mapped to the beginning of internal flash which is always readable (and may also be writeable when in flash programming mode), so 0 pointer dereferencing by itself is neither illegal nor technically forbidden in such scenarios. Yet it is one very very common cause for runtime problems, so I suspect that this compiler setting is an optional aid for developers to trap this case and distinguish it from more generic access errors even in those cases where there is no memory mapped to 0.

Again, without this setting in action, it may be the case that your code still attempts to access network buffers at address 0, but coincidentally, the memory behind it (the IVT probably) causes the network driver to bail out in a begnine manner. According to Murphy, this WILL at some point (eg after a firmware update that leaves the IVT different) make the software fail, and most probably at the most critical and inaccessible site of your most important customer. BTDT. I would leave the check in, wait till it gets hit and then inspect your data structures. Alternatively, you could set a hardware breakpoint to catch null pointer dereferences.