FreeRTOS 10.1.1 HTTP server on SAM4E Xplained stops after sending one file

dhogendorf wrote on Monday, February 18, 2019:

Hello everyone,

I use FreeRTOS 10.1.1 to build a web server. In the ATSAM4E networkdriver usGenerateProtocolChecksum() complaines “too few arguments to function ‘usGenerateProtocolChecksum’”, size was missing.
I added the size by taking it from gmac.c where it calls vGMACGenerateChecksum( ( uint8_t * ) p_tx_td->addr, ul_size ).
So the call in network. c now is usGenerateProtocolChecksum( ( uint8_t * ) apBuffer, ul_size, pdTRUE ), the size beeing the original size received in gmac.c.
I’m mot sure if this is correct, but it won’t hit breakpoints set on errors in FreeRTOS_IP.c usGenerateProtocolChecksum.

As there is no ATSAM4E demo included in version 10.1.1 I used as much of the demo 160919_FreeRTOS_Labs as needed.
I got it all up and running and can with FTP I add the HTML_for_default_pages to the disk.

Now, when I request the freertos.html webpage, I get the web page, but without the logo and the ftp picture.
Whireshark shows the full response for the freertos.html with an ACK after the HTTP OK. The Get /logo.jpg is still send, but the web server does not respond to it. I included the file wireshark10.txt with the wireshark info.

Next I replaced the complete networkinterface folder with the folder from the 160919_FreeRTOS_Labs version.
This time the server responds OK, I get the complete web page including the logo and FTP picture.
the WireShark info is in the file wireshark9.txt

Can anyone help me getting it running for the V10.1.1 version?

Thanks in advance,

Dik Hogendorf

heinbali01 wrote on Monday, February 18, 2019:

Hi Dik, would you mind attaching the original PCAP files? Those are much easier to read than a
TXT

dhogendorf wrote on Tuesday, February 19, 2019:

Hi Hein,

Thank you for the fast response.

These are the pcap files from a new run.
Digging in further I checked the tx_release_count in NetworkInterface.c after a run
tx_release_count[0] = 0x08 (network buffers released?)
tx_release_count[1] = 0x00 (no buffer selected?)
tx_release_count[2] = 0x08 (Tx callback?)
tx_release_count[3] = 0x17 (timeout waiting for free Tx descriptor?)

so it looks to me as if it can’t get a Tx descriptor
if I break on tx_release_count[ 3 ]++ tx_release_count 0 and 2 are both 0x08

kind regards
Dik Hogendorf

Van: Hein Tibosch [mailto:heinbali01@users.sourceforge.net]
Verzonden: maandag 18 februari 2019 16:29
Aan: [freertos:discussion] 382005@discussion.freertos.p.re.sourceforge.net
Onderwerp: [freertos:discussion] FreeRTOS 10.1.1 HTTP server on SAM4E Xplained stops after sending one file

Hi Dik, would you mind attaching the original PCAP files? Those are much easier to read than a
TXT


FreeRTOS 10.1.1 HTTP server on SAM4E Xplained stops after sending one filehttps://sourceforge.net/p/freertos/discussion/382005/thread/758a0c06c3/?limit=250#7ee0


Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/freertos/discussion/382005/

To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

heinbali01 wrote on Wednesday, February 20, 2019:

About tx_release_count :

  • [0] must be equal to [2]
  • Both [1] and [3] must remain zero.

tx_release_count[2] :

After a packet was delivered, there is an interrupt from the EMAC.

The TX DMA descriptors are returned via a queue ‘xTxBufferQueue’

tx_release_count 0 and 1:

The task prvEMACHandlerTask() will free the Network Buffer.

If the buffer was OK, it will call vReleaseNetworkBufferAndDescriptor() and increase tx_release_count[0].
Otherwise, it will increase tx_release_count[1]

The above is only true for zero-copy drivers (ipconfigZERO_COPY_TX_DRIVER).
Other drivers will only increase tx_release_count[0].

xTXDescriptorSemaphore is a “counting semaphore”, it keeps track of the number of available DMA descriptors for transmission. It is incremented after a DMA descriptor is returned to the driver.

Have you seen this warning ?

    FreeRTOS_printf( ( "TX DMA buffers: lowest %lu\n", uxLowestSemCount ) );

When tx_release_count[3] is increased, all DMA descriptors were occupied. That is unexpected.

GMAC_TX_BUFFERS defines the number of DMA/TX descriptors. How is it defined in your project?

But even if you only have 1 TX descriptor, there should not be a time-out: xNetworkInterfaceOutput waits for 50 ms for a TX descriptor. That is more than enough to deliver the largest packet.

I think that your GMAC_TX_BUFFERS equals 8 and that xTxCallback() is never called by the driver gmac_SAM.c.

Here you find the latest SAM driver, I attached it to this post

I left in two lines that must be taken away:

-NVIC_DisableIRQ( GMAC_IRQn );
-NVIC_EnableIRQ( GMAC_IRQn );

I temporarily disabled the IRQ in order debug the code. That might cause the problem that you are describing.

Hi @htibosch, I am runing the latest FreeRTOS v202112.00 and am having the same issue, where tx_release_[3] is not 0. tx_release_count[0] and [1] are both always matching. I am running the NetworkInterface.c from the DriverSAM, along with the gmac_SAM.h/c files.

I have been trying to track down the issue, but I cannot figure out exactly when a TxBufferDescriptor semaphore is not being released. When I launch my software, it all works. The issue only starts happening after I send a command to the microcontroller to send me a lot of data. So it really just sounds like the Ethernet stack is getting overwhelemed, but I’d expect for all the packets to be sent/receieved after a while and everything to settle down, however once tx_release_count[3] starts incrementing, my board just stops responding.

I am using a SAME70 processor on a custom board. Here are some of the IPConfig macros, I’m including all the ones that I think might be relevant:

#define ipconfigZERO_COPY_RX_DRIVER (1)
#define ipconfigZERO_COPY_TX_DRIVER (1)
#define ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS 14
#define ipconfigETHERNET_MINIMUM_PACKET_BYTES 60
#define GMAC_RX_BUFFERS (8)
#define GMAC_TX_BUFFERS (4)

If I double the number of GMAC_TX_BUFFERS it appears to work most of the times. It scares me though that if I ever get close to the maximum number of buffers, after a few days or maybe weeks, then I’m done.

Any thoughts or ideas on figuring out where it’s going wrong? Once tx_release_count[3] starts going up, should it not fix itself?

My thoughts are that the there is a race condition with xNetworkInterfaceOutput and GMAC_Handler. Almost like multiple packets are being sent too fast and xTxCallback is not getting hit for each packet sent out and so there’s a BD leak.

Thank you for your time.

Hi @Maty

Are the tx_release_count values the same? Also do they hold after board stops responding?

tx_release_count[0] and tx_release_count[2] should typically be roughly equal. The former is analogous to a “take_count(xTxBufferQueue)” and the latter to a “give_count(xTxBufferQueue)”.

Given tx_release_count[2] is being incremented prvTxCallback must be getting invoked.

According to tx_release_count[0], it would seem the issue happens after 8 TX callbacks.

@htibosch @kanherea Any further insights?

Hi @Maty, thank you for sharing this.
I am preparing some changes to DriverSAM/NetworkInterface. Once tested on my board I will show it here.

Hello @Maty, I don’t know how difficult it is to reproduce the problem, but could you test the following adapted driver:
DriverSAM_test_version.zip (24.5 KB)

It should prevent some racing conditions.

Earlier, I made similar changes to other drivers, see e.g. PR #459.

I just ran an iperf3 test, where the DUT is sending a huge stream of data to my laptop.

After that it showed the value of tx_release_count:

[0] = 95121
[1] = 0
[2] = 95121
[3] = 0

[3] indeed means that a packet can not be sent. That should never occur. It is enough to allocate 2 outgoing DMA buffers:

#define GMAC_TX_BUFFERS    2

so you can increase GMAC_RX_BUFFERS to 10.

[1] means that the function pxPacketBuffer_to_NetworkBuffer() has failed. That should also never occur.

Now if this solution doesn’t solve your problem, we’ll have to debug it more.

Are you using a custom-made board, or the standard SAME70 Xplained board?

Hello @htibosch, thank you for taking a look at this. I will try those files that you sent over and get back to you with any feedback.

The SAME70 driver can run in either three ways:

  1. Disable D-caching so you can use any SRAM for DMA
  2. Enable D-caching and use part of the DTCM for DMA
  3. Enable D-caching and use functions like SCB_InvalidateDCache()

Ad 1: D-caching is very useful, it will make the application run a lot faster.

Ad 2: DTCM memory is uncached, so it also works while D-caching is enabled.

Ad 3: That option is not yet correctly implemented :frowning:

I just worked out option 2 that uses TCM (Tightly Coupled Memory). It shows a real good performance in combination with D-caching enabled.

I will add some text about this in the readme file.

@Maty, could you find time next week to check the new version?

Hello @htibosch. Reproducing the behavior has been challenging but I think I got it by just setting GMAX_TX_BUFFERS to 2.

It seems like the problem still persists, tx_release_count[3] starts counting up. I also noticed this new behavior (it also happened with the old code), where sometimes tx_release_count[0] and [2] will be sitting at 0 and [3] starts incrementing immediately. So, when I set the buffer count to 2 I’m able to send anywhere between 0 packets and maybe 7,000 before I loose all of the semaphores.

The caching system has been quite problematic, but nothing to do with FreeRTOS. However, I did have a couple of questions, some relating to caching.

  1. In the invalidate cache code there is this comment /* SAME70 does not have clean/invalidate per area. */, I’m not able to find any documentation supporting this. How was this concluded?

  2. Inside of prvGMACInit(), I set the MDC clock divider based on the peripheral clock, not cpu with this line of code gmac_set_mdc_clock( GMAC, sysclk_get_peripheral_hz() );. Reading the datasheet on page 276 and 696, the GMAC peripheral get its clock from the peripheral clock (MCK). The current code calls sysclk_get_cpu_hz() which returns the cpu clock (HCLK). I very well could be missing something.

  3. I also added an if statement to xNetworkInterfaceOutput() where gmac_dev_write() gets called. I saw this as a possiblity of where the buffer descriptor could’ve gotten lost.

if ( gmac_dev_write( &gs_gmac_dev, ( void * ) pxDescriptor->pucEthernetBuffer, pxDescriptor->xDataLength ) != GMAC_OK )
{
   configASSERT(0); //!!! 
}

As far as the caching goes though, I set NETWORK_BUFFERS_CACHED to 0, because I put the ".first_data" section in the NOCACHE_SRAM_REGION. If you enable MPU_HAS_NOCACHE_REGION in the ASF library (mpu.h), then the last N-bytes of SRAM is non-cacheable. This seemed to do the trick. I’m able to send/receieve packets, for the most part.

I am also curious as to the prurpose of the following line of code in gmac_SAM.c

/*#warning Trying out */
gmac_start_transmission( p_hw );

Just an update, I ran the new code overnight and tx_release_count[0] and tx_release_count[1] both got stuck at 52,558,643 before tx_release_count[3] started counting up. I am now unable to send any more packets.

Just for reference I would say that [0] and [1] are incrementing at a rate of 1,500 per second. I’m able to run the predecessor board, which uses a Cortex-M4 (from NXP) at 2,500 with the same application code for months non-stop.

I did also find a small irrelevant bug in this NetworkInterface.c. The 2nd line inside prvEMACHandlerTask sets UBaseType_t uxLowestSemCount = 0. I believe this should be static UBaseType_t uxLowestSemCount = GMAC_TX_BUFFERS + 1;. The “+ 1” is sort of optional I guess. I just wanted to print the message the very first time.

@Maty, thank you for all the effort you take to sort this out.

You wrote:

I am also curious as to the purpose of the following line of code in gmac_SAM.c
/*#warning Trying out */
gmac_start_transmission( p_hw );

This function will trigger the CPU to check for transmission:

Bit 9 – TSTART - Start Transmission
Writing a '1' to this bit starts transmission.
Writing a '0' to this bit has no effect.

As you see I was not sure about when to call the TX trigger: just before or just after clearing the GMAC_TXD_USED flag.
I think that the first call to gmac_start_transmission() doesn’t add anything.

if ( gmac_dev_write( ) != GMAC_OK )
{
   configASSERT(0); //!!! 
}

That is a very good one, because when the function fails to send a packet, transmission will get blocked. Have you seen an assert happening?

In a meanwhile I run a TCP/IP test ( with iperf3 ), in which de SAME70 will transmit one TB of data.
Right now it is at: 6541504 0 6541504 0.
I have GMAC_TX_BUFFERS defined as 2, and I have included the above configASSERT().

The 2nd line inside prvEMACHandlerTask sets
UBaseType_t uxLowestSemCount = 0
I believe this should be
static UBaseType_t uxLowestSemCount = GMAC_TX_BUFFERS + 1;

You are right about that, except that the static is not necessary: this is a task function that never ends.

If you enable MPU_HAS_NOCACHE_REGION in the ASF library (mpu.h), then the last N-bytes of SRAM is non-cacheable

That is a good idea, I didn’t know this!
As I wrote earlier, I’ve put .first_data in DTCM memory, which gives a high performance without caching.

Inside of prvGMACInit(), I set the MDC clock divider based on the peripheral clock

I don’t know. It sounds logic what you are saying.

With cpu_hz() I get 150 MHz = GMAC_NCFGR_CLK_MCK_64
With peripheral_hz() I get 75 MHz = GMAC_NCFGR_CLK_MCK_32
Meaning a slower Management Data Clock. When using peripheral_hz(), it works as well.

The reference says:

The Management Data Clock (MDC) should not toggle faster
than 2.5 MHz (minimum period of 400 ns), as defined by the
IEEE 802.3 standard. MDC is generated by dividing down MCK.
Three bits in the Network Configuration register determine
by how much MCK should be divided to produce MDC.

After sending 29,575,605 packets, transmission stopped. So I can reproduce the problem you’re facing. Tomorrow more news.

Here is a new version that I am testing with:

DriverSAM_test_version_v1.zip (25.2 KB)

I finally cleaned up the variable that contains TX statistics:

/*
    The following struct replaces the earlier:
        int tx_release_count[ 4 ];
    The purpose of this struct is to describe the TX events.
*/
typedef struct STransmitStats {
    unsigned tx_enqueue_ok;     /* xNetworkInterfaceOutput() success. */
    unsigned tx_enqueue_fail;   /* xNetworkInterfaceOutput() failed, no slot available. */
    unsigned tx_callback;       /* Transmission ready, buffer returned to driver. */
    unsigned tx_release_ok;     /* Buffer released. */
    unsigned tx_release_bad;    /* Buffer corruption. */
} TransmitStats_t;

extern TransmitStats_t xTransmitStats;

Changes:

  • After writing p_tx_td->status.val, this variable will be read back to to make sure that it was written to SRAM.
  • The function gmac_start_transmission() will only be called once
  • The function xTxCallback() must check its parameter puc_buffer. When NULL, do not take further actions ( a Retry Limit Exceeded has occurred ).

My IPERF session has been running now for 6 hours without a problem, sending about 47 M packets.

EDIT : the application has been sending packets for 24 hours with stalling.
EDIT 2 my test was not good: the DUT was only acking incoming data, in stead of sending full-size packets (using iperf -R).

Hi @htibosch, thank you for your time on the issue.

if ( gmac_dev_write( ) != GMAC_OK )
{
   configASSERT(0); //!!! 
}

That is a very good one, because when the function fails to send a packet, transmission will get blocked. Have you seen an assert happening?

I have never seen that assert happening.

I don’t know. It sounds logic what you are saying.

With cpu_hz() I get 150 MHz = GMAC_NCFGR_CLK_MCK_64
With peripheral_hz() I get 75 MHz = GMAC_NCFGR_CLK_MCK_32
Meaning a slower Management Data Clock. When using peripheral_hz(), it works as well.

The reference says:

The Management Data Clock (MDC) should not toggle faster
than 2.5 MHz (minimum period of 400 ns), as defined by the
IEEE 802.3 standard. MDC is generated by dividing down MCK.
Three bits in the Network Configuration register determine
by how much MCK should be divided to produce MDC.

My CPU clock runs at 300 MHz. If you try to use this clock frequency to set the GMAC->NCFGR_CLK_MCK clock divider register then it is too high, you cannot fit a clock divider in the register to get it under the 2.5 MHz MDC clock limit (and I believe incorrect too, this isn’t the clock that the GMAC peripheral is using). My peripheral_hz() returns 150 MHz, which can be divided down to be under 2.5 MHz to make the MDC clock happy.

After many hours of trying to debug all of this, I implemented a silver bullet, and it seemed to fix the system. It’s been running for quite a few days now, non-stop, and no issues. I set GMAC_TX_BUFFERS to 8, but xTXDescriptorSemaphore to 7, meaning that I have a 1 buffer descriptor buffer. I also had to fix all of the checks where it compares the number of semaphores available/used to GMAC_TX_BUFFERS - 1.

I still had the feeling that there was a race condition between FreeRTOS and the buffer descriptors being used/released from the GMAC and so I lied to FreeRTOS, telling it that there was actually 1 less buffer, by decrementing the number of xTXDescriptorSemaphore. I still don’t quite know if, or how, this was the actual problem, but it does run very consistently now.

In the invalidate cache code there is this comment /* SAME70 does not have clean/invalidate per area. */, I’m not able to find any documentation supporting this. How was this concluded?

Do you know anything about this? Enabling cache on this chip has brought me more pain than anything else and this comment scared me… I put .first_data in a non-cached area of RAM and set NETWORK_BUFFERS_CACHED to 0, so the cache_invalidate_...() functions don’t get called, but I’m still very curious.

While googling for the SAME70 transmission issue, I found this PR from Zephyr. They decided to give up on using cached memory. Also, they found the same problem when sending two packets too quickly: the GMAC TX interrupt will never come.

In order to understand your “silver bullet”, would you mind to share your current version of the driver?

My solution would be to create an extra mutex that is taken when activating a TX descriptor in gmac_dev_write(), and it is given when the GMAC TX interrupt is triggered. That also seems to helps 100%, and without a measurable los of speed.