FreeRTOS+TCP freertos_send and "no space"

Good Day,
I have a system running on a Xilinx Zynq. It has been in use for several years, and has worked well.
The core system is based around FreeRTOS+TCP Labs build 160919

We’ve recently made some small changes to the system, and now we’ve run into a problem. We are allowing clients to connect via TCP (using TCP Echo Server Demo as a framework). The clients send a command at their leisure, and receive a reply of data.

This works perfectly. Until it doesn’t. After a random number of requests, the FreeRTOS device will stop replying to the TCP client. The socket is still open, commands can still be sent to the FreeRTOS device - there’s just no response.

I’ve done some debugging, and it seems the freertos_send() is where the failure is occuring. We do have UDP Logging enabled, so I sprinkled to debug printf’s around.

Here’s the code that calls freertos_send()

WrittenSize = 0;
					FreeRTOS_debug_printf( ( "PreFreeRTOS_send WriteSize: %d\n", write_size ) );
					while (WrittenSize < write_size) {
						if (write_size-WrittenSize > CMD_TXBUF_SIZE) { ActualWriteSize = CMD_TXBUF_SIZE; } else { ActualWriteSize = write_size-WrittenSize; }
						WrittenSize += FreeRTOS_send( xConnectedSocket, &write_buf[WrittenSize], ActualWriteSize, 0);
						FreeRTOS_debug_printf( ( "PostFreeRTOS_send WrittenSize: %d\n", WrittenSize ) );
					}

For a while, it works perfectly.

  63.126.875 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  63.127.075 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  63.128.655 [Command_Server] PostFreeRTOS_send WrittenSize: 1969
  64.016.244 [Tmr Svc   ] Status LED State: OFF
  64.016.272 [Tmr Svc   ] Tick Count: 620000
  64.016.300 [Tmr Svc   ] Eth Status: 4
  64.360.036 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  64.360.237 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  64.362.354 [Command_Server] PostFreeRTOS_send WrittenSize: 1969
  64.710.854 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  64.711.054 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  64.711.699 [Command_Server] PostFreeRTOS_send WrittenSize: 1969
  65.016.274 [Tmr Svc   ] Status LED State: ON
  65.016.301 [Tmr Svc   ] Tick Count: 630000
  65.016.328 [Tmr Svc   ] Eth Status: 4
  65.582.766 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  65.583.007 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  65.584.072 [Command_Server] PostFreeRTOS_send WrittenSize: 1969
  66.016.344 [Tmr Svc   ] Status LED State: OFF
  66.016.371 [Tmr Svc   ] Tick Count: 640000
  66.016.398 [Tmr Svc   ] Eth Status: 4
  66.082.665 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  66.083.161 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  66.083.523 [Command_Server] PostFreeRTOS_send WrittenSize: 1969
  66.155.013 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  66.155.252 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  66.156.350 [Command_Server] PostFreeRTOS_send WrittenSize: 1969
  66.182.248 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  66.182.482 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  66.183.353 [Command_Server] PostFreeRTOS_send WrittenSize: 1969
  66.275.432 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  66.275.632 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  66.276.274 [Command_Server] PostFreeRTOS_send WrittenSize: 1969
  66.287.343 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  66.287.557 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  66.288.615 [Command_Server] PostFreeRTOS_send WrittenSize: 1969
  66.369.637 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  66.369.852 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  66.371.396 [Command_Server] PostFreeRTOS_send WrittenSize: 1969
  66.383.218 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  66.383.353 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  66.383.955 [Command_Server] PostFreeRTOS_send WrittenSize: 1969
  66.461.642 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  66.461.767 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  66.461.866 [Command_Server] PostFreeRTOS_send WrittenSize: 1969
  66.488.743 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  66.488.843 [Command_Server] PostFreeRTOS_send WrittenSize: 1384
  66.489.196 [Command_Server] PostFreeRTOS_send WrittenSize: 1969

But then, suddenly,

  67.831.410 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  67.831.913 [Command_Server] FreeRTOS_send: 7734 -> c0a8651bip:62195: no space
  67.831.949 [Command_Server] PostFreeRTOS_send WrittenSize: -28

Looks like -28 corresponds to pdFREERTOS_ERRNO_ENOSPC

So freertos_send bombs because there’s no space in a buffer is what I can gather. Which is fine. But what would cause this error to prevent freertos_send from ever working again?

It’s probably worth mentioning that I have a huge heap, tons of stack size, and 512 network buffer descriptors. This device has 4GB of ram, so I’m not pressed for memory space.

Here are some additional details after the error occurs

ip-debug-stats

Packets received by the network interface 2442
Count of transmitted packets 5937
Count of packets dropped to generate ARP 1
Lowest ever available network buffers 476
Lowest ever free space in network event queue 534
Count of failed attempts to obtain a network buffer 0
Count of expired ARP entries 0
Count of failures to create a socket 0
Count of times recvfrom() has discarding bytes 0
Count of lost Etheret Rx events (event queue full?) 0
Count of lost IP stack events (event queue full?) 0
Count of failed calls to bind() 0
Count of receive timeouts 6445959
Count of failed sends due to oversized payload 0
Count of failed sends due to unbound socket 0
Count of failed transmits due to timeout 0
Number of times task had to wait to obtain a DMA Tx descriptor 0
Failed to notify select group 0
Total network buffers obtained 8176
Total network buffers released 8145
net-stats
1685.722.295 [IP-task   ] Prot Port IP-Remote       : Port  R/T Status       Alive  tmout Child
1685.722.335 [IP-task   ] TCP  7734 0               ip:    0 0/0 eTCP_LISTEN   999999      0 1/20
1685.722.380 [IP-task   ] TCP  7734 c0a8651b        ip:62195 1/1 eESTABLISHED   19625  14985
1685.722.430 [IP-task   ] UDP Port  7734
1685.722.454 [IP-task   ] UDP Port  8999
1685.722.478 [IP-task   ] UDP Port 65001
1685.722.503 [IP-task   ] UDP Port 65002
1685.722.527 [IP-task   ] UDP Port  5001
1685.722.549 [IP-task   ] UDP Port  1499
1685.722.571 [IP-task   ] FreeRTOS_netstat: 8 sockets 476 < 480 < 512 buffers free
task-stats

Task          State	Priority	Stack	#
************************************************
UDP CLI        	R	0	588	7
IDLE           	R	0	164	3
DetectAndSet   	B	3	65336	8
IP-task        	B	5	720	1
EMAC           	B	6	206	6
Tmr Svc        	B	3	188	4
LogTask        	B	0	426	11
Command_Server 	B	0	65242	12
ReadAndXmit    	B	4	65212	10
CommandListener	S	2	65316	9
query-heap

Current free heap 3049152 bytes, minimum ever free heap 3048784 bytes

Any ideas what could be causing this? Is it possible this has been corrected in a newer version of FreeRTOS+TCP? Are there substantial difficulties in transferring from the old FreeRTOS+TCP Labs release to a newer 10.x release?

Much Thanks,
Ben

More searching of the forums turned up some talk about PR1798

Do you think this issue could be solved by upgrading the Kernel, +TCP and using the code from 1798?

Thanks again,
Ben

Yes, it is worth upgrading to the latest release of all. You can find that here

Hein,

Alright, I’ll work on upgrading the kernel and the +tcp stuff in the next day or two. Hopefully the xilinx driver changes will solve the issue.

What is the fundamental difference between


and

I’m a bit confused by the multiple different places for FreeRTOS code since the amazon merge.

I can imagine the confusion about this.

The kernel sources are release on FreeRTOS-Kernel, and the FreeRTOS libraries, such as FreeRTOS+TCP is released on FreeRTOS-Plus.

The PR #1798 for the Xilinx driver still fist appeared on AWS. New FreeRTOS PR’s should be submitted to github/freertos.

Eventually, the FreeRTOS software will appear in the AWS repo as symbolic links, just like the kernel that you see here.

Note that the FreeRTOS+TCP library on FreeRTOS/FreeRTOS-Plus-TCP is organised slightly different, it misses a source directory.

The FreeRTOS+FAT library can be found here

Hein,

Thank you for that.

So I upgraded both the Kernel to 10.3.1 and +TCP per the link you provided. Added the xApplicationGetRandomNumber and ulApplicationGetNextSequenceNumber functions.

Everything compiles happily, and the application works. Exactly as before.

Same exact random error.

Based upon pdFREERTOS_ERRNO_ENOSPC, do you have any idea what could be happening?
I totally understand FreeRTOS_Send() bombing out if no space in a buffer is available somewhere, but FreeRTOS_Send never working again after that error seems broken.

Thoughts much appreciated!

P.S. It occurs randomly, but I can absolutely force it by sending a lot of requests at the device. It has happened with a single request, but it definitely seems more likely to occur if I hammer on it quickly.

In case it’s helpful, here is the routine that sends and receives data on this TCP Socket.

char read_buf[1023];
char write_buf[1023];
int write_size;
int read_size;

static void command_inf_thrd( void *pvParameters )
{

int delay = t_300us;
Socket_t xConnectedSocket;
static const TickType_t xReceiveTimeOut = pdMS_TO_TICKS( 5 );
static const TickType_t xSendTimeOut = pdMS_TO_TICKS( 5 );
int i;
TickType_t xTimeOnShutdown;

xConnectedSocket = ( Socket_t ) pvParameters;

have_client++; //Keep Track of number of connected clients
FreeRTOS_printf( ( "Command Control Client Connected\r\n" ) );

//Display the Prompt
FreeRTOS_send (xConnectedSocket, prompt, prompt_size, 0);
no_cmd_counter = 0;

FreeRTOS_setsockopt( xConnectedSocket, 0, FREERTOS_SO_RCVTIMEO, &xReceiveTimeOut, sizeof( xReceiveTimeOut ) );
FreeRTOS_setsockopt( xConnectedSocket, 0, FREERTOS_SO_SNDTIMEO, &xSendTimeOut, sizeof( xSendTimeOut ) );

		for( ;; )
		{

			//Send any Data From the process_command routine
			if (SendData) {
				SendData = 0;
				FreeRTOS_send( xConnectedSocket, write_buf, write_size, 0);
				clear_buffers(); //clear write_buf and read_buf
			}

			/* Receive data on the socket. */
			read_size = FreeRTOS_recv( xConnectedSocket, read_buf, CMD_RXBUF_SIZE, 0 );

			/* If data was received, Process it appropriately */
			if( read_size > 0 )
			{
				if (read_size >= 1) {
					no_cmd_counter = 0;
					uint8_t have_eoc = 0;
					for (i=0; i<read_size; i++) {
						if(read_buf[i] == ';') {
							have_eoc = 1;
						}
					}
					if (have_eoc) {	//if string ends with a semicolon, indicating a command
						read_buf[read_size] = '\0';
						process_command(); //This fills write_buf, write_size and SendData
					} else {
					write_size = sprintf(write_buf, "partial read or no end of command sent (;)\r\n");
					FreeRTOS_send( xConnectedSocket, write_buf, write_size, 0);
					FreeRTOS_send( xConnectedSocket, read_buf, read_size, 0);
					FreeRTOS_send( xConnectedSocket, prompt, prompt_size, 0);
					clear_buffers();
					}

				}
			} else if (read_size == 0) {
				if(no_cmd_counter > no_cmd_timeout) {
					no_cmd_counter = 0;
					//DisconnectClient = 1;
					//Don't disconnect due to inactivity
				}

			} else {
				/* Socket closed? */
				break;
			}
			if (DisconnectClient == 1) {
				break;
			}
		vTaskDelay(delay);
		}
	//}

	/* Initiate a shutdown in case it has not already been initiated. */
	FreeRTOS_shutdown( xConnectedSocket, FREERTOS_SHUT_RDWR );

	/* Wait for the shutdown to take effect, indicated by FreeRTOS_recv()
	returning an error. */
	xTimeOnShutdown = xTaskGetTickCount();
	do {
		if( FreeRTOS_recv( xConnectedSocket, read_buf, CMD_RXBUF_SIZE, 0 ) < 0 )
		{
			break;
		}
	} while( ( xTaskGetTickCount() - xTimeOnShutdown ) < tcpCommandSHUTDOWN_DELAY );

	/* Finished with the socket, buffer, the task. */
	//vPortFree( read_buf );
	FreeRTOS_closesocket( xConnectedSocket );
	have_client--;
	write_size = 0;
	DisconnectClient = 0;
	clear_buffers();
	FreeRTOS_printf( ( "Command Control Client Disconnected\r\n" ) );

	vTaskDelete( NULL );

}

You wrote:

But then, suddenly,

  67.831.410 [Command_Server] PreFreeRTOS_send WriteSize: 1969
  67.831.913 [Command_Server] FreeRTOS_send: 7734 -> 192.168.101.27:62195: no space
  67.831.949 [Command_Server] PostFreeRTOS_send WrittenSize: -28

If 67.831.913 stands for SEC,MS,US, it is strange to see that FreeRTOS_send() returns within half a msec.
Could you check that? Is the return value of pdMS_TO_TICKS( 5000 ) as expected? It looks more like a 500 uS wait.

Or check: does uTaskDelay( pdMS_TO_TICKS( 5000 ) ) pause for 5 seconds?

Also, it would be interesting to see a PCAP of the conversation at the moment it goes wrong ( with WireShark, use “Export Specified Packets” ). If you zip a PCAP, you can attach it to your post. Unless it contains classified information :slight_smile:

One remark about your code, which has nothing to do with the send problem: FreeRTOS_shutdown() only needs to be called if you want to end a TCP conversion. After FreeRTOS_recv() has returned -pdFREERTOS_ERRNO_ENOTCONN, you can call FreeRTOS_closesocket() immediately.

Looks like -28 corresponds to pdFREERTOS_ERRNO_ENOSPC

Yes sure. All errno’s are defined in FreeRTOS_errno_TCP.h.

FreeRTOS_netstat: 8 sockets 476 < 480 < 512 buffers free

I think that your application can do with less network buffers, see ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS. Maybe 64 would be enough? or 128, because you have loads of heap.

It would also be interesting to see your FreeRTOSConfig.h and FreeRTOSIPConfig.h files.
You might want to play with the FREERTOS_SO_WIN_PROPERTIES socket option to change the buffer sizes.

Hello Hein,

Yes, the SEC,MS,US was correct - I had reduced the timeouts down to 500uS to try and get things to break faster during debug. It’s set back to 5mS now and makes no difference.

I started looking at things with wireshark, and foudn that there were a lot of TCP Window updates occuring right around when FreeRTOS_send went off in the weeds. At some points, the scaling factor was -1, which seemed strange.

In any case, I started adjusting window and buffer parameters, and I’ve seen some success. Things are much improved - perhaps even fixed (more testing needed) with these values

xWinProps.lTxBufSize = ( ipconfigTCP_MSS * 12u );
xWinProps.lTxWinSize = 8;

xWinProps.lRxBufSize = ( ipconfigTCP_MSS * 12u );
xWinProps.lRxWinSize = 12;
#define ipconfigNETWORK_MTU		552

They were previously WinSize of 2, lxxBufSize of 23360 and MTU 1500

Just for reference:

#define ipconfigTCP_MSS		( ipconfigNETWORK_MTU - ipSIZE_OF_IPv4_HEADER - ipSIZE_OF_TCP_HEADER )

I’m now occasionally seeing window full if I hammer it with a ton of requests, but the device recovers properly and doesn’t completely lock up sending.

For a Zynq with 4GB of ram, what would you think is a reasonable maximum for these values?

Thanks for your help thus far, it’s appreciated.

Edit:
Regarding FreeRTOS_shutdown() - this was present in the FreeRTOS Labs 160919 Demo, so I left it in there. I’ll remove it if it isn’t necessary, but as you said, It shouldn’t have anything to do with this.

Alright, that did not fix the problem. Just pushed it off a little. I can still cause problems with freertos_send using these new settings.

Changed MTU back to 1526 since it didn’t seem to significantly change anything - the problem shows up with both MTU sizes.

I’ve sprinkled some additional debug code around, and here’s what I’ve come up with. When the problem first begins, FreeRTOS_send(…) returns a value as if the data was sent, but nothing shows up on the wire. At least nothing I can see with Wireshark anyway.

So that actually looks like where the problem begins - not with the “No Space” error, but with FreeRTOS_send() correctly accepting data but then not transmitting it.

Attached is the packet capture. “gdb 17” is the command sent to the FreeRTOS device that requests data - a blob of binary data is then returned.
At packet number 81 in this capture, you can see “gdb 17” is sent, just like all of the prior packets in the capture. But after packet 81, the FreeRTOS device never responds again.

Packet 81 has timestamp of 05:00:15:025

Here is a UDPPrintF output showing timestamp and the return of FreeRTOS_Send()

(05:00:13.359)   69.205.630 [Command_Control] ProcessSend: 4001/4001
(05:00:13.380)   69.205.941 [Command_Control] PromptSend: 9/9
(05:00:13.885)   69.731.400 [Command_Control] ProcessSend: 4001/4001
(05:00:13.903)   69.732.007 [Command_Control] PromptSend: 9/9
(05:00:14.252)   70.098.169 [Command_Control] ProcessSend: 3305/3305
(05:00:14.272)   70.098.481 [Command_Control] PromptSend: 9/9
(05:00:14.445)   70.291.365 [EMAC      ] Network buffers: 223 lowest 222
(05:00:14.466)   70.293.639 [Command_Control] ProcessSend: 1781/1781
(05:00:14.485)   70.293.992 [Command_Control] PromptSend: 9/9
(05:00:14.711)   70.557.288 [Command_Control] ProcessSend: 2390/2390
(05:00:14.732)   70.557.630 [Command_Control] PromptSend: 9/9
(05:00:15.032)   70.878.421 [Command_Control] ProcessSend: 2927/2927
(05:00:15.058)   70.878.767 [Command_Control] PromptSend: 9/9
(05:00:15.319)   71.165.228 [Command_Control] ProcessSend: 2624/2624
(05:00:15.342)   71.166.039 [Command_Control] PromptSend: 9/9
(05:00:15.665)   71.510.312 [Command_Control] ProcessSend: 3155/3155
(05:00:15.694)   71.510.638 [Command_Control] PromptSend: 9/9
(05:00:16.258)   72.022.042 [Command_Control] ProcessSend: 4001/4001
(05:00:16.289)   72.022.362 [Command_Control] PromptSend: 9/9

and here is the freertos_send command that generates that output

SendResponse = FreeRTOS_send( xConnectedSocket, write_buf, write_size, 0);
					FreeRTOS_printf( ( "ProcessSend: %d/%d\r\n", SendResponse,write_size ) );
					if (SendResponse == -pdFREERTOS_ERRNO_ENOTCONN || SendResponse == -pdFREERTOS_ERRNO_ENOSPC) { break; }

There’s a similar block (PromptSend:) that creates the command prompt. I currently have it breaking on those two errors so that I can capture the event, but I’m not sure that is very interesting any longer.

So the question becomes, what would cause freertos_send to accept data for transmit, but then never put it on the wire, and not really issue many complaints?

Thanks again!

Two additional captures just to provide more info.

Also, something interesting. Very re-producable.

If I continually send rapid requests (gdb 17) without stopping, the FreeRTOS machine keeps up perfectly and never misses a beat. It seems as if it would happily do that forever.

If I pause, and then resume rapid fire requests, it responds just fine again and provides thousands of responses without any sign of failure.

If, however, I pause, and then slowly resume requests at a lighter pace, that’s when everything fails. Capture3.zip shows this pretty well.

Any thoughts?
Thanks!

Additional Info:
Wondering if this may be related: FreeRTOS TCP/IP - TCP send priority

Hein,

More interesting details. Before each FreeRTOS_send(), I added a
FreeRTOS_maywrite() and stored the returned value.

As receiving and sending goes well, the FreeRTOS_maywrite() response stays high - mostly a little bigger than the xWinProps.lTxBufSize (not sure why it’s BIGGER than that, but that’s another story).

Then, when things go sour, this value starts dropping as each FreeRTOS_send() is completed, and it drops, and drops, and never recovers.

So it looks like data is being added to the transmit buffer, but really is never being physically sent out.

Any suggestions as to where I should be looking or what might be causing +TCP to not send this data out? Even if I completely stop sending requests for a long period of time (many minutes), the data never gets sent and maywrite() doesn’t recover.

Many Thanks
Ben

Hein,

I appreciate all the time you took to help out with this. It turned out that I was looking in all the wrong places.
Though tuning the TCP parameters of the FreeRTOS/Zynq board helped, the problem was really with the receiving end. The user application that was fetching this data was not properly clearing the TCP RX buffer, and so the receiving machine reduced the TCP RX Window as the buffer filled. The FreeRTOS/Zynq board was behaving exactly as it should - not transmitting once the TCP Window was too low to allow it.

After a little work on the receiving application, I’ve now got data flowing at full rate with no pauses or failures.

Thank you again for your help.

P.S. The program for TCP/UDP Debugging that you suggested offline is quite nice. Thank you for sharing. If anyone else needs a simple tool to send and receive TCP or UDP data, here’s the link. Hercules

Hello @Pilot, thanks for reporting back. if you have more doubts or questions, please ask.
Hein

1 Like