FreeRTOS + lwIP TCP cannot receive large packages at high rates

aripod · July 21, 2020, 10:46am

I cannot block as I have other tasks which need to run depending on other interrupts (data from DMA). So I would not like to block on the recv but rather have a semaphore to put the task in block mode and run it when it gets the semaphore from a callback, if possible. Also I have a second task checking if I need to send data over TCP as well on other sockets (depending on the DMA interrupt)

hs2 · July 21, 2020, 11:44am

It’s on you to partition the things to do appropriately.
For me it seems you mixed somewhat unrelated things into 1 task.
That might cause the kind of problems you struggle with.
If you have to deal with simultaneous, but different events (DMA data vs. TCP data) you might be better off to handle them in separate tasks.
For simultaneous, but similar events (e.g. HW interrupts) it’s more easy to handle them in 1 task by e.g. using task notifications supporting a number of flags to block on.
However, if the actions to be taken in this task are related means sequenced (recv TCP data, send it to HW, wait for DMA completion, repeat…) you could handle one step by another (w/o the need to poll/check several events in parallel).

aripod · July 21, 2020, 2:48pm

I have 4 tasks and 2 interrupts to do things individually:

Task 1: calls select with a 5ms timeout and with vTaskDelay (20 ms). When selectreturns that there is activity on the socket I read the incoming data from the buffer with lwip_recvfrom to move the data to a specific buffer. When data has been copied, I give a semaphore to Task 2.
Task 2: It takes the semaphore from task 1 (with portMAX_DELAY). When it gets it, sends the data from the buffer that was copied data to in Task 1 over DMA. Also here blocks on a semaphore waiting for the DMA Tx interrupt to be sure that data has been sent.
Task 3: Waits with xSemaphoreTakeon the DMA Rx interrupt to simply copy the incoming data to a buffer and gives a semaphore to task 4.
Task 4: Waits the semaphore for task 4 to send the received data from DMA over TCP

hs2 · July 21, 2020, 3:41pm

With this partitioning I still don’t understand the need for polling in task 1.
Its only job is to recv TCP data and trigger the TxDMA task to transmit the received frame.
It’s not that important but since the jobs of task 1 and task 2 seem to be sequential why not combine them in 1 task ? Same would apply to task3 and 4.
From what you describe it seems both task pairs are strictly coupled resp. working in a sequential way…

aripod · July 21, 2020, 3:47pm

Actually that makes a lot of sense. I will do that. Combine task 1 with task 2 and task 3 with task 4.

However, the polling on task 1 is because if I block it with select (or recv) would not allow task “3+4” run…right? As select blocks there…

hs2 · July 21, 2020, 3:52pm

No, no … blocking means sleeping
The opposition is true, the task falls asleep and gives up the CPU until it gets woken up by the OS e.g. when a semaphore the task blocks/sleeps on was signaled by an ISR or another task. The same applies to a blocking TCP recv (which internally uses a semaphore or something else to wait for network data to collect and return to the caller). While a task sleeps the other tasks get the full CPU and are able to run if there is something to do or are also waiting/sleeping for an event to process.
That’s exactly what you want

aripod · July 21, 2020, 4:01pm

Ok, great. One thing that I didn’t answer is that I use select then recv because I have multiple incoming sockets. But blocking on select then should be the same

hs2 · July 21, 2020, 4:09pm

Right It’s the same with select

aripod · July 22, 2020, 9:30am

I went to change select to block rather than poll but that didn’t work. I am missing something. Now, the tasks looks like this:

	while(1)
	{
		timeout.tv_sec = portMAX_DELAY;
		timeout.tv_usec = 0;
		FD_ZERO(&ready_sockets);
		ready_sockets = current_sockets; 	// This is needed because select is destructive and will return active fds in ready_sockets

		ret = select(maxfd+1, &ready_sockets, NULL, NULL, &timeout);
		if(ret==0)
		{
			//timeout
			//xil_printf("Select timmed out: %d\n", sret);
		}
		else
		{
            // Copy received that to the corresponding buffer
		}
	}

@hs2 Is that the correct way to change select to block rather than poll? I am following this answer.
Do I need to change something on how I create the socket? sock = lwip_socket(AF_INET, SOCK_STREAM, 0) ?
If I do it like that, I get a full buffer much faster than before

hs2 · July 22, 2020, 10:26am

Making select block forever specify NULL pointer as timeout.
I’m not sure how lwIP interprets portMAX_DELAY which is intended for FreeRTOS functions only. LwIP has no idea about it.
Otherwise the code fragment seems ok and there isn’t much to do wrong.
I’d add error handling though e.g. if the peer closes the connection.
What exactly is not working ?

aripod · July 22, 2020, 12:21pm

I removed all tasks to just test the TCP incoming, thinking the problem could be somewhere else. So, this is what I do with select and how I read the incoming buffer:

void socket_recv(fd_set *fd_set, FPGAROS_subscriber *subscriber)
{
	int read_bytes=0, total_bytes=0;
	unsigned char bufsize[4]={0};

	union Usize {
		unsigned char array[4];
		uint32_t size;
	};
	union Usize convert;

	if(FD_ISSET(subscriber->publisher_socket_to_receive_fd, fd_set))
	{
		read_bytes = lwip_recvfrom(subscriber->publisher_socket_to_receive_fd, bufsize, 4, 0, NULL, NULL);	//Read first 4 bytes to get the total length of the message
		if (read_bytes > 0) {
			//xil_printf("TCP received %d bytes\n", read_bytes);
			total_bytes = total_bytes + read_bytes;
			convert.array[0] = bufsize[0];
			convert.array[1] = bufsize[1];
			convert.array[2] = bufsize[2];
			convert.array[3] = bufsize[3];

			subscriber->data_subscribe_size = convert.size+1+4;	// 1 extra for the ID and 4 ones to include the total length
			subscriber->data_to_subscribe = malloc((subscriber->data_subscribe_size)*sizeof(unsigned char));

			subscriber->data_to_subscribe[0] = subscriber->ID;	// ID depends on the HW blocks, so it is directly copied (hardcoded)
			subscriber->data_to_subscribe[1] = bufsize[0];
			subscriber->data_to_subscribe[2] = bufsize[1];
			subscriber->data_to_subscribe[3] = bufsize[2];
			subscriber->data_to_subscribe[4] = bufsize[3];
			while(total_bytes < convert.size+4)
			{
				read_bytes = lwip_recvfrom(subscriber->publisher_socket_to_receive_fd, subscriber->data_to_subscribe+total_bytes+1, convert.size-total_bytes+4, 0, NULL, NULL);	// Read the rest of the bytes
				total_bytes = total_bytes + read_bytes;
				//vTaskDelay(xDelay);	//TODO: Is this delay needed?
			}
			xil_printf("TCP received %d bytes\n", total_bytes);
		}
	}
}

static void TCP_recv(void *p)
{
	uint32_t ret, maxfd=0;
	fd_set current_sockets, ready_sockets;

	// Pointers to pass to functions

	FPGAROS_subscriber *ptr_subscriber_FPGA_ROS_image=&subscriber_FPGA_ROS_image;

	/***** 1: Populate information for nodes and topic names *****/
	subscriber_FPGA_ROS_image_populate_info_nodes_and_topics_names();

	subscriber_FPGA_ROS_image.SubscriberPublisherConnected=NOTCONNECTED;

	/***** 2: Register Subscribers *****/
	// Subscriber image
	subscriber_FPGA_ROS_sub_to_master(ptr_subscriber_FPGA_ROS_image);
	if(subscriber_FPGA_ROS_image.publisher_socket_to_receive_fd>maxfd)
		maxfd = subscriber_FPGA_ROS_image.publisher_socket_to_receive_fd;

	SubscribersRegistered = TRUE;

	/***** 3: Add file descriptors to fd_set *****/
	FD_ZERO(&current_sockets);
	FD_SET(subscriber_FPGA_ROS_image.publisher_socket_to_receive_fd, &current_sockets);

	while(1)
	{
		/***** 4: Subscribers wait for Publishers *****/
		FD_ZERO(&ready_sockets);
		ready_sockets = current_sockets; 	// This is needed because select is destructive and will return active fds in ready_sockets

		ret = select(maxfd+1, &ready_sockets, NULL, NULL, NULL);
		if(ret==0)
		{
			//timeout
			//xil_printf("Select timmed out: %d\n", sret);
		}
		else
		{
			socket_recv(&ready_sockets, ptr_subscriber_FPGA_ROS_image);
		}
	}
}

In step 2 I create the socket so in the while it is ready to receive data. I can read all the bytes that I am supposed to (6404803+56 = 921656) only 9 times. Then the TCP window is full. I also attach the wireshark output.

hs2 · July 22, 2020, 12:56pm

That’s the not-so-nice part of using select/recv mechanism.
You have to handle all received data available in the socket signaled by select.
Otherwise you can run out of sync and the socket gets full without getting signaled by select anymore.
select tells you, there is something new to read.
It would be far more simple to just blocking recv exactly the expected image data from the TCP socket, beam it to FPGA, wait for DMA completion and repeat…
Letting TCP doing the flow control for you.

BTW: I wouldn’t mix BSD and lwIP native API. That might be confusing.
I think there is no need to use lwip_recvfrom (why *from ?) instead of BSD recv.

aripod · July 22, 2020, 1:00pm

The issue is that I have multiple sockets to read from so I cannot just use recv. Other way would be to have one task per socket and then user recv to block on each one, right?

#define recv(a,b,c,d) lwip_recv(a,b,c,d)

int
lwip_recv(int s, void *mem, size_t len, int flags)
{
  return lwip_recvfrom(s, mem, len, flags, NULL, NULL);
}

So in the end seems to be the same

hs2 · July 22, 2020, 1:11pm

That would be a good possibility. I guess each socket is dedicated to a certain (FPGA) service and you don’t feed images from multiple clients at the same time…
So you have a dedicated socket/task for each service. I think doing so would simplify the design a lot and you have full control about the incoming data and the associated FPGA HW (DMA channel ?).

aripod · July 22, 2020, 1:12pm

Multiple sockets because data come can from several places, but then there is only one DMA channel so that is why I wanted to do everything in one single task with select/recv

hs2 · July 22, 2020, 1:18pm

Yes, I know. In the end it’s the same code, but it’s less confusing (for others) using a consistent API.
It’s a very good rule to write software for others.
One day much later you or another poor guy has to do some changes to this application.
His/her 1st question is, why on earth is there lwip_recvfrom used but there BSD standard socket, listen, etc. ???
That’s at least what I’ve learned in the past…

aripod · July 22, 2020, 1:25pm

That is a good point.

I removed select and left with this:

	while(1){
		int read_bytes=0, total_bytes=0;
		unsigned char bufsize[4]={0};

		union Usize {
			unsigned char array[4];
			uint32_t size;
		};
		union Usize convert;
		read_bytes = lwip_recvfrom(subscriber_FPGA_ROS_image.publisher_socket_to_receive_fd, bufsize, 4, 0, NULL, NULL);	//Read first 4 bytes to get the total length of the message
		if (read_bytes > 0) {
			//xil_printf("TCP received %d bytes\n", read_bytes);
			total_bytes = total_bytes + read_bytes;
			convert.array[0] = bufsize[0];
			convert.array[1] = bufsize[1];
			convert.array[2] = bufsize[2];
			convert.array[3] = bufsize[3];

			subscriber_FPGA_ROS_image.data_subscribe_size = convert.size+1+4;	// 1 extra for the ID and 4 ones to include the total length
			subscriber_FPGA_ROS_image.data_to_subscribe = malloc((subscriber_FPGA_ROS_image.data_subscribe_size)*sizeof(unsigned char));

			subscriber_FPGA_ROS_image.data_to_subscribe[0] = subscriber_FPGA_ROS_image.ID;	// ID depends on the HW blocks, so it is directly copied (hardcoded)
			subscriber_FPGA_ROS_image.data_to_subscribe[1] = bufsize[0];
			subscriber_FPGA_ROS_image.data_to_subscribe[2] = bufsize[1];
			subscriber_FPGA_ROS_image.data_to_subscribe[3] = bufsize[2];
			subscriber_FPGA_ROS_image.data_to_subscribe[4] = bufsize[3];
			while(total_bytes < convert.size+4)
			{
				read_bytes = lwip_recvfrom(subscriber_FPGA_ROS_image.publisher_socket_to_receive_fd, subscriber_FPGA_ROS_image.data_to_subscribe+total_bytes+1, convert.size-total_bytes+4, 0, NULL, NULL);	// Read the rest of the bytes
				total_bytes = total_bytes + read_bytes;
				//vTaskDelay(xDelay);	//TODO: Is this delay needed?
			}
			xil_printf("TCP received %d bytes\n", total_bytes);
		}

I still get a full buffer… Could there be something that takes to long in my logic? I know that the first 4 bytes will give me the total number of bytes in the messages so I don’t need to hardcode anything…could that take too long?

hs2 · July 22, 2020, 1:27pm

Well, then you’re either back to use select and handle it properly means flushing the available socket data (by recv non-blocking from all signaled sockets until EWOULDBLOCK) dealing with partial image blocks, etc.
Or have multiple socket handler tasks using simple blocking recv, dealing with 1 image (buffer) at a time, take a FPGA TxDMA mutex, start the TxDMA, wait for completion, release the TxDMA mutex and continue. The mutex is needed to protect the single/shared FPGA TxDMA resource and serialize the HW access.

aripod · July 22, 2020, 1:30pm

I will try that…but still I am not able to read the total number of bytes. Just calling recv worked…but I only receive the total number of the TCP window (1500 bytes) and I need to read the number of bytes that the 4 first one tell me…but that logic (posten 2 replies ago) does not work.

hs2 · July 22, 2020, 1:39pm

You need an inner loop receiving the number of bytes retrieved from the image header.
After complete reception of the image you can go up to the while(1) loop which is resetting the byte counters etc. to start the reception of the next image(header and data).
And you really should add some error handling … later on.