Bug in Scheduler

eisenkolb wrote on Friday, August 22, 2008:

Hello at all!

My Configuration:
Luminary Micro LM3S6965 with lwIP Stack
4 running Tasks.

I have a problem when sending and receiving Data from Ethernet via lwIP.
At first I receive Data from a server and send it to the next Task.
In this task, i copy the Data and send it back to the Ethernet Task.
And from the Ethernet Task i send i back to the server.

There I have a problem with the timing of the tasks.
I measure the time at server side between sending data until i receive it on server.
This Time is not constant. It is constant rising from 1.2ms to 3.5ms.
When reaching the 3.5ms mark, it starts again with 1.2ms.

Please help me solving the problem! Thank you!


davedoors wrote on Friday, August 22, 2008:

—and the bug in the scheduler is?

eisenkolb wrote on Friday, August 22, 2008:

The bug is that the time between receiving and sending back the Ethernet package ist constantly rising but in my opinion it should be constant at the same value if there is no other action to do for the RTOS.


davedoors wrote on Friday, August 22, 2008:

How about critical sections within the lwIP stack, other interrupts coming in, other network traffic being processed and discarded, the prioritisation of your tasks, all these will introduce some timing variation.  How have you discarded these as sources of the timing?  Have you done any profiling to come to the conclusion that there is a bug?  lwIP is some 5 times larger than FreeRTOS and many times more complex, how can you be sure the bug is not there if one exists?

I’m just trying to get to the bottom of whether you have found a bug or not, you dont provide much information in the way of how you have come to this conclusion, or where you think the bug is, so it is difficult for anybody to start to provide any meaningful suggestions.

eisenkolb wrote on Friday, August 22, 2008:

There is no other interrupt. Only Timer and EMAC interrupt.
The lwIP stack is working correctly.
Before i implemented FreeRTOS, I had a single threaded application, whith the same code as i have now with the only difference, that now the code is executed in a task of FreeRTOS.

Now i have additionally tried to put the whole application in 1 task and the same things happend.

I’m not sure it is a bug in FreeRTOS. Maybe i have a error when executing the task of the ENET.
But as i said, i had the same application single threaded without RTOS and there wos a constant time between receiving and sending the packages.


eisenkolb wrote on Friday, August 22, 2008:

May be i should send you a piece of code from the task to validate my Task??

chaalar wrote on Friday, August 22, 2008:

Do you use UDP or TCP? IMHO, ethernet is one of the worst choices for testing
a systems performance if you are not saturating your resources. On the other hand,
UDP may be better choice than TCP for timing because of the less overhead and
higher determinism.

May be you should check your timing implementations across your single-threaded
and RTOS implementations of lwIP. However, I don’t think this is the problem.

I think your task priorities and interrupts do not optimized for this case. I mean it seems
like scheduler is circularly switching your tasks without any external intervention. For instance,
your ethernet ISR might not be task switching. Am sure of this? No, just a rough guess.


eisenkolb wrote on Friday, August 22, 2008:

I am using UDP.
I don’t want to do a performance test, but i think the time should be nearly constant.

And i don’t understand why the time is running up like a counter.
The Ethernet interrupts i have, also do not interrupt a task. They only set a bit for informing a task that there wos an interrupt and the task continues without being interrupted.

The Ethernet Task has highest priority, so i shouldn’t be interrupted while processing a message UDP packet.

davedoors wrote on Friday, August 22, 2008:

Can you try statically simulating the situation, so you have the same task arrangements and the same sequence of task execution, but without the TCP/IP executing and without the Ethernet cable plugged in (just use dummy data). Also ensure that the task you are measuring the lat time in is the highest priority.

TCP/IP stacks will often execute time related functionality so the TCP/IP stack can use CPU time even if the Ethernet is not generating interrupts. Best make sure this is the lowest priority when doing the static timing.

What happens with the timing in this case?

Do you have the MAC in promiscuous mode?

eisenkolb wrote on Friday, August 22, 2008:

No i have changed my System. I now only start the ethernet task and nothing else.
And if there is incoming UDP data, i send it immediately back to server.
The result is the same.

Is is very strange. It seems like a timer overflow. Is it possible?

eisenkolb wrote on Friday, August 22, 2008:

okay this is possible to try. But i think i will do on Monday.
Yes it is working in promicuous mode.

If i have done, i post the result of it. Thank you at first.

rtel wrote on Friday, August 22, 2008:

From your original post:

> I measure the time at server side between sending data until i receive it on server.

Am I correct in thinking then that you are taking the timing on a remote computer, not on the Luminary Micro device itself?  If this is the case then the time variation could be coming from any number of places [Ethernet is not deterministic in its nature in any case] and the first thing we need to do is narrow down the possibilities.

Do you have Wireshark setup?  If so you should be able to see the timing between a packet being on the network going into the LMI device, and the reply going back out.  What is the variation there?

If the time going into the NIC and coming out of the NIC varies, then can we take some timings on the LMI itself?  A good place to start would be in the Rx and Tx interrupts.  If you have a scope then you can get an accurate measurement by setting and clearing a pin as the packets go in and out.  If there is a variation there too then we need to go further inside.


david_farrell wrote on Friday, August 22, 2008:

Don’t forget ARP is running too.  I have a LM3S8962 running the LCD, reading the SD card with the shared SPI
port, using a quadrature encoder in 6 tasks and multiple queues/semaphores and running raw ethernet (no IP).
By using the capture (1588) feature I can see that the ethernet is deterministic (same capture count better than
1/1000 times) on a point to point link once I turned off ARP on the far end. Due to the SD interface (source of
the data) I am maxing out at 200 - 1024 byte packets per second.  Since you are using IP you probably have
ARP on the near end too.

rtel wrote on Friday, August 22, 2008:

Dave - thanks for your useful input.  Which SD card file system are you using?  I know Luminary provide FatFS - if it is this I would be grateful for a copy of a configuration that used this.

Regards.  [ r __dot barry (at) freertos.org ]

david_farrell wrote on Friday, August 22, 2008:

I started using the ChaN FatFS but I ran in to performance problems. Now I am using
4 raw partitions, one fat to prevent accidental Windows reformatting everything, and
three others a small one with FPGA code, a small one with ARM update code and the
last big one with data.  I would like to add code to FatFS to pre-allocate a list of
clusters, then I don’t see how performance would be any worse than raw.  I also
want to clean up the timeout timer.  The semaphore sharing the SPI between LCD
and SD has worked out well.  I can’t promise anything over the next several weeks
but I will contribute. (probably ARM9XE stuff first).