FreeRTOS stats

rtel wrote on Thursday, February 26, 2009:

With respect to the post just made titled FreeRTOS performance - please note that this post is in breach of the license terms and has been removed until such time that I am able to review it (I have not even read it yet).  I may then re-post the original.

Regards.

aturowski wrote on Thursday, February 26, 2009:

OK. Could you please tell me, how I’ve breached the licence?

Best regards,
Adam

rtel wrote on Thursday, February 26, 2009:

Check out the very last paragraph here:

I’m not against this out right, I just need to ensure accuracy and context.  I have had problems in the past where people have for their own commercial reasons deliberately miss-represented results (even incorrectly listed features, doh).  After this I learned that RTOS vendors use these clauses to protect themselves against exactly that.  FreeRTOS is portable across many many architectures and therefore written in C - it could be tailored to each individual architecture and highly tuned, but this would not be practical or desirable for FreeRTOS.  Also the primary design goal is to keep the code size to a minimum, which again means code is not always optimised for speed.

Regards.

aturowski wrote on Thursday, February 26, 2009:

I see. Thank you for the explanation. I understand your concern and I am sorry that I gave you some more unnecessary work with cleaning the forum.

Can you tell me, where can I find some accurate data regarding FreeRTOS performance - especially running on ARM7?

And of course feel free to use my measurement data if you decide that they are correct.

Best regards,
Adam

rtel wrote on Thursday, February 26, 2009:

Looking at a couple of figures:

Your figures:  Binary semaphore:
xSemaphoreTake          25.12us (301 instructions)
xSemaphoreGive          4.8us   (58 instructions)

What is curious is the huge difference in the results - when the functions are basically mirrors of each other.

I have just tried xSemaphoreTake() on an LPC2000 running at 60MHz BUT WITH ZERO OPTIMISATION and got a result of 4.33us.  This is for the case where the semaphore is successfully obtained (the running more code than if the semaphore is not available) and where a block time is not specified.  This is the only case that can really be measured because if a block time is specified and the task blocks its impossible to say what is actually being measured.

Regards.

aturowski wrote on Thursday, February 26, 2009:

Yes, it is curious. Please find the piece of code attached.

// test binary semaphore
      AT91C_BASE_PIOA ->PIO_SODR = TEST1;
      xSemaphoreTake(xTestBinSem,1);
      AT91C_BASE_PIOA ->PIO_CODR = TEST1;

      for(i=0; i < 10; i++)
      {
        a=a+i;
        asm volatile("NOP \n\t");
      }

      AT91C_BASE_PIOA ->PIO_SODR = TEST1;
      xSemaphoreGive(xTestBinSem);
      AT91C_BASE_PIOA ->PIO_CODR = TEST1;

      for(i=0; i < 100; i++)   // some delay to ease oscilloscope measurement
      {
        a=a+i;
        asm volatile("NOP \n\t");
      }

There is no other task that can take the semaphore, so it looks OK. This task is infinite loop running sequentially tests. At the end of the loop there is vTaskDelay(2); call to visually separate results on the scope.

Can you check xSemaphoreGive() function on your micro and provide me the result?
Best regards,
Adam

aturowski wrote on Friday, February 27, 2009:

When I’ve changed binary semaphore call to
xSemaphoreTake(xTestBinSem,0);

I’ve aquired result similar to yours - which is good and expected. And now XSemaphoreTake and xSemaphoreGive execution time is almost the same - again it is good and expected.

Still I don’t understand why when I change waiting time to 1 xSemaphoreTake execution time increases rapidly leaving xSemaphoreGive execution time not altered. I’ve double checked, but there is no other task which can take or is blocked on this testing semaphore. Any clue?

Regards,
Adam

aturowski wrote on Friday, February 27, 2009:

I’ve also double check that in both cases semaphore take is successful.

Can you please check xSemaphoreTake execution time on your platform, when wait time is more than 0, but no other task can aquire this semaphore? That way we can be sure, that semaphore will be always available.

aturowski wrote on Friday, February 27, 2009:

I’ve just found out another thing: xQueueGenericReceive main loop has two parts. First part is not executing when waiting time is 0. When waiting time is not 0, this part is executing and it takes a lot of time.

First part I mean:
if( xTicksToWait > ( portTickType ) 0 )
{
  vTaskSuspendAll();
  prvLockQueue( pxQueue );



    else
    {
      prvUnlockQueue( pxQueue );
      ( void ) xTaskResumeAll();
    }
}

My results are:
wait time = 0 : 368ns first part, 5.4us the rest of xQueueGenericReceive
wait time > 0 : 19.2us first part, 5.4us the rest of xQueueGenericReceive

It explains why changing wait time from 0 to value greater than zero rapidly increases semaphore take or queue receive execution time even if semaphore/queue is always available and calling task never blocks.

Richard, can you comment on that?

rtel wrote on Friday, February 27, 2009:

These functions are actually the most complex part of the whole system as they have to account for every eventuality and possible sequence of events.  This is actually where some RTOSes will silently fail to meet their requirements where there are multiple tasks waiting to send or receive and the tasks have different priorities, and interrupts can access the queues - the scenarios get quite complex (for example, what happens if an interrupt removes an item from a queue after a task has already been ublocked to read the same item?).  The FreeRTOS code attempts to work correctly in all cases (the SafeRTOS testing helps there) while minimising the use of disabling interrupts (again some RTOSes will just disable interrupts for the function, in which case the complexity diminishes).  The function could be made faster by making it longer (more code) and/or greater use of critical sections.  For example, you could ignore the possibility of blocking until after recognising that blocking was required, but that would mean performing that whole operation in a critical section.

Regards.

aturowski wrote on Friday, February 27, 2009:

Thank you for your explanation.

I thought that using alternative API will degrade interrupt responce time (larger critical sections) but decrease system functions execution time.

From my measurement results it seem that this is not the case. Any clue?