CM3 speed improvement proposal

artlip wrote on Sunday, March 07, 2010:

Hello,

I want to propose improvements for a task switching and for a critical section code.

The base concept is to change portYIELD, portENTER_CRITICAL and portEXIT_CRITICAL into macros which allows compiler to put it as inlined code. To achieve a full potential of the (critical section) inlined code there is also required a small change in the interrupt masking code. A compiler should be allowed to “freely” choose register used for storing constant (mask value). BTW> this improvement part is the same like the improvement request (2709048) from Samuel Tardieu.
The downside of this solution is the increase of the OS code (in case of tests described below it is 376 bytes). The exact size of increase depends on how many FreeRTOS functions (with critical sections) is used and does not depend on the number of API calls used in the application.

I made two tests to check if this code fix will give a real speedup gain. Tests was performed using original FreeRTOS 6.0.3, GCC 4.4.3, compiler optimization -Os, STM32F103 @ 72MHz.

1. The first test was about speed of queue operations. There was configured 2 tasks with the same priority, queue with 1024 entries of 4 bytes long (system tick period at 1ms).
The first task puts into queue as many entries as possible. When queue is full task executes delay for one tick.
The second task receives from queue as many entries as possible. When queue is empty task executes delay for one tick.
There was measured number of successfull xQueueSend and xQueueReceive operations during test period (20s).

2. The second test was about speed of a context switching. There was configured 2 tasks with the same priority, two binary semaphores - one of them intially acquired (system tick period at 1ms).
The first task acquires semaphore number 1 then releases semaphore number 2. The second task acquires semaphore number 2 then releases semaphore number 1.
There was measured number of task loops during test period (20s).

Results:
Test 1 - 5.8% improvement
Test 2 - 15.9% improvement

The first (and obvious) cause of this speedup is elimination of function calls and the second cause comes from the fact that Flash prefetch buffer is not disturbed.

The patches for portmacro.h and port.c:

– portmacro.h.orig    2010-02-19 14:23:46.000000000 +0100
+++ portmacro.h 2010-03-07 20:33:22.057137332 +0100
@@ -93,53 +93,63 @@
#define portBYTE_ALIGNMENT         8
/*---------------------------------------*/

+#define portNVIC_INT_CTRL          ( ( volatile unsigned long *) 0xe000ed04 )
+#define portNVIC_PENDSVSET         0x10000000

/* Scheduler utilities. */
-extern void vPortYieldFromISR( void );
+#define vPortYieldFromISR()                                    \
+   do { /* Set a PendSV to request a context switch. */    \
+       *(portNVIC_INT_CTRL) = portNVIC_PENDSVSET;          \
+   } while (0)

#define portYIELD()                    vPortYieldFromISR()

-#define portEND_SWITCHING_ISR( xSwitchRequired ) if( xSwitchRequired ) vPortYieldFromISR()
/*---------------------------------------*/

/* Critical section management. */

+/*
+ * Set basepri to requested value. Lets compiler to choose clobbered register.
+ */
+static inline void portSetBasePri(uint32_t value)
+{
+   __asm volatile (“msr basepri, %0” : : “r” (value) );
+}
+
/*
- * Set basepri to portMAX_SYSCALL_INTERRUPT_PRIORITY without effecting other
- * registers.  r0 is clobbered.
+ * Set basepri to portMAX_SYSCALL_INTERRUPT_PRIORITY - disable RTOS interrupts.
  */
-#define portSET_INTERRUPT_MASK()                       \
-   __asm volatile                                      \
-   (                                                   \
-       "   mov r0, %0                              \n" \
-       "   msr basepri, r0                         \n" \
-       ::“i”(configMAX_SYSCALL_INTERRUPT_PRIORITY):“r0”    \
-   )

+#define portSET_INTERRUPT_MASK()       portSetBasePri(configMAX_SYSCALL_INTERRUPT_PRIORITY)
+
/*
- * Set basepri back to 0 without effective other registers.
- * r0 is clobbered.
+ * Set basepri back to 0 - enable RTOS interrupts.
  */
-#define portCLEAR_INTERRUPT_MASK()         \
-   __asm volatile                          \
-   (                                       \
-       "   mov r0, #0                  \n" \
-       "   msr basepri, r0             \n" \
-       :::“r0”                             \
-   )
+#define portCLEAR_INTERRUPT_MASK()     portSetBasePri(0)

#define portSET_INTERRUPT_MASK_FROM_ISR()      0;portSET_INTERRUPT_MASK()
-#define portCLEAR_INTERRUPT_MASK_FROM_ISR(x)   portCLEAR_INTERRUPT_MASK();(void)x
+#define portCLEAR_INTERRUPT_MASK_FROM_ISR(x)   portCLEAR_INTERRUPT_MASK()
+
+extern volatile unsigned portBASE_TYPE uxCriticalNesting;

+#define portDISABLE_INTERRUPTS()       portSET_INTERRUPT_MASK()
+#define portENABLE_INTERRUPTS()            portCLEAR_INTERRUPT_MASK()

-extern void vPortEnterCritical( void );
-extern void vPortExitCritical( void );
+#define portENTER_CRITICAL()           \
+   do {                                \
+       portDISABLE_INTERRUPTS();       \
+       uxCriticalNesting++;            \
+   } while (0)
+
+#define portEXIT_CRITICAL()                \
+   do {                                \
+       uxCriticalNesting-;            \
+       if( uxCriticalNesting == 0 )    \
+       {                               \
+           portENABLE_INTERRUPTS();    \
+       }                               \
+   } while (0)

-#define portDISABLE_INTERRUPTS()   portSET_INTERRUPT_MASK()
-#define portENABLE_INTERRUPTS()        portCLEAR_INTERRUPT_MASK()
-#define portENTER_CRITICAL()       vPortEnterCritical()
-#define portEXIT_CRITICAL()            vPortExitCritical()
/*---------------------------------------*/

/* Task function macros as described on the FreeRTOS.org WEB site. */

– port.c.orig 2010-02-19 14:23:46.000000000 +0100
+++ port.c  2010-03-07 20:23:23.197147006 +0100
@@ -69,12 +69,10 @@
/* Constants required to manipulate the NVIC. */
#define portNVIC_SYSTICK_CTRL      ( ( volatile unsigned long *) 0xe000e010 )
#define portNVIC_SYSTICK_LOAD      ( ( volatile unsigned long *) 0xe000e014 )
-#define portNVIC_INT_CTRL          ( ( volatile unsigned long *) 0xe000ed04 )
#define portNVIC_SYSPRI2           ( ( volatile unsigned long *) 0xe000ed20 )
#define portNVIC_SYSTICK_CLK       0x00000004
#define portNVIC_SYSTICK_INT       0x00000002
#define portNVIC_SYSTICK_ENABLE        0x00000001
-#define portNVIC_PENDSVSET         0x10000000
#define portNVIC_PENDSV_PRI            ( ( ( unsigned long ) configKERNEL_INTERRUPT_PRIORITY ) << 16 )
#define portNVIC_SYSTICK_PRI       ( ( ( unsigned long ) configKERNEL_INTERRUPT_PRIORITY ) << 24 )

@@ -87,7 +85,7 @@

/* Each task maintains its own interrupt status in the critical nesting
variable. */
-static unsigned portBASE_TYPE uxCriticalNesting = 0xaaaaaaaa;
+volatile unsigned portBASE_TYPE uxCriticalNesting = 0xaaaaaaaa;

/*
  * Setup the timer to generate the tick interrupts.
@@ -191,30 +189,6 @@
}
/*---------------------------------------*/

-void vPortYieldFromISR( void )
-{
-   /* Set a PendSV to request a context switch. */
-   *(portNVIC_INT_CTRL) = portNVIC_PENDSVSET;
-}
-/*---------------------------------------*/

-void vPortEnterCritical( void )
-{
-   portDISABLE_INTERRUPTS();
-   uxCriticalNesting++;
-}
-/*---------------------------------------*/

-void vPortExitCritical( void )
-{
-   uxCriticalNesting-;
-   if( uxCriticalNesting == 0 )
-   {
-       portENABLE_INTERRUPTS();
-   }
-}
-/*---------------------------------------*/

void xPortPendSVHandler( void )
{
    /* This is a naked function. */

Regards,

Artur Lipowski

edwards3 wrote on Sunday, March 07, 2010:

It would be good to include an option to either inline or not inline, maybe that could be part of FreeRTOSConfig.h? But I don’t think inline is actually part of the official C language until C99.

artlip wrote on Monday, March 08, 2010:

Are you asking for making configurable the “static inline void portSetBasePri”?
I don’t know the official policy for C standard used in the FreeRTOS. I’d rather assume that CM3 is so new CPU that any compiler for it should support inline without problems. Of course my assumption can be wrong and in such case there can be used a plain __asm instead (with fix from Samuel Tardieu) and no configuration is necessary in such case.

Of course offering more choices is a good idea, but I MHO making the whole fix configurable is a little better idea. This will allow to make choice between code size and speed.

Regards,

rtel wrote on Monday, March 08, 2010:

These code size Vs execution speed trade offs are always a matter of opinion and circumstance.  I agree making it configurable would be the best thing, although compilers will often inline short functions in any case once the optimiser is switched on.

With regards to the use of the inline keyword - I don’t have a problem with it as long as it is only in the portable layer, so will only get compiled with a single compiler that can be tested.  Older versions of FreeRTOS had an inline used in tasks.h that caused no end of problems on the low end compilers.

Regards.

artlip wrote on Tuesday, March 09, 2010:

There is created the new feature request https://sourceforge.net/tracker/?func=detail&aid=2966871&group_id=111543&atid=659636 2966871 with patches discussed in this thread.
Patches are modified to make inlining optional through configCM3_PORT_USE_INLINE macro.

Regards,