About macro: listGET_OWNER_OF_NEXT_ENTRY

ny88 wrote on Wednesday, July 18, 2007:

Posiont:

In Function void vTaskSwitchContext( void ), FreeRTOS 4.3.1, at line 1864
: listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB, &( pxReadyTasksLists[ uxTopReadyPriority ] ) ).

Problem:

Because listGET_OWNER_OF_NEXT_ENTRY is a macro by defined, the compilers can rarely optimize its parameters. They only replace them simply. I have tested it as below.

A partial Solution:

change " listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB, &( pxReadyTasksLists[ uxTopReadyPriority ] ) );"
to "
   xList* ppp=&( pxReadyTasksLists[ uxTopReadyPriority ] );
   listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB, ppp );
"

Test:

Condition: ATMega323, WinAVR20070525, AVR Studio4.13 simulation mode Frequence = 4MHz, makefile: OPT = s, configUSE_TRACE_FACILITY == 0.
Result:
   (1) listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB, &( pxReadyTasksLists[ uxTopReadyPriority ] ) ):
   asm:
1420:         listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB, &( pxReadyTasksLists[ uxTopReadyPriority ] ) );
+00000100:   91200079    LDS     R18,0x0079       Load direct from data space
+00000102:   91800079    LDS     R24,0x0079       Load direct from data space
+00000104:   2799        CLR     R25              Clear Register
+00000105:   01FC        MOVW    R30,R24          Copy register pair
+00000106:   E0A3        LDI     R26,0x03         Load immediate
+00000107:   0FEE        LSL     R30              Logical Shift Left
+00000108:   1FFF        ROL     R31              Rotate Left Through Carry
+00000109:   95AA        DEC     R26              Decrement
+0000010A:   F7E1        BRNE    PC-0x03          Branch if not equal
+0000010B:   0FE8        ADD     R30,R24          Add without carry
+0000010C:   1FF9        ADC     R31,R25          Add with carry
+0000010D:   58E1        SUBI    R30,0x81         Subtract immediate
+0000010E:   4FFF        SBCI    R31,0xFF         Subtract immediate with carry
+0000010F:   8001        LDD     R0,Z+1           Load indirect with displacement
+00000110:   81F2        LDD     R31,Z+2          Load indirect with displacement
+00000111:   2DE0        MOV     R30,R0           Copy register
+00000112:   8182        LDD     R24,Z+2          Load indirect with displacement
+00000113:   8193        LDD     R25,Z+3          Load indirect with displacement
+00000114:   2733        CLR     R19              Clear Register
+00000115:   01F9        MOVW    R30,R18          Copy register pair
+00000116:   E073        LDI     R23,0x03         Load immediate
+00000117:   0FEE        LSL     R30              Logical Shift Left
+00000118:   1FFF        ROL     R31              Rotate Left Through Carry
+00000119:   957A        DEC     R23              Decrement
+0000011A:   F7E1        BRNE    PC-0x03          Branch if not equal
+0000011B:   0FE2        ADD     R30,R18          Add without carry
+0000011C:   1FF3        ADC     R31,R19          Add with carry
+0000011D:   58E1        SUBI    R30,0x81         Subtract immediate
+0000011E:   4FFF        SBCI    R31,0xFF         Subtract immediate with carry
+0000011F:   8392        STD     Z+2,R25          Store indirect with displacement
+00000120:   8381        STD     Z+1,R24          Store indirect with displacement
+00000121:   91800079    LDS     R24,0x0079       Load direct from data space
+00000123:   91200079    LDS     R18,0x0079       Load direct from data space
+00000125:   9F24        MUL     R18,R20          Multiply unsigned
+00000126:   0190        MOVW    R18,R0           Copy register pair
+00000127:   2411        CLR     R1               Clear Register
+00000128:   572E        SUBI    R18,0x7E         Subtract immediate
+00000129:   4F3F        SBCI    R19,0xFF         Subtract immediate with carry
+0000012A:   2799        CLR     R25              Clear Register
+0000012B:   01FC        MOVW    R30,R24          Copy register pair
+0000012C:   E063        LDI     R22,0x03         Load immediate
+0000012D:   0FEE        LSL     R30              Logical Shift Left
+0000012E:   1FFF        ROL     R31              Rotate Left Through Carry
+0000012F:   956A        DEC     R22              Decrement
+00000130:   F7E1        BRNE    PC-0x03          Branch if not equal
+00000131:   0FE8        ADD     R30,R24          Add without carry
+00000132:   1FF9        ADC     R31,R25          Add with carry
+00000133:   58E1        SUBI    R30,0x81         Subtract immediate
+00000134:   4FFF        SBCI    R31,0xFF         Subtract immediate with carry
+00000135:   8181        LDD     R24,Z+1          Load indirect with displacement
+00000136:   8192        LDD     R25,Z+2          Load indirect with displacement
+00000137:   1782        CP      R24,R18          Compare
+00000138:   0793        CPC     R25,R19          Compare with carry
+00000139:   F509        BRNE    PC+0x22          Branch if not equal
+0000013A:   91200079    LDS     R18,0x0079       Load direct from data space
+0000013C:   91800079    LDS     R24,0x0079       Load direct from data space
+0000013E:   2799        CLR     R25              Clear Register
+0000013F:   01FC        MOVW    R30,R24          Copy register pair
+00000140:   E053        LDI     R21,0x03         Load immediate
+00000141:   0FEE        LSL     R30              Logical Shift Left
+00000142:   1FFF        ROL     R31              Rotate Left Through Carry
+00000143:   955A        DEC     R21              Decrement
+00000144:   F7E1        BRNE    PC-0x03          Branch if not equal
+00000145:   0FE8        ADD     R30,R24          Add without carry
+00000146:   1FF9        ADC     R31,R25          Add with carry
+00000147:   58E1        SUBI    R30,0x81         Subtract immediate
+00000148:   4FFF        SBCI    R31,0xFF         Subtract immediate with carry
+00000149:   8001        LDD     R0,Z+1           Load indirect with displacement
+0000014A:   81F2        LDD     R31,Z+2          Load indirect with displacement
+0000014B:   2DE0        MOV     R30,R0           Copy register
+0000014C:   8182        LDD     R24,Z+2          Load indirect with displacement
+0000014D:   8193        LDD     R25,Z+3          Load indirect with displacement
+0000014E:   2733        CLR     R19              Clear Register
+0000014F:   01F9        MOVW    R30,R18          Copy register pair
+00000150:   E043        LDI     R20,0x03         Load immediate
+00000151:   0FEE        LSL     R30              Logical Shift Left
+00000152:   1FFF        ROL     R31              Rotate Left Through Carry
+00000153:   954A        DEC     R20              Decrement
+00000154:   F7E1        BRNE    PC-0x03          Branch if not equal
+00000155:   0FE2        ADD     R30,R18          Add without carry
+00000156:   1FF3        ADC     R31,R19          Add with carry
+00000157:   58E1        SUBI    R30,0x81         Subtract immediate
+00000158:   4FFF        SBCI    R31,0xFF         Subtract immediate with carry
+00000159:   8392        STD     Z+2,R25          Store indirect with displacement
+0000015A:   8381        STD     Z+1,R24          Store indirect with displacement
+0000015B:   91800079    LDS     R24,0x0079       Load direct from data space
+0000015D:   2799        CLR     R25              Clear Register
+0000015E:   01FC        MOVW    R30,R24          Copy register pair
+0000015F:   E023        LDI     R18,0x03         Load immediate
+00000160:   0FEE        LSL     R30              Logical Shift Left
+00000161:   1FFF        ROL     R31              Rotate Left Through Carry
+00000162:   952A        DEC     R18              Decrement
+00000163:   F7E1        BRNE    PC-0x03          Branch if not equal
+00000164:   0FE8        ADD     R30,R24          Add without carry
+00000165:   1FF9        ADC     R31,R25          Add with carry
+00000166:   58E1        SUBI    R30,0x81         Subtract immediate
+00000167:   4FFF        SBCI    R31,0xFF         Subtract immediate with carry
+00000168:   8001        LDD     R0,Z+1           Load indirect with displacement
+00000169:   81F2        LDD     R31,Z+2          Load indirect with displacement
+0000016A:   2DE0        MOV     R30,R0           Copy register
+0000016B:   8186        LDD     R24,Z+6          Load indirect with displacement
+0000016C:   8197        LDD     R25,Z+7          Load indirect with displacement
+0000016D:   93900073    STS     0x0073,R25       Store direct to data space
+0000016F:   93800072    STS     0x0072,R24       Store direct to data space
+00000171:   9508        RET                      Subroutine return
   time: ~= 49us

    (2) xList* ppp=&( pxReadyTasksLists[ uxTopReadyPriority ] );
        listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB, ppp );

1422:         xList* ppp=&( pxReadyTasksLists[ uxTopReadyPriority ] );
+00000102:   91800079    LDS     R24,0x0079       Load direct from data space
+00000104:   9F89        MUL     R24,R25          Multiply unsigned
+00000105:   01D0        MOVW    R26,R0           Copy register pair
+00000106:   2411        CLR     R1               Clear Register
+00000107:   58A1        SUBI    R26,0x81         Subtract immediate
+00000108:   4FBF        SBCI    R27,0xFF         Subtract immediate with carry
1423:         listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB, ppp );
+00000109:   01ED        MOVW    R28,R26          Copy register pair
+0000010A:   81E9        LDD     R30,Y+1          Load indirect with displacement
+0000010B:   81FA        LDD     R31,Y+2          Load indirect with displacement
+0000010C:   8002        LDD     R0,Z+2           Load indirect with displacement
+0000010D:   81F3        LDD     R31,Z+3          Load indirect with displacement
+0000010E:   2DE0        MOV     R30,R0           Copy register
+0000010F:   83FA        STD     Y+2,R31          Store indirect with displacement
+00000110:   83E9        STD     Y+1,R30          Store indirect with displacement
+00000111:   01CD        MOVW    R24,R26          Copy register pair
+00000112:   9603        ADIW    R24,0x03         Add immediate to word
+00000113:   17E8        CP      R30,R24          Compare
+00000114:   07F9        CPC     R31,R25          Compare with carry
+00000115:   F421        BRNE    PC+0x05          Branch if not equal
+00000116:   8182        LDD     R24,Z+2          Load indirect with displacement
+00000117:   8193        LDD     R25,Z+3          Load indirect with displacement
+00000118:   839A        STD     Y+2,R25          Store indirect with displacement
+00000119:   8389        STD     Y+1,R24          Store indirect with displacement
+0000011A:   01ED        MOVW    R28,R26          Copy register pair
+0000011B:   81E9        LDD     R30,Y+1          Load indirect with displacement
+0000011C:   81FA        LDD     R31,Y+2          Load indirect with displacement
+0000011D:   8186        LDD     R24,Z+6          Load indirect with displacement
+0000011E:   8197        LDD     R25,Z+7          Load indirect with displacement
+0000011F:   93900073    STS     0x0073,R25       Store direct to data space
+00000121:   93800072    STS     0x0072,R24       Store direct to data space
+00000123:   91DF        POP     R29              Pop register from stack
+00000124:   91CF        POP     R28              Pop register from stack
+00000125:   9508        RET                      Subroutine return
   time: ~= 14us

End:

Be careful to use a function defined by #define. "Maybe" to implement it with a true function is better in this case. There are some other places like this flaws, please correct it in next version.

B.R.

davedoors wrote on Wednesday, July 18, 2007:

Very good point, I agree.  But in part it seems you have a very week optimizer.  Here is the ARM IAR equivalent with and without optimization.

  Old method no optimization.

  000018E0  E59F0638  LDR          R0, [PC, #+1592]  
  000018E4  E5900000  LDR          R0, [R0, #+0]
  000018E8  E3A0101C  MOV          R1, #0x1C
  000018EC  E59F264C  LDR          R2, [PC, #+1612]  
  000018F0  E0202091  MLA          R0, R1, R0, R2
  000018F4  E59F1624  LDR          R1, [PC, #+1572]  
  000018F8  E5911000  LDR          R1, [R1, #+0]
  000018FC  E3A0201C  MOV          R2, #0x1C
  00001900  E59F3638  LDR          R3, [PC, #+1592]  
  00001904  E0213192  MLA          R1, R2, R1, R3
  00001908  E5911004  LDR          R1, [R1, #+4]
  0000190C  E5911004  LDR          R1, [R1, #+4]
  00001910  E5801004  STR          R1, [R0, #+4]
  00001914  E59F0604  LDR          R0, [PC, #+1540]  
  00001918  E5900000  LDR          R0, [R0, #+0]
  0000191C  E3A0101C  MOV          R1, #0x1C
  00001920  E59F2618  LDR          R2, [PC, #+1560]  
  00001924  E0202091  MLA          R0, R1, R0, R2
  00001928  E5900004  LDR          R0, [R0, #+4]
  0000192C  E59F15EC  LDR          R1, [PC, #+1516]  
  00001930  E5911000  LDR          R1, [R1, #+0]
  00001934  E3A0201C  MOV          R2, #0x1C
  00001938  E59F3600  LDR          R3, [PC, #+1536]  
  0000193C  E0213192  MLA          R1, R2, R1, R3
  00001940  E2911008  ADDS         R1, R1, #0x8
  00001944  E1500001  CMP          R0, R1
  00001948  1A00000C  BNE          0x001980
  0000194C  E59F05CC  LDR          R0, [PC, #+1484]  
  00001950  E5900000  LDR          R0, [R0, #+0]
  00001954  E3A0101C  MOV          R1, #0x1C
  00001958  E59F25E0  LDR          R2, [PC, #+1504]  
  0000195C  E0202091  MLA          R0, R1, R0, R2
  00001960  E59F15B8  LDR          R1, [PC, #+1464]  
  00001964  E5911000  LDR          R1, [R1, #+0]
  00001968  E3A0201C  MOV          R2, #0x1C
  0000196C  E59F35CC  LDR          R3, [PC, #+1484]  
  00001970  E0213192  MLA          R1, R2, R1, R3
  00001974  E5911004  LDR          R1, [R1, #+4]
  00001978  E5911004  LDR          R1, [R1, #+4]
  0000197C  E5801004  STR          R1, [R0, #+4]
  00001980  E59F0170  LDR          R0, [PC, #+368]   
  00001984  E59F1594  LDR          R1, [PC, #+1428]  
  00001988  E5911000  LDR          R1, [R1, #+0]
  0000198C  E3A0201C  MOV          R2, #0x1C
  00001990  E59F35A8  LDR          R3, [PC, #+1448]  
  00001994  E0213192  MLA          R1, R2, R1, R3
  00001998  E5911004  LDR          R1, [R1, #+4]
  0000199C  E591100C  LDR          R1, [R1, #+12]
  000019A0  E5801000  STR          R1, [R0, #+0]
  000019A4  E59F014C  LDR          R0, [PC, #+332]   
  000019A8  E5900000  LDR          R0, [R0, #+0]
  000019AC  E5900004  LDR          R0, [R0, #+4]
  000019B0  E59F1140  LDR          R1, [PC, #+320]   

  Old method with optimization.

  00001558  E5902000  LDR          R2, [R0, #+0]
  0000155C  E0221293  MLA          R2, R3, R2, R1
  00001560  E5903000  LDR          R3, [R0, #+0]
  00001564  E3A0C01C  MOV          R12, #0x1C
  00001568  E023139C  MLA          R3, R12, R3, R1
  0000156C  E5933004  LDR          R3, [R3, #+4]
  00001570  E5933004  LDR          R3, [R3, #+4]
  00001574  E5823004  STR          R3, [R2, #+4]
  00001578  E5902000  LDR          R2, [R0, #+0]
  0000157C  E1A0300C  MOV          R3, R12
  00001580  E0221293  MLA          R2, R3, R2, R1
  00001584  E5922004  LDR          R2, [R2, #+4]
  00001588  E5903000  LDR          R3, [R0, #+0]
  0000158C  E023139C  MLA          R3, R12, R3, R1
  00001590  E2833008  ADD          R3, R3, #0x8
  00001594  E1520003  CMP          R2, R3
  00001598  1A000007  BNE          0x0015BC
  0000159C  E5902000  LDR          R2, [R0, #+0]
  000015A0  E1A0300C  MOV          R3, R12
  000015A4  E0221293  MLA          R2, R3, R2, R1
  000015A8  E5903000  LDR          R3, [R0, #+0]
  000015AC  E023139C  MLA          R3, R12, R3, R1
  000015B0  E5933004  LDR          R3, [R3, #+4]
  000015B4  E5933004  LDR          R3, [R3, #+4]
  000015B8  E5823004  STR          R3, [R2, #+4]
  000015BC  E59F34B0  LDR          R3, [PC, #+1200]        
  000015C0  E5900000  LDR          R0, [R0, #+0]
  000015C4  E1A0200C  MOV          R2, R12
  000015C8  E0201092  MLA          R0, R2, R0, R1
  000015CC  E5900004  LDR          R0, [R0, #+4]
  000015D0  E590000C  LDR          R0, [R0, #+12]
  000015D4  E5830000  STR          R0, [R3, #+0]

  New method no optimisation:

  000018E0  E59F05C4  LDR          R0, [PC, #+1476]
  000018E4  E5900000  LDR          R0, [R0, #+0]
  000018E8  E3A0101C  MOV          R1, #0x1C
  000018EC  E59F25D8  LDR          R2, [PC, #+1496]
  000018F0  E0202091  MLA          R0, R1, R0, R2
  000018F4  E1B04000  MOVS         R4, R0
  000018F8  E5940004  LDR          R0, [R4, #+4]
  000018FC  E5900004  LDR          R0, [R0, #+4]
  00001900  E5840004  STR          R0, [R4, #+4]
  00001904  E5940004  LDR          R0, [R4, #+4]
  00001908  E2941008  ADDS         R1, R4, #0x8
  0000190C  E1500001  CMP          R0, R1
  00001910  1A000002  BNE          0x001920
  00001914  E5940004  LDR          R0, [R4, #+4]
  00001918  E5900004  LDR          R0, [R0, #+4]
  0000191C  E5840004  STR          R0, [R4, #+4]
  00001920  E59F02F4  LDR          R0, [PC, #+756]
  00001924  E5941004  LDR          R1, [R4, #+4]
  00001928  E591100C  LDR          R1, [R1, #+12]
  0000192C  E5801000  STR          R1, [R0, #+0]
  00001930  E59F02E4  LDR          R0, [PC, #+740]
  00001934  E5900000  LDR          R0, [R0, #+0]
  00001938  E5900004  LDR          R0, [R0, #+4]
  0000193C  E59F12D8  LDR          R1, [PC, #+728]
  00001940  E5911000  LDR          R1, [R1, #+0]

  New method no optimisation:

  00001550  E5900000  LDR          R0, [R0, #+0]
  00001554  E1A02003  MOV          R2, R3
  00001558  E0201092  MLA          R0, R2, R0, R1
  0000155C  E5901004  LDR          R1, [R0, #+4]
  00001560  E5911004  LDR          R1, [R1, #+4]
  00001564  E5801004  STR          R1, [R0, #+4]
  00001568  E2802008  ADD          R2, R0, #0x8
  0000156C  E1510002  CMP          R1, R2
  00001570  1A000001  BNE          0x00157C
  00001574  E5911004  LDR          R1, [R1, #+4]
  00001578  E5801004  STR          R1, [R0, #+4]
  0000157C  E59F34A4  LDR          R3, [PC, #+1188]
  00001580  E5900004  LDR          R0, [R0, #+4]
  00001584  E590000C  LDR          R0, [R0, #+12]
  00001588  E5830000  STR          R0, [R3, #+0]

With the optimizer the difference is no where near as marked as the AVR code you supplied.

rtel wrote on Friday, July 27, 2007:

I have updated the macro in SVN, and will include this in the next release (within the next few days).  This seems to be the only place where this can be achieved without actually increasing the compiled code size.

In the change history within the file I have credited the change to B.R.  Let me know your full name if you want it included in the project history file.

Thanks for your contribution.

Regards.

ny88 wrote on Monday, July 30, 2007:

I feel highly honoured that my effort can be acknowledged.
My name : Niu Yong
          Tianjin University
          a Chinese man

adarkar9 wrote on Friday, August 03, 2007:

Ouch!  I just updated to V4.4.0 and got this "fix".  Not only does it increase my compiled code size, but it also breaks the use of this macro in vTaskSwitchContext() for my compiler.

I am using Keil’s C51 compiler on a derivative of the Cygnal 8051 port.  C51 uses a compiled stack to compensate for the lack of stack space in the 8051.  Prior to this change, vTaskSwitchContext() did not use any compiled stack space and was, therefore, intrinsically reentrant.  This change causes vTaskSwitchContext() to use three bytes of the compiled stack for the new pointer copy, causing the function to no longer be reentrant.

I am forced to either add the compiler’s inefficient “reentrant” attribute to the function (which would move the three bytes to a software run-time stack) or revert back to the previous version of the macro.  I dislike both options.

Clearly, you can’t please everyone (or every compiler) all of the time.

rtel wrote on Friday, August 03, 2007:

Thats a bit of a bugger.  I tried it on several ports and in each case the code size was smaller.  I’m surprised this is not the case for Keil, where they make lots of claims on their code efficiency.  Is the same true when you turn optimization on?

Regards.

p.s. Its not a ‘fix’ but an ‘improvement’, not in your case though :frowning:

rtel wrote on Friday, August 03, 2007:

Does vTaskSwitchContext() need to be reentrant?  It is only used with interrupts disabled.

Regards.

adarkar9 wrote on Monday, August 06, 2007:

Good point.  As long as vTaskSwitchContext() is only called with interrupts disabled, it does not need to be tagged as reentrant.  Currently vTaskSwitchContext() is called from the timer tick ISR and vPortYield().  I don’t imagine that will change, so I could just remove the function from the overlay analysis.  I like that this would only affect the makefile and not the source code.

P.S.  Optimization was at maximum.

adarkar9 wrote on Monday, August 06, 2007:

Update: I removed vTaskSwitchContext() from the overlay analysis and the resultant image uses one more byte of code space and three more bytes of RAM.  I can live with that.

Thanks for the suggestion Richard!