hannes23 wrote on Wednesday, June 20, 2018:
Hello,
herewith I’d like to share my experience with improving the TCP communication.
In my case I could get more than 20 % gain in TCP speed at 1000 Mbps.
Following things are necessay:
1st: Re-map the OCM (on-chip-memory) from bottom to top of address-space.
2nd: Force the linker to place the ucNetworkPackets - buffers into the OCM space.
The remapping is done by a macro calling some certain assembler code directly
after starting in main().
My code is :
int main( void )
{
xil_printf( “Hello from FreeRTOS main\r\n” );
configASSERT( configUSE_TASK_FPU_SUPPORT == 2 );
xil_printf( “configUSE_TASK_FPU_SUPPORT (FreeRTOS.h) is set to %d\r\n”, configUSE_TASK_FPU_SUPPORT );
// Remap all 4 64KB blocks of OCM to top of memory and enable DDR address filtering
MY_REMAP();
...
...
The configUSE_TASK_FPU_SUPPORT - part could of course be omitted if not used.
The code (found somewhere in the Xilinx forum) for the MY_REMAP() define is:
#define MY_REMAP() asm volatile(
“mov r5, #0x03 \n”
“mov r6, #0 \n”
“LDR r7, =0xF8000000 /* SLCR base address / \n"
"LDR r8, =0xF8F00000 / MPCORE base address / \n"
"LDR r9, =0x0000767B / SLCR lock key / \n"
“mov r10,#0x1F \n”
"LDR r11,=0x0000DF0D / SLCR unlock key \n”
“dsb \n”
“isb /* make sure it completes / \n"
"pli do_remap / preload the instruction cache / \n"
“pli do_remap+32 \n”
“pli do_remap+64 \n”
“pli do_remap+96 \n”
“pli do_remap+128 \n”
“pli do_remap+160 \n”
“pli do_remap+192 \n”
"isb / make sure it completes / \n"
“b do_remap \n”
".align 5, 0xFF / forces the next block to a cache line alignment / \n"
"do_remap: / Unlock SLCR / \n"
"str r11, [r7, #0x8] / Configuring OCM remap value / \n"
"str r10, [r7, #0x910] / Lock SLCR / \n"
"str r9, [r7, #0x4] / Disable SCU & address filtering / \n"
"str r6, [r8, #0x0] / Set filter start addr to 0x00000000 / \n"
"str r6, [r8, #0x40] / Enable SCU & address filtering */ \n”
“str r5, [r8, #0x0] \n”
“dmb \n”
);
Next step is to create a memory section “.ocm” by changing the linker-desciption file.
Following changes are to be done:
In the memory section add:
ps7_ocm : ORIGIN = 0xfffc0000, LENGTH = 0x3fe00
In the section description add:
.ocm (NOLOAD) : {
__ocm_start = .;
*(.ocm)
__ocm_end = .;
} > ps7_ocm
Final step is to inform the buffer definition, that the buffers should be placed into the ocm.
In file: NetworkInterface.c add the ocm-section attribute. It then should be:
static uint8_t ucNetworkPackets[ ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS * niBUFFER_1_PACKET_SIZE ] attribute ( ( aligned( 32
) ) ) attribute ((section (".ocm")));
After compile and link one could inspect the map-file to see, if the ocm section is successful generated and populated.
It looks like:
.ocm 0x00000000fffc0000 0x30000
0x00000000fffc0000 __ocm_start = .
*(.ocm)
.ocm 0x00000000fffc0000 0x30000 ./src/Ethernet/FreeRTOS-Plus-TCP/portable/NetworkInterface/Zynq/NetworkInterface.o
0x00000000ffff0000 __ocm_end = .
All hints and changes are of course without my responsibility and warrenty
If anybody has other or additional changes or hints to improve the speed in TCP communication let me please know.
Especially someone could comment, if it makes sense to push other buffers or variables into the ocm.
Greetings to all.