+TCP BufferAllocation_1 xFreeBuffersList corruption

joehinkle wrote on Thursday, August 11, 2016:

I have a web server that is transferring jquery.js (256000+ bytes) to the web browser.

My MTU is 700 bytes.

After a random number of packets being sent I get a cpu “HardFault”

Within BufferAllocation_1.c there is a function pxGetNetworkBufferWithDescriptor();

pxReturn is assigned a value from listGET_OWNER_OF_HEAD_ENTRY

The pxReturn get a value of 0xa5a5a5a5; bIsValidNetworkDescriptor() uses that pointer and causes the fault.

Looking at the xFreeBuffersList node in question, both pvOwnser and pvContainer are set to 0xa5a5a5a5.

I enable configUSE_LIST_DATA_INTEGRITY_CHECK_BYTES to check the list and try to detect when the corruption.

The stack calls TCPSendRepeated() which delivers the network buffer to my xNetworkInterfaceOutput().

I can’t see how my xNetworkInterfaceOutput is corrupting it.

BaseType_t xNetworkInterfaceOutput( NetworkBufferDescriptor_t * const pxNetworkBuffer, BaseType_t xReleaseAfterSend )
{
    byte *B;
	BaseType_t r;
	
	
    if(LinkStatus == 0)	// link down;
	{
			goto bye;
	}
		
	xSemaphoreTake(Xmit_GateSemaphore, portMAX_DELAY);		// only let one in
			
    XmitRingDescr[0].Status |= ETX_R;
    B = pxNetworkBuffer->pucEthernetBuffer;
    B -= 2;
	B = (byte*)FlipEndianDW((dword)(B));
    XmitRingDescr[0].BuffPtr = B;
	XmitRingDescr[0].EnhancedStatus1 = E1TX_PINS | E1TX_IINS | E1TX_INT;		// set PINS and IINS --- insert Protocol and IP checksum

    XmitRingDescr[0].DataLength = FlipEndianW(pxNetworkBuffer->xDataLength + 2);    // need to include the 2 extra byte for alignment

    ENET_TDAR = ENET_TDAR_TDAR_MASK;	// tell mac one ring full - xmit it
	

	r = xSemaphoreTake(Xmit_ReleaseSemaphore, 5);		// we can't let go unto buff can be released or TCP port issues
	
	xSemaphoreGive(Xmit_GateSemaphore);	
	
bye:	
	vReleaseNetworkBufferAndDescriptor( pxNetworkBuffer );
	
    return pdTRUE;
}


The Xmit_ReleaseSemaphore is used in the XMIT_ISR

Since I’m using BufferAllocation_1, I look at all the networkbuffer to see if any over-run occurred but found none.

I find it interesting that the corrupt values are multiple 0xa5.

Not sure how to debug this. If you have any suggestion please let me know.

Thanks.

Joe

rtel wrote on Friday, August 12, 2016:

0xa5a5a5a5 is what tthe stack is filled with when the task is created, so seeing 0xa5a5a5a5 in variables is an indication that the stack pointer has been corrupted, or something else has corrupted the stack.

Do you have stack overflow checking set to two? http://www.freertos.org/Stacks-and-stack-overflow-checking.html

heinbali01 wrote on Friday, August 12, 2016:

What you need to add to your ‘xNetworkInterfaceOutput()’ function is the following:

  bye:
+    if( xReleaseAfterSend != pdFALSE )
+    {
           vReleaseNetworkBufferAndDescriptor( pxNetworkBuffer );
+    }

A network buffer may only be released when xReleaseAfterSend == true.

Explanation:

In some cases xNetworkInterfaceOutput() will be called with a fake ‘pxNetworkBuffer’. It is fake in a sense that it does not belong to the pool of real Network Buffers. It is declared on stack and its buffer points to a space in the socket data.

Now if you try to release it, the administration of buffers will get corrupted. It is logical to see a hardfault later on.

Earlier you wrote in an email:

All my issues had to do with the stack’s call to my xmit driver –
xNetworkInterfaceOutput

I found you can NOT return from this function without having the
network buffer released!!!

I don’t think that is true. As long as xReleaseAfterSend it true, you may keep the Network Buffer and use it longer.

If xReleaseAfterSend is false, the buffer will be used and changed by the IP stack. That is what you noticed earlier: the port numbers got changed by the IP stack and you would send the wrong data.

Also from your email:

My current code uses two binary semaphores to control access to and
from xNetworkInterfaceOutput.

I probable don’t need the first one as only the stack is calling it …
but I don’t know how many stack thread there are that MAY call it.

That is true: the IP-stack is a single task. I would give us nightmares if there were multiple IP-tasks.
So yes, ‘xNetworkInterfaceOutput()’ may contain non-reentrant code, it will never be called before it has returned.

And to be sure: the return value is ignored by the stack.

One last remark about the IP-task: we chose to have a single task doing all the work. We made a clear distinction between data structures owned by the stack, and other data that is accessible by the API-code, which runs from an application. This approach needs less locking mechanism.