Cellular Interface - Cellular_SocketSend function seldom does not send the payload

Hello All,

I am having an issue with the library where Socket Send function works as intended for a while but then once in a while it does not send the payload it is supposed to send and times out.
Subsequently the modem interpret the following messages as payload.

We are using Telit Cinterion PLS63 modem.

The way the write command works is the following:

We send the command and length and receive the length we are able to send as a response or we get CME Error to indicate sending is not available right now (socket closed etc.).

Here are the implementations for parsing and send functions:

static CellularPktStatus_t _Cellular_SocketSendDataPrefix( void * pCallbackContext,
                                                 char * pLine,
                                                 uint32_t * pBytesRead )
{
    if( ( pLine == NULL ) || ( pBytesRead == NULL ) ) {
        return CELLULAR_PKT_STATUS_BAD_PARAM;
    }

    // Clip anything that comes after ^SISW 
    const size_t clippingPoint = sizeof("^SISW")-1;
    pLine[ clippingPoint  ] = '\n';
    *pBytesRead = clippingPoint+1;  
    return CELLULAR_PKT_STATUS_OK;
}

CellularError_t Cellular_SocketSend( CellularHandle_t cellularHandle,
                                     CellularSocketHandle_t socketHandle,
                                     const uint8_t * pData,
                                     uint32_t dataLength,
                                     // coverity[misra_c_2012_rule_8_13_violation] 
                                     uint32_t * pSentDataLength )
{
    CellularContext_t * pContext = ( CellularContext_t * ) cellularHandle;

    CellularError_t cellularStatus = _Cellular_CheckLibraryStatus( pContext );
    if( cellularStatus != CELLULAR_SUCCESS ) {
        return cellularStatus;
    } 
    
    if( socketHandle == NULL ) {
        return CELLULAR_INVALID_HANDLE;
    }

    switch( socketHandle->socketState ) {
        case SOCKETSTATE_CONNECTED: {
            break;
        }
        case SOCKETSTATE_ALLOCATED: // Intended fallthrough
        case SOCKETSTATE_CONNECTING: {
            return CELLULAR_SOCKET_NOT_CONNECTED;
        }
        case SOCKETSTATE_DISCONNECTED: {
            return CELLULAR_SOCKET_CLOSED;
        }
        default: {
            return CELLULAR_INTERNAL_FAILURE;
        }
    }

    if( ( pData == NULL ) || ( pSentDataLength == NULL ) || ( dataLength == 0U ) ) {
        return CELLULAR_BAD_PARAMETER;
    }

    if( dataLength > ( uint32_t ) CELLULAR_MAX_SEND_DATA_LEN ) {
        dataLength = ( uint32_t ) CELLULAR_MAX_SEND_DATA_LEN;
    }

    char cmdBuf[ CELLULAR_AT_CMD_MAX_SIZE ] = { '\0' };
    CellularAtReq_t atReqSocketSend = {
        .pAtCmd = cmdBuf,
        .atCmdType = CELLULAR_AT_NO_RESULT
    };
    if ( 0 > snprintf( cmdBuf, 
                       CELLULAR_AT_CMD_MAX_SIZE, 
                       "AT^SISW=%d,%d",
                       socketHandle->socketId, 
                       dataLength ) ) {
        return CELLULAR_INTERNAL_FAILURE;
    }
    
    CellularAtDataReq_t atDataReqSocketSend = {
        .pData = pData,
        .dataLen = dataLength,
        .pSentDataLength = pSentDataLength,
    };

    const uint32_t atCommandTimeoutMs = CELLULAR_COMMON_AT_COMMAND_TIMEOUT_MS;

    const uint32_t dataSendTimeoutMs = 
        socketHandle->sendTimeoutMs != 0 ? 
        socketHandle->sendTimeoutMs : 
        CELLULAR_COMMON_AT_COMMAND_TIMEOUT_MS;

    const uint32_t delayBetweenAtAndDataCommandsMs = 0;

    CellularPktStatus_t  pktStatus = //_Cellular_AtcmdRequestWithCallback( pContext, atReqSocketSend );
        _Cellular_AtcmdDataSend( pContext, 
                                 atReqSocketSend, 
                                 atDataReqSocketSend,
                                 _Cellular_SocketSendDataPrefix, 
                                 socketHandle,
                                 atCommandTimeoutMs,
                                 dataSendTimeoutMs, 
                                 delayBetweenAtAndDataCommandsMs );
    
    return _Cellular_TranslatePktStatus( pktStatus );
}

I have added the salae capture as a zip since sal files are not permitted:
CaptureShort.zip (109.3 KB)

First timestamp shows that response to write command is received but the payload is not sent by the library and timeout occurs after 20 seconds.

What could be the cause of library not sending the data?

Thank you,
Tuna

Hi Tuna,

I am looking into the saleae capture file and the implementaion. I will reply in this thread later if I have more observation. I would like to align the timestamp information with you first.

Around 83.8 seconds, the following command is sent.

AT^SISW=0,31\r

Several ms later, the modem responses

\r\n^SISW:0,31,0\r\n

The cellular interface is supposed to start sending the byte stream to the modem but failed.

Is it possible to reproduce this problem with cellular debug log enabled? It can also help us to look into this problem.

Hi Ching-Hsin,

I didn’t have logs on my port so I added some to be able to highlight the problem. I also enabled the logs for the common library code.

It seems like what is happening is the URC that is received just before the write command is treated as the first part of the response and for some reason this messes up the data write.

Here is the part of the logs that highlight this:

[DEBUG] [Cellular] [_Cellular_AtcmdRequestTimeoutWithCallbackRaw:224] >>>>>Start sending [AT^SISW=0,31]<<<<<
[DEBUG] [Cellular] [_Cellular_ReadLine:583] AT Read 31 bytes, data[0x20412fce]
[DEBUG] [Cellular] [_Cellular_SocketSendDataPrefix:1248] socketSendDataPrefix: pBytesRead 29 ^SISR: 0,1

^SISW: 0,31,0

[DEBUG] [Cellular] [_handleMsgType:681] Allocate at response 0x204165f0
[DEBUG] [Cellular] [_handleMsgType:684] AT solicited Resp[^SISR]
[DEBUG] [Cellular] [_saveATData:177] Save [^SISR] 5 AT data to pResp
[DEBUG] [Cellular] [_saveData:136] _saveData : Save data 0x20412fd0 with length 6
[ERROR] [Cellular] [_Cellular_AtcmdRequestTimeoutWithCallbackRaw:254] pkt_recv status=1, AT cmd AT^SISW=0,31 timed out
[DEBUG] [Cellular] [_Cellular_AtcmdRequestTimeoutWithCallbackRaw:262] <<<<<Exit sending [AT^SISW=0,31] status[1]<<<<<
[ERROR] [Cellular] [Cellular_SocketSend:1352] Cellular_SocketSend: Data send fail, PktRet: 1
[INFO] [MbedTLS] [2145] ssl->f_send() returned -1 (-0x0001)

Attached is the full capture as well:
CaptureWithCellularLogs.zip (49.3 KB)

I also noticed that between any operation performed on Cellular Interface there is a 1 second delay. I wasn’t able to figure out where this delay comes from. Is there any delays in the library side or could it be other threads/comm interface that is causing the delays. Could you also share your thoughts on that part as well?

Please let me know if you need any other logs/data from our side.

Thanks,
Tuna

Hi Tuna,

Thank you for your information. It helps to locate the problem.

Let’s consider the following AT command sequence.

AT^SISW=0,31
^SISW: 0,31,0
"sending 31 bytes data stream"
OK

From the implementation, it looks like you truncate the response for a success token when the repone is received.

AT^SISW=0,31
^SISW\n           // This part will be regarded as success token to indicate that
                  // AT^SISW=0,31 is received by modem. We can start to send data stream.
"sending 31 bytes data stream"
0,31,0            // This part is dropped by setting pBytesRead
OK                // OK to indicate sending binary data success

However, the implementation doesn’t check if the line received is “^SISW: 0,31,0”.

Let’s consider the following AT command sequence.

AT^SISW=0,31
^SISR: 0,1               // This is URC
^SISW: 0,31,0            // This is the response to AT^SISW=0,31 from modem

From the implementation, it will split the sequence into the following strings

AT^SISW=0,31
^SISR\n         // This part is regarded as AT^SISW=0,31 response. ^SISR is not a success token. It will be regarded as AT command response.
 0,1            // This part is dropped due to the pBytesRead is set to a shorter value
^SISW: 0,31,0   // The following line is also dropped due to the implementation set pByteRead to a shorter value.

Suggest we check the line received in the data send prefix function. In this example, it should check if the size is correct and the string starts with “^SISW:” when it received “^SISR: 0,1”. If not just returns CELLULAR_PKT_STATUS_OK without setting pBytesRead. Cellular interface will continue to process this line and call the callback function again with the next line received, which is “^SISW: 0,31,0”.

As to your second question, it could be the comm interface implementation. The default value for CELLULAR_COMM_IF_RECV_TIMEOUT_MS is 1000ms. The comm interface receive function doesn’t have to wait the receive buffer is full or timeout. It should return as soon as bytes are received from modem. Please give it a try to update the implementation or adjust the value and feedback your observation.

Hi Ching-Hsin,

That makes sense I was basing my implementation on SARA R4 and there they were doing something similar but I see now that they were checking for @ symbol and setting the bytes read on a match.
Where should I put the new line and which value should I set the bytes read for ^SISW: response?
Also is there a way to indicate how much to write in case I ask for 50 and get 40 from the modem for example?
While porting some of the Doxygen for these function are not the most clear about Invariants of the status to return so I have been trying to mimic other implementations and that can create issues like this sometimes.

I’ll check the Receive If timeout and respond back.

EDIT: Below is the read logic. We are using RTD drivers from NXP but all that needs to be known is reads are done with Async functions that call the _CellularReceiveCallback callback function when data is in or there is an error. Receive implementation is based on this example: https://github.com/aws/amazon-freertos/blob/feature/cellular/vendors/st/boards/stm32l475_discovery/ports/comm_if/comm_if_uart.c#L528

void _CellularReadComplete()
{
    CellularCommInterfaceContext * pIotCommIntfCtx = & _iotCommIntfCtx;
    
    const BaseType_t fifoNotOverrun = 
        IotFifo_Put( &pIotCommIntfCtx->rxFifo, &pIotCommIntfCtx->uartRxChar[ 0 ] ) ?
        pdTRUE :
        pdFALSE;

    configASSERT( fifoNotOverrun  );

    Lpuart_Uart_Ip_SetRxBuffer( pIotCommIntfCtx->uartNumber, pIotCommIntfCtx->uartRxChar, 1 );
    
    if( pIotCommIntfCtx->rxFifoBusy ) {
        BaseType_t xHigherPriorityTaskWoken = pdFALSE;
        BaseType_t xResult = xEventGroupSetBitsFromISR( pIotCommIntfCtx->pEventGroup,
                                                COMM_EVT_MASK_RX_DONE,
                                                &xHigherPriorityTaskWoken );
        
        portYIELD_FROM_ISR( xResult && xHigherPriorityTaskWoken ); 
        
    } else if( pIotCommIntfCtx->pRecvCB != NULL ) { 
            
        CellularCommInterfaceError_t cbStatus = pIotCommIntfCtx->pRecvCB( pIotCommIntfCtx->pUserData,
                                                ( CellularCommInterfaceHandle_t ) pIotCommIntfCtx );

        portYIELD_FROM_ISR( ( cbStatus == IOT_COMM_INTERFACE_SUCCESS ) ? pdPASS : pdFAIL ); 
    }


}

void _CellularReceiveCallback(const uint8 HwInstance, const Lpuart_Uart_Ip_EventType Event, void *UserData)
{
    CellularCommInterfaceContext * pIotCommIntfCtx = & _iotCommIntfCtx;

    switch( Event ) {
        
        case LPUART_UART_IP_EVENT_RX_FULL: {
            _CellularReadComplete();
            break;
        }
        case LPUART_UART_IP_EVENT_END_TRANSFER: // Intended fallthrough
        // // TODO: Figure out how we want to handle receive errors.
        // // Right now error retries to start receive and stop if it fails once.
        // // Next time we are transmitting the receive is started again if it failed. 
        case LPUART_UART_IP_EVENT_ERROR: // Intended fallthrough
        default: {
            Lpuart_Uart_Ip_StatusType status = 
                Lpuart_Uart_Ip_AsyncReceive( pIotCommIntfCtx->uartNumber, pIotCommIntfCtx->uartRxChar, 1 );

            pIotCommIntfCtx->uartBusyFlag = ( status != LPUART_UART_IP_STATUS_SUCCESS );
            break;
        }
    } 
}


static CellularCommInterfaceError_t _CellularReceive( CellularCommInterfaceHandle_t commInterfaceHandle,
                                              uint8_t * pBuffer,
                                              uint32_t bufferLength,
                                              uint32_t timeoutMilliseconds,
                                              uint32_t * pDataReceivedLength )
{
    CellularCommInterfaceContext * pIotCommIntfCtx = ( CellularCommInterfaceContext * ) commInterfaceHandle;
    
    if( ( pIotCommIntfCtx == NULL ) || ( pBuffer == NULL ) || ( bufferLength == 0 ) ) {
        return IOT_COMM_INTERFACE_BAD_PARAMETER;
    }

    if( pIotCommIntfCtx->ifOpen == false ) {
        return IOT_COMM_INTERFACE_FAILURE;
    }
    
    
    pIotCommIntfCtx->rxFifoBusy = 1U;
    ( void ) xEventGroupClearBits( pIotCommIntfCtx->pEventGroup, COMM_EVT_MASK_RX_DONE );
    uint64_t startTimeMs = TICKS_TO_MS( xTaskGetTickCount() );
    uint32_t rxCount = 0;
    while( rxCount < bufferLength )
    {   
        uint8_t rxChar; 
        if( IotFifo_Get( &pIotCommIntfCtx->rxFifo, &rxChar ) == true ) {
            *pBuffer = rxChar;
            pBuffer++;
            rxCount++;
        } else {
            uint64_t elapsedTimeMs = TICKS_TO_MS( xTaskGetTickCount() ) - startTimeMs;
            if( elapsedTimeMs >= timeoutMilliseconds) {
                break;
            }
            // If a byte has been received wait a ms if not wait for the rest of the elapsed time.
            uint8_t waitTimeMs = rxCount > 0U ? 1 : (timeoutMilliseconds - elapsedTimeMs);
               
            EventBits_t uxBits = xEventGroupWaitBits( pIotCommIntfCtx->pEventGroup,
                                              COMM_EVT_MASK_RX_DONE,                                              
                                              pdTRUE,
                                              pdFALSE,
                                              pdMS_TO_TICKS( waitTimeMs ) );

        }
    }
    
    /* Clear this flag to inform interrupt handler to call callback function. */
    pIotCommIntfCtx->rxFifoBusy = 0U;

    *pDataReceivedLength = rxCount;
    return ((rxCount > 0U) ? IOT_COMM_INTERFACE_SUCCESS : IOT_COMM_INTERFACE_DRIVER_ERROR );
}

EDIT2: After looking at how the recv function on comm if is called I think I see the issue.

( void ) pContext->pCommIntf->recv( pContext->hPktioCommIntf, ( uint8_t * ) pRead,
                                        bufferEmptyLength,
                                        CELLULAR_COMM_IF_RECV_TIMEOUT_MS, &bytesRead );

This call on pktio file tries to do a read to fill the full buffer. Since what we are receiving is smaller than the buffer my implementation does tiny reads and sleeps 1 ms increments. Since we never fill the buffer the receive callback never gets called.

I see two fixes. If we received any data and I already slept for a ms or two I call the read to be done. If we haven’t received a full line receive thread will call comm if again to continue reading. If I get no data then I do the full wait with the timeout seconds. alternatively I could shorten the timeout as well but don’t want to make it too short I think.

Hi Tuna,

Where should I put the new line and which value should I set the bytes read for ^SISW: response?

With your original implementation, you can do the following:

  1. Check if pLine contains the complete string with “^SISW: <srvProfileID>,<cnfWriteLength>,<unackData>”
  2. Put the new line after the “^SISW”.
  3. Clear " 0,31,0" in the pLine buffer.
^SISW: 0,31,0\r\n
=>
^SISW\n\0\0...\0\r\n

‘\r’,‘\n’, or ‘\0’ will be ignored when parsing the response from modem.

Also is there a way to indicate how much to write in case I ask for 50 and get 40 from the modem for example?

Currently, the design required the byte stream size assigned before sending the command. This feature can be added in cellular interface if required.

I also look into your implementation. In the following loop, the returned uxBits should be checked. If this is timeout event instead of RX_DONE, we can break the loop directly and return to pktio RX thread.

The intention for the delay here is to accumulate received data as much as possible for pktio to process in batch. It is port specific implementation. If the platform has a receive buffer and doesn’t require the port to read the data frequently, we don’t have to wait for the RX event again.

    while( rxCount < bufferLength )
    {   
        uint8_t rxChar; 
        if( IotFifo_Get( &pIotCommIntfCtx->rxFifo, &rxChar ) == true ) {
            *pBuffer = rxChar;
            pBuffer++;
            rxCount++;
        } else {
            uint64_t elapsedTimeMs = TICKS_TO_MS( xTaskGetTickCount() ) - startTimeMs;
            if( elapsedTimeMs >= timeoutMilliseconds) {
                break;
            }
            // If a byte has been received wait a ms if not wait for the rest of the elapsed time.
            uint8_t waitTimeMs = rxCount > 0U ? 1 : (timeoutMilliseconds - elapsedTimeMs);
               
            EventBits_t uxBits = xEventGroupWaitBits( pIotCommIntfCtx->pEventGroup,
                                              COMM_EVT_MASK_RX_DONE,                                              
                                              pdTRUE,
                                              pdFALSE,
                                              pdMS_TO_TICKS( waitTimeMs ) );

            /* Check the returned uxBits here. */
            if( ( uxBits & COMM_EVT_MASK_RX_DONE ) == 0U )
            {
                /* If it is a timeout, break the loop directly instead of waiting
                 * for RX_DONE event. */
                break;
            }
        }
    }
2 Likes

Hi Ching-Hsin,

Thank you so much for the clarifications and the bugfix.
Both are working great.
I really appreciate the help.

Thanks,
Tuna

2 Likes

Hi Ching-Hsin,

After further testing we ran into some issues and I wanted to post here again.

When I applied the receive timeout fix everything got faster but we started running into another issue.
When the socket opens modem takes a while to get ready for writes. When it is ready it sends the response ^SISW,0,1 to indicate 1 or more bytes can be written. Then we are supposed to write, wait for this again and then continue like this. On the cellular api there is no way to indicate the socket is “busy” so I was wondering how should I be able to implement this behavior.

Basically I need a way to block on write but also check timeout and urc or any other suggestions to solve this issue. I can’t return not ready etc when the write is called because mqtt and tls libraries continuously call write and treat anything other than success as errors so no way to indicate a socket is busy etc.

We had to revert back the fix and set timeouts to 200 ms which is long enough for SISW urc to come after a write.

Any ideas?

Thank you,
Tuna

Hi Tuna,

Thank you for your further questions. I’ll study the behavior of the modem and discuss it with you later.

Hi Ching-Hsin,

I also have been talking to the modem manufacturer and they were recommending using Transparent Mode to be able to instantly send and receive data without the URC.
I need to brush up on the subject but I remember this modem switches RX channel to RING line in order to send URCs to the host and then uses TX RX for socket data.
Do you have a good example on implementing Transparent Mode for the Socket API and does the way this modem does things have any similarities to other modems?
This is the first modem I’ve ever worked with so I am trying to figure out what is custom and what is standard.

Thank you,
Tuna

Hi Tuna,

Sorry for lat reply. From your previous description, it looks like “^SISW: 0,1” has to be received before sending data through socket.

AT^SISO...     // Open the internet service

^SISW: 0, 1   // URC to indicate the service is ready to accept data

AT^SISW=0,31         // Start to send data
^SISW: 0,31,0
< 31 bytes data > 
OK

^SISW: 0, 1   // Wait for this URC again before sending data again

AT^SISW=0,31         // Send more data
^SISW: 0,31,0
< 31 bytes data > 
OK

If that is the case, we can make use of FreeRTOS inter-task communication mechanism to wait for an event received before sending data. Here use event group as an example.

Create an event group for socket when module init.

CellularError_t Cellular_ModuleInit( ... )
{
    ...
    pModuleContext->xSocketEventGroup = xEventGroupCreate();
    if( pModuleContext->xSocketEventGroup == NULL )
    {
        /* The event group was not created because there was insufficient
        FreeRTOS heap available. */
    }
}

Register a URC handler to handle “^SISW: 0, 1”.

CellularAtParseTokenMap_t CellularUrcHandlerTable[] =
{
    ...
    { "SISW",              prvProcessSISWUrc },
    ...
};

static void prvProcessSISWUrc( CellularContext_t * pContext,
                                                     char * pInputLine )
{
    /* Set the socket event bit for corresponding socket when "^SISW: 0, 1" is received. */
    xEventGroupSetBits( pModuleContext->xSocketEventGroup, ( 1 << uxSocketIndex ) );

}

In the socket send function, wait for the bit before start to send data.

CellularError_t Cellular_SocketSend( ... )
{
    ...
    /* Wait the corresponding event groupt bit before start to send data. */
    uxBits = xEventGroupWaitBits(
             pModuleContext->xSocketEventGroup,
              ( 1 << uxSocketIndex ),
             pdTRUE,        /* Bit should be cleared before returning. */
             pdFALSE,       /* Don't wait for all bits. */
             xTicksToWait );/* Wait a maximum timeout for bit to be set. */

    if( ( uxBits & ( 1 << uxSocketIndex ) ) != 0 )
    {
        /* URC "^SISW" is received. Continue to send data through socket. */
    }
    else
    {
        /* This is the timeout case. Check the spec for error handling. */
    }
}

The three reference modem ports make use of buffer access mode for socket send/receive. We don’t have reference port for transparent mode. If you have problem implementing this feature, please feedback your question. We will discuss with you here.

1 Like

Implementing the event group to wait on the URC fixed the issue.

Thank you again for the discussion and suggestions.
I really appreciate it.

Thanks,
Tuna

1 Like