TCP Secure Sockets Serious Delays

leandropg wrote on August 15, 2019:

Hello,

I am developing with AWS FreeRTOS 201906.00_Major over Microchip Curiosity PIC32MZ EF board. Currently I have using the Secure Socket Library for connecting from a FreeRTOS Task to a AWS Lambda Function.

https://docs.aws.amazon.com/freertos/latest/lib-ref/html2/secure_sockets/index.html

I have implemented TX and RX using AWS Secure Sockets and it works. But I have a problem: the time consumed. When I am sending and recieve data from the AWS Lambda Function, this process takes between 3 to 10 seconds in send and receive data when the Lambda function takes only 500 ms from a computer.

I have reviewed that the function SOCKETS_Recv and it takes much time. I understood in the documentation that this function wait an specific quantity of bytes, and when it does not receive any wait the timeout time before realease the read data, although the data is in the buffer. I am worry with the response time, because we are building a system that must send much data by second to the cloud and it is a serious problem.

Aditionally, with the Ethernet Module setting the timeout (Send: 1000 ms / Receive: 500 ms) the send and receive works fine. But when I use the WIFI Module setting timeout (Send: 5000 ms / Receive: 5000 ms) sometimes works and another I obtained SOCKETS_TLS_HANDSHAKE_ERROR before the start the transmission. Another times I obtain DNS Lookup Failed!

I want understand what is the real problem with use this library, because with FreeRTOS sockets the situation is the same. I extracted some the code for that you understand my implementation. I use the Amazon Root CA 1 certificate and works https://www.amazontrust.com/repository/AmazonRootCA1.pem

Thank you.


BaseType_t connectErrorCode;
Socket_t xClientSocket;
static const TickType_t xTimeOutSend = pdMS_TO_TICKS(5000);
static const TickType_t xTimeOutReceive = pdMS_TO_TICKS(5000);

xClientSocket = SOCKETS_Socket(SOCKETS_AF_INET,
                               SOCKETS_SOCK_STREAM,
                               SOCKETS_IPPROTO_TCP);

if(xClientSocket != SOCKETS_INVALID_SOCKET) {
        
   SOCKETS_SetSockOpt(xClientSocket, 0, SOCKETS_SO_RCVTIMEO, &xTimeOutSend, sizeof(xTimeOutSend));
   SOCKETS_SetSockOpt(xClientSocket, 0, SOCKETS_SO_SNDTIMEO, &xTimeOutReceive, sizeof(xTimeOutReceive));
   SOCKETS_SetSockOpt(xClientSocket, 0, SOCKETS_SO_REQUIRE_TLS, NULL, ( size_t ) 0 );
   SOCKETS_SetSockOpt(xClientSocket, 0, SOCKETS_SO_TRUSTED_SERVER_CERTIFICATE, cTlsECHO_SERVER_CERTIFICATE_PEM, ulTlsECHO_SERVER_CERTIFICATE_LENGTH );        
        
   ulRemoteIPAddress = SOCKETS_GetHostByName("MY_URL");

   if (ulRemoteIPAddress != 0) {

      xRemoteAddress.usPort = SOCKETS_htons(443);
      xRemoteAddress.ulAddress = ulRemoteIPAddress;

      connectErrorCode = SOCKETS_Connect(xClientSocket, &xRemoteAddress, sizeof(xRemoteAddress));

      if(connectErrorCode == 0) {

         httpPostResponsePtr = &httpPostResponse[0];
         httpPostResponseLength = 0;
    
	 qtyDataToSend = sprintf(httpPostQuery, "MY_POST_REQUEST");

         qtyDataSent = SOCKETS_Send(xClientSocket, &httpPostQuery[0], qtyDataToSend, 0);

         if(qtyDataSent > 0) {
     
            do {

               qtyDataReceived = SOCKETS_Recv(xClientSocket, &bufferTcpRx[0], 1024, 0);}

               if (qtyDataReceived > 0) {

                  bufferTcpRxPtr = &bufferTcpRx[0];

                  for (i = 0; i < qtyDataReceived; i++) {

                     *httpPostResponsePtr++ = *bufferTcpRxPtr++;
                     httpPostResponseLength++;
                  }
            }

        } while(qtyDataReceived > 0);
    
        SOCKETS_Shutdown(xClientSocket, SOCKETS_SHUT_RDWR);
      }    
   }
   SOCKETS_Close(xClientSocket);
}

DanG-AWS wrote on August 15, 2019:

Hello, here are a few questions to help debug:

1/ How much data are you trying to receive from the Lambda when you run the code that you posted?

2/ To confirm, the amount of data that the device is sending to the Lambda is the string literal plus some headers?

3/ We have found that some IoT boards are more sensitive to Wi-Fi signal strength than you might expect. An access point optimized for smartphones, laptops, etc., may see problems with these lower-power, smaller antenna devices. Is it possible to eliminate that variable by using, for example, a consumer Wi-Fi router that you can position next to the board?

Thank you.

leandropg wrote on August 16, 2019:

Hello,

Thanks. I send the answers:

1/ How much data are you trying to receive from the Lambda when you run the code that you posted? –> Near to 1000 bytes that included Headers and JSON Data

2/ To confirm, the amount of data that the device is sending to the Lambda is the string literal plus some headers? –> Near to 1000 bytes too

3/ We have found that some IoT boards are more sensitive to Wi-Fi signal strength than you might expect. An access point optimized for smartphones, laptops, etc., may see problems with these lower-power, smaller antenna devices. Is it possible to eliminate that variable by using, for example, a consumer Wi-Fi router that you can position next to the board? –> It is not easy because the IoT nodes are deployed into different locations and many cases does not exist a clean path between the node and the Wifi router

Edited by: leandropg on Aug 16, 2019 12:06 PM

Edited by: leandropg on Aug 16, 2019 12:06 PM

DanG-AWS wrote on August 19, 2019:

Hello, one more question I should have asked before:

4/ Regarding the 3 - 10 second delay that you’re observing, does that include the call to SOCKETS_Connect? The latter is relevant because it involves the TLS protocol negotiation, which is multi-round trip and computationally intensive. In that case, more granular metrics would be of interest:

a. How much time is spent in SOCKETS_Connect?
b. How much time is spent in SOCKETS_Send for 1000 bytes?
c. How much time is spent in SOCKETS_Recv for 1000 bytes?

Thank you.

leandropg wrote on August 20, 2019:

Hello,

  1. Sometimes the call to the DNS Server fails after 20 seconds (DNS Lookup Failed), the sending process fail and I retry again

  2. Sometimes the socket generate a -1002 (SOCKETS_TLS_HANDSHAKE_ERROR) error after 3 seconds in the connection process (SOCKETS_Connect)… I retry again

When the sending works, I obtain this values:

a. How much time is spent in SOCKETS_Connect? --> 3 Seconds
b. How much time is spent in SOCKETS_Send for 1000 bytes? --> Less than 1 Second
c. How much time is spent in SOCKETS_Recv for 1000 bytes? --> 3 Seconds or more

hbouvier-AWS wrote on August 20, 2019:

Hello,

The API SOCKETS_Recv follows the Berkeley socket implementation.
https://github.com/aws/amazon-freertos/blob/c7556dfa81e4479061521510d861c5696578081b/libraries/abstractions/secure_sockets/include/iot_secure_sockets.h#L312
" have reviewed that the function SOCKETS_Recv and it takes much time. I understood in the documentation that this function wait an specific quantity of bytes, and when it does not receive any wait the timeout time before realease the read data, although the data is in the buffer. I am worry with the response time, because we are building a system that must send much data by second to the cloud and it is a serious problem."
That call is blocking until some data comes in not until all data comes in.

Looking at the break down of your time:
+a. How much time is spent in SOCKETS_Connect? --> 3 Seconds + => I am not surprised by this time. Connect does the crypto handshake, it is going to be much slower than with a computer.
+c. How much time is spent in SOCKETS_Recv for 1000 bytes? --> 3 Seconds or more+
If you are getting less than 5s, it means you are not hitting the timeout.
Can you check if you are spinning in that while loop? Receive data byte per byte? Or are you received everything in one go?

leandropg wrote on August 21, 2019:

Hello,

I am sorry to hear that the connection time is correct. I understand that is because TLS protocol but is a long time, adding the problems with DNS issues timeout.

In the data reception, I receive data in packets of 1024 bytes. When the packet is greater I received the first 1024 quickly, but the last data takes until timeout of 5 seconds… In a test I received data in small pieces for example 100 bytes, but the last bytes less than 100 bytes takes until timeout of 5 seconds… even a test with 1 byte… the last byte takes until timeout of 5 seconds…

I tried implement a graceful shutdown how suggest the FreeRTOS documentation (https://www.freertos.org/FreeRTOS-Plus/FreeRTOS_Plus_TCP/TCP_Networking_Tutorial_Receiving_TCP_Data.html) but sometimes works… sometimes does not work… depends of the connection speed… I do not how fix it… I understand that is by the Berkeley implementation… but it is very slow wait for the timeout… A send/receive could take until 10 seconds in many cases…

hbouvier-AWS wrote on August 26, 2019:

Hello,

Inconsistent behavior is weird. I would expect that to be consistent.
If you use an un-encrypted connection, like with a local mosquitto server, is it still very slow?
Could you give us a Wireshark log that shows the issue?

Thank you!

DanielH-aws wrote on September 09, 2019:

Hello,
Were you able to resolve your issue? Please feel free to reopen this request with more information or with a network log. You could try tcpdump to capture packets:

tcpdump -i <your interface> port 8883