AWS MQTT connection dies after some point

Some more info right away - I can publish thousand of app messages without issue. but PINGREQ does it right away, usually on second or third as mentioned.

Oh the pingreq that doesnt work, the keepalive worker prepares a pingreq, sends it and immediately gets a legit connection closed alert back from AWS. there are no errors in CloudWatch though, just the disconnect message which is confusing.

I’m now working on gathering & comparing unencrypted, encrypted data sent over wire in working versus nonworking PINGREQs

** EDIT: looked at pings -

After some more testing I am starting to see what is going on, I think.
I enabled TCP keepalive with a really small window, its not related to that or the underlying network stack.

Here’s the key thing. I commented out the section that sends PINGREQ over network, and made the PINGRESP check always pass and it still got a connection closed message from the server at the same time, so it’s unrelated to the ping message itself.

This is using a MQTT keepalive min of 30 seconds in the connection info. If I use 1200 (the max), and send no network traffic, it stays alive for a little over 1200 as you would think. So it seems AWS is taking the client setting and enforcing it server side if they dont get PINGRESP in some window around it.

Now the key thing is I had HUGE timeouts set on MQTT operations, 15 seconds or so from bench testing, stepping through cell stack, etc. But the MQTT client worker code that handles the PINGREQ/RESP timer waits the full resp timer to see if it got a PINGRESP, then starts the timer again. So on a 30 second keepalive, and waiting 15 seconds for MQTT operations, it appears to drift outside of the window the server enforces…

It would be nice if AWS could confirm any functionality like that while I try more reasonable timeouts on a solid connection!

it appears i ran into this: