Commercially available devices add [PSH] only to the last data when sending data of 1460 bytes or more, but my device adds [PSH] to all data.
And some non-commercial IP-stacks do so as well 
As FreeRTOS+TCP was written for platforms with limited resources (RAM, flash), some features have been simplified, like setting the PSH flag. The flag will be set in every packet that contains outgoing data.
At the receiving side, it means that every packet can be passed immediately to the application, a call to recv()
will unblock and return data.
You’re saying that the peer is replying with an ACK “too early”. That is up to the peer, and I don’t think it is much related to the PSH flag. When the sender is fast enough, and the peer’s WIN is big enough, the peer won’t see a reason to send ACK’s in between.
When I send a file from a fast CPU (running +TCP) to my laptop, it is possible that +TCP has 12 outstanding packets before receiving an ACK.
RAc wrote:
This may be related to the peer’s advertised small window size (513).
Yes indeed, when the peer has a lot of RX buffer space, it will also send less ACK’s in-between.
If you like, you can run iperf3 and see maximum possible speed. Please download:
iperf_config.h
iperf_task_v3_0f.c
‘iperf_config.h’ shows an example of configuring iperf3.
the logging will tell how to start up iper3 at the Window’s side, e.g. :
iperf3 -c 192.168.101.100 --port 5001 --bytes 100M -R
Or, if that is too complex, you can post a PACP file of any TCP conversation in which loads of data are being sent. I will analyse it.