I understand your point but even when they expose source code it’s a maintenance mess to have modified code of theirs you have to re-modify whenever you update to a newer library. It would be better for me to get Espressif to fix their FCC functions more directly and I will write a bug for that and also inquire about the USB default driver that is nonblocking.
As for FCC certification, they used to and still optionally have a full code replacement method and a tool for testing but that is a binary (no source) hard-coded to work only over specific UART (not USB) serial over a pair of GPIO pins often used for other purposes in the ESP32-C3. Again, that doesn’t work well for sealed systems. They more recently created functions (API) that can be included directly in user code so works with USB and also allows us to initiate test commands via BLE so works with real product sealed units, but still has the stupid loop with task yield instead of suspend/block. Their new functions still do what you described to be able to do continuous wave (CW) unmodulated tests as well as single channel packet tests for both WiFi and BLE. It’s identical in API function to their standalone code they had before, but is more flexible.
By the way, I looked at why I used polling for the USB case and it’s because of the following for their USB Serial JTAG/Controller Console (I’m a new user that can’t post links but this is from Espressif online documentation):
The non-blocking (default) driver and the VFS implementation will flush automatically after a newline. The blocking (interrupt-based) driver will automatically flush when its transmit buffer becomes empty.
While they refer to a blocking interrupt-based driver, I couldn’t find how to initiate that in place of their default for USB. They do describe UART drivers for serial but those don’t support USB.
Nevertheless, I may be able to use fcntl and disable O_NONBLOCK though that might not be enough given their code, but I’ll try that (I may have tried it unsuccessfully before – I don’t remember as it was more than 2 years ago).
As for the main loop, it isn’t handling “real-time” requirements in the sense of what low-level code for hardware needs to deal with, but it does need to be responsive. We do want serial and BLE inputs to respond reasonably quickly while the other messages are really nothing more than delayed responses from hardware requests like reading a photodiode value. The call in application code that starts that isn’t real-time – the low-level code is real-time doing the hardware I2C or register initiation and has an interrupt for getting a response. Only the response result (a value the application wants) then gets put into a message (usually in the interrupt routine that gets the response) for the application code. It’s really nothing more than a delayed return value for most calls. But during that delay, other activities can occur though in a restricted way such as BLE that may get a response saying to wait (because generally two things can’t be done at the same time though there are some exceptions like interrupting a current activity for debugging).
So the call to the message handler does those details of waiting for the message with appropriate timeout and logging warnings/errors and recovery and even doing sleep. The paradigm is “low_level_function(…)” followed by “msg_handler(expected_message_type)”. The message handler returns with the expected message content and handles any other messages that may be coming in asynchronously such as from BLE or from the once-a-second clock timer. While it is waiting, it is usually suspended using xTaskNotifyWaitIndexed (except for the serial polling we talked about but even that suspends one second at a time when there isn’t recent serial input). So this works with RTOS (except for the fast-response serial polling mode).
Espressif has some slow low-level hardware tasks like erasing Flash in a loop for blocks but they were smart enough to do a vTaskDelay(1) (the “1” is configurable but is the default) during their erasure to allow the idle task to run though it would have been better for them to use other suspend calls during the Flash write that would resume when the write completed. Still, large Flash erasure is slow enough that 1 tick isn’t terribly inefficient.
As for non-real-time tasks, we have our own timer table (we started in AVR with 8 KB RAM and 128 KB Flash before porting to ESP32) where the message handler looking at the once-a-second RTC messages updates the table efficiently (i.e. updates one variable until there is a trigger or change that then updates the full table) and dispatches to the appropriate registered function call but almost all of these call low-level code (either ours or Espressif library hardware code). But the task watchdog is not a problem for this at all. In fact, there is absolutely nothing wrong with the task watchdog. The problem is ONLY dealing with 3rd party code that, as you aptly put it, isn’t RTOS-compliant. The ask is for being able to explicitly do what the idle task does as a workaround, whether that’s forcing the idle task to run or whether that’s calling its functionality (freeing self-deleted task memory and feeding the task watchdog timer).
Again, if it weren’t for 3rd party code we wouldn’t be having this discussion as I’d have everything doing appropriate suspend including for serial and for FCC code and it would all work well with FreeRTOS. Well, I suppose a developer could during their development/testing have a tight loop that didn’t suspend, but that’s their problem not a FreeRTOS one. In my test-like CLI commands for application code all my loops have some command that calls msg_handler and if I don’t explicitly call a low-level command I still call msg_handler for an RTC message or I call a short-timer (down to 1 ms) that also has me call msg_handler. This is somewhat analogous do doing a yield but is better since it will generally do a suspend allowing the idle task to run (or will call my hack if the serial workaround or FCC workaround for their library flaws is being done). Alternatively, if no msg_handler is done I can just call vTaskDelay(1) but that will prevent asynchronous messages from getting serviced.