Posix port macOS blocking

So far, we had been using a poor man’s compatibility layer for our simulator with some limitations. Moving forward to integrating FreeRTOS in the simulator as well, I’ve run into a strange issue with the Posix port.

On Linux, the simulator works absolutely fine, whereby on macOS, it blocks soon after boot.

The simulator application comes in 2 different flavours. One is a Qt application used by end-users that renders the device’s LCD screen into a widget. The other is a FOX toolkit application with very limited capabilities (mostly for developers). The main issue is with the Qt application. The FOX toolkit app runs mostly fine, whereby I have seen a full block or two as well, but it is very hard to reproduce.

Now to the issue itself. When the application blocks, it seems that the task that should be running (fetched in lldb via call prvGetThreadFromTask( xTaskGetCurrentTaskHandle() )) has all signals masked. If I unmask them via “call (int)vPortEnableInterrupts()”, the application will run fine for a couple more seconds (sometimes 10s of seconds) until the next blocking.

As advised on the Posix port documentation, pthread_sigmask() is used to mask all signals directly at the beginning of main() and I checked in lldb that all threads created by Qt and macOS have indeed the signal mask properly setup.

I’m now running a bit out of options as to how this could be debugged, or how I could find the root cause of this. One possibility would be to check the sigmask in the idle hook, which is most of the time the task that is marked as “running” when it blocks. Placing a breakpoint there might give me some hints at what caused the task to be resumed with an improper signal mask.

Any pointers at what could be wrong would help. Thx a lot in advance!

We had some issues in the past with Signals on MacOS as well.
After looking briefly at the implementation of signals, I think they are susceptible to racing conditions when saving and blocking signals (it was some time ago, so I might be wrong)

We opened a thread on stackoverflow with a working example of one of the issues here, with no concrete results, so we had to modify the implementation to use condition variables instead of signals in some of the functionality.

Another question comes to my mid, are you using signals in your application+libraries that could interfere with the POSIX port signals ?

Thanks a lot for the hint. Regarding your question: no, we don’t use any signals in any of the other threads, at least not that I could spot and certainly not in our code.

By the way: I checked as well that SIGALRM is never handled by any other thread than the FreeRTOS tasks that should (those in running state).

I tried a dirty workaround which seems to work okish, but is definitely no solution:

void vApplicationIdleHook( void )
{
  sigset_t set;
  pthread_sigmask(0, NULL, &set);
  if (sigismember(&set, SIGALRM)) {
    //__asm__ __volatile__("int3");
    vPortEnableInterrupts();
  }
    
  // Sleep 15 ms
  usleep( 15000 );
}

Another question, that I should have asked in my previous reply.
Are you creating raw threads with pthread_create?(could be directly or some of the libraries you are using) these need to be masked as well. However, this method is susceptible to a small window where some racing condition might happen when calling the thread, and just before calling sigmask, unless you use a non standard method where the thread is created with signals blocked. the best solution is to create all threads before starting the scheduler, a thread pool looks like a good solution for this kind of a problem.

I believe some threads are created by macOS through Qt, but I checked them all, and it seems they properly inherit the sigmask set directly in main. In all of them SIGUSR1 and SIGALRM is properly masked as designed. Also, as written earlier, I checked that SIGALRM is never ever delivered to a thread that does not belong to FreeRTOS (lldb issues a notification that tells which thread the signal has been delivered to).

Of course, I cannot guarantee for the OS that there isn’t some race condition in the sigmask inheritance, but I believe this is part of POSIX compliance.

POSIX doesn’t mention anything about race condition, just the API, the rest is implementation dependent.
We haven’t observed any anomalies with Linux, not WSL, only MacOS was acting weird.
So my guess is that if your library is creating threads on the fly it will at some point face that racing conditions, another point, is that maybe the library is modifying thread masks ?

How are you creating the general signal mask from main?

I don’t think so. I crawled through Qt source code and could not find anything. Then placed a break point on pthread_sigmask and sigprocmask, and it stopped only on my calls and FreeRTOS. I did not go to far with this though, as the Posix port is calling it all the time.

int main(int argc, char *argv[])
{
  // Mask all signals to avoid issues with FreeRTOS
  sigset_t set;
  sigfillset(&set);
  pthread_sigmask(SIG_SETMASK, &set, NULL);
  [.../...]
}

I could see a potential problem here, assuming you are calling vTaskStartScheduler(); in the

[.../...]

which would make the scheduler immune to the needed signals.
Masking all signals is not encouraged, as it would render the process not responsive to ctrl_C

Ok, so you would suggest unmasking the signals again, for example in the thread that is actually starting the FreeRTOS application? I will add this.

If you unmask, all the threads will be unmasked.
The way I suggest is to mask before/after calling pthread_create, but it would cause a problem with 3rd party libraries.
Another way is to call vTaskStartScheduler(); from a new unmasked thread, but you will have to do something with the main thread so it doesn’t exit, maybe use it for some task you were already using with pthread_create before.