Releasing Resources on Task Exit/Delete

herlien wrote on Thursday, August 11, 2016:

FreeRTOS v9.0.0 on PIC32MX using MPLabX v3.35 with XC32 v1.42. I.e., latest of everything, as far as I can tell.

I’m porting a data acquisition / instrument controller system from a home-grown OS on a 68332 to FreeRTOS on PIC32MX. Part of what I’m porting is a mechanism to release system resources when a task exits or is deleted. To do so, I have a porting layer that wraps functions for task create/delete, semaphore creation/give/take, malloc/free, etc. It stores information about the resources used in ThreadLocalStorage, and uses that information to:

  1. release resources when the task exits or is deleted. E.g., frees memory, gives back MUTEX sems, etc.
  2. for instrumentation and display purposes, to allow the user and or developer (me) to observe system state.

The malloc/free works fine. Giving back semaphores works when the task voluntarily exits, since I catch that fact and release the semaphores from within that task’s context. The problem is when the task is externally deleted by another task. I fail an assertion at line 3783 in file tasks.c. From inspection, it appears that the assertion is checking that the caller owns the MUTEX. Questions:

  1. Why an assertion? I could perhaps see checking this condition in the code and returing a failure for the xSemaphoreGive() call. But an assertion seems rather extreme.
  2. Can someone please elucidate this situation, and come up with some suggestion for how I may accomplish my goal? That goal is to make sure that the MUTEX is released when the task dies.

I’ve now changed my code to NOT try to release semaphores in this case. From some cursory testing, it appears that the semaphore is released anyway. Can someone please confirm this?

However, in this case (semaphore had been taken by a task that subsequently exits), my system status display routine gets and displays a garbage string when using
pcTaskGetName(xSemaphoreGetMutexHolder(mySemaphore))

Again, could someone verify and advise?

Thank you very much for your help.

rtel wrote on Friday, August 12, 2016:

If a semaphore is held by a task, and that task gets deleted, then there is nothing in the code that will automatically release the semaphore.

Looking at the source code I don’t think there is an easy way around this. The mutex can be reset by passing its handle into xQueueReset(), but then to make it a mutex rather than a queue you would need to call prvInitialiseMutex() too - and that function is not publicly accessible.

Perhaps, if you are 100% sure there are no other tasks blocked on the mutex, it could be deleted then re-created?

herlien wrote on Tuesday, August 16, 2016:

Thank you. In general, there is no way to assure that no other tasks are blocked on the mutex. Indeed, the most common (and useful) scenario is:

  1. User notices that system appears hanged
  2. Using inspection routines, user ascertains that many threads are pending on a particular mutex, and determines which thread owns that mutex
  3. User kills the thread that owns the mutex.

In the existing system, killing the mutex owner releases the semaphore, allowing the waiting threads to acquire it and run, in order. Unless I can resolve this dilemma, it appears there will be no recourse other than to reboot the system.

I should add that this typically takes place over a comms link to a system deployed at sea. In some cases, it’s over a cabled link to a system some hundreds or thousands of meters under the sea.

rtel wrote on Wednesday, August 17, 2016:

Yikes. It sounds like you have a deadlock built into your system, which is deployed in an inaccessible place. Could you re-architect the system so the deadlock is avoided in the first place?

herlien wrote on Wednesday, August 17, 2016:

It’s not that, so much. It’s that periodically someone will integrate a new instrument handler that’s not well behaved. Hasn’t happened in a long time, as our instrument suite is relatively stable for most deployment scenarios. But it gives me a warm fuzzy feeling to know that, if a scientist decides to add a new strange instrument, or worse yet, a homegrown instrument where the instrument itself is not stable, then we have mechanisms in place in case things go south. Of course, the optimal solution is to always thoroughly debug and test any new instrument handlers. But the scientists don’t always understand software engineering principals.

richard_damon wrote on Thursday, August 18, 2016:

The issue is we have tasks, which are in many ways, more like a thread than a process. On a ‘big’ system, where you have a number of independant processes, each protected from each other and sharing information with each other only via OS provided connections, it is standard for the OS to keep track of resources allocated to a process and automatically free those when the process terminates.

In Tasks, like in threads, there isn’t a strong wall between them, so things are shared on a much more ad hoc manner, and defining ‘ownership’ really can be difficult. It is quite possible for one task to create something and give it to another, and if the thing went away just because the first task died it would cause a lot of trouble. Mutexs are a bit special here, as when a task task it, it owns the mutex, but FreeRTOS doesn’t keep a central repository listing of all the mutexes currently in the system.

I find that, in general, in an enviroment like FreeRTOS, the ‘random’ aborting of a task is generally a bad idea, if something has gone wrong, you really need to reboot to fix things, as you have no idea what else might be in a ‘bad’ state.

herlien wrote on Thursday, August 18, 2016:

Thank you Richard. I don’t disagree with anything you said. I do have a bit of a problem with one particular implementation decision in FreeRTOS. I believe (and I wrote my original post partially to get confirmation, as I can’t be sure) that somewhere in the software stack executed with xSemaphoreGive(), it uses an assertion to ensure that the caller actually owns the mutex. Given your statements about fluid ownership, I would prefer the OS to simply allow the mutex give to take place. But as I said, I may be misinterpreting what’s happening under the hood.

I also agree that killing a task is generally a bad idea. But when something of this nature occurs, it may be desireable to salvage whatever you can of the deployment (i.e. try to accomplish the scientific goals), and then recover the equipment for post mortem and further testing. Especially since ship time typically costs on the order of $30K/day, with a day each required for deployment and recovery.

tlafleur wrote on Thursday, August 18, 2016:

Hi Bob… It been a long time from the Kildall-DRI-CP/M days…
I’v been using FreeRTOS for over 10 years now in my projects…

tom [at] lafleur (dot) us

richard_damon wrote on Friday, August 19, 2016:

Mutexes check that the giver is the same task as which took the Mutex. The issue with trying to automatically give any Mutex that was taken is that FreeRTOS has not list of Mutexes to check to see if the task has a hold on any Mutex. My understanding is that there are technical reasons relating to possible priority inheritance that make this needed (the task holding the mutex might have had its priority raised if a higher priority task is waiting on the mutex.)