Bug in queue.c?

axos88 wrote on Saturday, March 10, 2012:

Hi!

What happens if a queue is deleted when a task is waiting for it? (To write or to receive from?)

All I see in queue.c is:

void vQueueDelete( xQueueHandle pxQueue )
{
configASSERT( pxQueue );

traceQUEUE_DELETE( pxQueue );
vQueueUnregisterQueue( pxQueue );
vPortFree( pxQueue->pcHead );
vPortFree( pxQueue );
}

This does not seem to wake up the tasks waiting for the queue (and telling them to return an error) before freeing it.

Regards,
  Ákos Vandra

davedoors wrote on Saturday, March 10, 2012:

Don’t delete a queue that is in use. If you do, it is a bug in the application code, not the kernel.

richard_damon wrote on Saturday, March 10, 2012:

FreeRTOS was designed ti handle the typical cases very efficiently, and to achieve that goal has a few basic constraints, including not deleting queue etc that are in use, or deleting tasks that own mutexes. To be honest, I can’t think of a time where I wanted to be able to delete a queue in my application (I can think of some places  you might want to, but I don’t do that sort of thing.)  My first thought is that the issue may be that to get all the tasks waiting to return with an error would require the delaying of the actually deleting of the queue since those tasks are in the middle of queue functions referencing that queue, and delaying the frees might cause overhead in the normal use cases.

axos88 wrote on Friday, March 16, 2012:

Hi Richard,

Okay, I understand, however the API documentation lacks this note. Actually I find the API documentation pretty lacking in itself, but that is another issue.

Do you think this could be worked around somehow?

For the scenario:
I am writing a connection-based network stack.
Suppose one task opens a connection to a remote machine, and tries to read data from the socket, however it blocks, as there is no data pending. I am using semaphores here for waiting.
Now suppose something goes wrong, and the remote closes the connection, or the connection times, out or whatever.
The socket should now go into the closing state, sending the close connection messgaes, and eventually cleaning up all semaphores, and resources used, even the one that the task is still waiting for for the read.

And there might be multiple tasks waiting for the socket, so “just setting” the semaphore, and indicating via some variable that the connection is closing is not really an option either, because i have no idea how many time it needs to be set, and also just by setting the semaphore does not guarantee that the waiting task gets scheduled, and gets out of the kernel function.

Do you have any idea what I could do in such a situation?

Regards,
  Ákos Vandra

richard_damon wrote on Friday, March 16, 2012:

One thing to think about is that even if a queue/semaphore handled tasks waiting on the queue/semaphore, how could it handle a task that was just starting a call to the object? In general, it is normally a BAD idea to have resources “disappear” asyncronously to a task that is using it. If you look at sockets in most systems they don’t get deleted at random times, but enter a  terminal state that all operations on them just immediately return an error.

What I would do to handle this sort of case, (Disclaimer, not tested pseudocode and details not dealt with), would be to have some additional structure for a socket (which probably includes a handle to a synchronization primitive, in your description a semaphore), The socket keeps track of ow many times it is currently being used, and a task, once it has “opened” a socket, can safely access it knowing it will stay around. When an external event “closes” the socket, it doesn’t disappear, but goes into a terminal state (probably marked with a flag). When this terminal state is entered, the synchronization primitive is signaled so if a task is waiting it will wake up. As each task wakes up from the wait, it check this flag to see if the socket has closed, and if so, repeats the signal in case there is another task waiting (or heading for a wait), and returns the error condition to the calling task. Depending on how you want the API set up, either the calling task needs to then call the release function, telling the socket that it knows now that it can no longer use the socket, or the operation that returned the error could include as part of returning the error code that the socket is released automatically, this latter mode has the slight disadvantage the it is harder to look at the code a pair up acquisition and release of the resource. When the last task that had acquired the socket, releases it, then it can do its clean up and delete the primitive, as it KNOWS that it is not in use now.

rtel wrote on Friday, March 16, 2012:

The first thing to consider is, if a task is blocked on a semaphore that is deleted and somehow the deletion unblocks the task, how would the task know the semaphore had been delete, and what should it do about it?  Currently the task only knows whether it was successful in obtaining the semaphore or not.

When a task unblocks due to a queue/semaphore event, it looks to see if the event that unblocked it is still valid.  If the event is not still valid (maybe another task or interrupt got to the queue first), the task will return to block for whatever is left of its original block time.  I could update the code to first check to see if the queue itself had been deleted as a special case before the task checks anything in the queue that may no longer exist - having an error returned if the queue was no more - but that would add more code and has the potential to be dangerous (without the use of heavy critical sections) because another queue might have been created in its place in the mean time.

Which TCP/IP stack are you using?  I think generally it would not be normal, in small embedded systems anyway, to have a semaphore per task/socket pair like this.  Instead the select() call, which may run in a different context, would have a single mechanism for blocking tasks making calls to it itself (by whatever means), and unblocking tasks when select() had an event that the task was interested in.  In this scenario select() would unblock the task (maybe just unsuspend it) when an error occurred from its own context, lett the task know there was an error,  then clean up the socket, then yield to the task that was just unblocked.

Regards.