Problem in 4.3.1 task handling

dansear wrote on Friday, July 27, 2007:

I recently encountered a bug in the handling of tasks.
A consequence of the bug causes ‘pxItemToRemove->pvContainer’
to be NULL in vListRemove(). I can detect the problem by
putting a test for ‘pxItemToRemove->pvContainer == NULL’
in that function.

The source of the bug is subtle, but I will try to describe
the situation. ARM platform. Several tasks exist, including
an idle task. The tasks that cause the problem loop around
doing some work and then call vTaskSuspend(NULL). An interrupt
is used to get the task going again, via xTaskResumeFromISR().
In my case, there are 2 tasks doing this, each has its own
interrupt for resuming.

The ‘work’ that the tasks perform include using a semaphore
to lock/unlock access to a resource.

So the task may be suspended two ways 1) at the vTaskSuspend(NULL),
and 2) waiting for the semaphore. The first does not have an ‘Event’
associated with it, but the second does (involving xPendingReadyList,
and xEventListItem).

The problem appears to be generated if the task is suspended on
the semaphore, with event, but the interrupt triggers a resume.
In my case, the interrupt occurs frequently, so it may happen
during the suspend for the semaphore lock. There is nothing in
xTaskResumeFromISR() that checks if the task is suspended waiting
on an event.

Since the task is suspended, waiting for an event, I believe
the ISR resume should be ignored… the event has not occured.

The following is my solution to accomplish that. In xTaskResumeFromISR()
change:

if( listIS_CONTAINED_WITHIN( &xPendingReadyList, &( pxTCB->xEventListItem ) ) != pdTRUE )

to:

if(( listIS_CONTAINED_WITHIN( &xPendingReadyList, &( pxTCB->xEventListItem ) ) != pdTRUE )
               && (pxTCB->xEventListItem.pvContainer == NULL) )

This has solved my current problem with the NULL pointer getting into
vListRemove(), but I have to wonder if this is correct?

Dan Searles

rtel wrote on Friday, July 27, 2007:

I think this is related to your previous post regarding the indefinite suspension mechanism (a new thing) - where the task is actually held in the suspended list rather than the event list should it be waiting indefinitely for an event.

There is a new release to be made in the next couple of days, so I will fix.

Thanks, and regards.

dspaude wrote on Friday, July 27, 2007:

I think I also have a similar problem as Dan. I have a task that is sending data to the UART and then pausing (waiting on a semaphore) until a character is received from the UART to resume printing. However, the when the semaphore is signaled (from the UART Rx task) the Tx task will not resume. I THINK I explained that correctly, but I haven’t dug into that issue yet since I’ve been resolving my printf()-related aborts.

Darrik

dansear wrote on Friday, July 27, 2007:

I think it may be different as I had that fix in while tracking down
the NULL pointer problem. So my tasks should have been staying in
the suspended list.

A task suspended indefinitely and a task suspended on an event both
have their GenericItem in the suspended list. Only the task waiting
for the event has an EventItem in the PendingReady list. I’m not even
sure what would happen if the GenericItem were in one of the Delayed lists.

rtel wrote on Friday, July 27, 2007:

I think Dans problems relate to the ‘wait indefinitely’ functionality, which was put in quite recently (not quite correctly it seems) following a request from a user.  The problem is probably not related unless you are wanting to wait indefinitely while also using the xTaskResumeFromISR() feature (which incidentally is also a new function following a user request).

Regards.

dansear wrote on Friday, July 27, 2007:

I think I see what you mean. The new part (delay indefinitely) is putting the GenericItem
on the suspended list, where as before, it would have gone on a Delayed list. If it were on
a delayed list, xTaskResumeFromISR() would not have seen it and wouldn’t resume it. So
a proper fix may be a little tricker than what I did. A few more questions are starting
to pop up, like should you be able to xTaskResumeFromISR() a task that has a definite
wait time?

Dan

rtel wrote on Friday, July 27, 2007:

The intention is that ResumeFromISR() is only used to unsuspend a task that was suspended using vTaskSuspend(), and never a task that was blocked with a timeout on a queue/semphore.  Tasks that are blocked on a queue or semaphore should be woken (within an interrupt) using QueueSendFromISR() or SemaphoreGiveFromISR().

The problem has arisen by placing a task that is blocked on a queue/semaphore into the suspended list (where until recently they would never have been), without updating the ResumeFromISR() function to check that the task is indeed in the Suspended state (as a result of a call to TaskSuspend()), and not in the Blocked state (as a result of a call to a queue or semaphore function with infinite block time).

This is definitely an error, but in most cases people have got away from it because their tasks enter either the Suspended or Blocked states, not both.  The scenario you described in the first message of this post is not one I have personally encountered.

I think the solution you suggested works, but I need more time to consider it, as per:

portBASE_TYPE xTaskResumeFromISR( xTaskHandle pxTaskToResume )
{
portBASE_TYPE xYieldRequired = pdFALSE;
tskTCB *pxTCB;

__pxTCB = ( tskTCB * ) pxTaskToResume;

__/* Is the task we are attempting to resume actually suspended? */
__if( listIS_CONTAINED_WITHIN( &xSuspendedTaskList, &( pxTCB->xGenericListItem ) ) != pdFALSE )
__{
____/* Has the task already been resumed from within an ISR? */
____if( listIS_CONTAINED_WITHIN( &xPendingReadyList, &( pxTCB->xEventListItem ) ) != pdTRUE )
____{
______/* Is the task really in the Suspended state, or blocking
______indefinitely on an event? */
______if( pxTCB->xEventListItem.pvContainer == NULL )
______{
________if( uxSchedulerSuspended == ( unsigned portBASE_TYPE ) pdFALSE )
________{
__________xYieldRequired = ( pxTCB->uxPriority >= pxCurrentTCB->uxPriority );
__________vListRemove(  &( pxTCB->xGenericListItem ) );
__________prvAddTaskToReadyQueue( pxTCB );
________}
________else
________{
__________/* We cannot access the delayed or ready lists, so will hold this
__________task pending until the scheduler is resumed, at which point a
__________yield will be preformed if necessary. */
__________vListInsertEnd( ( xList * ) &( xPendingReadyList ), &( pxTCB->xEventListItem ) );
________}
_______}
____}
__}

__return xYieldRequired;
}

The same change is probably required in the TaskResume() function too.

With respect to the other issue you raised I was thinking along the lines of:

portBASE_TYPE xTaskCheckForTimeOut( xTimeOutType *pxTimeOut, portTickType *pxTicksToWait )
{
portBASE_TYPE xReturn;

__#if INCLUDE_vTaskSuspend == 1
____/* The task is waiting indefinitely so had not timed out. */
____if( *pxTicksToWait == portMAX_DELAY )
____{
______xReturn = pdFALSE;
____}
____else
__#endif

__if( ( xNumOfOverflows != pxTimeOut->xOverflowCount ) && ( xTickCount >= pxTimeOut->xTimeOnEntering ) )
__{
____/* The tick count is greater than the time at which vTaskSetTimeout()
____was called, but has also overflowed since vTaskSetTimeOut() was called.
____It must have wrapped all the way around and gone past us again. This
____passed since vTaskSetTimeout() was called. */
____xReturn = pdTRUE;
__}
__else if( ( xTickCount - pxTimeOut->xTimeOnEntering ) < *pxTicksToWait )
__{
____/* Not a genuine timeout. Adjust parameters for time remaining. */
____*pxTicksToWait -= ( xTickCount - pxTimeOut->xTimeOnEntering );
____vTaskSetTimeOutState( pxTimeOut );
____xReturn = pdFALSE;
__}
__else
__{
____xReturn = pdTRUE;
__}

__return xReturn;
}

Although I don’t like using the preprocessor in that way.

Thoughts?

Regards.

dansear wrote on Friday, July 27, 2007:

I see that my solution was wrong… It misses the ERRONEOUS_UNBLOCK case.

I currently don’t see another way that doesn’t add a couple more
preprocessor tests, making it less clear.

Dan

dansear wrote on Friday, July 27, 2007:

Consider moving the "if( xTicksToWait > 0 ) "

test inside xTaskCheckForTimeOut().

Currently it is outside both calls.

Dan

dspaude wrote on Friday, July 27, 2007:

I use portMAX_DELAY when I call xSemaphoreTake() from a task to wait for a character from the UART.

I looked at my sequence and here is what happens:
1) TASK1 becomes active and sends data to UART
2) TASK1 pauses by waiting on a semaphore (xSemaphoreTake())
3) Rx IRQ on UART received
4) UART ISR actually sends the data to TASK1 via xQueueSendFromISR()
5) TASK1 never wakes up (I assume it won’t because it is waiting for the semaphore before it resumes)

So, my problem is that I am using the same task. I’ll have to see what to do about that.

Darrik

rtel wrote on Saturday, July 28, 2007:

Files under SVN have been updated, but not yet fully tested (which will be done prior to release).

Regards.

jwestmoreland wrote on Sunday, July 29, 2007:

Richard,

I’ve added them in the work that I’m doing - of course I need to get the port working - but I’ll let you know if I see anything
else otherwise.

Regards,
John W.