Crash to hard fault handler after sending data to queue that is blocked on

Hello everybody,
I am currently experiencing a weird bug.
I have a queue (one of a bunch of queues) of pointers that i use to transfer data between tasks. When first sending data to the queue, the task receiving data is not yet running, therefore no task is blocking on reading from that queue. The first transmission to the queue works fine, after that the receiving tasks runs, reads from the queue, processes the data and attempts to read from the queue again. As the queue is empty now, it blocks. When the sending task is executed again, it attempts to send data to the queue again. This time it results in a catch by the HardFault handler. I was able to pinpoint the error with the debugger, it occurs when trying to execute the listREMOVE_ITEM( &( pxUnblockedTCB->xEventListItem ) ); line in the xTaskRemoveFromEventList of tasks.c.

BaseType_t xTaskRemoveFromEventList( const List_t * const pxEventList )
{
    TCB_t * pxUnblockedTCB;
    BaseType_t xReturn;

    /* THIS FUNCTION MUST BE CALLED FROM A CRITICAL SECTION.  It can also be
     * called from a critical section within an ISR. */

    /* The event list is sorted in priority order, so the first in the list can
     * be removed as it is known to be the highest priority.  Remove the TCB from
     * the delayed list, and add it to the ready list.
     *
     * If an event is for a queue that is locked then this function will never
     * get called - the lock count on the queue will get modified instead.  This
     * means exclusive access to the event list is guaranteed here.
     *
     * This function assumes that a check has already been made to ensure that
     * pxEventList is not empty. */
    pxUnblockedTCB = listGET_OWNER_OF_HEAD_ENTRY( pxEventList ); /*lint !e9079 void * is used as this macro is used with timers and co-routines too.  Alignment is known to be fine as the type of the pointer stored and retrieved is the same. */
    configASSERT( pxUnblockedTCB );
    listREMOVE_ITEM( &( pxUnblockedTCB->xEventListItem ) );

    if( uxSchedulerSuspended == ( UBaseType_t ) pdFALSE )
    {
        listREMOVE_ITEM( &( pxUnblockedTCB->xStateListItem ) );
        prvAddTaskToReadyList( pxUnblockedTCB );

        #if ( configUSE_TICKLESS_IDLE != 0 )
            {
                /* If a task is blocked on a kernel object then xNextTaskUnblockTime
                 * might be set to the blocked task's time out time.  If the task is
                 * unblocked for a reason other than a timeout xNextTaskUnblockTime is
                 * normally left unchanged, because it is automatically reset to a new
                 * value when the tick count equals xNextTaskUnblockTime.  However if
                 * tickless idling is used it might be more important to enter sleep mode
                 * at the earliest possible time - so reset xNextTaskUnblockTime here to
                 * ensure it is updated at the earliest possible time. */
                prvResetNextTaskUnblockTime();
            }
        #endif
    }
    else
    {
        /* The delayed and ready lists cannot be accessed, so hold this task
         * pending until the scheduler is resumed. */
        listINSERT_END( &( xPendingReadyList ), &( pxUnblockedTCB->xEventListItem ) );
    }

    if( pxUnblockedTCB->uxPriority > pxCurrentTCB->uxPriority )
    {
        /* Return true if the task removed from the event list has a higher
         * priority than the calling task.  This allows the calling task to know if
         * it should force a context switch now. */
        xReturn = pdTRUE;

        /* Mark that a yield is pending in case the user is not using the
         * "xHigherPriorityTaskWoken" parameter to an ISR safe FreeRTOS function. */
        xYieldPending = pdTRUE;
    }
    else
    {
        xReturn = pdFALSE;
    }

    return xReturn;
}
/*-----------------------------------------------------------*/

I am using a STM32F4 ARM Cortex M4 MCU.
Researching the cause of the HardFault Exeption leads to a PRECISERR flag in the CFSR register. The location of the executed code before the error is listed in the BFAR register, but my debugger is not happy reading the asm at that point. I may give it another try tomorrow.

I checked the usual interrupt priority configuration of the Arm Cortex port, everything is fine there. Config assert is also used correctly.

What could be the cause of the problem?

Thank you everybody!

It is a bit hard to guess just from your description - Can you share the code snippet of your application code?

Thanks.

1 Like

Of course!
Everything is wrapped in C++ classes, but I will try to explain.

This is the method that sends the data to the queue. Even though pointers to stack variables are send, this should be alright because it is ensured that the stack scope is not left while the data is read, and the reading task does not write to the memory location.
display_controller_queue_.Send() sends data to the queue (code is below).
WaitForTouchRelease() blocks this task until a touch event is recognised, letting other tasks execut at this point.

TouchController::Error TouchController::Calibrate() {
	TouchCalibrationContainer send_start_calibration_container;
	TouchCalibrationContainer send_1print_calibration_point_container;
	TouchCalibrationContainer send_2print_calibration_point_container;
	TouchPosition pressed_position;

	initialization_sequence_running_ = true;

	send_start_calibration_container.command_ = TouchCalibrationContainer::Command::kStartCalibration;
	display_controller_queue_.Send(&send_start_calibration_container, 10);
	send_1print_calibration_point_container.marker_x = first_calibration_marker_x_;
	send_1print_calibration_point_container.marker_y = first_calibration_marker_y_;
	send_1print_calibration_point_container.command_ = TouchCalibrationContainer::Command::kPrintCalibrationPoint;
	display_controller_queue_.Send(&send_1print_calibration_point_container, 10);


	pressed_position = WaitForTouchRelease();
	touch_driver_.SetCalibrationXOffset(first_calibration_marker_x_-pressed_position.x);
	touch_driver_.SetCalibrationYOffset(first_calibration_marker_y_-pressed_position.y);

	send_2print_calibration_point_container.marker_x = second_calibration_marker_x_;
	send_2print_calibration_point_container.marker_y = second_calibration_marker_y_;
	send_2print_calibration_point_container.command_ = TouchCalibrationContainer::Command::kPrintCalibrationPoint;
	display_controller_queue_.Send(&send_2print_calibration_point_container, 10);

	pressed_position = WaitForTouchRelease();

	return Error::kOk;
}

This is the display_controller_queue_.Send() method that writes to the queue:

IQueue::Error FreeRTOSQueue::Send(IContainer *container,
		std::uint32_t timeout) {

	if (!is_created_)
		return Error::kNotCreated;
	else {
		BaseType_t error_code;
		error_code = xQueueSendToBack(queue_handle_, &container,
				timeout/portTICK_PERIOD_MS);
		if (error_code == pdTRUE)
			return (Error::kOk);
		else if (error_code == errQUEUE_FULL)
			return (Error::kTimeout);
		else
			return (Error::kSendError);
	}
}

The task reading from the queue uses a display_controller_queue_.Receive(fetched_container, 0x0fffffffUL) (fetched container is the read buffer) method that is this:

IQueue::Error FreeRTOSQueue::Receive(IContainer *&container,
		std::uint32_t timeout) {

	if (!is_created_)
		return Error::kNotCreated;
	else {
		BaseType_t error_code;
		error_code = xQueueReceive(queue_handle_, &container,
				timeout / portTICK_PERIOD_MS);
		if (error_code == pdTRUE)
			return (Error::kOk);
		else
			return (Error::kReceiveError);
		return (Error::kOk);
	}
}

All this has definitely worked before…

Even though the code seemed to be working it’s absolutely possible that there was or is a bug :wink:
So the receiving task confirms the data by sending back a message or something ?
You need some life cycle management for the stack-allocated data.
Besides having configASSERT in place did you also enable stack overflow checking ?
BTW you don’t care about send errors. Maybe you should ?

1 Like

A typical fallacy is to pass pointers to stack resident objects to a queue. Once the receiver tries to dereference the pointer, the stack of the sender has been reused, so “garbage” is being decoded. Could that have happened here?

1 Like

Thanks for both of your answers! @hs2 the error really was a stack overflow! (I even did implement the stack overflow handler, but handled the checking wrong…)

I didn’t connect the error while sending to a queue with a stack overflow of the calling task, but everything works fine now. Thank you very much!