HardFault in STM32F103 because of array class member in FreeRTOS C++ wrapper

ancord wrote on Thursday, November 28, 2019:

Three weeks I am trying to cope with a mysterious problem.
I will begin from general description and then dive into details. The MCU is STM32F103RFT6, HAL and C++ Wrapper for FreeRTOS 10.2 are in use.
So, I have only one task running (not counting Idle Task). Continious infinite stream of bytes arrives at UART. 72 bytes every 10ms. Inside the task I am periodically arming DMA to receive these 72 bytes and suspending the task twice using ulTaskNotifyTake(). There’re RxHalfComplete and RxFullComplete callback functions which are called by DMA IRQ handler when it half or fully finishes reception. Inside these callback functions there’s nothing but vTaskNotifyGiveFromISR with portYIELD_FROM_ISR calls.
When arming DMA a buffer must be provided. The buffer can be declared as stack variable inside task’s body or as private member variable of a class. What perplexing me is that when buffer is the stack variable - everything is OK. And if it is member of the class - Hard Fault comes within several seconds.

Now, the necessary pieces of code.
As I said FreeRTOS native calls a wrapped into C++ classes. So here is how the wrapper for task creation lools like:

class Thread
{
public:
		Thread(const char* const threadName, uint16_t stackDepth, UBaseType_t threadPriority)
		{
				BaseType_t result = xTaskCreate(
										TaskAdapter, 
										threadName, 
										stackDepth,
										this, 
										threadPriority,
										&handle);
		}
        
        TaskHandle_t getTaskHandle(void)
        {
                return handle;
         }
         
protected:
        virtual void run(void) = 0;
private:
		TaskHandle_t handle;
        
		static void TaskAdapter(void *pvParameters)
		{
			Thread *task = static_cast<Thread *>(pvParameters);			
			task->run();
			#if (INCLUDE_vTaskDelete == 1)
                vTaskDelete(task->handle);
            #endif
		}
 }

It is inherited by another class which implements desired behaviour:

class DataGrabberReceiver : public Thread
{
	private:
		uint8_t classMemberByteBuffer[72];
        
	public:
		DataGrabberReceiver(const char* const threadName, uint16_t stackDepth, UBaseType_t threadPriority) : Thread(threadName, stackDepth, threadPriority) { //no-op here }; 
		
		void run(void) override
        {
                uint8_t stackAllocatedByteBuffer[72];
                QueueHandle_t queue = xQueueCreate(72, sizeof(uint8_t));
                
                while(true)
                {
                        //with this if-statement everything works
                        if(HAL_OK == HAL_UART_Receive_DMA(&huart_DataGrabber, stackAllocatedByteBuffer, 72))
                        //and with this one HardFault comes in seconds after start
                        //if(HAL_OK == HAL_UART_Receive_DMA(&huart_DataGrabber, classMemberByteBuffer, 72))
                         //only one above mentioned if-statement can be uncommented
                         {
                                 //waiting for the first half being received into buffer
                                 ulTaskNotifyTake(pdTRUE, portMAX_DELAY);  
                                 //and copying it inside the queue
                                 for(int i = 0; i < 36; ++i)
                                 {
                                        xQueueSendToBack(queue, &stackAllocatedByteBuffer[i], 1);
                                        //xQueueSendToBack(queue, &classMemberByteBuffer[i], 1);
                                }
                                //waiting for the second hald being received into buffer
                                 ulTaskNotifyTake(pdTRUE, portMAX_DELAY);
                                 //and copying further
                                 for(int i = 36; i < 72; ++i)
                                 {
                                        xQueueSendToBack(queue, &stackAllocatedByteBuffer[i], 1);
                                        //xQueueSendToBack(queue, &classMemberByteBuffer[i], 1);
                                }
                         }
                }
        }
};

The overriden RxCallbacks:

void HAL_UART_RxHalfCpltCallback(UART_HandleTypeDef *huart)
{
	if(huart->Instance == huart_DataGrabber) 
	{	
        BaseType_t px = pdFALSE;
		vTaskNotifyGiveFromISR(dataGrabber_handle, &px);
		portYIELD_FROM_ISR(px);
	}
}

void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
	if(huart->Instance == huart_DataGrabber) 
	{	
        BaseType_t px = pdFALSE;
		vTaskNotifyGiveFromISR(dataGrabber_handle, &px);
		portYIELD_FROM_ISR(px);
	}
}

The code in main.cpp:

#include "stm32f1xx_hal.h"
#include "DataGrabberReceiver.h"

UART_HandleTypeDef huart_DataGrabber;
TaskHandle_t dataGrabber_handle;

int main(void)
{
	HAL_Init();
	SystemClock_Config();
	MX_GPIO_Init();
	MX_DMA_Init();
	MX_USART2_UART_Init();
    
    auto receiverThread = DataGrabberReceiver("DataGrabberReceiver", 128, 3);
	receiver_handle = receiverThread.getTaskHandle();
    
    vTaskStartScheduler();
    
    /* Here we should never get */
    while(1) {  }
}

everything is quite obvious there and even more: it is generated by ST CumeMX. So task is created, scheduler started and… everything works fine if inside DataGrabberReceiver::run’s while cycle I work with stackAllocatedByteBuffer[]. And I am facing HardFault if I work with classMemberByteBuffer[].

Let me answer some questions:
0. Task’s stack was set at different sizes. 128 words, 1024 words… all the same.

  1. Yes, NVIC interrupt priorities of DMA and UART are logically lower than configMAX_SYSCALL_INTERRUPT_PRIORITY.
  2. If one makes classMemberByteBuffer static or global - everything works fine.
  3. Heap size in FreeRTOS config file set at 30KB.
  4. I am using heap4.c and have tried to move it to another location, defining manualy ucHeap array and asking linker to put it at certain RAM position - it did not helped.
  5. If one rejects wrapper everything seems working. But I need runtime polymorphism for interfaces and incapsulation. C function pointers are not the case.
  6. When in HardFault stack trace (I am using Segger J-Link via SWD) is like this
Thread #1 57005 (Suspended : Signal : SIGTRAP:Trace/breakpoint trap)	
	HardFault_Handler() at stm32f1xx_it.c:78 0x8003a20	
	<signal handler called>() at 0xfffffff1	
	uxListRemove() at list.c:218 0x8004748	
	xTaskIncrementTick() at tasks.c:2,571 0x800565e	
	xPortSysTickHandler() at port.c:445 0x8004534	
	osSystickHandler() at cmsis_os.c:1,415 0x8004624	
	SysTick_Handler() at stm32f1xx_it.c:166 0x8003a4a	
	<signal handler called>() at 0xfffffffd	
	prvPortStartFirstTask() at port.c:270 0x8004360	
	xPortStartScheduler() at port.c:350 0x8004402	

The cause is floating and makes one think that something wrong with kernel which I believe is impossible.
7. Class member variables are stored somewhere in RAM heap while task’s stack variables are stored in FreeRTOS’s heap. They are separated in RAM space. But I don’t think that someone’s (not FreeRTOS’s) stack overlaps the RAM heap and currupting memory.

Who can explain or have faced with such “magic”? I am almost sure that memory is currupted somehow, but there is no reasons for race conditions because the byteBuffer is guarded. But HOW?? Help me please!

richard_damon wrote on Thursday, November 28, 2019:

I don’t see the code for where you actually create the object of type DataGrabberReceiver, and that may be where the issue is. Your fault looks like memory corruption, and that can be hard to find.

ancord wrote on Thursday, November 28, 2019:

I have added the main code, where DataGrabberReceiver is created. I have run into opinion somewhere that a task could be created inside object and started before the construction of object itself finishes. But in my case it is impossible, becasue the task created before scheduler started.

richard_damon wrote on Thursday, November 28, 2019:

You are creating the object as an object on the main function stack, and thus the buffer is also on that stack. The main function stack is reused by FreeRTOS for the ISR stack, you are getting data overwrites as DMA is writting ‘randomly’ into the ISR Stack.

ancord wrote on Thursday, November 28, 2019:

Oh… that’s simply incredible!! IT WORKS!!! Richard, God bless you! I’ve almost gave up. Three weeks being stucked… And now I can move on. From the other hand it so obvious with this memory overlapping… I felt that something like this occures but forgot that main also has stack…

richard_damon wrote on Thursday, November 28, 2019:

That issue has caught a number of people, it generally makes sense from a memory conservation point of view, after main has started the scheduler, it will never get control back, so those variable can only be accessed if pointers to them have been placed. It may be worthy of a more pomenate warning in the documentation, but not sure where it would go, maybe in the documentation for vTaskStartScheduler().

The other option (I would need to see how hard it would be to implement) would be to set the ISR stack pointer to the CURRENT main stack pointer rather than to the beginning of its stack, that would lose a bit of memory for what has been used in the main stack, but would get arround this problem.

ancord wrote on Thursday, November 28, 2019:

Can you tell please more about ISR stack? I have read plenty of documentation but did not Come across this yet

richard_damon wrote on Thursday, November 28, 2019:

The Arm Cortex M processors (which the STM32 series use) have two seperate stacks, one for normal program usage, and a sperate one for ISR usage, so that when an interrupt occurs rather than using the program stack to save the state, it uses the ISR stack. and when the ISR ends, it pops its data off the ISR stack and then returns to what it interrupted. The big advantage of this is that if you need X words of stack for your ISRs, you can allocate that once, rather than needed to add it to every task that you might have. Basically it is a hardware optimization to save memory for multitask programs.

xz8987f wrote on Thursday, November 28, 2019:

after main has started the scheduler, it will never get control back,
That’s actually not true for the case when the scheduler is ended (which I do in some cases, see https://mcuoneclipse.com/2019/01/20/freertos-how-to-end-and-restart-the-scheduler/).
That’s why I have added an option/configuration macro to the FreeRTOS port to have this stack reset disabled.
I think it is not a good idea in general to reset the MSP stack: if the scheduler gets started from main(), then usually there should not be much on the stack anyway. Or it should be turned on on demand (and not by default). My 1 cent.

rtel wrote on Thursday, November 28, 2019:

The first Cortex-M parts FreeRTOS was ported to had 8K bytes of RAM - in
which case recovering as much RAM as possible was almost essential. The
RISC-V port, which is much newer, gives you the option - you can either
provide a #define to statically allocate an interrupt stack, and if the
"#define is not provided you must provide a linker variable that marks
the top of the stack used by main() so that can be re-used.

We could add a similar scheme into the Cortex-M ports, but the default
behaviour will never change, there are just too many end user projects
that could break as the RAM consumption would go up, and too many users
who are now familiar with the way it works today.

richard_damon wrote on Thursday, November 28, 2019:

Well, in your case you have essentially created a new alternate port for the processor, so port specific details might not apply, to add the ability to stop (and later resume) the scheduler.

I suspect that there may be some applications that do some significant work pre-scheduler and might use measurable stack (maybe reading some configuration data that controls the startup), so the reset might make sense. That case could also be handled by moving that code into a function called by main, which returns and then the scheduler is started.

ancord wrote on Friday, November 29, 2019:

Richard, are you talking about this?
https://stackoverflow.com/questions/35179883/does-isr-interrupt-service-routine-have-a-separate-stack
You said:

The main function stack is reused by FreeRTOS for the ISR stack, you are getting data overwrites as DMA is writting ‘randomly’ into the ISR Stack.

And

The Arm Cortex M processors (which the STM32 series use) have two seperate stacks, one for normal program usage, and a sperate one for ISR usage

Were you talking about Main Stack Pointer (which is for ISR usage) and Program Stack Pointer (which is “for normal program usage”)? I am trying to conceive how can the stack of the main function - as usual stack of any function where part of DataGrabberReceiver’s class member variables were - be overlapped with stack for ISR usage? You’ve mentioned hardware optimization. Is that the reason?

ancord wrote on Friday, November 29, 2019:

I’ve carried out small experiment: created local int variable var1 inside main function and inside HAL_UART_RxHalfCpltCallback - var2. Debugger showed their addresses: var1 had something like 0x20017FAC and var2 - 0x20017F9C. Pretty close to each other. An to the RAM end. STM32F103RFT6 has 96kB, so RAM end = 0x20018000. So, var1 and var2 both belong to the stack.
If the task was created as local object inside main function, it’s private byteBuffer member also happens to be located at the stack, it’s address 0x20017EF4.
And if task created as global object outside main function, private byteBuffer member’s address is 0x2000AE00. It is right ahead the FreeRTOS’s ucHeap. So there is the MCU’s heap.

richard_damon wrote on Friday, November 29, 2019:

Yes, that is talking about the same phenomenon. The Arm Cortex M series has two stack pointers available. The MSP (Main Stack Pointer) which is used at the start of operation, and the PSP (Process Stack pointer) which is optional, but used by FreeRTOS for tasks so that the ISRs can use a seperate SP than the tasks. The Arm hardware will automatically switch between the stacks as needed.

In the FreeRTOS scheduler startup code, it rewinds the MSP to the beginning of its stack to give it maximum space, as it is presumed that you will never return to main and anything there is just wasted. (Erich noted that he has a modified version that doesn’t do this, and in fact allows stopping and restarting the scheduler) This rewinding of the MSP is what causes problems if you try to have stuff on the main stack that is used during operations.

You are using heap4?
Are you aware of Using newlib and FreeRTOS?
Hope that helps,
Best Regards, Dave

1 Like