freeRTOS LwIP UNALIGNED hardfault access. STM32H7

Hello. I am writing that post, in order to find out what I should probable do, to investigate the error further. As the tittle is mentioning, I have enable the LwIP functionality in order to create an MQTT client. The steps I made are relevant to that topic: https://www.youtube.com/watch?v=sQ3rgQNGKV4&t=783s, because I have the same screen and the same external PHY board. All I wanted to do is to check if the ethernet works and for that reason I used a terminal to ping the IP address that I have assigned into the screen. The ping command was worked correctly for 2 hours and after that it appears a hardfault error that is mentioning an unaligned access. The code stops in the file “list.c” at the code, in the function uxListRemove :

(pxList->uxNumberOfItems)--;

As I was trying to find something about that, I noticed a strange value in the xItemValue variable that is too large as you can see in the picture down below. The function name is uxListRemove and this function is the last function before the hardfault occurs. I have posted also a picture that illustrates the registers while the code is inside the function, and also the memory browser that locates the pc address.

The second pictures illustrates the register addresses and the memory browser with the address of pc.

I have enable all the parameters in freeRTOSConfig.h file that are related to check for memory overflow and I have increase the heap_size to 231072. My default task stack size is equal to 60004 and also the TCPIP_STACK_SIZE is 42048, in brief I have increase every stack I notice in the lwipopts.h file. The reason I did that is because I thought that this problem is related to some kind of stack overflow, although the problem didn’t solve.

This is also my MPU_Config function:

and my linkerscript code for lwip_section is also this:

.lwip_sec (NOLOAD) : {
    . = ABSOLUTE(0x30000000);
    *(.RxDecripSection) 
    
    . = ABSOLUTE(0x30000200);
    *(.TxDecripSection)
    
    . = ABSOLUTE(0x30000400);
    *(.Rx_PoolSection) 
  } >RAM_D2 


The following picture illustrates the registers when the hardfault function is stopped at the begging from my break point.

I don’t know what else to do and how to avoid this annoying hardfault error. Is there anything useful information that you could tell me in order to investigate this error further, for example what other steps I could do? It is a program that is running correctly for 2hours. I don’t have experience in debugging since I have started my journey as embedded programmer, so I apologize if something is not clear to you.

The first image shows the value of pxList as 0xffffffff, which is invalid, so you are accessing an invalid pointer. The 0xffffffff value came from pxItemToRemove->pxContainer, so I’m going to guess pxItemToRemove is not valid. As this is executed while the scheduler is suspended, which doesn’t disable interrupts, it might be you have an invalid interrupt priority somewhere that should be masked but isn’t - causing a race. Please post your FreeRTOSConfig.h file. You may need to zip it up to attach it.

@rtel I really appreciate the fact that you took time to read my post and help me. Here is my freeRTOSConfig.h file:

/* USER CODE BEGIN Header */
/*
 * FreeRTOS Kernel V10.3.1
 * Portion Copyright (C) 2017 Amazon.com, Inc. or its affiliates.  All Rights Reserved.
 * Portion Copyright (C) 2019 StMicroelectronics, Inc.  All Rights Reserved.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy of
 * this software and associated documentation files (the "Software"), to deal in
 * the Software without restriction, including without limitation the rights to
 * use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 * the Software, and to permit persons to whom the Software is furnished to do so,
 * subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in all
 * copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
 * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
 * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
 * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
 * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 *
 * http://www.FreeRTOS.org
 * http://aws.amazon.com/freertos
 *
 * 1 tab == 4 spaces!
 */
/* USER CODE END Header */

#ifndef FREERTOS_CONFIG_H
#define FREERTOS_CONFIG_H

/*-----------------------------------------------------------
 * Application specific definitions.
 *
 * These definitions should be adjusted for your particular hardware and
 * application requirements.
 *
 * These parameters and more are described within the 'configuration' section of the
 * FreeRTOS API documentation available on the FreeRTOS.org web site.
 *
 * See http://www.freertos.org/a00110.html
 *----------------------------------------------------------*/

/* USER CODE BEGIN Includes */
/* Section where include file can be added */
/* USER CODE END Includes */

/* Ensure definitions are only used by the compiler, and not by the assembler. */
#if defined(__ICCARM__) || defined(__CC_ARM) || defined(__GNUC__)
  #include <stdint.h>
  extern uint32_t SystemCoreClock;
#endif
#ifndef CMSIS_device_header
#define CMSIS_device_header "stm32h7xx.h"
#endif /* CMSIS_device_header */

#define configENABLE_FPU                         1
#define configENABLE_MPU                         0

#define configUSE_PREEMPTION                     1
#define configSUPPORT_STATIC_ALLOCATION          1
#define configSUPPORT_DYNAMIC_ALLOCATION         1
#define configUSE_IDLE_HOOK                      1
#define configUSE_TICK_HOOK                      0
#define configCPU_CLOCK_HZ                       ( SystemCoreClock )
#define configTICK_RATE_HZ                       ((TickType_t)1000)
#define configMAX_PRIORITIES                     ( 56 )
#define configMINIMAL_STACK_SIZE                 ((uint16_t)512)
#define configTOTAL_HEAP_SIZE                    ((size_t)131072)
#define configMAX_TASK_NAME_LEN                  ( 16 )
#define configUSE_TRACE_FACILITY                 1
#define configUSE_16_BIT_TICKS                   0
#define configUSE_MUTEXES                        1
#define configQUEUE_REGISTRY_SIZE                8
#define configCHECK_FOR_STACK_OVERFLOW           2//1
#define configUSE_RECURSIVE_MUTEXES              1
#define configUSE_APPLICATION_TASK_TAG           1
#define configUSE_COUNTING_SEMAPHORES            1
#define configUSE_PORT_OPTIMISED_TASK_SELECTION  0
/* USER CODE BEGIN MESSAGE_BUFFER_LENGTH_TYPE */
/* Defaults to size_t for backward compatibility, but can be changed
   if lengths will always be less than the number of bytes in a size_t. */
#define configMESSAGE_BUFFER_LENGTH_TYPE         size_t
/* USER CODE END MESSAGE_BUFFER_LENGTH_TYPE */

/* Co-routine definitions. */
#define configUSE_CO_ROUTINES                    0
#define configMAX_CO_ROUTINE_PRIORITIES          ( 2 )

/* Software timer definitions. */
#define configUSE_TIMERS                         1
#define configTIMER_TASK_PRIORITY                ( 2 )
#define configTIMER_QUEUE_LENGTH                 10
#define configTIMER_TASK_STACK_DEPTH             1024

/* The following flag must be enabled only when using newlib */
#define configUSE_NEWLIB_REENTRANT          1

/* CMSIS-RTOS V2 flags */
#define configUSE_OS2_THREAD_SUSPEND_RESUME  1
#define configUSE_OS2_THREAD_ENUMERATE       1
#define configUSE_OS2_EVENTFLAGS_FROM_ISR    1
#define configUSE_OS2_THREAD_FLAGS           1
#define configUSE_OS2_TIMER                  1
#define configUSE_OS2_MUTEX                  1

/* Set the following definitions to 1 to include the API function, or zero
to exclude the API function. */
#define INCLUDE_vTaskPrioritySet             1
#define INCLUDE_uxTaskPriorityGet            1
#define INCLUDE_vTaskDelete                  1
#define INCLUDE_vTaskCleanUpResources        1
#define INCLUDE_vTaskSuspend                 1
#define INCLUDE_vTaskDelayUntil              1
#define INCLUDE_vTaskDelay                   1
#define INCLUDE_xTaskGetSchedulerState       1
#define INCLUDE_xTimerPendFunctionCall       1
#define INCLUDE_xQueueGetMutexHolder         1
#define INCLUDE_xSemaphoreGetMutexHolder     1
#define INCLUDE_uxTaskGetStackHighWaterMark  1
#define INCLUDE_xTaskGetCurrentTaskHandle    1
#define INCLUDE_eTaskGetState                1
#define INCLUDE_xTaskGetHandle               1

/*
 * The CMSIS-RTOS V2 FreeRTOS wrapper is dependent on the heap implementation used
 * by the application thus the correct define need to be enabled below
 */
#define USE_FreeRTOS_HEAP_4

/* Cortex-M specific definitions. */
#ifdef __NVIC_PRIO_BITS
 /* __BVIC_PRIO_BITS will be specified when CMSIS is being used. */
 #define configPRIO_BITS         __NVIC_PRIO_BITS
#else
 #define configPRIO_BITS         4
#endif

/* The lowest interrupt priority that can be used in a call to a "set priority"
function. */
#define configLIBRARY_LOWEST_INTERRUPT_PRIORITY   15

/* The highest interrupt priority that can be used by any interrupt service
routine that makes calls to interrupt safe FreeRTOS API functions.  DO NOT CALL
INTERRUPT SAFE FREERTOS API FUNCTIONS FROM ANY INTERRUPT THAT HAS A HIGHER
PRIORITY THAN THIS! (higher priorities are lower numeric values. */
#define configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY 5

/* Interrupt priorities used by the kernel port layer itself.  These are generic
to all Cortex-M ports, and do not rely on any particular library functions. */
#define configKERNEL_INTERRUPT_PRIORITY 		( configLIBRARY_LOWEST_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )
/* !!!! configMAX_SYSCALL_INTERRUPT_PRIORITY must not be set to zero !!!!
See http://www.FreeRTOS.org/RTOS-Cortex-M3-M4.html. */
#define configMAX_SYSCALL_INTERRUPT_PRIORITY 	( configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )

/* Normal assert() semantics without relying on the provision of an assert.h
header file. */
/* USER CODE BEGIN 1 */
#define configASSERT( x ) if ((x) == 0) {taskDISABLE_INTERRUPTS(); for( ;; );}
/* USER CODE END 1 */

/* Definitions that map the FreeRTOS port interrupt handlers to their CMSIS
standard names. */
#define vPortSVCHandler    SVC_Handler
#define xPortPendSVHandler PendSV_Handler

/* IMPORTANT: After 10.3.1 update, Systick_Handler comes from NVIC (if SYS timebase = systick), otherwise from cmsis_os2.c */
 #define USE_CUSTOM_SYSTICK_HANDLER_IMPLEMENTATION 0

/* USER CODE BEGIN Defines */
/* Section where parameter definitions can be added (for instance, to override default ones in FreeRTOS.h) */
/* USER CODE END Defines */

#endif /* FREERTOS_CONFIG_H */

@rtel TouchGFX has already initialize all peripherals, but I use only Ethernet, nothing else. So based what you have mentioned above, you think that may be the Ethernet interrupt priority that might has involve to that error? It was initialized with the value 5, its priority, or maybe the TCPIP priority? Could we assume that the LwIP functionality, which is being generated from CubeMX, is full of bugs and we cannot trust it?

Lwip in conjunction with FreeRTOS has proven stable in literally millions of installations in the field, so this is almost certainly a configuration issue. Do you have any isrs calling OS services with a priority > 5 or any isr using OS services NOT ending in -FromISR?

@RAc I see that gpio.c file that has been generated from TOUCHGFX designer has interrupt priority 7, also the jpeg.c file has interrupt priority 7, and the mdma has also 8.

Are the GPIO or jpeg interrupt handlers making any FreeRTOS API calls?

@rtel No none of the the interrupt handlers doing that as far as I can see. Although, I have disabled some peripherals that have been initialized from touchgfx and I increase a little the offset of rx_pool in linkerscript to 0x30000600, but I don’t know if that will fix the problem. Until now it is running without the hardfault of precise data. What do you think of that?

what may happen is that your startup (and thus interrupt) stack is too small, and the changes in your linker command file have shifted your memory layout just enough to make the problem go away as a side effect. I suggest initializing your startup stack to a signature at system startup and examine the stack in fault case, similar to the application stack overflow mechanism.

@RAc Sorry, could you please explain a little bit more cause I’m lost? What is the startup stack? Do you mean the defaultTask stack size, or you mentioning the total heap size? In brief, where is that stack? Is it in my linkerscript?

No, the stack to which the CPU is initialized at startup. FreeRTOS reuses that stack as the stack the (nested) ISRs run on. Sorry, I do not have a pointer to the documentation right now, might be documented in the porting guide. In a nutshell, most systems have a static area of memory reserved for the stack main() runs on whose location and size is typically defined in the linker command file. When you start the scheduler, that stack is naturally useless because main() does not execute anymore, so most ports (including the Cortex M4/M7 ports) reuse the stack for interrupt handlers. Browse your linker command file for identifiers containing the token .STACK.

1 Like

@RAc you did have a point on that because in my linker script the estack was assigned to RAMD1, which is occupying already the 73%. I changed that to RAMD2 where I have set the ethernet descriptors to run, where RAM_D2 is occupying only 6% usage. It seems that the problem was solved, because it exceeds the 2 hours working I think it manages to succeed on working 2:30 hours and after that the corruptions was happened again. I will try to use something else instead of RAM_D2, and I will come back to give a feedback about that. I don’t know if the RX_POOL offset which is 0x30000600 might play a role to this corruption? because it was 0x30000400 and I have changed it.

1 Like

where is your estack now, and where was it before? You can look that up in your map file. If you do not have one yet, rebuild your project with -m added to the linker.

My linker script in its initial form is like this: (I am showing the a snippet of the code where the estack is declared and also the size of each ram that it is defined in this script):


/* Entry Point */
ENTRY(Reset_Handler)

/* Highest address of the user mode stack */
_estack = ORIGIN(RAM_D1) + LENGTH(RAM_D1); /* end of "RAM_D1" Ram type memory  ORIGIN(RAM_D1) + LENGTH(RAM_D1);*/

_Min_Heap_Size = 0xc000 ; /* required amount of heap  */
_Min_Stack_Size = 0x8000; /* required amount of stack */

/* Memories definition */
MEMORY
{
  RAM_D1 (xrw)   : ORIGIN = 0x24000000, LENGTH =  512K
  FLASH   (rx)   : ORIGIN = 0x08000000, LENGTH = 1024K    /* Memory is divided. Actual start is 0x08000000 and actual length is 2048K */
  DTCMRAM (xrw)  : ORIGIN = 0x20000000, LENGTH = 128K
  RAM_D2 (xrw)   : ORIGIN = 0x30000000, LENGTH = 288K
  RAM_D3 (xrw)   : ORIGIN = 0x38000000, LENGTH = 64K
  ITCMRAM (xrw)  : ORIGIN = 0x00000000, LENGTH = 64K
  QUADSPI (r)	 : ORIGIN = 0x90000000,	LENGTH = 64M
  SDRAM   (xrw)  : ORIGIN = 0xD0000000,  LENGTH = 3600K
  SDRAM2  (xrw)  : ORIGIN = 0xD0384000,  LENGTH = 4592K
}

If you want to upload the entire script just let me know. I have changed the RAM_D1 to RAM_D2 and the working flow lasted another 30 minutes and after it corrupted, in brief I have 30 minutes saving time, and the system worked for 2:30 compared to the previous working flow with RAM_D1 into the estack, which was for 2hours before the corruption.

what is missing here is the IVT, in particular what can be found at entry 0 and 1 (initial stack pointer and reset vector). I suspect that your startup stack pointer is initialized to _estack, which implies that (growing downward from the end of your first RAM block), originally mapped, may compete with whatever else is tucked into RAM_D1 - 512k is not a whole lot of RAM.

Where is the map file (not the linker command file)?

I don’t know where to find that map file. This linker is auto generated from touchgfx, I didn’t write code. I have made a basic UI and after that I have enabled the ETH and LwIP from cubemx

In that case, your best bet is to contact your tool vendor. Without knowing your factual memory layout, all attempts to find your root cause are shots in the dark.

Best of luck!

1 Like

@RAc Hello again, I have found the map file, what should probable look there?

show us both the map files with the old (stack in RAM1) and new (RAM2) setup. If you are not at liberty to make those public, you can send me a PM.

It is extremly educational and insightful for you try to understand the memory map, so my main piece of advice would be for you to try to understand the contents yourself. I am sure there are good tutorials out on the net. You are well advised to compare your linker command file to the map and see how your intended memory layout maps (pun intended) to the runtime memory layout. Fairly steep learning curve, but fundamental to embedded software design.

@RAc First and foremost, I would like to say that I am really grateful, for the fact that you willing to help me, and answer all those questions that I have asked from you, I really appreciate that, and I admire your knowledge for embedded systems. I will upload those files, as soon as possible, and I will try to read it in order to understand it properly.