Hi, I have developed multitasking application with FreeRTOS on SAM9x60 MCU. This application uses microchip harmony graphics library for GUI. This application hangs up randomly after some time of execution. I have 9 tasks in the application. These tasks are mainApptask, display driver task, graphics task, touch controller driver task, system input handling task, 2 task monitoring sensors in I2C channels and 2 other tasks monitoring external GPIOs. When I comment out few tasks, hang-up minimizes but still occurs. Can anyone guide me how to debug which task/s are causing this issue?
Thanks
Very often the reason for strange application behavior is corrupted (FreeRTOS) internal data. These are very often caused by stack overflows assuming the application code is not buggy and overwrites memory randomly.
Did you define configASSERT
and aready enabled stack checking for development ?
Please see FreeRTOS - stacks and stack overflow checking howto make use of stack checking
and maybe also FreeRTOS FAQ - links to all RTOS FAQ pages
Thankyou for your reply. I have enabled stack checking and I don’t see any of my tasks is overflowing with stack memory. I have now also included configASSERT. But I don’t see function vAssertCalled() getting called when it hangup. Rather it goes to ‘data_abort_irq_handler()’ in fault_handlers.c file. Is this related to any external hardware (like touch panel) or internal software interrupts?
I was wrong when I said vAsserCalled() function is not called when it hang-up. I could see some hang-up situations when this function is called. So It seems I have multiple reasons of hang-up in my application. Please suggest.
First suggestion is to look at the comments on the assert that fails to see if that gives the answer, and if not, let us know which configASSERT() is failing.
It seems compiler is optimizing the variables of configASSERT(). When i watch variables pcfilename and lineno. I can see their values as following:
→ (DW_OP_GNU_entry_value 1 DW_OP_reg1)
I am using MPLABX IDE and XC32 compiler. How to get real values of these variables?
Can you just disable optimization (for debugging) ?
When halting the target with the debugger or setting a breakpoint e.g. in your vAssertCalled
function if implemented, you can usually see the call stack / backtrace of the code finally asserting.
Declare the variables outside the function using the volatile qualifier to prevent them being optimized away.
Compiler optimization setting set to 0 - No optimization. All global variables shared between tasks are declared volatile. Now when application hang-up, it does not go to vAsserCalled() function. It either go to data_abort_irq_handler() or undefined_instruction_irq_handler() functions. In these functions there are only while(1) loop with no instructions in while loop.
I combined by functions related to display graphics, display driver, touch driver, graphics event handler and application main loop function in one task. I commented out other tasks. In this situation everything works good without hang-up but system in slightly sluggish in response after user touch. Then I added another task which has single function of reading sensor connected on I2C bus after every 100ms. In this case hang-up starts and when it hang-up it goes to either data_abort_irq_handler() or undefined_instruction_irq_handler().
Can you please given some guidelines which I should look in this other task which might be causing hang-up?
The program going into Data Abort and Undefined Instruction are often a sign of memory corruption. Sometimes tracing back to what code caused the Data Abort can give an idea of what is getting corrupted, so you can start to work backwards.
Also, checking what task was running can sometimes give you clues.
These are normally. caused by bad pointer usage, or overrunning the stack. These can be tough problems to find.
Thanks Richard…how to know which task was running when program jumped to Data Abort or Undefined Instruction? Do I have to enable any configuration in FreeRTOS to have this information?
I have already checked overrunning of stack is not the issue using uxTaskGetStackHighWaterMark(NULL); in each task. I don’t see any task overrunning their respective stack memory.
Bad pointer usage can be the issue. Is there any way I can get line of code causing this issue? I understand probably this is mostly a feature required in IDE I am using for development and debugging. Please suggest.
You just need to inspect the variable pxCurrentTCB, and look at the “Name” field (if you give your tasks unique names) to get the current task.
Finding where the bad usage occurs is not easy. Best option is to just write your program carefully so as to not allow it to happen in the first place.
Also, uxTaskGetStackHighWaterMark() only tells you the high water mark to THAT point, you need to wait until that task has done all the stuff it will do before calling to get the real usage.
I will inspect pxCurrentTCB variable and try to locate the task name. I did gave unique names to all tasks. Simultaneously, I am also removed all possible tasks which are for special functionalities like sensor readings, etc. Now I have only basic tasks for touch panel, display and main application engine. This led to reduction in frequency of hangup but it still exists.
I am using software timers in my tasks. For software timers, I am using timer library provided by Microchip with MPLABX IDE. When timer expires, I have callback functions which set/reset the flags. Should I use FreeRTOS timer APIs instead of MPLABX timer library?
Finally i could get the line of code at which hangup occurs. It is giving me below code from list.c file in FreeRTOS folder.
UBaseType_t uxListRemove( ListItem_t * const pxItemToRemove )
{
/* The list item knows which list it is in. Obtain the list from the list
item. */
List_t * const pxList = pxItemToRemove->pxContainer;
pxItemToRemove->pxNext->pxPrevious = pxItemToRemove->pxPrevious;
pxItemToRemove->pxPrevious->pxNext = pxItemToRemove->pxNext;
/* Only used during decision coverage testing. */
mtCOVERAGE_TEST_DELAY();
/* Make sure the index is left pointing to a valid item. */
if( pxList->pxIndex == pxItemToRemove )
{
pxList->pxIndex = pxItemToRemove->pxPrevious;
}
else
{
mtCOVERAGE_TEST_MARKER();
}
pxItemToRemove->pxContainer = NULL;
( pxList->uxNumberOfItems )--;
return pxList->uxNumberOfItems;
}
Hangup occurs at line : if( pxList->pxIndex == pxItemToRemove )
Can you please help to locate reason of this hangup here?
@ppatel : Typically this kind of problem is caused by tasks exceeding their allocated amount of stack or improper pointer usage.
Can you specify which compiler and compiler flags you are using?
You might try the following:
- Increase the stack size of tasks which may be overrunning.
- enable configUSE_LIST_DATA_INTEGRITY_CHECK_BYTES if not already enabled
- set configUSE_MINI_LIST_ITEM to 0 (this has been reported to cause problems with some compilers with optimization enabled)
I tried following:
- Task size increased from 1K to 2K bytes.
- Enabled configUSE_LIST_DATA_INTEGRITY_CHECK_BYTES. This was not enabled
I don’t find configUSE_MINI_LIST_ITEM used in the FreeRTOS code. This code is included by Microchip IDE configurator - Harmony 3 in my project. Is it ok not having configUSE_MINI_LIST_ITEM defined?
I am still getting hang-up at the same point.
As @richard-damon suggested, did you determine which task is running when you see the hang?
When you are stuck at this line if( pxList->pxIndex == pxItemToRemove )
as you said, can you examine the values of pxList, pxList->pxIndex and pxItemToRemove in the debugger and see if that gives any idea? What is the callstack at the time of hang?
This seems like a memory corruption and debugging those is usually a bit tricky. One way to debug it to try to declare a variable next to the one getting corrupted and place a data breakpoint on that. In most cases, you will catch the data corruption when it happens.
I inspected pxCurrentTCB & pxList, pxList->pxIndex and pxItemToRemove when it hang-up. Here is the image of watch list. It seems current task running is IDLE task. As far as I know we don’t have to create this IDLE task and it created by scheduler (not sure).
There are lot of invalid address for pxList, PxList->pxIndex and psItemToRemove.
Here is the call stack:
Please suggest.
Yes, it is created by the scheduler.
That is the reason of abort. We need to find the source of this corruption. Can you set configUSE_LIST_DATA_INTEGRITY_CHECK_BYTES
to 1 and see if that helps?