I’m getting a very weird behaviour during debugging where an enum value in memory is being overwritten seemingly by the kernel. I’m using pic32mz and have narrowed the issue down to when xTaskCreate is called for the task that holds the enum in question. Interestingly/annoyingly if I put a breakpoint at the xTaskCreate line the address doesn’t get written to however if the initialisation runs until the start of the task it does get overwritten. I’ve check the high watermark is nowhere near (~0x1C2) so I don’t think stack size is an issue.
What has seemed to work is changing configStackDepthType from uint16_t (default) to uint32_t. I couldn’t find a very detailed explanation on what effect this has and chose uint32_t because I’m using a 32bit micro. It seems to have worked so far which is great but I’d really like to understand why and if it’s just a workaround to another mistake I have made.
I think I was mistaken in assuming it had to do with stack depth type. I tested it a few more times and it seems not to have changed anything. What I realised I also changed is the stack size for that particular task. I changed it from 4096 to 2048 and that seems to have fixed the problem. My understanding of configuring memory in freertos is average at best so maybe someone could explain why this could have caused an issue. It would make sense that not having enough memory would cause issues but having too much doesn’t.
The important word here is “seems.” Frequently the chain of events that leads to something working or not working against all odds can be very long. Possibly your reducing the stack size has led to a modified memory layout in which a block of memory that had been overtrampled before is now coincidentally being untouched. Iow, your problem is still there but it currently doesn’t have a chance to manifest itself.
Hate to tell you, but until you know for sure what really caused your problem in the first place, it can (and according to Murphy will, but in the worst of scenarios) come back at any time. What it is exactly - noone can tell. We all spent long long hours tracing these things in utter despair. The grunt work is something you will need to do yourself.
Thanks for the ideas. I’m more than happy to keep digging at it because I do need to figure out the cause. I’m just hoping someone might have some tips for diagnosing bad memory settings with FreeRTOS like choosing a suitable heap vs stack size. If it is indeed that reducing stack size prevented trampling of other data surely that is something the compiler should handle or I could configure in config somehow?
A possibility is that the stack of your task overflowed which corrupted the data. You can use stack overflow checking to confirm that - FreeRTOS - stacks and stack overflow checking
Another way to catch the corruption when it happens is to put a data breakpoint on the value that is getting corrupted.
The compiler wn’t be able to handle that, you would need erun time support. Several approaches such as using a built in MPU (if available) have been discussed here extensively (but they don’t catch all cases and have adverse side effects), but aside from the techniques that Gaurav sketched, it’s still a lot of digging manually on your side.