apuech wrote on Tuesday, July 26, 2011:
I’ve been developping FreeRTOS apps for a couple of years now & I’m now encountering a very strange bug:
- Context : I’m developping a wireless sensor network app based on the Texas Instrument MSP430 F5435A mcu. Hardware design is fairly common (close to the Crossbow MICA boards for instance): radio transceiver, sensors, battery-operated, etc etc. Compil is done with TI’s Code Composer Studio.
- Bug : at infrequent intervals of time (lasting from 1 day up to… a couple of weeks!), my embedded app “freezes”, which means the MCU is in some unknown state. Peripherals are freezed in their last known state (e.g. I have observed that radios might be On or Off when the bug occurs). Serial connection is down. I’d like to give more details, but my debug capacities are fairly limited given the rare occurence of the bug.
Obviously my watchdog is turned on at init and is *never, ever* turned off. I have checked the Interrupt Vector Table which redirects the PC to the right address (in flash) at PUC resets. My app is designed to easily recover from resets (which are pretty common in WSN apps where environment variables change a lot - I volontarily trigger WDT password violation on some (infrequent) occasion to make the app more robust - e.g. clear-up the radio buffers in the case of “flooding attacks” by neighbour nodes).
I have also checked that this “freeze” is really a freeze (it’s *not* the app rebooting in loop, which would produce the same symptoms).
Finally : triggering a manual reset on the RESET pin is not sufficient to have the MCU restart correctly. I need to turn the device off (i.e. clear the RAM) before the MCU can restart normally.
- Why am I asking you : ==> Since most of my app runs in a FreeRTOS context, do you know of **bad things related to the FreeRTOS context** that could explain why the Watchdog does not reset the mote properly? Any interaction? There are many things *not to do* (that I’ve done at least once ;-), like calling an xQueueSend before the Task Scheduler is started, etc. etc. Might some of these “bad code line” prevent the WatchDog from doing its job?
==> what other test procedures would you suggest to narrow down the list of possible causes?
Thanks a lot,
AP