Watchdog troubles with FreeRTOS

apuech wrote on Tuesday, July 26, 2011:

I’ve been developping FreeRTOS apps for a couple of years now & I’m now encountering a very strange bug:

- Context : I’m developping a wireless sensor network app based on the Texas Instrument MSP430 F5435A mcu. Hardware design is fairly common (close to the Crossbow MICA boards for instance): radio transceiver, sensors, battery-operated, etc etc. Compil is done with TI’s Code Composer Studio.

- Bug : at infrequent intervals of time (lasting from 1 day up to… a couple of weeks!), my embedded app “freezes”, which means the MCU is in some unknown state. Peripherals are freezed in their last known state (e.g. I have observed that radios might be On or Off when the bug occurs). Serial connection is down. I’d like to give more details, but my debug capacities are fairly limited given the rare occurence of the bug.

Obviously my watchdog is turned on at init and is *never, ever* turned off. I have checked the Interrupt Vector Table which redirects the PC to the right address (in flash) at PUC resets. My app is designed to easily recover from resets (which are pretty common in WSN apps where environment variables change a lot - I volontarily trigger WDT password violation on some (infrequent) occasion to make the app more robust - e.g. clear-up the radio buffers in the case of “flooding attacks” by neighbour nodes).

I have also checked that this “freeze” is really a freeze (it’s *not* the app rebooting in loop, which would produce the same symptoms).

Finally : triggering a manual reset on the RESET pin is not sufficient to have the MCU restart correctly. I need to turn the device off (i.e. clear the RAM) before the MCU can restart normally.

- Why am I asking you : ==> Since most of my app runs in a FreeRTOS context, do you know of **bad things related to the FreeRTOS context** that could explain why the Watchdog does not reset the mote properly? Any interaction? There are many things *not to do* (that I’ve done at least once ;-), like calling an xQueueSend before the Task Scheduler is started, etc. etc. Might some of these “bad code line” prevent the WatchDog from doing its job?
==> what other test procedures would you suggest to narrow down the list of possible causes?

Thanks a lot,

edwards3 wrote on Tuesday, July 26, 2011:

I don’t think there is anything FreeRTOS, or other code, could do to give that symptom. Sometimes you can accidentally lock chips up by doing things like set a clock frequency that is too fast, or write protect or lock the flash, or reconfigure jtag pins. Stuff like that shows up right away though.

Those chips have gone through some revisions. Have you looked out the errata?

apuech wrote on Wednesday, July 27, 2011:

Sorry for the delay, but looking into the errata took some time! Besides you’re right, errata is @ revision “K” which means the device may still be ‘dodgy’.

I have found no direct cause in that errata. The single (direct) mention to WDT deals with a problem with the UCS (clocks), that can disable the WDT in some operating modes. But I’m not in those UCS modes, hence I don’t think the problem comes from there.

Errata mentions 2 interesting leads:
CPU29 : Using a certain instruction sequence to enter low-power mode(s) affects the instruction
width of the first instruction in an NMI ISR
CPU34: CPU may be **halted **if a conditional jump is followed by a rotate PC instruction
However, these errata deal with the CPU unit & I don’t see why WDT would be affected.

As for what you mention:
- _set a clock frequency that is too fast: _most probably not the case - I’m running the MCU @16MHz with a 5V. DC regulated power supply, which is very conservative (these devices can run up to 25MHz with lower voltages).
- write protect or lock the flash: What do you mean? MSP series cannot lock the flash; there are only few attempts to write into the flash INFO segments… but still, the WDT should work its magic if the process was stuck in a loop.
- reconfigure JTAG pins… what do you mean exactly? It’s true that I have unconnected JTAG pins, but the MSP has a pull-down resistor on the TEST JTAG pin which prevents other pins from triggering unwanted actions. And I don’t see any way, from the software side, to mess up with the JTAG config (perhaps I’m wrong?).

Any clue? From your experience, can you list more causes for a WDT not operating properly?
Thanks in advance!