ESP32 deep sleep wakeup causes WDT

We are having a problem with the ESP32 waking from deep sleep. We have multiple systems that have been reliably going into deepsleep, performing several I2C sensor readings, and then waking the main cpu to report the readings over wifi. These systems have had reliable up-times of weeks to months. This is not the same problem as TG0WDT_SYS_RESET upon waking up from deepsleep if FreeRTOS unicore is enabled (IDFGH-4116) · Issue #5983 · espressif/esp-idf · GitHub, because the reset reason is WDT rather than TG0WDT.

We have recently been testing the systems in warmer conditions, and have discovered occasional gaps in our sensor readings. These gaps are always associated with a WDT reset happening at the time of the regularly scheduled wakeup. This shows the ULP is running its code loop and tries to wake the xtensa at the correct time. But as the xtensa wakes, it sees a WDT instead.
Here are two log excerpts to show the event.

  • A normal sleep then wakeup:

[code]I (36475) ff_boot: power killed
I (36485) ff_ulp: Setting DOWNLOADBOOT (gpio0) as EXT1 wakeup source
I (36485) ff_ulp: rtc i2c2 gpio to inputs
W (36495) ff_ulp: GPIO13 to ULP control
I (36495) ff_ulp: prepping deep sleep. Mode 11, sleeptime 5m, min ULP cycles: 12
I (36505) ff_boot: Entering deep sleep. ULP sensor period 300s, nominally 12 sweeps

I (19) boot: ESP-IDF v3.3.2-241-g1d2d93acd 2nd stage bootloader
I (19) boot: compile time 00:32:48
D (19) boot: Enabling RTCWDT(90000 ms)
I (20) boot: Enabling RNG early entropy source…
D (24) boot: magic e9
D (26) boot: segments 04[/code]

  • A watchdog event:

[code]I (36675) ff_boot: power killed
I (36675) ff_ulp: Setting DOWNLOADBOOT (gpio0) as EXT1 wakeup source
I (36675) ff_ulp: rtc i2c2 gpio to inputs
W (36685) ff_ulp: GPIO13 to ULP control
I (36685) ff_ulp: prepping deep sleep. Mode 11, sleeptime 5m, min ULP cycles: 12
I (36695) ff_boot: Entering deep sleep. ULP sensor period 300s, nominally 12 sweeps

W (14) boot: PRO CPU has been reset by WDT.
W (14) boot: WDT reset info: PRO CPU PC=0x2721c99f
D (14) boot: WDT reset info: PRO CPU STATUS 0x00000000
D (16) boot: WDT reset info: PRO CPU PID 0x00000002
D (22) boot: WDT reset info: PRO CPU PDEBUGINST 0x0f303000
D (27) boot: WDT reset info: PRO CPU PDEBUGSTATUS 0x00000008
D (33) boot: WDT reset info: PRO CPU PDEBUGDATA 0x7f6940aa
D (38) boot: WDT reset info: PRO CPU PDEBUGPC 0x2721c99f
D (44) boot: WDT reset info: PRO CPU PDEBUGLS0STAT 0x00001024
D (49) boot: WDT reset info: PRO CPU PDEBUGLS0ADDR 0x235d859c
D (55) boot: WDT reset info: PRO CPU PDEBUGLS0DATA 0x2467c04d
W (60) boot: WDT reset info: APP CPU PC=0x7677c9cf
D (65) boot: WDT reset info: APP CPU STATUS 0x00000000
D (70) boot: WDT reset info: APP CPU PID 0x00000003
D (75) boot: WDT reset info: APP CPU PDEBUGINST 0x0e101002
D (81) boot: WDT reset info: APP CPU PDEBUGSTATUS 0x00000026
D (86) boot: WDT reset info: APP CPU PDEBUGDATA 0xaacb95f0
D (92) boot: WDT reset info: APP CPU PDEBUGPC 0x7677c9cf
D (97) boot: WDT reset info: APP CPU PDEBUGLS0STAT 0x00b0000a
D (103) boot: WDT reset info: APP CPU PDEBUGLS0ADDR 0xc1d8b65b
D (108) boot: WDT reset info: APP CPU PDEBUGLS0DATA 0xbe30baca
I (114) boot: ESP-IDF v3.3.2-241-g1d2d93acd 2nd stage bootloader[/code]

The WDT resets are also associated with off-module (but on PCB) sensor readings that are over 40C - we haven’t seen this because the readings are lost. When we graph our temperature readings, the trends show that above 40C the readings stop, and below 40C they restart.

40C is well within the operating temperature of both our sensor chip and the ESP32 module. To prove this, I ran a system without sleeping until it was reporting temperatures of nearly 50C, then let it sleep. It was fully operational, and there were no error messages at the time deepsleep was triggered. When I force the xtensa to wake with an external trigger, the same WDT reset happened.

The WDT reset information above doesn’t tell me very much, perhaps someone else will understand it better. All I note is that the PRO CPU PC address is not even in an address range that the ESP32 normally runs code from. The ESP32 datasheet says all accesses below 0x4000:0000 are treated as data access, although I’ve also seen information in the technical manual suggesting some code can run in the 3F00:0000 range.

This behaviour occurs even with these two settings in sdkconfig:
CONFIG_ESP32_DEEP_SLEEP_WAKEUP_DELAY=5000
CONFIG_BOOTLOADER_WDT_ENABLE=n

Is there a hardware or software bug that could be causing this issue? One suspicion was that the external flash is getting hotter, and isn’t able to be accessed even after the 5000us delay.

Workaround?

Is this a FreeRTOS question? Or something best to ask on the Espressif forum?

As an end user, I don’t have the detailed knowledge of where the separation of FreeRTOS and the vendor specific tools are. Since many hardware level questions are asked and answered on this forum it is worthwhile to ask.

I also asked the same question here: ULP wake failing with WDT reset - ESP32 Forum

Also, a WDT trigger, even though I can’t find the cause does seem to put this in the FreeRTOS realm.

Ian

There is a Discord server, “Espressif MCUs,” that may also be useful.

FWIW, I see similar behavior on an ESP32-S2 using the deep sleep timer. The MCU sleeps for five minutes, wakes up and checks some sensors then goes back to sleep. About every three or four days, the reset reason is WDT. I am still gathering data, so I haven’t posted anything yet.

I have since found my own workaround for this unpopular problem.

After reset, I check the return of esp_reset_reason(), and if it is ESP_RST_WDT, I do a second check to see if the RTC memory used by the ULP is coherent. If so, I assume it was a normal wake, and not a watchdog reset.

Interestingly, the ULP is still running when this happens - although the reset reason is a watchdog reset, it has not been halted as one might expect.

With the ULP as the only wake source, this comes with a risk of bricking the system if it has been reset. Further work remains to ensure that the ULP is really running. I’d be very interested in hearing from anybody who knows how to check if the ULP is currently alive, even if it is in the halted state.

I think I have found a way to verify that the ULP is actually going to wake up and run its program. The ULP Coprocessor programming API Guide has this to say near the end of the ULP program flow section:

The program runs until it encounters a halt instruction or an illegal instruction. Once the program halts, ULP coprocessor powers down, and the timer is started again.

To disable the timer (effectively preventing the ULP program from running again), clear the RTC_CNTL_ULP_CP_SLP_TIMER_EN bit in the RTC_CNTL_STATE0_REG register.

Presumably the opposite applies too: if that bit is set then the timer is running and the ULP will restart when it times out.

A full workaround runtime test for detecting this spurious watchdog reset is:

  1. esp_reset_reason() returns ESP_RST_TASK_WDT
  2. RTC memory contains values that make sense for a previously running ULP
  3. RTC_CNTL_ULP_CP_SLP_TIMER_EN bit is set

Given these 3 conditions, I can determine that the watchdog reset was caused by some problem during the wake triggered by the ULP and safely continue processing the ULP gathered data.