Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Failed to load latest commit information.
MCU (ESP32) Lockup Detection Using Internal WatchDog
If your program gets stuck for some reason, the fastest way to recover is to restart your computer. One of the most common examples was when the Windows operating system would crash (experience a fatal error) and display the Blue Screen of Death (BSoD), after which the machine would reboot to avoid further damage. Unlike desktop computers, embedded devices must be self-reliant. What if you are unable to restart the computer physically? If the program hangs, it’s not always possible to wait for anyone to reset them. Some embedded designs, such as space probes, are simply inaccessible to humans, and a manual reset is not feasible; these could become permanently disabled if they were unable to autonomously recover from faults. In such situations, a human operator’s capacity to reset the device will be inadequate to satisfy the product’s uptime requirements. As firmware engineers, it is our responsibility to ensure that our devices can always recover from a stuck task or infinite loop. A watchdog is the only way usually employed in cases like these , . Watchdog is typically used in embedded systems and other computer-controlled equipment where humans cannot easily access the equipment or would be unable to respond to faults promptly .
II. What is Watchdog Timer?
A watchdog timer (WDT) is a hardware timer that can be used to automatically detect software anomalies and automatically generates a system reset if the main program neglects to periodically service it. It is often used to automatically reset an embedded device that hangs because of a software or hardware fault . Generally speaking, a watchdog timer is based on a counter that counts down from some initial value to zero. The embedded software selects the counter’s initial value and periodically restarts it. If the counter ever reaches zero before the software restarts it, the software is presumed to be malfunctioning and the processor’s reset signal is asserted . The process of restarting the watchdog timer’s counter is sometimes called “kicking the dog.” The appropriate visual metaphor is that of a man being attacked by a vicious dog. If he keeps kicking the dog, it can’t ever bite him. But he must keep kicking the dog at regular intervals to avoid a bite. Similarly, the software must restart the watchdog timer at a regular rate, or risk being restarted. Another term for “kicking” is “feeding” the watchdog timer. The watchdog timer is continuously fed with the initial value so that the timer value can never reach zero.
|Fig. 1 A typical Watchdog setup|
Fig. 1 shows a typical arrangement of the watchdog timer as an external block to the processor. However, it could also be included within the same chip as the CPU. This is done in many microcontrollers. In either case, the output from the watchdog timer is tied directly to the processor’s reset signal.
In some engineer’s opinion that, if an MCU could fail, the same could happen to its internal WDT. Thus, the external WDT was their preferred option. Using an internal WDT requires one less component and is, therefore, cheaper . On the other hand, there’s still a valid argument to be made for using an external WDT, because an internal WDT can be turned off accidentally by runaway code. Ultimately, the choice of internal versus external WDT relies on the nature of the application and the production budget for the hardware.
III. What are the potential causes of the hang event?
A Microcontroller Unit (MCU) that causes failure can cause a complete standstill in operations. This can inconvenience users or pose functional safety risks in critical applications. A safety mechanism is extremely important. A reliable embedded system requires a combined effort from the hardware designer and the firmware programmer. There are some hardware and software-related factors why an MCU might have a failure mechanism .
- Electrical noise can cause a power glitch, it may corrupt the program counter.
- Unstable power source.
- Memory stack overflow.
- A hardware component is not responding to request.
- Electrical interference (back-EMF, EMI, electrostatic discharge, cosmic ray, etc.).
- Dead-lock condition.
- A lot of interrupts (e.g. interrupt service routine, timer interrupts, etc.).
As one of the example demonstrated by Listing 2, Deadlock is a phenomenon when two tasks are in a blocked state waiting for the resources held by one and another simultaneously. When using a real-time operating system (FreeRTOS), such a deadlock can only be avoided by carefully programming real-time applications. Unlike the priority assignment protocol, FreeRTOS does not have a solution to the issue of deadlock. It can only be solved when designing real-time embedded systems. We must design tasks in such a way that a deadlock does not occur.
IV. Structure of Watchdog Timer
Kicking the dog at a regular interval proves that the software is running. It is often a good idea to kick the dog only if the system passes some sanity check, as shown in Fig. 2: The main program typically has a loop that it constantly goes through performing various functions. The watchdog timer is loaded with an initial value greater than the worst-case time delay through the main program loop. Each time it goes through the main loop the code resets the watchdog timer (“kicking” or “feeding” the dog). If a fault occurs and the main program does not get back to reset the timer before it counts down to zero, an interrupt is generated to reset the processor. Used in this way, the watchdog timer can detect a fault on an unattended Arduino program and attempt corrective action with a reset . The best example of this structure is shown in Listing 1.
|Fig. 2 A simple sanity checking|
In some cases, there are some buffers allocated or the status of some component may be checked before deciding to kick the dog. Good design of such checks will increase the family of errors that the watchdog will detect. One approach is to clear some flags before each loop is started, as shown in Fig. 3.
|Fig. 3 A sanity checking with multiple flags in a single loop|
Each flag is set at a certain point in the loop. At the bottom of the loop, the dog is kicked, but first, the flags are checked to see that all of the important points in the loop have been visited. The structure is shown in Fig.3 can be demonstrated in Listing 3.
For a structure requiring multitasking as shown in Fig. 4, particularly a system running on Real-Time Operating System (RTOS). This scheme uses a task dedicated to the watchdog as demonstrated in Listing 4. This task wakes up at a regular interval and checks the sanity of all other tasks in the system. If all tasks pass the test, the watchdog is kicked. The watchdog monitor task runs at a higher priority than the tasks it is monitoring.
|Fig. 4 A sanity checking with multiple flags in a multiple tasks|
The watchdog timeout can be chosen to be the maximum time during which all regular tasks have had a chance to run from their start point through one full loop back to their start point again. Each task has a flag which can have two values, TRUE and FLASE. The flag is later read and written by the monitor. The monitor’s job is to wake up before the watchdog timeout expires and check the status of each flag. If all flags contain the value TRUE, every task got its turn to execute and the watchdog may be kicked. Some tasks may have executed several loops and set their flag to TRUE several times, which is acceptable. After kicking the watchdog, the monitor sets all of the flags to FLASE. By the time the monitor task executes again, all of the FLASE flags should have been overwritten with TRUE.
Since the WDT is the very last line of defense, its design must anticipate any failure mode. One may ask, “What are the characteristics of a great watchdog?”
- The WDT must be independent of the CPU
- The WDT must always, under any condition barring perhaps a hardware failure, bring the system back to life.
- Some WDTs issue a non-maskable interrupt (NMI) instead of a reset.
How to enable hardware WDT on ESP32 using Arduino IDE
Watchdog timer (WDT) is an important feature for hardware devices like ESP32 or Arduino that need to auto recover from all kind of unexpected failures. On a previous post I wrote about how WDT works and how to set WDT on a Raspberry Pi, to make sure it stays up and running 24/7.
If we look at the esp-idf documentation about ESP32 WDT we can see that ESP32 comes with an interrupt watchdog and a task watchdog timer api. What many people are looking for is the task watchdog timer and I will show you in this article how to implement it using Arduino IDE.
Implementing ESP32 hardware watchdog timer using Arduino IDE
On ESP32, many people implement manually some sort of watchdog timer using flags and loops. This is NOT how it should be done, especially since ESP32 comes with a hardware watchdog timer.
This code shows you how to properly implement WDT on ESP32 using Arduino IDE:
Now let’s look at the code. First of all you need to include the esp_task_wdt.h header. This should be available if you have properly installed arduino-esp32.
In the setup block we need to call two functions: esp_task_wdt_init(uint32_t timeoutSeconds, bool panic) and esp_task_wdt_add(TaskHandle_t handle) .
- esp_task_wdt_init is used to initialise WDT with a timeout of timeoutSeconds and with a panic mode set. If panic is set to true, when WDT times out, it will throw a hardware panic and reboot.
- esp_task_wdt_add is used to add a task to WDT. If handle is NULL, the current task is used.
Now, the watchdog timer need to be reset BEFORE it times out! This is done with esp_task_wdt_reset() executed in the current task.
You can see that in this example I am initialising the WDT with a timeout of 3 seconds and then inn the main loop I am resetting it every 2 seconds. After five resets, I stop the reset call and let it timeout. As expected, it reboots my ESP32 after 3 seconds.
I hope this example helps you keep your ESP32 connected to WiFi and internet all the time and prevent it from being stuck.
The ESP-IDF has support for multiple types of watchdogs, with the two main ones being: The Interrupt Watchdog Timer and the Task Watchdog Timer (TWDT). The Interrupt Watchdog Timer and the TWDT can both be enabled using Project Configuration Menu , however the TWDT can also be enabled during runtime. The Interrupt Watchdog is responsible for detecting instances where FreeRTOS task switching is blocked for a prolonged period of time. The TWDT is responsible for detecting instances of tasks running without yielding for a prolonged period.
ESP-IDF has support for the following types of watchdog timers:
Interrupt Watchdog Timer (IWDT)
Task Watchdog Timer (TWDT)
Crystal 32K Watchdog Timer (XTWDT)
The various watchdog timers can be enabled using the Project Configuration Menu . However, the TWDT can also be enabled during runtime.
Interrupt Watchdog Timer (IWDT)пѓЃ
The purpose of the IWDT is to ensure that interrupt service routines (ISRs) are not blocked from running for a prolonged period of time (i.e., the IWDT timeout period). Blocking ISRs from running in a timely manner is undesirable as it can increases ISR latency, and also prevents task switching (as task switching is executed form an ISR). The things that can block ISRs from running include:
Critical Sections (also disables interrupts)
Other same/higher priority ISRs (will block same/lower priority ISRs from running it completes execution)
The IWDT utilizes the watchdog timer in
When the IWDT times out, the default action is to invoke the panic handler and display the panic reason as Interrupt wdt timeout on CPU0 or Interrupt wdt timeout on CPU1 (as applicable). Depending on the panic handlerвЂ™s configured behavior (see CONFIG_ESP_SYSTEM_PANIC ), users can then debug the source of the IWDT timeout (via the backtrace, OpenOCD, gdbstub etc) or simply reset the chip (which may be preferred in a production environment).
If for whatever reason the panic handler is unable to run after an IWDT timeout, the IWDT has a secondary timeout that will hard-reset the chip (i.e., a system reset).
The IWDT is enabled by default via the CONFIG_ESP_INT_WDT option.
The IWDTвЂ™s timeout is configured by setting the CONFIG_ESP_INT_WDT_TIMEOUT_MS option.
Note that the default timeout is higher if PSRAM support is enabled, as a critical section or interrupt routine that accesses a large amount of PSRAM will take longer to complete in some circumstances.
The timeout should always at least twice longer than the period between FreeRTOS ticks (see CONFIG_FREERTOS_HZ ).
If you find the IWDT timeout is triggered because an interrupt or critical section is running longer than the timeout period, consider rewriting the code:
Critical sections should be made as short as possible. Any non-critical code/computation should be placed outside the critical section.
Interrupt handlers should also perform the minimum possible amount of computation. Users can consider deferring any computation to a task by having the ISR push data to a task using queues.
Neither critical sections or interrupt handlers should ever block waiting for another event to occur. If changing the code to reduce the processing time is not possible or desirable, itвЂ™s possible to increase the CONFIG_ESP_INT_WDT_TIMEOUT_MS setting instead.
Task Watchdog Timer (TWDT)пѓЃ
The Task Watchdog Timer (TWDT) is used to monitor particular tasks, ensuring that they are able to execute within a given timeout period. The TWDT primarily watches the Idle task, however any task can subscribe to be watched by the TWDT. By watching the Idle task, the TWDT can detect instances of tasks running for a prolonged period of time wihtout yielding. This can be an indicator of poorly written code that spinloops on a peripheral, or a task that is stuck in an infinite loop.
The TWDT is built around the Hardware Watchdog Timer in Timer Group 0. When a timeout occurs, an interrupt is triggered. Users can define the function esp_task_wdt_isr_user_handler in the user code, in order to receive the timeout event and extend the default behavior.
The following functions can be used to watch tasks using the TWDT:
esp_task_wdt_init() to initialize the TWDT and subscribe the idle tasks.
esp_task_wdt_add() subscribes other tasks to the TWDT.
Once subscribed, esp_task_wdt_reset() should be called from the task to feed the TWDT.
esp_task_wdt_delete() unsubscribes a previously subscribed task
esp_task_wdt_deinit() unsubscribes the idle tasks and deinitializes the TWDT
In the case where applications need to watch at a more granular level (i.e., ensure that a particular functions/stub/code-path is called), the TWDT allows subscription of вЂњusersвЂќ.
esp_task_wdt_add_user() to subscribe an arbitrary user of the TWDT. This function will return a user handle to the added user.
esp_task_wdt_reset_user() must be called using the user handle in order to prevent a TWDT timeout.
esp_task_wdt_delete_user() unsubscribes an arbitrary user of the TWDT.
The default timeout period for the TWDT is set using config item CONFIG_ESP_TASK_WDT_TIMEOUT_S . This should be set to at least as long as you expect any single task will need to monopolize the CPU (for example, if you expect the app will do a long intensive calculation and should not yield to other tasks). It is also possible to change this timeout at runtime by calling esp_task_wdt_init() .
Erasing large flash areas can be time consuming and can cause a task to run continuously, thus triggering a TWDT timeout. The following two methods can be used to avoid this:
Increase CONFIG_ESP_TASK_WDT_TIMEOUT_S in menuconfig for a larger watchdog timeout period.
You can also call esp_task_wdt_init() to increase the watchdog timeout period before erasing a large flash area.
For more information, you can refer to SPI Flash .
The following config options control TWDT configuration. They are all enabled by default:
CONFIG_ESP_TASK_WDT_EN — enables TWDT feature. If this option is disabled, TWDT cannot be used, even if initialized at runtime.
CONFIG_ESP_TASK_WDT_INIT — the TWDT is initialized automatically during startup. If this option is disabled, it is still possible to initialize the Task WDT at runtime by calling esp_task_wdt_init() .
CONFIG_ESP_TASK_WDT_CHECK_IDLE_TASK_CPU0 — Idle task is subscribed to the TWDT during startup. If this option is disabled, it is still possible to subscribe the idle task by calling esp_task_wdt_init() again.
XTAL32K Watchdog Timer (XTWDT)пѓЃ
One of the optional clock inputs to the ESP32-C3 is an external 32 KHz crystal or oscillator (XTAL32K) that is used as a clock source ( XTAL32K_CLK ) to various subsystems (such as the RTC).
The XTWDT is a dedicated watchdog timer used to ensure that the XTAL32K is functioning correctly. When XTAL32K_CLK works as the clock source of RTC_SLOW_CLK and stops oscillating, the XTWDT will detect this and generate an interrupt. It also provides functionality for automatically switching over to the internal, but less accurate oscillator as the RTC_SLOW_CLK source.
Since the switch to the backup clock is done in hardware it can also happen during deep sleep. This means that even if XTAL32K_CLK stops functioning while the chip in deep sleep, waiting for a timer to expire, it will still be able to wake-up as planned.
If the XTAL32K_CLK starts functioning normally again, you can call esp_xt_wdt_restore_clk to switch back to this clock source and re-enable the watchdog timer.
When the external 32KHz crystal or oscillator is selected ( CONFIG_RTC_CLK_SRC ) the XTWDT can be enabled via the CONFIG_ESP_XT_WDT configuration option.
The timeout is configured by setting the CONFIG_ESP_XT_WDT_TIMEOUT option.
The automatic backup clock functionality is enabled via the ref: CONFIG_ESP_XT_WDT_BACKUP_CLK_ENABLE configuration option.
JTAG & WatchdogsпѓЃ
While debugging using OpenOCD, the CPUs will be halted every time a breakpoint is reached. However if the watchdog timers continue to run when a breakpoint is encountered, they will eventually trigger a reset making it very difficult to debug code. Therefore OpenOCD will disable the hardware timers of both the interrupt and task watchdogs at every breakpoint. Moreover, OpenOCD will not reenable them upon leaving the breakpoint. This means that interrupt watchdog and task watchdog functionality will essentially be disabled. No warnings or panics from either watchdogs will be generated when the ESP32-C3 is connected to OpenOCD via JTAG.
A full example using the Task Watchdog is available in esp-idf: system/task_watchdog
Initialize the Task Watchdog Timer (TWDT)
This function configures and initializes the TWDT. This function will subscribe the idle tasks if configured to do so. For other tasks, users can subscribe them using esp_task_wdt_add() or esp_task_wdt_add_user(). This function wonвЂ™t start the timer if no task have been registered yet.
esp_task_wdt_init() must only be called after the scheduler is started. Moreover, it must not be called by multiple tasks simultaneously.
config вЂ“ [in] Configuration structure
ESP_OK: Initialization was successful
ESP_ERR_INVALID_STATE: Already initialized
Other: Failed to initialize TWDT
Reconfigure the Task Watchdog Timer (TWDT)
The function reconfigures the running TWDT. It must already be initialized when this function is called.
esp_task_wdt_reconfigure() must not be called by multiple tasks simultaneously.
config вЂ“ [in] Configuration structure
ESP_OK: Reconfiguring was successful
ESP_ERR_INVALID_STATE: TWDT not initialized yet
Other: Failed to initialize TWDT
Deinitialize the Task Watchdog Timer (TWDT)
This function will deinitialize the TWDT, and unsubscribe any idle tasks. Calling this function whilst other tasks are still subscribed to the TWDT, or when the TWDT is already deinitialized, will result in an error code being returned.
esp_task_wdt_deinit() must not be called by multiple tasks simultaneously.
ESP_OK: TWDT successfully deinitialized
Other: Failed to deinitialize TWDT
Subscribe a task to the Task Watchdog Timer (TWDT)
This function subscribes a task to the TWDT. Each subscribed task must periodically call esp_task_wdt_reset() to prevent the TWDT from elapsing its timeout period. Failure to do so will result in a TWDT timeout.
task_handle вЂ“ Handle of the task. Input NULL to subscribe the current running task to the TWDT
ESP_OK: Successfully subscribed the task to the TWDT
Other: Failed to subscribe task
esp_err_t esp_task_wdt_add_user ( const char * user_name , esp_task_wdt_user_handle_t * user_handle_ret ) пѓЃ
Subscribe a user to the Task Watchdog Timer (TWDT)
This function subscribes a user to the TWDT. A user of the TWDT is usually a function that needs to run periodically. Each subscribed user must periodically call esp_task_wdt_reset_user() to prevent the TWDT from elapsing its timeout period. Failure to do so will result in a TWDT timeout.
user_name вЂ“ [in] String to identify the user
user_handle_ret вЂ“ [out] Handle of the user
ESP_OK: Successfully subscribed the user to the TWDT
Other: Failed to subscribe user
Reset the Task Watchdog Timer (TWDT) on behalf of the currently running task.
This function will reset the TWDT on behalf of the currently running task. Each subscribed task must periodically call this function to prevent the TWDT from timing out. If one or more subscribed tasks fail to reset the TWDT on their own behalf, a TWDT timeout will occur.
ESP_OK: Successfully reset the TWDT on behalf of the currently running task
Other: Failed to reset
Reset the Task Watchdog Timer (TWDT) on behalf of a user.
This function will reset the TWDT on behalf of a user. Each subscribed user must periodically call this function to prevent the TWDT from timing out. If one or more subscribed users fail to reset the TWDT on their own behalf, a TWDT timeout will occur.
user_handle вЂ“ [in] User handle
ESP_OK: Successfully reset the TWDT on behalf of the user
Other: Failed to reset
Unsubscribes a task from the Task Watchdog Timer (TWDT)
This function will unsubscribe a task from the TWDT. After being unsubscribed, the task should no longer call esp_task_wdt_reset().
task_handle вЂ“ [in] Handle of the task. Input NULL to unsubscribe the current running task.
ESP_OK: Successfully unsubscribed the task from the TWDT
Other: Failed to unsubscribe task
Unsubscribes a user from the Task Watchdog Timer (TWDT)
This function will unsubscribe a user from the TWDT. After being unsubscribed, the user should no longer call esp_task_wdt_reset_user().
user_handle вЂ“ [in] User handle
ESP_OK: Successfully unsubscribed the user from the TWDT
Other: Failed to unsubscribe user
Query whether a task is subscribed to the Task Watchdog Timer (TWDT)
This function will query whether a task is currently subscribed to the TWDT, or whether the TWDT is initialized.
task_handle вЂ“ [in] Handle of the task. Input NULL to query the current running task.
ESP_OK: The task is currently subscribed to the TWDT
ESP_ERR_NOT_FOUND: The task is not subscribed
ESP_ERR_INVALID_STATE: TWDT was never initialized
void esp_task_wdt_isr_user_handler ( void ) пѓЃ
User ISR callback placeholder.
This function is called by task_wdt_isr function (ISR for when TWDT times out). It can be defined in user code to handle TWDT events.
It has the same limitations as the interrupt function. Do not use ESP_LOGx functions inside.
Task Watchdog Timer (TWDT) configuration structure.
TWDT timeout duration in milliseconds
Mask of the cores whoвЂ™s idle task should be subscribed on initialization
Trigger panic when timeout occurs
Task Watchdog Timer (TWDT) user handle.
© Copyright 2016 — 2022, Espressif Systems (Shanghai) Co., Ltd.