Keeping your IoT boards alive even when they freeze (#dev #devlog #iot #micropython #raspberrypipico #pipico #rpipico)

Overview
You have a cool IoT project, let's say a system to automatically water your plants, or a custom weather station. In your micropython code, you added all the error handling and safeguards. Everything is perfect, so you place the board in an out-of-the-way place, and you're sure it will work flawlessly... until it doesn't. In this post I show a way of fixing that, and it is probably simpler than you would expect!
I decided to create this post because, while reading articles and tutorials, none of them had this information, and this can help any long-running project.
The problem: You create try catches, and add mechanisms to restart the board if something goes wrong, but when a sensor freezes, or an http request takes too long and the board hangs, then what we usually do is turn it off and on again manually.
However, there's a simple solution for that: Enter the WatchDog Timer!
The WDT (or WatchDog Timer) is a micropython native tool located in the machine module, and available for pyboard,
WiPy, esp8266, esp32, rp2040, and mimxrt.
When you initialize the WDT, you set a timeout of up to 8388 ms, and after that, you need to call the method feed()
every Nms (N equals the timeout you defined), or the board will restart by itself.
1import machine
2import time
3
4wdt = machine.WDT(timeout=8000)
5
6while True:
7 read_sensor_1()
8
9 wdt.feed()
10
11 send_data_from_sensor_1()
12
13 wdt.feed()
14
15 time.sleep(1)
In the code above, we initialize the WDT with a timeout of 8000ms, and then we call the feed() method between each
operation to prevent the board from restarting. If either read_sensor_1() or send_data_from_sensor_1() takes too
long (> 8000 ms), the board will restart automatically, regardless of the cause.
As you can see, it is pretty straightforward and easy to add to an existing IoT code, and it will restart the board, giving the system a chance to start again.
To add complimentary observability, in my projects I also include:
- Save errors to a file, when the board is not connected to the network.
- Upon boot, the board flushes the error log files, reporting everything to a dedicated API.
- Integrated the boards with my Open Telemetry system, so I can see what the boards are doing on a Grafana dashboard.
- As a last resort, so I can see what the board is doing, I added an inexpensive mini LCD that shows the current status of the board.
Won't go into details about that, because this is another subject entirely. :)
Hope it helps! :)
References: