After being home for weeks, I went away for business, the 1st night away there was a brief powercut and the firewall (on a UPS) seemed to get stuck.
So, that’s no DNS, DHCP, or connectivity between wifi and LAN… All due to (admittedly aging) hardware issue.
Since then my entire home system has had issues whilst it all settles down.
It made me think about getting some redundancy into the system to handle a single failure.
So,.can you give me any insights into High Availability like CARP (for pfSense), VM failover (on Incus?), mesh wifi, Home Assistant, etc?
Of course there are going to be single points, like ISP line, etc, but seems like something to test out.
Both dhcpd and bind support failover.
If you want to have failover storage you might want to look into beegfs, as storage targets can be mirrored across hosts.
Source: Using all of the above at work. I’ve had motherboards die on me without causing downtime.
Not heard of BeeGFS, had a quick look on the Arch wiki… looks quite involved…
But, ok, at least I know that the DHCP part can be dealt with - thanks.
Get a UPS.
Read the first sentence at least, c’mon.
thanks for this . . . .I keep worrying about security, hardening the system … etc, but forget about the essentials: power and networking. #1 priority for me is to get a UPS this year, once I find a job, that is
There’s a lot of layers here, so let me work backwards from the edge, inward:
-
You lost power, so you probably lost internet if your endpoint hardware was not also on a UPS. Nothing is going to stop that unless you get a multi-WAN router, and an LTE backup on standby. Probably not worth the cost.
-
You shouldn’t have lost DNS or DHCP for your local network just because of a reboot. Something is wrong with your setup, and we’d need more info about said setup to say more, but generally these services are stateful for the most part, and shouldn’t lose state on reboot IF you have them configured properly for your local domains, like a DNS forwarded, and static reservations on DHCP for local devices.
-
You don’t need HA for all your services. You need to fix the issues with your services not running properly with interruptions. The specific services you mentioned don’t behave poorly of they die and come back in properly configured environments.
-
If you have a UPS in your home, all devices connected to UPS should be getting information about the status of said UPS and shutdown cleanly when thresholds are met. Install NUT somewhere, and upsmon on all your hosts to properly issue shutdown signals when you lose power, and the UPS starts discharging. The thresholds you set for this are up to you.
In general, you don’t need to overthink HA, you need to focus instead on your services recovering gracefully in these situations. Spending insane amounts of time and money to make highly available services for your media and home automation will only leave you having spent resources and realizing there is no way to ever get to 100% uptime without flaws somewhere.
Good points there.
For 1. The ISP router is a Fritz one set to bridge mode running over a PoE adapter from the same UPS the firewall is using. It stayed up all the time (looking back at the logs)
-
Not sure what happened here, but the firewall is the DNS resolver and when everything else powered back up, nothing got an IP address. Now, whether thw service failed or the WAPs took longer to start than the devices could wait, I’m not sure, but as Scotty said: it’s dead Jim.
-
Good point. I don’t need it ALL to be redundant.
-
Also good. The UPS is directly connected to the firewall (which has NUT in), but it doesn’t inform anything else… I’ll look into that too.
Nice mental reset for me about over thinking it… thanks
- Okay, so no issues there
- DHCP handles the address assignments in your network, not DNS. DNS resolves to named host queries. If no devices got IP addresses, that’s one problem. If you couldn’t resolve public hosts like www.news.com, that’s a DNS problem. If you couldn’t resolve INTERNAL named hosts you refer to around your network, then that’s also DNS, but a different problem.
My hunch here is that you MIGHT be using a named host as your DNS resolves instead of an IP address in your network, OR, for some reason your DNS resolves doesn’t have a static address. Never use named hosts to point to network services, and all network services need a static IP, so go and check all of that.
Yep, all good with DHCP vs DNS… just my grammer was terrible.
Nothing was getting an IP from the DHCP, when the wifi returned…and… DNS was also not working for the few devices that still had an IP.
Sry bout the confusion there.
-
I have a multi wan SMB router. 945mbit throughput. $60 new.
TPLink omada or Ubiquiti tier stuff is all you really need for small business. The redundant ISP connections cost way more, but it’s still a tiny cost per month for something that can get the job done in a pinch like a hotspot.
Battery backups are only useful if you have a generator to take over the utility load imo. Not a common thing in small business unless you’re leasing somewhere with generators provided for the whole building.
Redundant servers are not that hard to have. Just need proxmox. It’s not as intuitive as old vmware but it’s more than enough for a SMB. Some kind of storage shelf and three little servers gets you a ton of redundancy. If a tiny budget is necessary and small downtime is fine you really only need a couple of hosts that are beefy enough to run everything you need on each.
Well…no, and this is what I’m saying.
Every downstream issue you try to solve with redundancy has a doubled and duplicate cost to it’s upstream. Internet links, load balancers for web services, and in this specific situation, UPS’s.
Throwing more servers at a homelab with no power is just wasting money without more UPS power in the mix. You have 4 servers, and want HA for everything on your network, expect to have two of everything, including UPS units.
This is the n* sunken cost of redundancy at its core, and in your example, you’re assuming this person even had a generator or whatever, but even if they did, they’d need an even BIGGER generator to run all this stuff.
That’s why my points deal with solving for what they have and making it work better than, instead, immediately jumping to adding more and more and more to the stack. It’s just not necessary when all they want is a graceful recovery to power loss.
-
Low tech options: a smart plug that power cycles if it can’t ping eg google and have your edge devices plugged in there, or a timer that reboots the firewall at 0200 daily. I haven’t implemented either of these, despite having a network other people rely on about 400km from my house. I should remediate that…
I have decided dual firewalls are silly without dual internet and dual power, as both those things go down more often than my FW.
I have two instances of pihole on two hosts, because I block dns outbound to the best of my ability.
Two firewalls in HA.
You can also get a little Cradlepoint or something with a SIM card as a backup Internet connection if you need uplink redundancy.
That (2 FWs) was what I was considering initially.
But, looking at some other posts, I’m starting to rethink my design as I only have 1 WAN connection, then I only need 1 FW (maybe). SIM would be rarely used, I’m not sure the overall cost would be worth it
So separating FW from DHCP & DNS might be a better solution.
I had a similar failure while I was out of the country for a month. My Raspberry Pi didn’t come back after a power blink. Home Assistant, Wireguard tunnels, security cameras, Jellyfin, Syncthing backup and DNS all failed until I returned. After looking at possible solutions I ruled out buying redundant hardware because of the cost, and more importantly the time and complexity of implementing and maintaining everything.
Instead I bought a small, relatively inexpensive laptop and a router with plenty of processing power and memory. I moved my Wireguard endpoints, DHCP and DNS server to the router and everything else to the laptop and disconnected my UPS completely.
If the router is up, WG connectivity, DNS, DHCP and wifi are up. The router does reset on power failure, but my ISP has no local power backup so Internet is out until power is restored anyway.
This laptop loafs along at 10 watts and costs about $2 per month to operate despite our high electric rates. My old UPS drew 75 watts most of the time even when there was nothing plugged in and cost more than $16/month to run. The laptop’s battery is firmware limited to a 70% charge so the battery will last years without degrading and making other battery issues unlikely. It provides 7 hours of operation if power fails compared to an optimistic 20 minutes for the UPS. Power blinks (and there have been plenty) have no effect on the laptop at all.
I’ve been happy with this configuration. It has worked flawlessly for almost 2 years.
Ah… I was reading this thinking “ah, I’ll have to reply about the battery…”… glad you’re limiting the charging…
But an interesting point… I have a spare OLD Dell laptop kicking around which has various issues, but might be able to do what you’re doing. Thanks
There’s so many ways to skin this cat, you may want to start with identifying the most crucial single failure point that concerns you.
Is it the router? Best you can really do is have good hardware and make sure it and your modem are on a UPS.
If it’s an ISP-provided modem, some enable remote management via a phone app, which can be done from anywhere by signing in to your ISP account.
Well, in my case the most crucial single point is the firewall.
The rest isn’t too bad
For me, I have three proxmox nodes that are configured to restart VMs and LXC containers if a host goes offline. There’s a Palo Alto pa-440 for my fw/router and a brocade switch (they were something work gave me for practicing for a network exam).
The nodes, Palo, brocade, and AT&T modem are all on two UPS 1500va systems along with my wifi ap. Run time in case of power loss is around an hour.
I’m this close to getting a comprehensive shutdown script working from a raspberry pi that is triggered if there’s power loss (most UPS systems have some capability to trigger scripts on a host that’s connected to the UPS’s console port).
If I can get that script working, the battery backup will run a PI for several days.
Back on the redundancy side, I host two PowerDNS systems in the proxmox cluster along with a 3 node/LXC container Vault.
I’ve not looked at Proxmox clusters - can they restart VMs on a different host if they’re all using the same shared storage?
Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:
Fewer Letters More Letters DHCP Dynamic Host Configuration Protocol, automates assignment of IPs when connecting to a network DNS Domain Name Service/System HA Home Assistant automation software ~ High Availability IP Internet Protocol LXC Linux Containers PoE Power over Ethernet SMB Server Message Block protocol for file and printer sharing; Windows-native
[Thread #49 for this comm, first seen 31st Jan 2026, 18:40] [FAQ] [Full list] [Contact] [Source code]
Not a redundancy option, but I also had my routing setup go in to a bad state while traveling which was a hassle.
I solved this by setting up nightly reboots while away. Both my routing PC and modem are rebooted by smart switches (zwave/ZigBee) controlled by home assistant. Which means they’ll operate without a working network. The routing PC is set to shutdown one minute before the smart switch turns off, and set to boot automatically when power is restored (smart switch turns back on). Which avoids any issues with hanging on a reboot.



