My homelab is not for reliability

Domenico Giordano2026-06-0413 min · read

For years I told myself I wanted a NAS, and never bought one. Every time I looked, the price for what I actually needed felt out of proportion: a box I would stand on a shelf and barely touch, good for the occasional backup and the odd evening of streaming. At one point I decided I would build one myself instead, then admitted I did not have the time the project deserved, and let it go.

Then one evening, poking around online for something unrelated, I had a smaller idea: buy a cheap mini PC and tinker. I came away with a used Dell OptiPlex 7060 Micro, bought for almost nothing, as a toy.

That toy now has worse uptime than the €17-a-month server that runs my actual product. I know this because I have monitoring on both, and the homelab loses. It sits behind a residential internet connection and a domestic power socket, both of which fail more often than a datacenter's, and it has exactly one operator: me, asleep for a third of every day.

People who run homelabs tend to describe them as the opposite of that. Self-hosting gets sold as a reliability play, a sovereignty play, a way to stop renting your infrastructure from companies that might raise the price or read your files. Some of that is true at the margins. Most of it, at the scale of a single box in an apartment, is not.

This essay is about what a homelab is actually for, once you stop pretending it is a small datacenter. The honest answer, in my case, is that it is a gym.

What the genre sells

Open any homelab forum and the same three promises recur: uptime, sovereignty, and cost. Own your hardware and nothing goes down when someone else's cloud has a bad day. Keep your data in your house rather than on a server you do not control. Stop paying a monthly bill for something you could buy once.

Each has a grain of truth wrapped in a larger exaggeration, and uptime is the thinnest of them. A small VPS in a real datacenter has redundant power, a redundant network, and a team watching it; my apartment has none of those. Over three weeks last winter the power on my street failed three times. Each time the datacenter rode it out on batteries, and each time my homelab went dark and came back up needing a hand: containers that had not restarted on their own, a storage pool waiting to be reimported, a cluster with pods stuck in an unknown state.

Sovereignty is real but narrower than it sounds. The data is in my house, and so is the entire responsibility for not losing it, which turns out to be the harder half. Cost I will come back to, because the number is real but it is not the reason.

So if the box is less reliable than a cheap VPS, no more sovereign than I am diligent about backups, and only cheaper if I value my own time at zero, what is it for?

What it is actually for

It is the place where I practice the things that are expensive to get wrong somewhere that matters.

The hardware is unremarkable. A small Proxmox host boots from NVMe and carries seven LXC containers and a single-node k3s cluster. Storage is a ZFS mirror, two 10TB disks in a USB enclosure, 9 TB usable, with an NVMe device in front as a write cache. Prometheus and Grafana scrape every container and the host itself. None of this is impressive on its own. What it gives me is a system real enough to have real failure modes, and cheap enough that I can break it on a Tuesday night without anyone filing a ticket.

That combination is the whole value. At work, the cost of learning how a ZFS resilver behaves, or what a node running out of memory does to a Kubernetes scheduler, or how long a database restore actually takes under load, is paid in incidents and apologies. On the homelab it is paid in an evening. So I wrote a disaster-recovery runbook for a system nobody pays me to keep up: six failure scenarios, each with the steps to recover, an estimate of how long recovery takes, and an honest note on how much data I would lose. Writing recovery procedures for a media server and a folder of personal files is, on paper, absurd. Doing it taught me the shape of a good runbook in a way that reading other people's never had.

Built in three weeks of evenings

I should be honest about how little time this took, because it changes the argument rather than decorating it. The whole stack, the containers and the cluster and the storage pool and the tunnels and the monitoring, went up in about three weeks of evenings in late 2025. I was not fighting it for months. I had Claude Code open the entire time, and it wrote most of the mechanical parts: the systemd unit, the udev rule, the exact hdparm flags, the YAML the cluster wanted. I decided what to build and why; it meant I never lost an evening to the syntax of a unit file.

That speed is the point, not a footnote to it. If the artifact were the valuable thing, the fact that an assistant can generate most of it in a few weeks would be deflating. It is not, because the artifact was never the asset. What the speed did was push the entire cost of the project onto the part that does not compress: deciding that the disks should sleep at all, that the hot data belongs on a different tier from the cold, that a recovery procedure is worth writing before there is anything to recover. The tool removed the typing. It did not remove the judgment, and the judgment was the thing I was there to build.

Making the disks sleep

One of those evenings went entirely into making the disks turn themselves off.

The two 10TB drives spin at 7200 RPM, and a cupboard in a flat is a different acoustic environment from a rack in a basement. Left alone, they are audible across the room, and they burn power staying ready for data I touch maybe twice a day. The capability to fix this is built into every modern drive: advanced power management, plus an idle spin-down timer. The work is in setting them so the disk actually rests without thrashing its own heads parking and unparking.

I settled on APM level 127. The scale runs to 254, and the detail that matters is the boundary at 128: from 128 up the drive never spins down, and below it the drive is allowed to stop. 127 is the highest value that still permits spin-down, which keeps the resting behavior while leaving out the aggressive head-parking that wears a drive out. Alongside it, a spin-down timer of ten minutes of idle. Both are applied at boot by a small systemd service and reasserted by a udev rule, so they survive a reboot and survive me unplugging the enclosure and plugging it back in, which is the case I would otherwise forget about until the noise came back.

The setting only works because of a decision one layer up. A disk sleeps only if nothing touches it, and on a server plenty of things want to. So everything that gets touched constantly stays off the spinning pool entirely. The operating system, the container root filesystems, the Prometheus time-series database, the k3s volumes all live on the NVMe. The spinning mirror holds only cold data: the media library, the backups, the files I open occasionally. Put the metrics database on the slow pool and it would write a handful of points every few seconds, forever, and the disks would never get a chance to stop. Keeping the two tiers honestly separated is most of the work.

The thing most likely to undo all of it is your own monitoring. The first time I set the spin-down, the disks still refused to sleep, because the SMART daemon watching their health was waking them every few minutes to ask how they were. The fix was to tell it to skip any disk already in standby and to poll far less often. I had built a careful power saving and then quietly defeated it with a health check, which is the kind of mistake you make exactly once.

I will be honest about the edge of this story: I have never put a power meter on the box. The saving is obvious in noise and sound in principle, but if I quoted you a wattage I would be inventing it, and the homelab genre is full of energy numbers nobody measured. What I can say exactly is that the disks go silent ten minutes after I stop using them, and that the instinct underneath, keep the hot data on fast storage and let the cold tier fall asleep, is the same one you reach for later tuning a cloud bill instead of a cupboard. There the currency is euros. Here it was the noise, and the noise was enough to make me learn it.

The disciplines that transferred

The reason I trust this framing is that I can watch the transfer happen.

The product I actually run sits on a Hetzner server, and its operational setup did not come from a tutorial. It came from the homelab, a year earlier and at no stakes. Staging a change before it touches the thing that has to stay up is a habit I built by letting the homelab cluster be the place where a database migration or an API change runs first. Having monitoring in place before an incident, rather than bolting it on during one, came from the same place. So did the backup cadence, the habit of actually restoring and not merely taking backups, and the reflex to write the recovery steps down while the system is calm rather than on fire.

A smaller version of the same lesson came from the monitoring side. Earlier this year, auditing the public DNS for one of my domains, I found what looked like a private address from my home network answering on the public internet. For a few minutes I was sure I had leaked the layout of my own house. Then I checked it the right way, from resolvers outside my network instead of from inside it, and the public answer turned out to have been correct all along; the private address was a local override my own DNS was adding, visible only from the LAN. The scare was the useful part. It made me trace the thing end to end instead of trusting the first alarming read, which is a reflex worth more than the finding that triggered it.

None of this was planned as training. It became training because the homelab is real enough to punish the mistakes that production punishes, and forgiving enough that the punishment is an evening rather than a customer.

The discipline I have not finished

There is one place where the gym analogy turns on me, and I would rather say it than have it noticed.

The first rule of self-hosting, the one every guide puts ahead of the others, is that a backup living only in the same building as the original is not a backup. A fire, a theft, a flood, a dead enclosure, and the snapshots go with the disks they were meant to protect. I know this. I have known it the whole time. And the offsite copy is still a line on my roadmap rather than a process that runs.

What I have is good local discipline: ZFS snapshots every night with a week of retention, container backups on their own schedule, host configuration captured weekly. What I do not have is any of it leaving the apartment on its own. The honest description of my setup is that it would survive almost everything except the one category of disaster the whole practice exists to survive. I am writing this down partly to make the gap harder to keep ignoring.

Cost was never the reason

I said I would come back to cost, because the arithmetic is real and still not the point.

Run the comparison honestly and the homelab wins on paper. The hardware was already there, so the incremental cost of putting these workloads on it is roughly zero. The cloud equivalent, a few small servers and a managed database, would run somewhere between €30 and €50 a month. Against a revenue of exactly nothing, which is what these side projects make, that is hard to justify paying a provider.

But the arithmetic is the excuse, not the motive. If cost were really driving this I would have taken the simplest cheap option, not built a ZFS mirror with a write cache, a Kubernetes cluster, a monitoring stack, and a recovery document for a media server. The parts that were expensive in time exist because building them taught me something, not because they saved me €40 a month. There is an irony I owe the opening here: the NAS I would not pay for, I built anyway, one evening at a time. A second-hand mini PC and a disk enclosure that ran about €150, with ZFS on top, are the NAS I had balked at, and then a good deal more. I avoided the purchase and walked straight into the project.

I keep a short list of conditions that would flip the decision: a side project earning enough to deserve real infrastructure, a customer who would genuinely be hurt by a power cut, a stretch of home internet bad enough to make the single point of failure intolerable. None of them has fired yet. When one does, I will move the workload to a real datacenter without nostalgia, because by then the homelab will have already done its work.

What it is for, said plainly

That is the case for a machine that is, by every metric a datacenter would care about, worse than renting.

You do not lift weights because the barbell needs moving. The weight is pointless on its own. The point is what carrying it does to you, so that when there is something real to lift you are ready for it. The homelab is that arrangement. Its uptime does not matter. Its data, beyond a few personal files, does not matter. What matters is that it is a real system with real consequences, scaled down to where the consequences cost an evening, and that everything I have learned keeping it alive is waiting for the next time the system is one that does.

My homelab is not for reliability. It is for the version of me that runs the things that have to be.