1
0
Fork 0
self-hosting.riou.xyz/content/posts/infrastructure-overview.md

68 lines
3.9 KiB
Markdown
Raw Permalink Normal View History

---
title: "Infrastructure overview"
date: 2020-07-20T18:30:00+02:00
---
The idea behind this infrastructure is to run on commodity servers. No need to buy big racks of expensive servers as we
see in data centers. Simple homemade computers will do the job. At work, I have access to cheap hard drives that were
used in servers and either are out of warranty or not suitable for enterprise workload. They generally are half their
market price. I have a mix of brand new and re-used drives to reduce the risk of having two disks failing at the same
time in the same host.
There are three components in the infrastructure:
* **storage** servers that hold the data
* **monitoring** server that grabs metrics and sends alerts
* **vps**[^1] server used to create a VPN[^2] and watch for monitoring server availability
{{< rawhtml >}}
<p style="text-align: center;"><img src="/infrastructure-overview.svg" alt="Infrastructure overview" style="width: 65%;"></p>
{{< /rawhtml >}}
# Storage
Every storage server is designed to be hosted on a different location. Each one could be unplugged from a location then
plugged somewhere else and work the same way as before. They require an Internet access to be able to contact the VPS to
join the VPN.
The technology that holds data is **[ZFS](https://en.wikipedia.org/wiki/ZFS)**. I have the chance to use it at work for
production workloads and it makes life way easier. I am used to manage GNU/Linux servers
([Debian](https://www.debian.org/)) and I know that [FreeBSD](https://www.freebsd.org/) has built-in ZFS support, so I
wanted to give it a try. I didn't choose [FreeNAS](https://www.freenas.org/) because I wanted to do everything by myself
to learn and use only the features I needed.
The right balance I found to maximize available disk space while keeping data safe is to use **three disks** in a
[RAID-Z](https://en.wikipedia.org/wiki/ZFS#RAID_(%22RaidZ%22)). Storage servers are allowed to lose one disk at a time
without breaking the service. In the meantime, almost all the cumulative space is available to use. Datasets are
configured to use **lz4** compression because it saves disk space without pushing too much pressure on the CPU.
| Host | Disk capacity |
| -------- | ------------: |
| storage1 | 5.44T |
| storage2 | 2.72T |
| storage3 | 10.9T |
# Monitoring
Like any system administrator, I want to be alerted when something goes wrong on the infrastructure. I also want to
browse the history with graphs to see trends. There was a [Raspberry Pi](https://www.raspberrypi.org/) waiting to be
used in a drawer. It is now connected to the Wi-Fi network somewhere in the house, perfectly hidden, to do this job in
the background.
# VPS
I am not a network engineer. Actually, this is not my job and I don't want it to be. There are numerous experts in the
field that do this very well and I am thankful to them. But a computer without network connectivity is not very useful.
When self-hosting, you have to deal with your ISP modem settings. There is no standard as far as I know. Mine has no
fixed public IPv4 address. I tried to develop scripts to automatically update a subdomain name with the current public
IP address and try to contact it from the outside. The name worked, but the communication always failed.
To solve this problem, I [rent a VPS](https://www.ovhcloud.com/fr/vps/) hosted close to storage locations and I have
configured an [OpenVPN](https://openvpn.net/) server. This is a single point of failure and a *bottleneck* because all
the traffic goes to this server to communicate with others. In fact, Internet bandwidth at home is the real bottleneck
so the VPS should not be a problem. It also acts as the entry point from the outside world for metrics and monitoring
websites.
[^1]: [Virtual Private Server](https://en.wikipedia.org/wiki/Virtual_private_server)
[^2]: [Virtual Private Network](https://en.wikipedia.org/wiki/Virtual_private_network)