1
0
Fork 0

Initial commit

Signed-off-by: Julien Riou <julien@riou.xyz>
This commit is contained in:
Julien Riou 2024-12-22 07:56:14 +01:00 committed by Julien Riou
commit 8e018ba84d
Signed by: jriou
GPG key ID: 9A099EDA51316854
43 changed files with 14239 additions and 0 deletions

View file

@ -0,0 +1,136 @@
---
title: "Do your sensors yourself"
date: 2020-08-17T18:00:00+02:00
---
A big question I've asked myself during this project is what is the best place to put my storage servers? There are
multiple environmental variables to watch out: **temperature**, **humidity** and **noise**. If components are too hot,
they could be damaged in the long run. Of course, water and electricity are not friends. You can add a fan to move air
out of the case and reduce both temperature and humidity but the computer will become noisy. We need to measure those
variables. Unfortunately, all systems have different set of built-in sensors but not all of them are exposed to the
operating system. So I decided to build my own sensors.
# Sensors hardware
I'm a newbie in electronics. I never weld anything. In the DIY[^1] world, there is a open-source micro-controller, the
[Arduino Uno](https://store.arduino.cc/arduino-uno-rev3), that costs only a few bucks (20€). There are cheaper
alternatives available like the Elegoo Uno (11€). To build sensors, you'll need good sensors like the
[DHT22](https://www.waveshare.com/wiki/DHT22_Temperature-Humidity_Sensor) for temperature and humidity and
[KY-037](https://electropeak.com/learn/how-to-use-ky-037-sound-detection-sensor-with-arduino/) for capturing sound. To
connect everything together, you'll need a [breadboard](https://en.wikipedia.org/wiki/Breadboard),
[resistors](https://en.wikipedia.org/wiki/Resistor) and cables.
Components:
- [Elegoo Uno R3](https://www.amazon.fr/dp/B01N91PVIS/ref=cm_sw_r_tw_dp_x_8NtkFbHZ6X6K9)
- [DHT22 sensor](https://www.amazon.fr/dp/B07TTJNY1C/ref=cm_sw_r_tw_dp_x_QOtkFbBM2ZAAD)
- [KY-037 sensor](https://www.amazon.fr/dp/B07ZHGX5T6/ref=cm_sw_r_tw_dp_x_kPtkFbXRRK7ZP)
- [10k Ω resistor](https://www.amazon.fr/dp/B06XKQLPFV/ref=cm_sw_r_tw_dp_x_EPtkFbB24855X)
- [breadboard](https://www.amazon.fr/dp/B06XKZWCJB/ref=cm_sw_r_tw_dp_x_.PtkFb01X4WNW)
- [cables](https://www.amazon.fr/dp/B01JD5WCG2/ref=cm_sw_r_tw_dp_x_QQtkFbRA6PSG0)
In electronics, you need to build closed circuits going from the power supply ("+") to the ground ("-"). The Arduino
card can be plugged on an USB port which provides power to the card, on the "5V" pin. The end of the circuit should
return to the "GND" pin, which means "ground". The breadboard can help you extending the circuit and plug more than one
element (resistors and sensors at the same time). The top and bottom parts are connected horizontally. The central part
connects elements vertically. Horizontal and vertical parts are isolated from each other. Resistors role is to regulate
electrical intensity. They act like a tap for distributing water. If there is too much water at a time, the glass can be
full too quickly and water can spit everywhere. We'll put a resistor in front of the DHT22 to have valid values and to
prevent damages.
The circuit looks like this:
{{< rawhtml >}}
<p style="text-align: center;"><img src="/sensors.svg" alt="Sensors circuit" style="width: 65%;"></p>
{{< /rawhtml >}}
The DHT22 sensor has three pins: **power**, **digital** and **ground** (and not four like in the schema). The KY-037
sensor has four pins: **analog**, **ground**, **power** and **digital** (and not three like in the schema). We'll use
the analog pin to gather data from the sound sensor.
# Sensors software
The circuit is plugged to a computer via USB and it's ready to be used. To be able to read values, we need to compile
low-level code and execute it on the board. For this purpose, you can install the [Arduino
IDE](https://www.arduino.cc/en/Main/Software) which is available on multiple platforms. My personal computer runs on
Ubuntu (no joke please) and I tried to use the packages from the repositories. However, they are too old to work. You
should [install the IDE yourself](https://www.arduino.cc/en/Guide/Linux). I've added my own user to the "dialout" group
to be able to use the serial interface to send compiled code to the board. The code itself is called a "sketch". You can
find mine [here](https://github.com/jouir/arduino-sensors-toolkit/blob/master/sensors2serial.ino). Click on "Upload",
job done.
# Multiplexing
Values are sent to the serial port but only one program can read this interface at a time. No luck, we would like to
send those metrics to the alerting and trending systems. Both have their own schedules. They will try to access this
interface at the same time. Moreover, programs that would like to read the serial port will have to wait for, at least,
four seconds. In the IoT[^2] world, we often see the usage of [MQTT](https://en.wikipedia.org/wiki/MQTT), a queuing
protocol. To solve the performance issue, I've developed a simple daemon that reads values from the serial interface and
publishes them to an MQTT broker called [serial2mqtt](https://github.com/jouir/arduino-sensors-toolkit/#serial2mqtt).
I've installed [Mosquitto](https://mosquitto.org/) on storage servers so the multiplexing happens locally.
# Thresholds
What is the **critical temperature**? I [found](https://www.apc.com/us/en/faqs/FA157464/) that UPS batteries should not
run in an environment where temperatures exceed 25°C (warning) and must not go over 40°C (critical). This summer, I had
multiple buzzer alerts on storage3 and the temperature was over 29°C every time.
What is the **critical humidity**? Humidity is the concentration of water in a volume of air. In tropical regions of the
world, we often see a 100% humidity level, with working computers. Humidity is proportional to the temperature. The
hotter it is, the more water can be contained in the air. Generally, temperature in a computer case is warmer than the
ambient temperature. What is dangerous is not the quantity of water in the air, it's when water condense. A good rule of
thumb is to avoid going over 80%. But 100% should not be a problem.
# Alerting
On Nagios, I use the [check-mqtt](https://github.com/jpmens/check-mqtt) script on the monitored storage host under an
NRPE command:
```
# Sensors
command[check_ambient_temperature]=/usr/local/bin/python3.7 /usr/local/libexec/nagios/check-mqtt.py -m 10 --readonly -t sensors/temperature -H localhost -P 1883 -u nagios -p ***** -w "float(payload) > 25.0" -c "float(payload) > 40.0"
command[check_ambient_humidity]=/usr/local/bin/python3.7 /usr/local/libexec/nagios/check-mqtt.py -m 10 --readonly -t sensors/humidity -H localhost -P 1883 -u nagios -p ***** -w "float(payload) > 80.0" -c "float(payload) > 95.0"
```
[![storage2](/sensors-storage2-alert.png)](/sensors-storage2-alert.png)
# Observability
Telegraf has a [mqtt_consumer](https://github.com/influxdata/telegraf/tree/master/plugins/inputs/mqtt_consumer) input
plugin:
```
[[inputs.mqtt_consumer]]
servers = ["tcp://localhost:1883"]
topics = [
"sensors/humidity",
"sensors/temperature",
"sensors/sound"
]
persistent_session = true
client_id = "telegraf"
data_format = "value"
data_type = "float"
username = "telegraf"
password = "*****"
```
Grafana is able to display environmental variables now:
[![storage1](/sensors-storage1.png)](/sensors-storage1.png)
[![storage2](/sensors-storage2.png)](/sensors-storage2.png)
[![storage3](/sensors-storage3.png)](/sensors-storage3.png)
# In the end
I tried to measure noise but I failed. The KY-037 sensor is designed to detect sound variations like a big noise for a
short period of time. When we try to measure the ambient noise level, it requires a lot of conversions to get values in
[decibel](https://en.wikipedia.org/wiki/Decibel). So I decided to ignore values coming from the sensor and to hear it
myself.
I can put my storage servers in the attic, in a room or in the cellar. The attic is right under the roof which is too
hot in the summer (over 40°C). Rooms are occupied during the night and noise is a problem. I am lucky to have a free
room right now but it's too hot during the summer (over 25°C). There is the cellar left, where all the conditions are
optimal, even humidity. Remote locations all have a cellar which is perfect!
[^1]: Do It Yourself
[^2]: Internet of Things

View file

@ -0,0 +1,136 @@
---
title: "Geographic distribution with Sanoid and Syncoid"
date: 2020-08-03T18:00:00+02:00
---
Failures happen at multiple levels: a single disk can fail, as well as multiple disks, a single server, multiple
servers, a geographic region, a country, the world, the universe. The probability decreases with the number of
simultaneous events. Costs and complexity increase with the number of failure events you want to handle. It's up to you
to find the right balance between all those variables.
For my own infrastructure at home, I was able to put storage servers into three different locations. Two in Belgium
(with 10Km distance from one another), one in France. They all share the same data. Up to two storage servers can burn
or be flooded entirely without data loss. There are different redundant solutions at the host level but I will not cover
them in this article.
{{< rawhtml >}}
<script src="https://unpkg.com/leaflet@latest/dist/leaflet.js"></script>
<link href="https://unpkg.com/leaflet@latest/dist/leaflet.css" rel="stylesheet"/>
<div id="osm-map"></div>
<script type="text/javascript">
var element = document.getElementById('osm-map');
element.style = 'height:500px;';
var map = L.map(element);
L.tileLayer('https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', {
attribution: '&copy; <a href="http://osm.org/copyright">OpenStreetMap</a> contributors'
}).addTo(map);
var center = L.latLng('49.708', '2.516');
map.setView(center, 7);
L.marker(L.latLng('48.8566969', '2.3514616')).addTo(map); // storage france
L.marker(L.latLng('50.4549568', '3.9519580')).addTo(map); // storage belgium (x2)
</script>
<p><!-- space --></p>
{{< /rawhtml >}}
# Backup management
Storage layer relies on ZFS pools. There is a wonderful free software called
[Sanoid](https://github.com/jimsalterjrs/sanoid) to take snapshots of your datasets and manage their retention. Here is
an example of configuration on a storage host:
```
[zroot]
hourly = 0
daily = 0
monthly = 0
yearly = 0
autosnap = no
autoprune = no
[storage/xxx]
use_template = storage
[storage/yyy]
use_template = storage
[storage/zzz]
use_template = storage
[template_storage]
hourly = 0
daily = 31
monthly = 12
yearly = 10
autosnap = yes
autoprune = yes
```
Where *storage/xxx*, *storage/yyy*, and *storage/zzz* are datasets exposed to my family computers. With this
configuration, I am able to keep 10 years of snapshots. This may change over time depending on disk space, performance
or retention requirements. The *zroot* dataset has no snapshot nor prune policy but is declared in the configuration for
monitoring purpose.
Sanoid is compatible with FreeBSD but it requires [system
changes](https://github.com/jimsalterjrs/sanoid/blob/master/FREEBSD.readme). You'll need an "sh" compatible shell to be
compatible with mbuffer. I've chosen to install and use "bash" because I'm familiar with it on GNU/Linux servers.
To automatically create and prune snapshots, I've created a cron job that runs every minute:
```
* * * * * /usr/local/sbin/sanoid --cron --verbose >> /var/log/sanoid.log
```
# Remote sync
Sanoid comes with a tool to sync local snapshots with a remote host called
[Syncoid](https://github.com/jimsalterjrs/sanoid#syncoid). It is similar to "rsync" but for ZFS snapshots. If the
synchronization fails in the middle, Syncoid can **resume** the replication where it was left, without restarting from
zero. It also supports **compression** on the wire. This is handy for low bandwidth networks like the one I have. To be
able to send dataset to remote destination, I've set up direct SSH communication (via the VPN) with ed25519 keys.
Then cron jobs for automation:
```
0 2,6 * * * /usr/local/sbin/syncoid storage/xxxxx root@storage2:storage/xxxxx --no-sync-snap >> /var/log/syncoid/xxxxx.log 2>&1
0 3,7 * * * /usr/local/sbin/syncoid storage/xxxxx root@storage3:storage/xxxxx --no-sync-snap >> /var/log/syncoid/xxxxx.log 2>&1
```
Beware, I use the "root" user for this connection. This can be a **security flow**. You should create a user with low
privileges and possibly use "sudo" with a restriction to the command. You should disable root login over SSH. The
countermeasure I've implemented is to disable password authentication on the root user ("*PermitRootLogin
without-password*" in sshd_config file from OpenSSH server). I've also restricted SSH connections to the VPN and local
networks only. No public network allowed.
# Local usage
Now, ZFS snapshots are automatically created and replicated. How can we start using the service? *I want to send my
data!* Every location has its own storage server. The idea is to use the local network and send data to the local server
and let the Sanoid/Syncoid couple handle the rest over the VPN for data safety.
At the beginning, all my family members were using [Microsoft Windows](https://en.wikipedia.org/wiki/Microsoft_Windows)
(10). To provide the most user friendly experience, I thought it was a good idea to create a
[CIFS](https://en.wikipedia.org/wiki/Server_Message_Block) share with
[Samba](https://en.wikipedia.org/wiki/Samba_(software)). The authentication system was a pain to configure but the
network drive was recognized and it worked... for a while. Every single Samba update on the storage server broke the
share. I've lost countless hours debugging this s\*\*t.
I started to show them alternatives to Windows. One day, my wife accepted to change. She opted for
[Kubuntu](https://kubuntu.org/). Then my parents-in-law changed too. I was able to remove the Samba share and use
[NFS](https://en.wikipedia.org/wiki/Network_File_System) instead. This changed my life. The network folder has never
stopped working since the switch. For my personal use, I use [rsync](https://en.wikipedia.org/wiki/Rsync) and cron to
**automatically** send my local folders.
The storage infrastructure looks like this (storage1 example):
{{< rawhtml >}}
<p style="text-align: center;"><img src="/geographic-distribution-diagram.svg" alt="Geographic distribution diagram" style="width: 50%;"></p>
{{< /rawhtml >}}
Syncoid is configured to replicate to other nodes:
{{< rawhtml >}}
<p style="text-align: center;"><img src="/geographic-distribution-diagram-2.svg" alt="Geographic distribution part 2" style="width: 50%;"></p>
{{< /rawhtml >}}
The most important rule is to **strictly forbid writes** on the **same dataset** on two **different locations** at the
**same time**. This
setup is not "[multi-master](https://en.wikipedia.org/wiki/Multi-master_replication)" compliant at all.
In the end, the data management is fully automated. Data losses belong to the past.

View file

@ -0,0 +1,76 @@
---
title: "Hardware adventures and operating systems installation"
date: 2020-07-24T18:00:00+02:00
---
At the beginning of the project, the goal was to create a single storage server at my apartment. So I bought a [fancy
case](https://www.ldlc.com/fr-be/fiche/PB00181814.html) with racks in front to hot replace disks and I retrieved an
[Intel NUC motherboard](https://www.intel.com/content/www/us/en/products/boards-kits/nuc/boards.html) from work. It had
only two SATA ports available to connect disks which is not enough to plug at least four disks: one for the system and
three for the storage. I bought a [PCI RAID card](https://www.amazon.fr/gp/product/B0001Y7PU8) to add four slots. I
connected two small SSD for the system and four data disks, then installed FreeBSD without any issue. I started to copy
data to the storage space when a noisy alarm[^1] began to wake everybody up in the building. This was unbearable. I
decided to buy a *micro ATX* motherboard with processor and memory to replace the Intel NUC board. Wrong. I confused
[micro ATX](https://en.wikipedia.org/wiki/MicroATX) with [mini ITX](https://en.wikipedia.org/wiki/Mini-ITX) formats. The
first one was too big to fit in the box. So I bought a classic ATX case with a cheap power supply and 3x2TB disks from
work. **Storage1** was born.
At that point, I had a working storage server and some pieces to build a second one. At the same time, my wife and I had
a baby. My office at home became the newborn bedroom. I paused this project for a year to focus on my family. Then, we
bought a house with plenty of space to handle life serenely.
During the move, I unpacked my very first computer that I had assembled in 2008. The only missing thing was a physical
slot to rack the fourth disk. I bought a [low cost ATX case](https://www.amazon.fr/gp/product/B00LA7PC6Y/) and moved
every piece into. I started before work on a Friday but didn't finish on time. My home office was covered with computers
pieces all day long. When I finished work, I went back to the project when a friendly neighbor called on me for help
because his computer crashed. Right before going to bed, I tried to connect the power button to the motherboard without
instructions, and it didn't work. I finally found it on the web and made it work, at midnight. **Storage 2** was born.
It runs on a quite old hardware (10+ years). I thought it would be easy to install FreeBSD because it was created in the
90s[^2]. I tried to boot from USB but the stick was not recognized. I burnt a CD-ROM with version 12, the latest release
at that time. The installer was not able to load because of a [LUA
error](https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=234031) in the bootloader. In the comments and on forums, some
people managed to make version 11 work. I burnt a CD-ROM with version 11, same result. After having lost an afternoon of
my time and two CD-ROMs, I went back into my comfort zone and installed a Debian 10 with success.
Recently, my family offered me the missing hardware pieces to finalize the third storage host. The big one with 4TB
disks in the mini case. The one I had bought at the beginning of the project. In the end, it is not so practical. Disks
are not fixed to the rack. They can move back and forth a few centimeters. Some disks were not recognized by the system
because they were not connected. I pushed all of them with a screwdriver to ensure they were plugged into the SATA
connector. For the price, I expected it to work out-of-the-box. I was surprised to find four SATA ports on the
motherboard where I expected five or six. I removed one system disk. Goodbye dirty hack with adhesive tape to stick the
second SSD! Let's join your friends in the stock. **Storage 3** was born.
Here is the detailed list of components:
| Host | Component | Reference |
| -------- | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| storage1 | Case | [Antec One](https://media.ldlc.com/ld/products/00/01/00/62/LD0001006251_2.jpg) |
| | Power supply | [Antec Basiq Series VP350P](https://media.ldlc.com/ld/products/00/00/89/95/LD0000899597_2.jpg) |
| | Motherboard | [Gigabyte GA-B150M-DS3H](https://media.ldlc.com/ld/products/00/03/45/35/LD0003453579_2.jpg) |
| | CPU | [Intel Celeron G3900 (2.8 GHz)](https://media.ldlc.com/ld/products/00/01/47/39/LD0001473956_2_0001473966_0001571304_0001571323_0003614881.jpg) |
| | RAM | [G.Skill Aegis 4 Go (1 x 4 Go) DDR4 2133 MHz CL15](https://www.ldlc.com/fr-be/fiche/PB00202287.html) |
| | System disks | [LDLC SSD F2 32 GB](https://media.ldlc.com/ld/products/00/03/42/11/LD0003421194_2_0003421246.jpg) (x2) |
| | Data disks | 2TB HDD 3.5" (x3) |
| storage2 | Case | [Advance Grafit](https://www.amazon.fr/gp/product/B00LA7PC6Y/) |
| | Power supply | No reference found |
| | Motherboard | Asus M2A-VM HDMI |
| | CPU | AMD Athlon 64 X2 5000+ Socket AM2 |
| | RAM | G.Skill Kit Extreme2 2 x 1 Go PC6400 PK (x2) |
| | System disk | Recycled 160GB HDD 3.5" |
| | Data disks | 1TB HDD 3.5" (x3) |
| storage3 | Case | [In Win IW-MS04](https://www.ldlc.com/fr-be/fiche/PB00181814.html) |
| | Motherboard | [ASRock H310CM-ITX/AC](https://www.ldlc.com/fr-be/fiche/PB00275155.html) |
| | CPU | [Intel Celeron G4920 (3.2 GHz)](https://www.ldlc.com/fr-be/fiche/PB00247186.html) |
| | RAM | [G.Skill Aegis 4 Go (1 x 4 Go) DDR4 2133 MHz CL15](https://www.ldlc.com/fr-be/fiche/PB00202287.html) |
| | System disk | [LDLC SSD F2 32 GB](https://media.ldlc.com/ld/products/00/03/42/11/LD0003421194_2_0003421246.jpg) |
| | Data disks | 4TB HDD 3.5" (x3) |
Despite heterogeneous components, storage servers have been successfully running for a while now.
[^1]: Later, I found out that the noise was coming from the disk backplane and not the motherboard. There is a buzzer
that emits a sound sequence depending on the detected anomaly. At the apartment and at my current house in the summer,
the temperature in the room was too high (more than 29°C). I moved the host in a cold place. Problem solved.
[^2]: FreeBSD [initial release](https://en.wikipedia.org/wiki/FreeBSD) was on November 1, 1993.

View file

@ -0,0 +1,141 @@
---
title: "Increased observability with the TIG stack"
date: 2020-08-10T18:00:00+02:00
---
[Observability](https://en.wikipedia.org/wiki/Observability) has become a buzzword lately. I must admit, this is one of
the many reasons why I use it in the title. In reality, this article will talk about fetching measurements and creating
beautiful graphs to feel like [detective Derrick](https://en.wikipedia.org/wiki/Derrick_(TV_series)), an *old* and wise
detective solving cases by encouraging criminals to confess by themselves.
With the recent [Go](https://golang.org/) programming language [gain of
popularity](https://opensource.com/article/17/11/why-go-grows), we have seen a lot of new software coming into the
database world: [CockroachDB](https://www.cockroachlabs.com/), [TiDB](https://pingcap.com/products/tidb),
[Vitess](https://vitess.io/), etc. Among them, the **TIG stack**
([**T**elegraf](https://github.com/influxdata/telegraf), [**I**nfluxDB](https://github.com/influxdata/influxdb) and
[**G**rafana](https://github.com/grafana/grafana)) has become a reference to gather and display metrics.
The goal is to see the evolution of different resources usage (memory, processor, storage space), power consumption,
environment variables (temperature, humidity), on every single host of the infrastructure.
# Telegraf
The first component of the stack is Telegraf, an agent that can fetch metrics from multiple sources
([input](https://github.com/influxdata/telegraf/tree/master/plugins/inputs)) and write them to multiple destinations
([output](https://github.com/influxdata/telegraf/tree/master/plugins/outputs)). There are tens of built-in plugins
available! You can even gather a custom source of data with
[exec](https://github.com/influxdata/telegraf/tree/master/plugins/inputs/exec) with an expected
[format](https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md).
I configured Telegraf to fetch and send metrics every minute (*interval* and *flush_interval* in the *agent* section is
*"60s"*) which is enough for my personal usage. Most of the plugins I use are built-in: cpu, disk, diskio, kernel, mem,
processes, system, zfs, net, smart, ping, etc.
The [zfs](https://github.com/influxdata/telegraf/tree/master/plugins/inputs/zfs) plugin fetches ZFS pool statistics like
size, allocation, free space, etc, on FreeBSD but not [on Linux](https://github.com/influxdata/telegraf/issues/2616).
The issue is known but has not been merged upstream yet. So I have developed a simple Python snippet to fill the gap on
my only storage server running on Linux:
```
#!/usr/bin/python
import subprocess
def parse_int(s):
return str(int(s)) + 'i'
def parse_float_with_x(s):
return float(s.replace('x', ''))
def parse_pct_int(s):
return parse_int(s.replace('%', ''))
if __name__ == '__main__':
measurement = 'zfs_pool'
pools = subprocess.check_output(['/usr/sbin/zpool', 'list', '-Hp']).splitlines()
output = []
for pool in pools:
col = pool.split("\t")
tags = {'pool': col[0], 'health': col[9]}
fields = {}
if tags['health'] == 'UNAVAIL':
fields['size'] = 0
else:
fields['size'] = parse_int(col[1])
fields['allocated'] = parse_int(col[2])
fields['free'] = parse_int(col[3])
fields['fragmentation'] = '0i' if col[6] == '-' else parse_pct_int(col[6])
fields['capacity'] = parse_int(col[7])
fields['dedupratio'] = parse_float_with_x(col[8])
tags = ','.join(['{}={}'.format(k, v) for k, v in tags.items()])
fields = ','.join(['{}={}'.format(k, v) for k, v in fields.items()])
print('{},{} {}'.format(measurement, tags, fields))
```
Called by the following input:
```
[[inputs.exec]]
commands = ['/opt/telegraf-plugins/zfs.py']
data_format = "influx"
```
This exec plugin does exactly the same job as the zfs input running on FreeBSD.
All those metrics are sent to a single output, InfluxDB, hosted on the monitoring server.
# InfluxDB
Measurements can be stored in a time series database which is designed to organize data around time. InfluxDB is a
perfect use case for what we need. Of course, there are other time series databases. I've chosen this one because it is
well documented, it fits my needs and I wanted to learn new things.
[Installation](https://docs.influxdata.com/influxdb/v1.8/introduction/install/) is straightforward. I've enabled
[HTTPS](https://docs.influxdata.com/influxdb/v1.8/administration/https_setup/) and
[authentication](https://docs.influxdata.com/influxdb/v1.8/administration/authentication_and_authorization/#set-up-authentication).
I use a simple setup with only one node in the *cluster*. No sharding. Only one database. Even if there is not so many
metrics sent by Telegraf, I've created a default [retention
policy](https://docs.influxdata.com/influxdb/v1.8/query_language/manage-database/#retention-policy-management) to store
two years of data which is more than enough. A new default retention policy will become the default route to store all
your new points. Don't be afraid to see all the existing measurements vanished. Nothing has been deleted. They just are
under the previous policy and need to be
[moved](https://community.influxdata.com/t/applying-retention-policies-to-existing-measurments/802). You should define a
[backup](https://docs.influxdata.com/influxdb/v1.8/administration/backup_and_restore/) policy too.
# Grafana
Now that we are able to gather and store metrics, we need to visualize them. This is the role of
[Grafana](https://grafana.com/). During my career, I played with
[Graylog](https://docs.graylog.org/en/3.2/pages/dashboards.html), [Kibana](https://www.elastic.co/kibana) and Grafana.
The last one is my favorite. It is generally blazing fast! Even on a Raspberry Pi. The look and feel is amazing. The
theme is dark by default but I like the light one.
I have created four dashboards:
- **system**: load, processor, memory, system disk usage, disk i/o, network quality and bandwidth
- **storage**: ZFS pool allocation, capacity, fragmentation and uptime for each disk
- **power consumption**: kWh used per day, week, month, year, current UPS load, price per year (more details on a next
post)
- **sensors**: ambient temperature, humidity and noise (more details on a next post)
Every single graph has a *$host* [variable](https://grafana.com/docs/grafana/latest/variables/templates-and-variables/)
at the dashboard level to be able to filter metrics per host. On top of the screen, a dropdown menu is automatically
created to select the host based on an InfluxDB query.
And because a picture is worth a thousand words, here are some screenshots of my own graphs:
[![System](/grafana-system.png)](/grafana-system.png)
[![Storage](/grafana-storage.png)](/grafana-storage.png)
[![Power consumption](/grafana-power-consumption.png)](/grafana-power-consumption.png)
[![Sensors](/grafana-sensors.png)](/grafana-sensors.png)
# Infrastructure
To sum this up, the infrastructure looks like this:
![TIG stack](/monitoring-tig.svg)
Whenever I want, I can sit back on a comfortable sofa, open a web browser and let the infrastructure speak for itself.
Easy, right?

View file

@ -0,0 +1,67 @@
---
title: "Infrastructure overview"
date: 2020-07-20T18:30:00+02:00
---
The idea behind this infrastructure is to run on commodity servers. No need to buy big racks of expensive servers as we
see in data centers. Simple homemade computers will do the job. At work, I have access to cheap hard drives that were
used in servers and either are out of warranty or not suitable for enterprise workload. They generally are half their
market price. I have a mix of brand new and re-used drives to reduce the risk of having two disks failing at the same
time in the same host.
There are three components in the infrastructure:
* **storage** servers that hold the data
* **monitoring** server that grabs metrics and sends alerts
* **vps**[^1] server used to create a VPN[^2] and watch for monitoring server availability
{{< rawhtml >}}
<p style="text-align: center;"><img src="/infrastructure-overview.svg" alt="Infrastructure overview" style="width: 65%;"></p>
{{< /rawhtml >}}
# Storage
Every storage server is designed to be hosted on a different location. Each one could be unplugged from a location then
plugged somewhere else and work the same way as before. They require an Internet access to be able to contact the VPS to
join the VPN.
The technology that holds data is **[ZFS](https://en.wikipedia.org/wiki/ZFS)**. I have the chance to use it at work for
production workloads and it makes life way easier. I am used to manage GNU/Linux servers
([Debian](https://www.debian.org/)) and I know that [FreeBSD](https://www.freebsd.org/) has built-in ZFS support, so I
wanted to give it a try. I didn't choose [FreeNAS](https://www.freenas.org/) because I wanted to do everything by myself
to learn and use only the features I needed.
The right balance I found to maximize available disk space while keeping data safe is to use **three disks** in a
[RAID-Z](https://en.wikipedia.org/wiki/ZFS#RAID_(%22RaidZ%22)). Storage servers are allowed to lose one disk at a time
without breaking the service. In the meantime, almost all the cumulative space is available to use. Datasets are
configured to use **lz4** compression because it saves disk space without pushing too much pressure on the CPU.
| Host | Disk capacity |
| -------- | ------------: |
| storage1 | 5.44T |
| storage2 | 2.72T |
| storage3 | 10.9T |
# Monitoring
Like any system administrator, I want to be alerted when something goes wrong on the infrastructure. I also want to
browse the history with graphs to see trends. There was a [Raspberry Pi](https://www.raspberrypi.org/) waiting to be
used in a drawer. It is now connected to the Wi-Fi network somewhere in the house, perfectly hidden, to do this job in
the background.
# VPS
I am not a network engineer. Actually, this is not my job and I don't want it to be. There are numerous experts in the
field that do this very well and I am thankful to them. But a computer without network connectivity is not very useful.
When self-hosting, you have to deal with your ISP modem settings. There is no standard as far as I know. Mine has no
fixed public IPv4 address. I tried to develop scripts to automatically update a subdomain name with the current public
IP address and try to contact it from the outside. The name worked, but the communication always failed.
To solve this problem, I [rent a VPS](https://www.ovhcloud.com/fr/vps/) hosted close to storage locations and I have
configured an [OpenVPN](https://openvpn.net/) server. This is a single point of failure and a *bottleneck* because all
the traffic goes to this server to communicate with others. In fact, Internet bandwidth at home is the real bottleneck
so the VPS should not be a problem. It also acts as the entry point from the outside world for metrics and monitoring
websites.
[^1]: [Virtual Private Server](https://en.wikipedia.org/wiki/Virtual_private_server)
[^2]: [Virtual Private Network](https://en.wikipedia.org/wiki/Virtual_private_network)

View file

@ -0,0 +1,112 @@
---
title: "Network configuration with OpenVPN"
date: 2020-07-27T18:00:00+02:00
---
Networking is hard. Dealing with ISP modem settings is even harder. Mine doesn't have a static public IP address by
default. If the modem reboots, it is likely that it will be assigned a new one. For regular people, it is not a problem
for browsing the Internet. But for hackers like us, that means we cannot use the IP address itself to reach the private
network from the outside world. It becomes a problem when we try to join hosts in different networks.
For your information, this is the price my ISP would like me to pay for this "option":
![Fixed IP option](/fixed-ip-option.png)
This is insane!
The first idea was to deploy a script on each host that discover the public IP address and register an A record on a
given subdomain name. This job could have been run by a cron daemon. It would transform a dynamic IP address into a
predictable name. It was like the [no-ip](https://www.noip.com/) service. It worked. I was able to know the home public
IP address.
Then, I started to use [port
mapping](https://www.proximus.be/support/en/id_sfaqr_ports_mapping/personal/support/internet/internet-at-home/advanced-settings/internet-port-mapping-on-your-modem.html#/bbox3)
to redirect a given port on my router to a host in the private network. By default, some protocols like SSH, HTTP and
HTTPS are [not
open](https://www.proximus.be/support/en/id_sfaqr_ports_unblock_secu/personal/support/internet/security-and-protection/internet-ports-and-security/open-internet-ports.html),
even if you configure port mapping correctly. You have to go on the ISP website and lower your *security level* from
high to low. At my apartment, I successfully managed to reach some port from the outside, but never on my current house.
The major problem of this procedure is its **complexity** and the fact it **highly depends on your ISP
devices/settings**. I had to find a simpler solution.
Here comes [OpenVPN](https://openvpn.net/). It's an open-source software which creates private networks on public
networks. It uses encryption to secure connection between each host to keep your transport safe. The initial setup is
quite long and complex but you just have to follow this [great
tutorial](https://www.digitalocean.com/community/tutorials/how-to-set-up-an-openvpn-server-on-debian-10) and it will
work like a charm. The drawback is you'll need a single point to act as a server. I choose to [rent a
VPS](https://www.ovhcloud.com/fr/vps/) for a few euros per month. It has a fixed IP address and a decent bandwidth for
our usage. It runs on Debian but there are plenty of operating systems available.
The OpenVPN certificate management can be a bit disturbing at first. I use my monitoring host as CA[^1] to keep trust at
home and every host has its own client certificate. I've set static IP addressing up to always assign the same address
to clients. I've enabled direct communication between clients because storage servers will send snapshots to each
others. I didn't configure clients to forward all their packets to the VPN server because the goal here is not to hide
behind it for privacy.
I have changed the following settings on the VPN server:
```
topology subnet ; declare a subnet like home
server 10.xx.xx.xx 255.xx.xx.xx ; with the range you like
client-to-client ; allow clients to talk to each other
client-config-dir /etc/openvpn/ccd ; static IP configuration per client
ifconfig-pool-persist /var/log/openvpn/ipp.txt ; IP lease settings
```
Example of *ipp.txt* file:
```
storage1,10.xx.xx.xx
storage2,10.yy.yy.yy
storage3,10.zz.zz.zz
```
Example of */etc/openvpn/ccd/storage1.user* file:
```
ifconfig-push 10.xx.xx.xx 255.xx.xx.xx
```
The network configuration declared in *client-config-dir* must match the one in *ipp.txt*.
The configuration generated by the *make_config.sh* script (see the tutorial mentioned above) can be written to:
* */etc/openvpn/client.conf* (Debian)
* */usr/local/etc/openvpn/openvpn.conf* (FreeBSD)
When the OpenVPN service is started, you should be able to see the tun interface up and running.
```
tun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
options=80000<LINKSTATE>
inet6 fe80::xxxx:xxxx:xxxx:xxxx%tun0 prefixlen 64 scopeid 0x3
inet 10.xx.xx.xx --> 10.xx.xx.xx netmask 0xffffff00
groups: tun
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Opened by PID 962
```
```
3: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
link/none
inet 10.xx.xx.xx/xx brd 10.xx.xx.xx scope global tun0
valid_lft forever preferred_lft forever
```
Et voilà! Every server is now part of a private network:
```
monitoring ~ # nmap -sn 10.xx.xx.xx/xx
Starting Nmap 7.70 ( https://nmap.org ) at 2020-07-13 17:28 CEST
Nmap scan report for vps (10.xx.xx.xx)
Host is up (0.018s latency).
Nmap scan report for 10.xx.xx.xx
Host is up (0.032s latency).
Nmap scan report for 10.xx.xx.xx
Host is up (0.24s latency).
Nmap scan report for 10.xx.xx.xx
Host is up (0.22s latency).
Nmap scan report for 10.xx.xx.xx
Host is up.
Nmap done: xx IP addresses (5 hosts up) scanned in 13.11 seconds
```
[^1]: [Certificate Authority](https://en.wikipedia.org/wiki/Certificate_authority)

View file

@ -0,0 +1,146 @@
---
title: "Power consumption and failures prevention"
date: 2020-08-14T18:00:00+02:00
---
Providing a full storage service means having computers up 24x7. On one hand, if we power off the local storage server
when we aren't using it, we'll have to find a solution to respect the backup policy and synchronize with remote servers
that could be down at the moment. On the other hand, if we let the storage server up all the time, it will consume
unnecessary resources and throw money down the drain. I deeply know that a personal computer, which is idle most of the
time, doesn't consume so much power. This is my conviction. But how to verify it?
With [observability]({{< ref "posts/increased-observability-with-the-tig-stack" >}}), I thought it would be easy to
gather power consumption via built-in sensors. I tried something that I know,
[lm_sensors](https://hwmon.wiki.kernel.org/lm_sensors), which is included in the Linux kernel and exposes CPU
temperatures, fans speed, power voltages, etc.
```
storage2 ~ # sensors
k8temp-pci-00c3
Adapter: PCI adapter
Core0 Temp: +30.0°C
Core0 Temp: +22.0°C
Core1 Temp: +30.0°C
Core1 Temp: +16.0°C
acpitz-acpi-0
Adapter: ACPI interface
temp1: +40.0°C (crit = +75.0°C)
atk0110-acpi-0
Adapter: ACPI interface
Vcore Voltage: +1.10 V (min = +1.45 V, max = +1.75 V)
+3.3 Voltage: +3.39 V (min = +3.00 V, max = +3.60 V)
+5.0 Voltage: +4.97 V (min = +4.50 V, max = +5.50 V)
+12.0 Voltage: +12.22 V (min = +11.20 V, max = +13.20 V)
CPU FAN Speed: 3391 RPM (min = 0 RPM, max = 1800 RPM)
CHASSIS FAN Speed: 0 RPM (min = 0 RPM, max = 1800 RPM)
POWER FAN Speed: 1662 RPM (min = 0 RPM, max = 1800 RPM)
CPU Temperature: +26.0°C (high = +90.0°C, crit = +125.0°C)
MB Temperature: +37.0°C (high = +70.0°C, crit = +125.0°C)
```
The ACPI interface returns some voltages measurements. But I doubt they can be used to find the instant consumption in
watt (W) and extrapolate the consumption over time in kilowatt-hour (kWh). On laptops, such information can be computed
from battery statistics. Unfortunately, all computers of the infrastructure are desktops without batteries.
I needed to buy a product. A [lot](https://modernsurvivalblog.com/alternative-energy/kill-a-watt-meter/)
[of](https://www.howtogeek.com/107854/the-how-to-geek-guide-to-measuring-your-energy-use/)
[websites](https://www.pcmag.com/news/how-to-measure-home-power-usage)
[talk](https://michaelbluejay.com/electricity/measure.html) about how to measure power consumption for computers and
even for the whole house. The common thing that comes out is the recommendation to use a
[wattmeter](https://en.wikipedia.org/wiki/Wattmeter).
{{< rawhtml >}}
<p style="text-align: center;"><img src="/zaeel-wattmetre.jpg" alt="Wattmeter" style="width: 25%;"></p>
{{< /rawhtml >}}
It's an instrument which can be plugged between the power outlet and your device to measure how much energy is consumed
instantaneously (W), over time (kWh) and even the total price if you have configured the kWh price. There is a LCD to
display the results. A wattmeter is cheap. I've bought [this
model](https://www.amazon.fr/dp/B07GN5NPDJ/ref=cm_sw_r_tw_dp_x_FMMiFb2911HN7) which does a good job. Sadly, we cannot
gather the data from the LCD to load them to the metrics infrastructure easily. It also lacks of precision for the
price. We can enter only two digits after the floating point while the energy provider gives us a price with five
digits.
Speaking of the price, my [energy provider](https://www.engie.be/fr/) publishes a beautiful but beyond understanding
[grid of prices](https://www.engie.be/fr/energie/electricite-gaz/prix-conditions). They are dependent on the pack of
products, the region and the distributor. They change over time. You can have an electrical counter for the day and for
the night. Moreover, price is displayed in cents and not euros! I called them to have a price estimation. Come on, we
are in a digitized world. They should, at least, display the current price of the contract somewhere in the customer
panel.
During my researches, I found that we could use an uninterruptible power supply (UPS) to gather power consumption
metrics. As a bonus, it is able to protect from power variations and interruptions that could harm computers. However,
they are quite expensive. Their prices range from 50€ to hundreds of euros. As I'm a total newbie in this domain, I've
read this detailed [guide](https://www.materiel.net/guide-achat/g13-les-onduleurs-et-prises-parafoudre/1/) (FR) to gain
some knowledge. So I decided to buy an [APC Back-UPS Pro
550](https://www.apc.com/shop/be/en/products/APC-Power-Saving-Back-UPS-Pro-550/P-BR550GI).
{{< rawhtml >}}
<p style="text-align: center;"><img src="/apc-back-ups-pro-550.jpg" alt="UPS" style="width: 25%;"></p>
{{< /rawhtml >}}
It has an USB interface to control it with [apcupsd](https://en.wikipedia.org/wiki/Apcupsd) and display power
information with the "apcaccess" binary. It's compatible with both Debian and FreeBSD and it even has a [telegraf
plugin](https://github.com/influxdata/telegraf/tree/master/plugins/inputs/apcupsd)!
```
storage1 ~ # /usr/local/sbin/apcaccess
APC : 001,036,0867
DATE : 2020-07-30 15:56:46 +0200
HOSTNAME : storage1
VERSION : 3.14.14 (31 May 2016) freebsd
UPSNAME : storage1
CABLE : USB Cable
DRIVER : USB UPS Driver
UPSMODE : Stand Alone
STARTTIME: 2020-07-26 18:28:21 +0200
MODEL : Back-UPS RS 550G
STATUS : ONLINE
LINEV : 234.0 Volts
LOADPCT : 10.0 Percent
BCHARGE : 100.0 Percent
TIMELEFT : 37.5 Minutes
MBATTCHG : 5 Percent
MINTIMEL : 3 Minutes
MAXTIME : 0 Seconds
SENSE : Medium
LOTRANS : 176.0 Volts
HITRANS : 282.0 Volts
ALARMDEL : No alarm
BATTV : 13.7 Volts
LASTXFER : No transfers since turnon
NUMXFERS : 0
TONBATT : 0 Seconds
CUMONBATT: 0 Seconds
XOFFBATT : N/A
SELFTEST : NO
STATFLAG : 0x05000008
SERIALNO : 4B1939P01928
BATTDATE : 2019-09-23
NOMINV : 230 Volts
NOMBATTV : 12.0 Volts
NOMPOWER : 330 Watts
FIRMWARE : 857.L7 .I USB FW:L7
END APC : 2020-07-30 15:57:32 +0200
```
The 550 is the first model of the Back-UPS Pro range so it has [IEC C13 power
plugs](https://en.wikipedia.org/wiki/IEC_60320#C13/C14_coupler) only, suitable for computers, but no [Euro/French
plugs](https://en.wikipedia.org/wiki/AC_power_plugs_and_sockets#CEE_7.2F5_socket_and_CEE_7.2F6_plug_.28French.3B_Type_E.29)
compatible with any electrical device. As I connect only a single computer to the UPS, this is the most economical
solution.
Once the data had been fed to the observability platform, I was able to import this [beautiful
dashboard](https://grafana.com/grafana/dashboards/10835) from the Grafana community. I've customized it to my own needs
and here is the result:
[![storage1](/power-consumption-storage1.png)](/power-consumption-storage1.png)
[![storage2](/power-consumption-storage2.png)](/power-consumption-storage2.png)
[![storage3](/power-consumption-storage3.png)](/power-consumption-storage3.png)
You can download my dashboard [here](/grafana-power-consumption.json).
Our winner is *storage3* which costs less than a kebab per year! The worst case is *storage2*, the old hardware, that
consumes the equivalent of an incandescent light bulb. See, the power consumption is not so bad after all.

View file

@ -0,0 +1,200 @@
---
title: "Problem detection and alerting"
date: 2020-08-07T18:00:00+02:00
---
Everything is distributed, automated and runs in perfect harmony with a common goal: protect your data. But bad things
happen, and rarely when you expect them. This is why you need to watch for services states and send a notification when
something goes wrong. Monitoring systems are well-known in the enterprise world. For our use case, we don't need to
deploy a complex infrastructure to check couple of hosts. For this reason, I choose to use the good old [Nagios
Core](https://www.nagios.org/projects/nagios-core/). It even provides a web interface for humans like us.
# How it works
There are two types of checks:
- **host**: check if host is alive or not
- **service**: check if service of a host is healthy or not
To check if a host is available, the simplest implementation is to use ping:
{{< rawhtml >}}
<p style="text-align: center;"><img src="/monitoring-host-check.svg" alt="Monitoring host check " style="width: 50%;"></p>
{{< /rawhtml >}}
For services, there is a tool to execute remote plugins called
[NRPE](https://support.nagios.com/kb/article/nrpe-agent-and-plugin-explained-612.html)[^1]. It works with a client on
the monitoring host and an agent on the remote host that executes commands on demand. The return code defines the check
result.
{{< rawhtml >}}
<p style="text-align: center;"><img src="/monitoring-service-check.svg" alt="Monitoring service check " style="width: 65%;"></p>
{{< /rawhtml >}}
Services states can be:
- **OK**: it works as expected
- **WARNING**: it works but we should take a look
- **CRITICAL**: it's broken
- **UNKNOWN**: something is wrong with the plugin configuration or communication
Plugins can define a warning and/or critical threshold to manage the expected state. For example, I would like to know
when disk space usage of a storage host goes over, say, 80% (warning) and 100% (critical). I have time to take action to
free some space or order new hard drives before it becomes critical. And if I do nothing, a higher alert will be sent if
the disk becomes full.
# Installation
My monitoring host runs on Raspbian 10:
```
apt update
apt install nagios4 monitoring-plugins
```
Installed.
By default, the web interface was broken. I had to disable the following block in the */etc/nagios4/apache2.conf* file:
```
# <Files "cmd.cgi">
# ...
# </Files>
```
For security reasons, I enabled a basic authentication (a.k.a *htaccess*) in the *DirectoryMatch* block of the same file
and created an *admin* user:
```
AuthUserFile "/etc/nagios4/htdigest.users"
AuthType Basic
AuthName "Restricted Files"
AuthBasicProvider file
Require user admin
```
In the CGI configuration file */etc/nagios4/cgi.cfg*, the default user can be set to *admin* as it is now protected by
basic security:
```
default_user_name=admin
```
Now the web interface should be up and running at http://monitoring-ip/nagios4. For my own usage, I've set up a reverse
proxy (nginx) on the VPS host to expose this interface to a public endpoint so I can access it from anywhere with my
credentials.
# Configuration
A fresh installation applies sane defaults to make Nagios work out-of-the-box. It even enables localhost monitoring.
Unfortunately, I want to check this host like any other server in the infrastructure. The first thing I do is to disable
the following include in */etc/nagios4/nagios.cfg* file:
```
#cfg_file=/etc/nagios4/objects/localhost.cfg
```
I don't want to be spammed by my monitoring system. Servers may be slower and take time to respond. The Wi-Fi connection
of the monitoring system may hang for a while... until someone reboots the host physically. During this extended period
of time (multiple hours), my family and I may sleep. I don't want to wake up with hundreds of notifications saying "Hey,
the monitoring system is DOWN!". One or two notifications is enough.
The following new templates can be defined in */etc/nagios4/conf.d/templates.cfg*:
```
define host {
name home-host
use generic-host
check_command check-host-alive
contact_groups admins
notification_options d,u,r
check_interval 5
retry_interval 5 ; retry every 5 minutes
max_check_attempts 12 ; alert at 1 hour (12x5 minutes)
notification_interval 720 ; resend notifications every 12 hours
register 0 ; template
}
define service {
name home-service
use generic-service
check_interval 5
retry_interval 5 ; retry every 5 minutes
max_check_attempts 12 ; alert at 1 hour (12x5 minutes)
notification_interval 720 ; 12 hours
register 0 ; template
}
```
There are multiple components to define:
- **hosts** (*/etc/nagios4/conf.d/hosts.cfg*): every single host
- **hostgroups** (*/etc/nagios4/conf.d/hostgroups.cfg*): groups of hosts
- **services** (*/etc/nagios4/conf.d/services.cfg*): services that will be attached to hostgroups
For example, I need to know ZFS usage of all storage servers:
- **hosts**: *storage1*, *storage2*, *storage3* with their IP addresses
- **hostgroups**: *storage-servers* that will regroup *storage1*, *storage2* and *storage3*
- **services**: *zfs_capacity* that will be attached to *storage-servers*
Host definition:
```
define host {
use home-host
host_name storage1
alias storage1
address XX.XX.XX.XX
}
```
Hostgroup definition:
```
define hostgroup {
hostgroup_name storage-servers
alias Storage servers
members storage1,storage2,storage3
}
```
Service definition:
```
define service {
use home-service
hostgroup_name storage-servers
service_description zfs_capacity
check_command check_nrpe!check_zfs_capacity
}
```
On all storage servers, we also need to define a NRPE command:
```
command[check_zfs_capacity]=/usr/local/bin/sudo /usr/local/sbin/sanoid --monitor-capacity
```
ZFS usage is now monitored!
I have repeated this process for all services I wanted to check to end up with:
[![Monitoring services](/monitoring-services.png)](/monitoring-services.png)
A single host can be in multiple hostgroups. For my tests, I always added features to *storage1*. I created a hostgroup
for each new capability and added only *storage1* to it. That means *storage1* had the same services as *storage2* and
*storage3*, and the new tested ones.
# Notifications
At work, we use [Opsgenie](https://www.atlassian.com/software/opsgenie) to define on call schedules within a team. Of
course, I don't want to receive a push notification on my phone for my home servers. This is why I choose to be notified
by e-mail. In the past, I hosted some e-mail boxes at home but I didn't want to deal with spam and SPF records to prove
to the world that my service is legit. I have a couple of [domain names](https://www.gandi.net/en/domain) with
(limited) e-mail services included. For the monitoring purpose, this is more than enough to do the job.
On Nagios, you can set the e-mail address in the contacts configuration file
*/etc/nagios4/objects/contacts.cfg*.
I followed this [great tutorial](https://www.linode.com/docs/email/postfix/postfix-smtp-debian7/) to configure
[postfix](http://www.postfix.org/) to send e-mails using the SMTP server of the provider. Secure and no more spam. I
have configured this new e-mail box on my phone so I can be alerted asynchronously and smoothly when something wrong
happens.
[^1]: Nagios Remote Plugin Executor

View file

@ -0,0 +1,47 @@
---
title: "State of Internet bandwidth in Belgium"
date: 2020-07-31T18:00:00+02:00
---
I was born and raised in a little city next to Paris in **France**. In early 2000s, the unlimited "high-speed" Internet
access revolutionized communications. No need to monopolize the phone line with a 56Kbps modem anymore. Since then, the
bandwidth has always increased. We have seen the ADSL, ADSL2 and fiber technologies. We had something called "Triple
play" offers where unlimited phone calls, TV and Internet were packed together. There were three major companies on the
market: [France Telecom/Orange](https://www.orange.fr), [Bouygues](https://www.bouyguestelecom.fr/) and
[Neuf/Cegetel/SFR](https://www.sfr.fr/) (depending on the year). Then [Free](https://www.free.fr) jumped into that
alliance and broke prices with revolutionary offers. From this time, all French ISP have "low prices" between 30 and
50€/month for "high-speed" hundreds of Mbps for both down and up thanks to the fiber deployment.
Then I moved to **Belgium** for personal reasons. My parents-in-law have chosen
[Belgacom/Proximus](https://www.proximus.be/en/personal/?) and they were happy with it so I followed their choice. This
ISP has deployed the VDSL technology which can be "fast". My first apartment was very close to the DSLAM[^1] so my
bandwidth was good enough, 50Mbps/15Mbps. The price was sensitively higher for Internet and TV only, 50€/month. If we
wanted to have a phone line, we would have added 20€ to the monthly bill and pay each phone call! You can get unlimited
phone calls for [1.19€/month](https://www.ovhtelecom.fr/telephonie/voip/decouverte.xml) only using VoIP which is the
same technology our ISP use. There is a limit to the monthly Internet volume we can consume. It was something like
600GB/month when I subscribed, to 3TB now.
When I moved to my current house, I knew the bandwidth will drop. Proximus had failed to organize my move on time. You
can do it yourself on the website but if you go to a shop to reschedule the appointment, they can't do anything because
it has been scheduled online. I canceled the first rendez-vous online and they created a new one with an additional
two-week delay, one month after the move. I subscribed to [Voo](https://www.voo.be/en), the *fastest Internet of
Belgium* like they say in their [commercials](https://www.youtube.com/watch?v=LKv6LtaXIf4). Same price, better speed,
120Mbps/10Mbps... for a week. Then I had three months of packet loss, 20% on average. It was unusable. The following two
months were stable with a bandwidth drop, 70Mbps/10Mbps. Then packet loss again, 80% on average this time! Horrible. I
re-subscribed to Proximus again, with 20Mbps/6Mbps bandwidth, but it is stable since the change. All of that for
60€/month.
I called Proximus to be notified when the fiber will come to my street to finally catch up with our neighbors' speeds,
kind of. They have no plan to install it. No date. Nothing. In the meantime, my father and my grand-parents have the
**gigabit** fiber installed at home for a lower price than mine. And even if Proximus deploy it, [upload bandwidth is
limited to 100Mbps](https://www.proximus.be/en/id_cr_fiber/personal/orphans/fiber-to-your-home.html) where it can be
[200Mbps](https://www.sfr.fr/offre-internet/fibre-optique) or even [600](https://www.free.fr/freebox/freebox-delta)
[Mbps](https://boutique.orange.fr/internet/offres-fibre/livebox-up) in France. As of today, the maximum bandwidth I
could get at home is the 400Mbps/20Mbps promised by Voo, with the stability we know.
Belgian ISP, Proximus and Voo, when will you stop to steal from our pockets and start to generalize very high-speed
Internet bandwidth to the small country of ours? We are in 2020s, not in 2000s.
[^1]: [Digital subscriber line access
multiplexer](https://en.wikipedia.org/wiki/Digital_subscriber_line_access_multiplexer), the closer you are, the faster
your bandwidth is.

View file

@ -0,0 +1,24 @@
---
title: "Storage servers at home"
date: 2020-07-17T19:00:00+02:00
---
I was born in the 90s. I grew up with computers. Other generations call us "digital natives". I am lucky and proud to
work with computers every day with a database specialization. People tend to generate lots of data. It might be
administrative papers (bills, contracts, paychecks), sentimental photo albums or whatever the data is as long as it is
**their** data. At work, we pay attention to back up every data though it was the most important thing in the world. At
home, it should be the same but, in fact, nobody really cares about it unless the data is definitely gone.
My family members used to buy a single USB hard drive and regularly copy their data to it and think it is safe. This
highly depends on the frequency of the backup. Actually, they didn't copy very often. If the drive fails, they call me
to the rescue but I'm not a magician.
Another solution involves sending their data to "the cloud" because they have seen on TV this will solve all of their
problems. Cloud providers can, intentionally or unintentionally, leak their data. If we materialize data, I'm not sure
my family wants to send their physical storage cupboard to the United States for the sake of data safety. We live in
Belgium and France. There is no point of sending our data to the other side of the planet in someone else's hands.
So I decided to **self-host a set of storage servers at home** and offer this service to my own family. It has to be
simple as my parents will be the main users. I am a full-time employee and a proud dad. I have a little bit of time to
do the service maintenance. It is an opportunity for me to learn and share it to the world. Welcome to my self-hosting
project. I hope you will learn something too.