Grafana with podman kube

9 minutes
January 19, 2023

Background

Since a long time I’m using a containerized version of Munin to monitor some of my devices. But there are many things I cannot do with it and for many new tasks I would need to write at first code, while there are very often more then enough solutions for Prometheus and Grafana.

Until now I only used them for demos of openSUSE Kubic and openSUSE MicroOS at conferences and fairs (like SUSECon19).

So time to modernize my monitoring in my Home Lab and use a containerized Grafana solution on Linux :)

The Linux OS: openSUSE MicroOS

All my servers are running openSUSE MicroOS, a quick, small environment designed to host container workloads with automated administration & patching. As rolling release distribution the software is always up-to-date. For automated updates transactional-update is used, which makes use of btrfs subvolumes. This allows to have a read-only root filesystem, which get’s updated in the background, so that the running process don’t notice it. If something goes wrong, the new snapshot will be deleted and it looks like as nothing has happend. If the updates applies, the system boots the next time into the new snapshot. If that fails, an automatic rollback to the last working snapshot is done. So no need anymore to spend hours after a broken update to fix the system :D, which saved me already quite some hours. And something which is unique to transactional-update: all changes to /etc will be reverted, too! Normally this is ignored, as this directory is not part of the read-only OS image.

Another advantage of btrfs is, that only the real changes of the update get stored on disk, which is really space efficient. So you can store many old snapshots for rollback or to create diffs between the snapshots, to see what has really changed. So no massive waste of disk space like you have with a A/B partition schema, no hard limit of old snapshots because you have only 2 or 3 partitions for that. The limiting factor is only the size of the disk and the size of the updates. I normally have around ~20 Snapshots on my system. And if they get to big, the system automatcially cleans up and removes enough old ones.

Podman kube

For various features I need Podman as container runtime. Podman comes with a very nice feature: podman pod and podman kube, which uses kubernetes yaml files, at least if they don’t use too advanced features.

There are many docker-compose.yaml files out there, which start Prometheus and Grafana, but I don’t want python on my OS (in my opinion python is good for applications to run inside the container, but python is a really bad choice for system tasks on the OS itself), so I couldn’t use them. On the other side, the documentation about how to create yaml files for podman kube play are not really existing and partly very complicated. But in the end I managed to create a working yaml file:

# Save the output of this file and use kubectl create -f
# to import it into Kubernetes.
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: monitoring
  name: monitoring
spec:
  containers:
  - name: prometheus
    image: docker.io/prom/prometheus:latest
    ports:
    - containerPort: 9090
      hostIP: 127.0.0.1
      hostPort: 9090
    resources: {}
    securityContext:
      capabilities:
        drop:
        - CAP_MKNOD
        - CAP_NET_RAW
        - CAP_AUDIT_WRITE
    volumeMounts:
    - mountPath: /etc/prometheus
      name: srv-prometheus-etc-host-0
    - mountPath: /prometheus
      name: srv-prometheus-data-host-0
  - name: grafana
    image: docker.io/grafana/grafana:latest
    ports:
    - containerPort: 3000
      hostIP: <your external host IP>
      hostPort: 3000
    resources: {}
    securityContext:
      capabilities:
        drop:
        - CAP_MKNOD
        - CAP_NET_RAW
        - CAP_AUDIT_WRITE
        - CAP_AUDIT_WRITE
      privileged: false
    volumeMounts:
    - mountPath: /var/lib/grafana
      name: srv-grafana-data-host-0
  restartPolicy: unless-stopped
  volumes:
  - hostPath:
      path: /srv/prometheus/etc
      type: Directory
    name: srv-prometheus-etc-host-0
  - hostPath:
      path: /srv/prometheus/data
      type: Directory
    name: srv-prometheus-data-host-0
  - hostPath:
      path: /srv/grafana/data
      type: Directory
    name: srv-grafana-data-host-0
status: {}

The <your external host IP> needs to be replaced with the host IP on which the grafana dashboard should be later accessible. Or localhost, if it should not be reacheable via the network.

This yaml file starts three containers:

  • *-infra - this is a podman helper container
  • monitoring-prometheus - this is the prometheus container
  • monitoring-grafana - this is the grafana container

Important to know: localhost is inside the POD identical for all containers. So grafana in the one container can connect via http://localhost:9090 with the prometheus container, and every process on the host can access e.g. prometheus via http://localhost:9090, too. But this is not valid for other PODs or in general for all containers outside the PODs: as they have their own localhost, you can only access them via the external network interface of the host, or if you do a more complicated network setup. The later one is still on my TODO list, but since podman announced to discontinue the currently used CNI network stack and wants to switch to their own, I’m waiting for that.

Directory layout, Configuration files and Permissions

But before we can start the containers, we need to create the necessary directories, to store the configuration files and persistent data. The directory layout looks like:

/srv/
  ├── prometheus/etc/prometheus.yml -> the prometheus configuration file
  ├── prometheus/data/ -> for the persistent database
  └── grafana/data/ -> for the persistent grafana data

Beside the existence of the diretories, the owner ships is as important. Prometheus writes as user nobody:nobody into /srv/prometheus/data, so this user needs write permissions. This is already a very big security design flaw, the user nobody should have never write rights anywhere. So make sure, that really nobody is allowed to look into the /srv/prometheus directory struct. Best is to do

chmod 700 /srv/prometheus

Grafana is here better, it runs by default with the user ID 472. So make sure that nothing is using this ID on your system and create a grafana system user for the data ownership:

useradd -u 472 -r grafana -d /srv/grafana/data

This command will create a system account grafana with the user and group ID 472. Afterwards we made sure that the grafana data directory is owned by this user, so that the grafana process can write into it:

chown grafana:grafana /srv/grafana/data

At least we need a configuration file for prometheus. For the start, we monitor prometheus itself:

global:

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

This config (stored as /srv/prometheus/etc/prometheus.yml) is very simple: we don’t change any global defaults and tell prometheus, to scrap localhost:9090 (so itself) every 60 seconds (which is the default) and store it as prometheus.

Done!

Run Containers

Now we just need to start the containers:

podman kube play monitoring.yaml

The command podman pod ps should show you one pod:

POD ID        NAME        STATUS      CREATED         INFRA ID      # OF CONTAINERS
63b242775da6  monitoring  Running     39 minutes ago  0e6192eb180c  3

The command podman ps will show you the three containers:

CONTAINER ID  IMAGE                                    COMMAND               CREATED         STATUS             PORTS                                             NAMES
0e6192eb180c  localhost/podman-pause:4.3.1-1669075200                        39 minutes ago  Up 39 minutes ago  127.0.0.1:9090->9090/tcp, 0.0.0.0:3000->3000/tcp  63b242775da6-infra
d9bbdaf48eb8  docker.io/prom/prometheus:latest         --config.file=/et...  39 minutes ago  Up 33 minutes ago  127.0.0.1:9090->9090/tcp, 0.0.0.0:3000->3000/tcp  monitoring-prometheus
96a3e9486239  docker.io/grafana/grafana:latest                               39 minutes ago  Up 35 minutes ago  127.0.0.1:9090->9090/tcp, 0.0.0.0:3000->3000/tcp  monitoring-grafana

Start container with every boot

While the containers are now running, we need to make sure, that they will be started with the next reboot, too. For this, podman comes with a very nice and handy systemd service: podman-kube@.service. This service will not only start the pod, but also makes sure, that the containers are current and update them if necessary.

The configuration file with complete path is passed as argument. The path needs to be escaped, but for this there is a systemd-escape.

So the final command to enable the systemd service would be:

systemctl enable "podman-kube@$(systemd-escape /<path>/monitoring.yaml).service"

Setting up Grafana

Now we can connect to the grafana dashboard using the url http://<hostname>:3000:

The grafana container comes with a default login:

  • User: admin
  • Password: admin

Grafana enforces a password change at the first login for this reason.

Before we can see the first data, there are two important steps to do:

  1. Configure prometheus as Datasource
  2. Create a dashboard for prometheus

Adding prometheus as datasource is simple:

  • Select DATA SOURCES and afterwards Prometheus.
  • Enter http://localhost:9090 in the HTTP URL field. Remember, this are the hostIP and hostPort fields for the prometheus container in the yaml file.
  • Go to the end of the page and select Save & test.

This should give you a “Data source is working” message.

Select the grafana logo in the upper left corner to come back to the main screen, now we need to add a dashboard. Since we want to use an existing one and don’t create a new one, we do not select DASHBOARDS, but the icon with the four squares on the left side. In the menu we go down and select Import. We Add the ID 3662 in the Import via grafana.com box and click on Load. This will import the Prometheus 2.0 Overview dashboard. But before we can finally import it, we need to select the datasource. On the next page, in the prometheus dropdown box, select Prometheus (default) and click on Import.

Now you should see your first dashboard:

Deploying Node Exporter

Monitoring Prometheus itself is now not the most exciting tasks, much more important is to monitor your servers. The most common tool used here is Node exporter, which is a prometheus sub-project and exists also as container.

Podeman kube yaml file

At first, we need a new yaml file. We could add the container to the existing monitoring.yaml file, but since we want to install it on several servers, it makes sense to create a standalone version of it:

# Save the output of this file and use kubectl create -f
# to import it into Kubernetes.
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: node-exporter
  name: node-exporter
spec:
  containers:
  - args:
    - --path.rootfs=/host
    image: docker.io/prom/node-exporter:latest
    name: node_exporter
    securityContext:
      capabilities:
        drop:
        - CAP_MKNOD
        - CAP_NET_RAW
        - CAP_AUDIT_WRITE
    volumeMounts:
    - mountPath: /host
      name: root-host-0
      readOnly: true
  enableServiceLinks: false
  hostNetwork: true
  volumes:
  - hostPath:
      path: /
      type: Directory
    name: root-host-0

Start Node Exporter

This time we don’t need to create any diretories or configuration files, so the command to enable and start the node-exporter is just:

systemctl enable --now "podman-kube@$(systemd-escape /<path>/node_exporter.yaml).service"

Add Node Exporter to Prometheus

To teach Prometheus about Node Exporter, we need to add the following lines at the end of the /srv/prometheus/etc/prometheus.yml file, so in the scrape_configs section:

  - job_name: 'node'
    static_configs:
    - targets: ['server1.example.com:9100']
    - targets: ['server2.example.com:9100']

We will not create an own entry for every host, instead there will be one node entry, and every host to scrape the metrics from is added as target.

We only need to restart prometheus to make it aware of the config file change:

podman restart monitoring-prometheus

Add Node Exporter Dashboard to Grafana

Now we will import that grafana dashboard with the ID 13978 like the first dashboard and you should see something like the following:

Finished

That’s all for the beginning.

In the next blogs of this series I will explain how to monitor the Fritz!Box including the Smarthome devices (which I use to monitor my balcony power plant) and several IoT devices (power measurement and temperature) with prometheus, InfluxDB and a MQTT broker.