We recently learned about Prometheus, which is a software for monitoring processes and services. It can be configured to collect data, store it in its internal time series database and it even comes with some basic dashboard building functions. Many services have a metrics page, with some information about them and Prometheus can be configured to scrape those metric pages.
We would love to gather information from Luigi e.g. how many jobs are currently running or how many failed jobs exist at a time. Unfortunately Luigi does not have such a metrics page. So we came up with a solution.
We created a package Luigimetrics, that starts a service. Whenever this service is called, it will load the current state of the Luigi server, using a headless web browser and pull all the relevant information from it. This information is then returned in a Prometheus readable format. Here is a sketch of how Luigimetrics is used including its dependencies:
In order to start Luigimetrics you need to have its dependencies installed, namely PhantomJS, Selenium and Flask. PhantomJS must either be available in the PATH variable or you need to tell Selenium where to find it. Flask can be configured to run on any port, default is 5000. Also you have to configure where Luigimetrics can find your Luigi server and of course you need to tell Prometheus where to find Luigimetrics. Usually we use conda for setting up environments and installing dependencies. In order to make this a bit easier we created a Dockerfile that installs all the dependencies and sets everything up. Per default it looks for Luigi on localhost port 8082, which of course you can change to whatever you need. Here is how to build and run it:
$ docker build -t luigimetrics .
$ docker run --network="host" --rm luigimetrics
The package is in an extra repo on our github. There's also more information about how to use it. Our repo also contains a yaml file to configure Prometheus with. When we find the time, we can maybe upload a ready-to-use docker image to dockerhub.We also thought of a few ways how this problem could be solved better. First of all, if Luigi had its own metrics page we wouldn't need a separate package! As far as we know there is no such page yet, only thing we found was this github issue. Or instead of creating a separate metrics page, the metrics data could be pushed to Prometheus. Since earlier this year the PhantomJS software is no longer worked on, therefore we could replace it with e.g. headless firefox. We also read about Luigis task history feature, maybe that could be utilized. And finally we stopped at the point where we could load the metrics into Prometheus and used Prometheus' expression browser as a dashboard. For better visualization we could add a predefined grafana config.


Keine Kommentare:
Kommentar veröffentlichen