How to collect and visualize logs with Argus (Promtail, Loki and Grafana)

Mathias Darscht, 18 October 2022

Absence from interference was always an important part in the IT and people are talking a lot about observability in the last few years. This year a new managed service which is responsible for tracking the software and infrastructure in realtime went live: STACKIT Argus. This article is about tracking logs with Argus which is part of the observability topic.

Observability

What is observability? Observability in the software industry means to make a (distributed) software system transparent in order to identify and solve problems and to optimize the performance and the customer experience. With the intention of observing a software system a CNCF standard was defined: Open Telemetry. The standard defines the data exchange between application and surveillance platforms in the following main categories:

Metrics

A representation form which contains a structured date in combination with a value is called metric. Sometimes metrics hold more data like timestamp, name, KPI, etc. Metrics are measured over a period of time. The most common metrics are the golden signals:

  1. Latency: The time it takes to service a request.
  2. Traffic: A measure of how much demand is being placed on your system
  3. Errors Rate: The rate of requests that fail and succeed.
  4. Saturation: How ‘full’ your service is.

Traces

Traces collect information to one request while it runs through the distributet system. They consist of one or many spans which are condensed over a tracing id. Those tracing ids are passed over the request and system and allow E2E tracking.

Logs

By logs are meant log lines. A log line is usually unstructured and unformatted text. It contains a record of an event which is explicitly programmed by the developer inside of the application. Logs mostly contain a timestamp and a messsage, sometimes some information about the computer operation. By adding meta data structure can be added to the logs for example by adding of an error code. Logs can have multiple log levels like :

  • INFO
  • WARN
  • ERROR
  • DEBUG

Those log levels are part of the meta data which are defined by the developer.

What is STACKIT Argus?

STACKIT is the public european cloud platform of the Schwarz IT. Schwarz IT is responsible for the digitalisation inside of the Schwarz Group. With Schwarz IT and STACKIT external and internal customers like LIDL, Kaufland or PreZero can have a state of the art cloud infrastucture. Argus is a managed service of STACKIT that provides a powerful observability-toolset. The goal of this service is to collect and visualize data and observes the application.

Argus consists of open source components that are not developed by Argus. The components that are important for logging are:

  • Loki - for saving logs
  • Grafana - for visualizing the logs

For getting the logs into Loki Promtail is required.

Loki

Loki is a mutli-tenant log aggregation system and the log storage component of Argus. It is horizontal scalable and made to be highly available. Loki’s querying language LogQL is inspired by Prometheus’s PromQL but mainly created to use with logs because Prometheus was only created for metrics and Loki for logs. For saving the logs Loki uses a multidimensional label-based approach of indexing.

Grafana

Grafana is used to visualize the data that is collected. Sources for the visualization could be other components of Argus like Prometheus, Loki, Thanos, or external sources like InfluxDB, PostgreSQL, Graphite, etc. With Grafana the logs can be queried or structured via different querying languages for example LogQL , structured via Transformation and arranged/visualized in dashboards with panels.

Promtail

For delivering the data to Argus’s Loki instance a Promtail agend is required. Promtail is not a component of Argus but can be installed on the system on which the monitored application is running. It is responsible for observing the configured directories for the specified log files. In case of new log entries Promtail will send those to the Loki instance of Argus. The configuration is done inside of the config.yml file which is located in /etc/promtail/config.yml.

The composition in combination with a sample application could look like this:

The application which is observed produces logs into certain directories. Promtail observes those and collects all specified logs. After that Promtail sends those logs to the Loki component of Argus. Loki saves the logs. To visualize or analyze the logs Grafana connects to Loki and gets the logs.

Example - Get Logs to Argus

Like in the diagram above there is an application needed that has to be observed. For this example lets pretend there is an application that is producing logs to the following directory /home/logs to a file called test.json. In addition to this Promtail has to be installed on this system. Now the setup of the config.yml file is done. It can look like this:

server: # configuration of the server where the application is running
	http_listen_port: 9090 # listening/publishing on the port 9090 of the system where the application is running
	grpc_listen_port: 0
	log_level: "debug"
	
positions:
	filename: /tmp/positions.yml # helps promtail to continue reading from where it left off in the case of the Promtail instance restarting
	
clients: # Argus's instance of Loki is a client of the promtail
	url: https://<username>:<password>@logs.stackit.argus.eu01.stackit.cloud/instances/<instanceId>/loki/api/v1/push # this url can be gotten from argus api /v1/instances/<instanceId> "logsPushUrl"

scrape_configs: # configuration about what and where to scrape
- job_name: logcollector
  static_configs:
  - targets:
      - localhost # where it will be published, could be an ip address
    labels:
      job: log-file-uploader
      __path__: /home/logs/*json # path to the directory that should be observed with file ending 'json'
      # searching for .json log files in the /home/logs folder

For more information click here. The configuration is done and the application is producing logs into the test.json file which means that the logs are already is on the Argus instance. The logs of the sample application are pretty simple and look like this:

{"yourFirstLabel":"yourFirstValue","yourSecondLabel":"yourSecondValue"},
{"yourFirstLabel":"yourFirstValue","yourSecondLabel":"yourSecondValue"},
{"yourFirstLabel":"yourFirstValue","yourSecondLabel":"yourSecondValue"},
{"yourFirstLabel":"yourFirstValue","yourSecondLabel":"yourSecondValue"}

To check this the Grafana instance of Argus has to be accessed. A new dashboard with an panel has to be created.

To access the logs, Loki has to be selected as the source. To find the entries the following search has to be placed in the search bar:

{filename="/home/logs/test.json"}

Beyond that the visualization should be set on Logs. The result looks like the following:

The fields from the test.json file are inside of the ‘Detected fields’ section and not as accessable as they could be. LogQL has a ‘line filter expression’ for this case and the query changes to the following:

{filename="/home/logs/test.json"} | json

With the | json line filter expression the data is passed to a JSON parser which makes the fields accessable. The visualization changes to:

The fields are now available in the search or for the Transformation for further restructuring. Besides | json there is the a line expression in which regular expressions can be defined to do the log structuring. Most of the structuring can be compensated with help of transformations.

Here are some transformation functions:

  1. Add field from calculation
  2. Concatenate fields
  3. Config from query results

Conclusion

Argus is a powerful observability stack based on open source components and is available on the STACKIT cloud. Keeping track with Argus on the logs of an application can be done with a few steps as shown in the example above by sending, structuring and visualizing the data. But the performance of Argus with querying logs especially the unstructured logs is much slower than querying metrics. Argus is more suited for querying amounts of metrics. Whenever there is a possibility to transform the logs to metrics before sending them to Argus is an advantage.