Getting rid of operational tasks using StackStorm

Tobias Ramm, 17 November 2023

We operate multiple platforms to automate even more operational processes. People automate things by using scripts locally, executing playbooks in our Ansible Automation Platform or workflows in Rundeck to clean up full hard disks or restart services. These automations are used by colleagues doing 24/7 to allow fast recovery in terms of service outages.

semi-auto

The following text describes how the StackStorm platform is used to get rid of operational tasks and respond to events automatically.

The StackStorm platform is known in the Schwarz Group as GROOT. GROOT stands for Get Rid Of Operational Tasks.

groot

As a part of the Power Up Infrastructure initiative of Schwarz IT we are using GROOT to give the community a platform to automate workflows across the entire infrastructure and beyond.

power_up

What is StackStorm?

StackStorm is an open-source automation platform that is designed to streamline and automate repeatable operational tasks in IT environments. The flexibility and extensibility make it a versatile automation platform suitable for a wide range of use cases in IT and beyond. It improves efficiency, reduces errors, and responds to operational challenges more rapidly. A wide range of use cases for automating workflows, orchestrating processes are available to integrate various systems and tools. 1

stackstorm

StackStorm helps automate common operational patterns. Some examples are:

  • Alerting and Notification Handling - Integration with monitoring and alerting systems, such as Nagios, Azure Alerts or Prometheus, to automatically respond to alerts. This includes sending notifications, escalating issues, and executing predefined scripts to address problems.
  • Incident Response and Remediation - Automate the detection and response to incidents, such as security breaches or system outages. When an incident is detected, predefined workflows can be triggered to contain the issue and initiate remediation procedures.
  • Scheduled Maintenance and Routine Tasks - Supporting routine maintenance tasks like system backups, patching, log rotation and resource scaling. Tasks can be scheduled and executed, including task fulfilment validations.
  • DevOps Automation - Facilitates the release process and covers aspects of monitored DevOps pipelines, including code deployment, testing, and integration. Actions can be triggered based on code commits.

The sky is the limit

StackStorm helps to compose these and other operational patterns as rules and workflows or actions.

Packs are formed out of rules, workflows, testcases, sensors and triggeres defined in YAML format and stored as code in git repositories. This supports the same collaboration approach that is used for code development. The packs can be shared or reused by the StackStorm community or in your private git repositories.

Example Use Case

StackStorm is used for the linux package update workflow of the virtual machines in our STACKIT cloud project. To execute the workflow a cron trigger is created in the StackStorm rule engine.

Using the action server-list of the stackit pack obtains informations from the VMs in the STACKIT project. While iterating the stored serverlist various actions are performed. Commands on remote systems are executed using actions of the core pack provided by StackStorm.

Before installing the patches the action clean_up_disk removes cached packages and temporary files. The Linux updates are installed using the action execute_linux_updates. After that a check is used to determine if a reboot is necessary. If a reboot is required, a downtime is set in Nagios and a reboot takes place. Finally a notification including the workflow results is sent to a chat.

In case of errors a generic alarm handler workflow sends alerts, notifications or creates issues in ticket systems.

workflow

Conclusion

StackStorm platform is a versatile automation and orchestration platform that is well-suited for IT and operations automation, especially in complex, event-driven scenarios. Due to the open-source nature, its extensibility and the support for custom integrations the platform is a strong choice for organizations with diverse automation needs.

For almost all our central services integrations were either developed individually or used out of StackStorm exchange. Almost all products can be linked with each other in workflows to execute comprehensive tasks automatically.

integrations
This helps to save a lot of time.

Further Links

Sources