How we extend Ansible to enable complex configuration management

September 14, 2021

Tom Martensen

Tom Martensen is a site reliability engineer in the IT operations team at Data4Life. He has a background in software engineering and drives initiatives to increase automation and reduce toil within the ITO team.

In the cloud era with on-demand virtual machines it’s more important than ever to reduce configuration drift between servers. Configuration drift can lead to unexpected and unintended side effects, or worse, unexplainable outages.

One way to tackle this is to manage infrastructure as code (IaC), where all configuration is stored in a human- and machine-readable version control system.

In this blog post, the first in our series on configuration management, we take a look at the agentless automation engine Ansible and how it can be used to both deliver configuration to, and provision, applications.

Motivation

Ansible is an agentless automation engine that executes tasks within playbooks on a group of remote servers. In our data centers, we use Ansible to configure virtual machines after they're created in OpenStack.

We write playbooks, variables and tasks in YAML. Our playbooks provision database servers, let Kubernetes nodes join existing clusters, and place logging configuration.

Ansible playbooks are written as idempotent and declarative configuration management, which makes complex, imperative processes difficult to implement.

Of course, imperative processes should be avoided at any costs in configuration management, but they do appear in real-world situations.

A typical Ansible task

To explain some of the important terminology, we’ll use the example code snippet below.

During the setup of a Postgres server, we must ensure that the configuration files for the client authentication exist with the expected content at a certain location. We do this with the builtin template module as a task:

- name: Update pg_hba.conf
  template:
    src: pg_hba.conf.j2
    dest: "/etc/postgresql/pg_hba.conf"
    owner: "postgresql"
    group: “postgres”

This step is repeatable (it doesn't have unexpected side effects) and abstracts from the internal steps that are executed on the host and remote server.

Tasks like this use Ansible collections, which are versioned bundles of reusable modules and entire roles for simplified distribution and test-driven development.

For an Ansible expert, who’s new to this particular playbook, it’s very clear what will happen.

Shortcomings

However this format shows significant shortages where complex steps are required to achieve a configuration.

Our example here is requesting a certificate from our public key infrastructure (PKI). In pseudocode these steps must be executed:

Compile the certificate request with a comma-separated list of domains for which the certificate should be valid — matching the format that our Hashicorp Vault PKI expects.
Send an authenticated request to the Vault API with the certificate request.
Parse the certificate, private key, and issuing certificate authority (CA) from response or fail with an understandable error message.
Verify the certificate chain.
Return the certificate, private key, and issuing CA as variables so that they can be placed on each of the servers.

We observe the following issues that would be difficult to implement with the declarative logic of Ansible:

Multiple points of failure and error handling, for example, failed requests to Vault, verification of the certificate
Optional variables or variable preprocessing, for example, the time to live (TTL) or the comma-separated list of domain names
Return values that must be passed between the steps through Ansible variables

Solution

As site reliability engineers, we're software engineers at heart and strive for simple solutions, using the right tool for the job. The problems presented (error handling, conditional branching, functions) cry for an imperative implementation.

Fortunately, Ansible has added support for custom modules, which can be used to extend the builtin functionality with Python code. Here’s how the Ansible developer documentation describes modules:

A module is a reusable, standalone script that Ansible runs on your behalf, either locally or remotely. Modules interact with your local machine, an API, or a remote system to perform specific tasks like changing a database password or spinning up a cloud instance. Each module can be used by the Ansible API, or by the ansible or ansible-playbook programs. A module provides a defined interface, accepts arguments, and returns information to Ansible by printing a JSON string to stdout before exiting. - Ansible developer documentation

While there are already thousands of modules in the Ansible Core, the community has implemented many more in collections. Of course, we can do the same!

Minimal Ansible module structure

Below is a minimal Ansible module. In this example, we check that an expected mount point exists, meaning a device is mounted on our server.

#!/usr/bin/python3

from ansible.module_utils.basic import AnsibleModule


def run_module():
    module_args = dict(
        expected_mount=dict(type='str', required=True),
        current_mounts=dict(type='list', elements='dict', required=True)
    )
    result = dict(changed=False)
    module = AnsibleModule(
        argument_spec=module_args,
        supports_check_mode=True
    )
    expected_mount = module.params['expected_mount']
    current_mounts = module.params['current_mounts']

    # Check if expected mount exists
    mount_points = [x["mount"] for x in current_mounts]
    if expected_mount not in mount_points:
        failure_message = f"Your mount point {expected_mount} does not exist."
        module.fail_json(msg=failure_message, **result)

    module.exit_json(**result)


def main():
    run_module()


if __name__ == '__main__':
    main()

This is implemented within the run_module() function. First, we define the input parameters (lines 7-10) as a string for the expected_mount and a list of dictionaries for the current_mounts, detailing the mount point, format, and mounted device.

After some more initialization in lines 11 to 17, the actual implementation begins.

We check in line 21 if our expected_mount is within the list of the current mount points. If that isn’t the case, we construct a failure message and let our module fail (lines 22 and 23).

Using the built-in fail_json() method, we tell the Ansible controller that this task has failed. Depending on how the module was called, this can now interrupt the entire playbook or be ignored.

In the success path, we call the exit_json() method, signalling to Ansible that the task was successfully executed.

There’s a whole lot of recommended documentation and best practices for Ansible modules. To find out more about modules, see the Ansible developer documentation. And if you're going to implement your own modules, we highly recommend using the sanity check command to validate your syntax.

Now that we’ve finished implementing our module, we want to use it in our Ansible playbook. Therefore, we need to store the module code on the host as the ~/.ansible/plugins/modules/mount_check.py.

In our playbook, we can then call it from our Ansible playbook or role as a task:

- name: Ensure that mount point exists
  mount_check:
    expected_mount: "/data"
    current_mounts: ""

Here, we check for the existence of the /data mount point within the ansible_mounts variable, which contains the mounted volumes on the remote server.

Ansible automatically fills it.

And that’s it! It looks quite similar to how we used the template module above, doesn’t it?

What’s next?

Hopefully, you’re now aware of the necessary steps to extend the configuration management tool Ansible with a custom module to enable custom and complex logic.

Look out for the next blog post in this series where we’ll show how to create and distribute an Ansible collection for this module.

Share using social media

How we extend Ansible to enable complex configuration management

Motivation

A typical Ansible task

Shortcomings

Solution

Minimal Ansible module structure

What’s next?

More posts

Cumulative Layout Shift at Data4Life

Introducing Core Web Vitals at Data4Life

How we extend Ansible to enable complex configuration management