Tuesday, December 29, 2020

Why Automate? Ansible Playbooks and Desired State for Network Operating Systems

Don't Reinvent the Wheel: Ansible Playbooks

Writing your own code isn't always the answer

Often, communities such as Python will contribute code of substantially higher quality than what you/I can create individually.

This is OK. In nearly every case, dyed-in-the-wool traditionalist programmers will consume "libraries" in their language of choice - it's only an outsider perspective that developers create everything they use.

In modern engineering, a true engineer or architect will often apply practices they studied in college to real-world situations instead of trying to create their own solutions. This doesn't discount creativity, nor does it discount those who are more pragmatically oriented. Without creativity, we have no way to improve engineering practice, and without pragmatism, we have seen some pretty serious loss of life: https://interestingengineering.com/23-engineering-disasters-of-all-time

...but you still have a lot of work to do

Adapting engineering practices, code from the internet, Googled Cisco example topologies as a matter of practice does take work. Do you trust all code from Stack Overflow? Cisco-answers.net (not a real website)?

You shouldn't, and modern engineering practice doesn't either. In nearly every case, the ability to apply engineering practice to a problem comes with years of training, millennia of past examples (failures and successes) as history for individual practice, ideally with similar applications. A good example of this is the study of brittle fractures where manipulating (maximizing) material hardness is no longer an automatic victory, but more of a serious safety risk.

We live in a simpler world of abstraction and pure mathematics, and behaviors are a lot more reliable - but they're not perfectly so. We as designers and implementers of computer solutions (Network, Systems, don't care) can learn from our more disciplined cousins. I'll write more on this later, but for now, let's simply at least agree to review every action critically.

Playbook Automation

Let's use the lens of an engineer evaluating a technical control here. Ansible is going to be my example here, as it's probably the most straightforward.

Supporting Files

While it is possible to run a standalone, self-supporting playbook, it's not generally recommended at scale. The first step towards leveraging this automation is by defining an inventory. As always, this is typically in YAML, so most of the effort goes into structuring your data as opposed to actual work.

Some recommendations:

  • Don't let names collide between production, lab, etc. We don't want to have a Wargames scenario in anybody's production network.
  • Make sure it makes sense. It's pretty easy to over/under-organize; think about the smallest elemental unit you may work on.
  • Leverage Source Control! Save a copy, keep your revision history. Even better, get peer reviews.
  • Remember, this can be edited later! This should continually improve.

Example (loosely based from https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html)

I'm using the project (virtualized Clos Topologies) as a prefix, and then organizing device types from there. Spines don't need VLANs, and will be route reflectors - which is enough to justify separation in this case.


      ansible_host: ""
      ansible_host: ""
    ansible_network_os: vyos.vyos.vyos
    ansible_user: vyos
    ansible_connection: ansible.netcommon.network_cli
      ansible_host: ""
      ansible_host: ""
    ansible_network_os: vyos.vyos.vyos
    ansible_user: vyos
    ansible_connection: ansible.netcommon.network_cli

Let's explain what I've done here. There are a few deviations from the typical. I'll try to explain them here:

  • YAML Inventory: This is just me, I prefer it over the INI format as a Linux guy. It also helps a lot with structured hierarchies, which I like as a network guy.
  • Variable declarations:
    • Per Ansible's documentation on networking, we do know that there are a few things unique to network automation - namely the lack of on-board python. This means that the Ansible control node (the one EXECUTING the playbook) needs to know that it's doing all of the planning/thinking. For this to work, we need to make a few unique (but re-usable) declarations
      • ansible_network_os: More or less does exactly what it says. There's a built-in ansible interpreter for VyOS - but this is really only true for a handful of network distros. You can get more from Ansible Galaxy, but extensive testing should be applied.
      • ansible_connection: This is basically the "driver" for the CLI. You can use Paramiko or SSH as well. this is primarily governed by your Network OS.
      • ansible_user just instructs the control node on what username to attempt against the target host.

Outside of this, I have also set up SSH key authentication to all VyOS nodes. It's pretty easy: (https://wiki.vyos.net/wiki/Remote_access)

set system login user vyos authentication public-keys key1 key blahblahblah
set system login user vyos authentication public-keys key1 type ssh-rsa

The Playbooks


Before designing a playbook, we do need to cover some of Ansible's key design values:

  • Idempotency: Run once, get the same result every time. If a change already has been made and is invasive, don't repeat it unless the state doesn't match.
  • Thin Veil of Abstraction: You should be aware of what is being implemented from a technical perspective, but not have to control every last aspect of it.
  • Be Declarative: Try to design from the abstract concept you want to implement, and fill in the technical details as needed, not the other way around.

Day 0, get the system online

In this example, we want to have four devices have some level of usable configuration, and we don't want to do lots of manual, error-prone editing to get there. We're going to adapt my base configuration for this purpose by re-tooling it to support Jinja deployments. At a high level, Jinja playbooks:

  • Load Variables: This will be a separate file, effectively designing the what of your deployment
  • Load Template, then translate variables: This will be executed by the template module

We'll keep this example pretty short - it's available in the linked repository, but we also want to leverage idemopotency for future changes. It doesn't leverage inventory, because it's creating base configurations to be applied by some other method.

Fun fact - this is the first stage to any Infrastructure-as-Code implementation. The created end results (*-compiled.conf) can be directly applied, or by using a "Day 2 Method".


  hostname: 'vyos-router.engyak.net'
  domain: 'engyak.net'
  timezone: 'US/Alaska'

Execution (Playbook):

- hosts: localhost
    - name: Import Vars...
        file: vyos-base.yml
    - name: Combine vyos...
        src: templates/vyos-base.j2
        dest: vyos-compiled.conf

Day 2, apply routine changes

In this example, we've already started the deployment, and have it up and running. We have some form of routine change to make, but we want it to be consistently applied, and idempotently. This will mean that the configuration change playbook shouldn't contain anything about the specific change in an ideal world with this method.

- hosts: vclos_l0.engyak.net
    - name: Apply on L0!
        src: 'vyos-l0-compiled.conf'
        save: yes
- hosts: vclos_l1.engyak.net
    - name: Apply on L1!
        src: 'vyos-l1-compiled.conf'
        save: yes
- hosts: vclos_s0.engyak.net
    - name: Apply on S0!
        src: 'vyos-s0-compiled.conf'
        save: yes
- hosts: vclos_s1.engyak.net
    - name: Apply on S1!
        src: 'vyos-s1-compiled.conf'
        save: yes

This will re-apply any changes that are staged via the base configuration and Jinja merge repeatedly if re-executed.

Note: This particular network driver is not idempotent. In production networks something like NAPALM/Nornir may be more appropriate. You can verify if a method is idempotent by repeatedly running the playbook - an expected result is changed=0.

18:55:40 PLAY [vclos_l0.engyak.net] *****************************************************
18:55:40 TASK [Gathering Facts] *********************************************************
18:55:41 [WARNING]: Ignoring timeout(20) for vyos.vyos.vyos_facts
18:55:44 ok: [vclos_l0.engyak.net]
18:55:44 TASK [Apply on L0!] ************************************************************
18:55:49 changed: [vclos_l0.engyak.net]
18:55:49 PLAY [vclos_l1.engyak.net] *****************************************************
18:55:49 TASK [Gathering Facts] *********************************************************
18:55:49 [WARNING]: Ignoring timeout(20) for vyos.vyos.vyos_facts
18:55:53 ok: [vclos_l1.engyak.net]
18:55:53 TASK [Apply on L1!] ************************************************************
18:55:57 changed: [vclos_l1.engyak.net]
18:55:57 PLAY [vclos_s0.engyak.net] *****************************************************
18:55:57 TASK [Gathering Facts] *********************************************************
18:55:58 [WARNING]: Ignoring timeout(20) for vyos.vyos.vyos_facts
18:56:02 ok: [vclos_s0.engyak.net]
18:56:02 TASK [Apply on S0!] ************************************************************
18:56:06 changed: [vclos_s0.engyak.net]
18:56:06 PLAY [vclos_s1.engyak.net] *****************************************************
18:56:06 TASK [Gathering Facts] *********************************************************
18:56:06 [WARNING]: Ignoring timeout(20) for vyos.vyos.vyos_facts
18:56:10 ok: [vclos_s1.engyak.net]
18:56:10 TASK [Apply on S1!] ************************************************************
18:56:14 changed: [vclos_s1.engyak.net]
18:56:14 PLAY RECAP *********************************************************************
18:56:14 localhost                  : ok=12   changed=4    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
18:56:14 vclos_l0.engyak.net        : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
18:56:14 vclos_l1.engyak.net        : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
18:56:14 vclos_s0.engyak.net        : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
18:56:14 vclos_s1.engyak.net        : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

The next step is important - automatically updating a network based on configuration changes! As always, my source code for executing this is here. Note that this is a moving project and will get updates with future posts.

No comments:

Post a Comment

PAN-OS IPv6 Error: bgp peer local address 0:0:0:0:0:0:0:0 does not belong to interface

  When encountering this error, please ensure that "Enable IPv6" is set under interfaces: Hope this helps! Happy IPv6ing!