Telemetry, Automation and Source of Truth
A couple weeks ago I listened to a couple of Heavy Networking podcasts, HN713 and HN717. Here are some of my reflections on what I absorbed from these podcasts and how I applied this knowledge to my work this week at RG Nets.
Recent updates
- Passed CompTia Security+
- Began working on a new exciting PMS integration opportunity
- Worked on more Telemetry stuff, including developing my presentation for WLPC
Source of Truth (SOT)
One of the primary topics of automation is Source of Truth (SOT).
The source of truth concept is interesting, but like any "pithy phrase", it's hard to truly grasp the implications from those 3 small words, and the reality is always more complicated. As one of the speakers pointed out, there's at least 3 levels of SOT: There's what was intended to happen, what was configured to happen and what actually happened. Loosely speaking we could also call these the intention level, the configuration level and the customer level.
What actually happened is the "lowest" level: this is what the customer actually experienced and may be calling into your hotline to complain about.
Then there is the intermediate level of what was configured to happen. This is the level of device configuration, eg was the AP/Switch/CPE and all MSP equipment configured correctly.
Then there is the highest level of what was intended to happen. One of the speakers mentioned a folder that has 1 configuration-file-per-device, that they know exactly how to grep through and find intended configuration. VLANs-in-spreadsheets and VISIO were also discussed. Everyone can probably do this level a little differently, but I think many would agree that the most powerful solution will be using a full-on relational database which is what the rXg does.
SOT and Telemetry
Jumping back to Telemetry, the SOT concept is very relevant for receiving and processing telemetry data. What's particularly important is the "direction of flow" of the data, or in other words, keeping the telemetry data at "device-from-device" level and not elevating it to an intention, or "data-to-device" level.
For example: Let's say that an rXg operator has configured a particular WLAN to use MAC Auth. The intention of the operator is to use MAC Auth. If that particular WLAN is actually configured to be Open on the device, then that information SHOULD NOT update the intention configuration of that WLAN on the rXg to "Open". Instead it SHOULD notify the operator of a configuration mismatch.
One of the speakers mentioned a "holy trinity" of an integration updating a device: there's the current state, the intended state, and the resultant state. I think that telemetry has a similar division: there's the intended config, the device's actual config, and there's the data coming in from the Telemetry. Using the data coming in from the telemetry to identify discrepancies between the intended config and the actual config is a really exciting goal.
I think the most important lesson about SOTs is that they need to flow from the top down, and when that is not happening, we need to find out why.
Automation = SOT + Integration
When we talk about trying to automate things, the thought that keeps coming back to my mind is that what we are trying to make the computer do exactly what the smartest network engineer on the planet would do if they knew the whole situation and had all the time in the world to do every double-checking procedure they can think of. Since human time and effort are the most expensive things any business can purchase, doing this automatically is a huge opportunity.
What "doing this automatically" kind of comes down to is having a SOT that is digitally encoded and an integration that knows how and when to translate that SOT into configuration changes pushed to some other device. That automation should have some way of detecting if it was successful or not and alerting or rolling back if not. Definitely something I will be pondering.