RESOLVE ‘22: Observability and AIOps sitting in a tree
In our first session from RESOLVE ‘22, we were honored to have Darren Boyd and Satbir Sran from the Incubator podcast and ink8r think tank talk observability and AIOps with BigPanda’s Aaron Johnson. Both panelists are part of communities adopting open standards, and they regularly consult with organizations about how they can improve IT Operations and overall performance.
The discussion began with defining observability and quickly led to their thoughts on trends in the space, how AIOps and observability are connected and the possibilities that emerge when organizations become vendor agnostic.
What is observability?
For Darren, that was “inferring an internal state of your system from external outputs.” He said organizations are good at seeing events and then diving in to triage and find root cause. And what they’re getting better at is ad hoc capability: being able to ask your system questions that are not pre-canned to “build inquisitiveness” in the system. For Darren, that’s where observability kicks in.
Satbir agreed and took the concept a step further, noting that an important component of observability is being able to have visibility into micro-parts of infrastructure so you can diagnose problems on a more granular level. For example, if it appears that an application has gone down, observability allows you to see that it is not the entire application that is down—just a component of it. This ability is vital as organizations break down traditional, monolithic applications into microservices and build out more complex architectures.
Common challenges and trends in observability
Trends in this space exist across multiple dimensions, according to Darren. One common trend he cited from a recent report on observability is that businesses are realizing observability can even help with time to market and product introduction. Observability helps with bringing resilience engineering to software to assist with the regression and performance management of code.
Satbir added that observability is solving a common challenge for enterprises today that are coming off legacy environments in which they had monoliths and added dozens of tools as their environments evolved. Observability “actually compliments that single-pane view of an aggregation layer better,” he noted.
And this, all panelists agreed, is where the birth of AIOps occurs.
Observability with AIOps: the connection
One of the great benefits provided by an AIOps platform is that it can enable decoupling the connection between data sources (observability tools) and data syncs (ITSM and automation solutions). Darren did a deep dive into how organizations can decouple their source to sync by putting in an observability pipeline—an engine that can process, collect, enrich, and eliminate data across all types of observability. Decoupling their source to sync allows them to be vendor neutral in their journey to automation. AIOps then aggregates all feeds from multiple tools, which drives automation. With AIOps and less dependencies on specific vendors, organizations can “make rationalizations without impacting the user.” Darren says if organizations don’t take this approach, they are looking at a “36-month unwinding of all your tools if rationalization is a part of the challenge.”
Satbir supported Darren’s suggestion for organizations, but he added that they might want to think about a second approach to creating an observability pipeline. Instead of starting with building a pipeline that serves as a collection or routing function for all telemetry, organizations that have a more chaotic environment to handle should consider implementing an AIOps function and correlation on the front end to get a single pane that gives you a better view into your environments. From there, you can “hopefully reduce your MTTR,” he says, and work “top-down” to build the various components of an observability pipeline in a more behind-the-scenes way.
The importance of vendor agnosticism
The concept of decoupling your source to sync is related to introducing more flexibility into your environment, said Darren. Organizations typically rely on vendor infrastructure, but once you start to separate the source—and “how you’re collecting the source”—you can access a broader set of collection techniques and transform your observability pipeline.
You can enrich it, remove superfluous data, shape it and put layers of guardrails in place, explained Darren. Those guardrails, as he later detailed, can help you codify actions in your observability pipeline to better enable you to protect your organization and give full flexibility to product teams.
Satbir underscored these points by mentioning that the pipeline is important from more than just an operational standpoint. Through your pipeline, you can have the ability to take in data that’s not just for IT Ops, but for any product team. “They can subscribe to that and have the ability to get that feed, so now they’ve become observable. An organization is building that observability across it,” he said.
The outcomes of a mature observability strategy
Business outcomes can include minimizing defects, increasing the chance of delivering sound code and having less errors, said Satbir. The outcome also ideally includes reducing the IT Ops burden. To get there in the past, organizations accumulated dozens of tools. With aggregation and AIOps, you can correlate your telemetry in order to gain a single pane environment. You can then see where your telemetry is coming from and reduce dependency on certain tools and technologies so you can start to rationalize some tool sets. From there, you can figure out what tools and technologies you truly need because you’re “focused on actual signals versus which platform is producing them,” Satbir said.
Darren added that outcomes should also include being able to influence architecture moving forward to positively impact central tendencies such as any MTTx. Organizations eventually are able to “evolve central tendency metrics beyond the quantitative and into the qualitative,” he noted. An example would be measuring human performance.
Satbir and Darren rounded up their conversation with a bit of discussion on continual improvement and open-source concepts. Check out their conversation from RESOLVE ‘22 on-demand here.