5 AIOps strategies for augmenting your IT operations
Date: August 25, 2021
Category: Enabling the Business
Author: BigPanda
We recently sat down with INOC to discuss several AIOps strategies to augment IT Operations. Following are the highlights of that discussion, also recently published in an infoQ article. You can also watch an on-demand webinar on the topic here.
If you’re part of a modernizing enterprise, you are probably looking to AIOps to enhance your IT operations by helping you cut costs while enhancing performance and increasing the agility of your business. But with a wide range of different AIOps options on the market, how do you make sure you’re going down the right path? And, once you have decided on an AIOps approach, how do you make the most of it?
Here are 5 strategies that will help you make sure you build the right AIOps plan for your business.
1 Be practical, not aspirational
In most things, it’s great to think big. When it comes to implementing an AIOps solution though, biting off more than you can chew by taking too general an approach can delay your project – often by months or even years. So instead, identify specific near-term goals that you can pursue step by step as you carefully build up your AIOps capability.
For example – when looking at your alarm-to-ticket flow, it’s good to take a gradual approach to adopting your AIOps platform.
For example, you can keep your existing alarm-to-ticket flow infrastructure in place, while, in parallel, implementing one new AIOps capability at a time. This way, you could start by feeding some of your monitoring alerts into an AIOps event correlation platform, and then feed the output back into your ticketing system.
This allows you to compare results before going into production. Once you are satisfied, you can incrementally add more of your tools into the AIOps platform, until you have fully integrated your monitoring and observability layer. Only then, start looking into adding additional AIOps capabilities such as root cause changes, remediation automation and more.
In addition to making sure that your AIOps platform has proven itself before you begin to fully rely on it, this step-by-step approach gives your team the chance to accumulate the skills they need over time, rather than having to learn everything at once.
2 Domain-centric or domain-agnostic? Choose wisely…
In its recent Market Guide for AIOps , Gartner identifies two categories of AIOps solutions: domain-centric and domain-agnostic. Domain-centric AIOps capabilities are added on top of data that is specific to a domain or practice, such as network, application, infrastructure, or cloud monitoring. In contrast, the best domain-agnostic AIOps solutions work across domains to pull in data from multiple sources and IT technologies from multiple vendors, along with data describing changes happening in your environment, and then combine and correlate it all to derive insights.
Use domain-centric AIOps features built into a monitoring tool for a one-off, specific use case, and deploy a domain-agnostic stand-alone solution that can straddle multiple use cases over time.
For example, if you are monitoring the signal quality in optical infrastructure, a domain-centric AIOps tool may help you understand a connection loss. But, if you are in charge of maintaining high-quality video calls running on top of this infrastructure, a domain-agnostic AIOps tool should be your choice, as a drop in service level can have many causes, spanning different domains and technologies that comprise the service – and you need to tie it all in to understand root cause.
It should be noted that, in general, Gartner states that “As organizations mature in AIOps adoption, they require a single domain-agnostic platform across I&O, DevOps, SRE and, in some cases, security practices”.
3 Use enrichment, drive intelligence
Enrichment is the unsung hero of the entire event correlation process. Raw alarm data is a start, but it’s not sufficient to be able to pinpoint root cause and enable an effective fix. When you have alerts coming in from a variety of domains, it can be difficult to correlate them. You can use timestamps or point of origin, but that will provide limited insight, and you’ll miss connections between related alerts coming from other sources or from other time windows.
Enriching alerts provides the extra layer of understanding needed to determine which alerts are interrelated, and in what way, enabling you to focus on high-level correlated incidents instead of following every low-level alert that comes in the AIOps platform.
Done right, this process of enrichment helps you bring in topology information from your CMDB, APM and orchestration tools, change information from your change management and CI/CD pipelines, and business context from your team’s knowledge and procedures.
Choose AIOps tools that provide built-in, scalable enrichment, and you’ll drive intelligence throughout your operations.
4 Automate your processes
Automation delivers many benefits, including consistency, saving time and minimizing errors. When your AIOps platform automates ticketing, you can potentially reduce your MTTA to just milliseconds!
Incorporating your runbooks into your ticketing system means that, when a specific alarm comes in, a specific workflow is triggered.
Runbook automation takes care of all the technical steps that don’t require any thinking – such as checking the status of a network resource, or grabbing information from a server or system – taking it as far as possible before human intervention is required, if at all, to identify and apply the necessary fix.
In addition to driving down the workload of your IT Ops teams and increasing the speed of incident or outage resolution, automation frees up your operations teams to focus on high-value, challenging work, that both drives innovation for your business and improves their productivity.
5 Drive continuous insights
The maximal value of implementing an AIOps solution goes beyond just improving ad-hoc resolution of performance issues. It also drives continuous process improvement over time, by enabling you to analyze every single stage, from incident detection, to investigation and root cause analysis, to remediation and resolution. Understanding how long each of these stages takes, and identifying where the delays and performance gaps are, gives you areas to focus on, in your quest to make your processes work more efficiently and further improve your team’s productivity.
As with any strategy, your IT Ops teams are critical partners in this process. Communicate with them to make sure AIOps is easing their workload, and not creating more work for them.
Perhaps you’ve got correlation patterns that need to be updated or better tuned; or they could benefit from additional enrichment. Whatever it may be, you need to work with them to identify and address pain points, and, where things are going well, make sure they are aware and maximize their value.
The world of AIOps is rapidly evolving. This makes it challenging to chart a course, and ensure that you can wisely choose from the many AIOps platforms that are available in the market. By adopting the five strategies outlined above, you will find that implementing an AIOps platform can deliver exceptional benefits and efficiencies that help in transforming your operations!
To find out more, we invite you to watch our recent webinar on the subject.
Gartner Market Guide for AIOps, Pankaj Prasad, Padraig Byrne, Josh Chessman, April 6 2021