If everyone is AIOps – which AIOps is right for you?
Date: May 19, 2021
Category: The CTO Perspective
Author: BigPanda
With so many IT vendors claiming they provide AIOps platforms, how do you understand the differences between them, and decide what flavor of AIOPs to choose for your organization?
Join us in a CTO Perspective discussion with Elik Eizenberg, CTO and co-founder at BigPanda, to find the answer.
Read the skinny for a brief summary, then either lean back and watch the interview, or if you prefer to continue reading, take a few minutes to read the transcript. It’s been lightly edited to make it easy for you to consume it. Enjoy!
The Skinny
AIOps is THE buzzword these days in IT Ops, and suddenly every vendor is AIOps. This creates confusion, both in understanding what AIOps really is (if there is a definitive answer) and in understanding how to best leverage AIOps to prevent and resolve outages. In this talk, we create some order in this chaos.
We discuss the 3 main AIOps tool categories:
- Data generators
- Data analyzers
- Collaboration tools
We then explain what each one does, and provide a recommendation on how to choose the best way to adopt AIOps based on this framework and the current state of your organization. Interested? Read on.
The Interview
The Transcript
Yoram: Hello and welcome to the CTO perspective, where we discuss unique perspectives about the most current issues in IT operations. And here today to speak with us is, once again, Elik Eizenberg Chief Technology Officer and co-founder of BigPanda. Hi Elik.
Elik: Hey Yoram. Great to be here, as always.
Yoram: Always great talking to you. Today we’ll be talking about a unique angle, or a unique way of looking at AIOps, mainly trying to understand what AIOps is. Gartner coined the term AIOps a few years ago in very general terms. Basically, what it means today is implementing AI and ML into IT operations, but without going into specifics on how exactly to do that. AIOps has become a big and important buzzword. It’s driving a lot of business decisions. So, what’s currently happening is that most ITOps vendors are AIOps, or their platforms are AIOps-driven. And even though these vendors do completely different things and look at AIOps in completely different ways, they’re all AIOps. And that causes a bit of confusion, doesn’t it?
Elik: Yes, for sure. I think, the premise of AI in IT operations is fantastic. We can accomplish a lot of things around automation, around insight, around data processing, so that traction towards AI is well understood.
I think that the challenge these days is that every vendor is essentially an AIOps vendor, everyone advertises they do AIOps. So how can we expect leaders to make decisions around what the next step in their AIOps strategy should be, when seemingly all ways lead to AIOps anyway?
Yoram: Can you provide an example? Which companies say they’re AIOps, and how different are they?
Elik: Absolutely. Let’s look at log management and log analysis tools. You have a system that essentially aggregates all your log data from different systems, provides a very good index and some dashboards, where you can easily search and visualize that log data. This has been around for at least 10, maybe 15 years? Now they have added an AIOps capability that automatically parses log data: it automatically detects the different patterns in the logs and parses it into something that can be read by humans more easily. Now let’s look at the APM category (application performance monitoring tools). They are great at instrumenting your different applications, whether Java, or dot net or anything else. They’ll tell you how long your transactions run. They’ll tell you about the error rates and things like that. Now they have an AIOps capability on top of that, which tells you when you have anomalies: when you have too many transaction errors, or a latency that is too long for a certain time of day. As you can see, every different category in our space has a flavor of AIOps that’s very different, and very specific to their part of the stack.
Yoram: So, is there a definitive answer to “what is AIOps?”’, or are there several types of AIOps? What’s the right way of looking at it?
Elik: In my conversations with customers, I go to a point where I felt that the best way to think about AIOps, is to group solutions into one of three categories. They’re either related to collecting and generating data, we’ll call these tools the “data generators”. Or they are “data analyzers”, the tools that turn data into insight and action. Or, finally, they are “collaboration tools”. These are used by teams to interact with each other on top of those insights and actions that are being generated. These are essentially the three categories of AIOps.
Yoram: So, three categories, three layers of different AIOps. Let’s discuss each layer, trying to understand what tools each layer includes, and what AIOps mean for them. The first one you mentioned was the layer of tools that generate data. Those are probably the monitoring and observability tools?
Elik: Spot on. Essentially, any log management tool, any infrastructure monitoring tool, any application monitoring tool. What they do well, is instrument your infrastructure, instrument your applications. They collect data very effectively and they centralize it in one place and visualize it. These are the data collectors, or data generators. Normally you buy another data generator when you need more data, if you feel you have some gaps in visibility on a specific part of your infrastructure or application stack.
Yoram: When you talk about AIOps in this layer of observability and monitoring, what does AIOps mean?
Elik: Because these tools have access to all the raw data that they collect, they mostly focus their energies around being able to identify anomalies in this data. If a metric goes up and down in a way that’s different than normal for that time of day or time of year. They will essentially trigger an event or a warning saying: “hey, here’s an anomaly”. That’s one flavor. The other flavor we’re seeing especially around log management, is automatic processing of log data. Being able to say: “All these log lines look the same, or here’s an exception, or here’s a specific hostname in the data”. These are the types of AIOps capabilities we are seeing for the data generators.
Yoram: If I understand correctly, then for these vendors AIOps means implementing AI and ML into making sense of the data they collect: trying to find anomalies or similarities within the data that they’re collecting.
Elik: Yeah, in many ways I would say they find ways to generate even more data on top of what they already generate…
Yoram: All right… We’ll get back to that later… The second layer we discussed was that of tools that convert data into insight and action. What tools are we talking about here, and what does AIOps mean for them?
Elik: This is the layer that addresses the problem of “you have all these data generators, and you have a ton of data, but unless you can turn this data into something actionable, some deep insight or an action, it’s pretty useless”. By the way, BigPanda sits in this layer. We are an event correlation and automation platform. What we do is consume data from all the different data generators (whether you have 10, 15, or more), aggregate that data, normalize it, correlate it and produce root cause insights on top of it. And finally, we automate the response to incidents that are generated from that data.
Yoram: So, this layer uses AI and ML (AIOps) to lower the noise and try to gain insight from all this data. Understand what is actually wrong, what’s the root cause and how to automate parts of its resolution?
Elik: Yes, I would say gain either a very meaningful piece of insight that drives decision, or an actual action resolves an incident.
Yoram: And then there’s the third layer, which uses that insight or uses those actions to facilitate teamwork.
Elik: Exactly. The vendors there are chat tools, on-call rotation tools, ticketing tools, ITSM tools. They’re very good, as you stated, in facilitating teamwork or the interaction between human beings, and the area where they normally use AI is around those interactions. If somebody enters data into the system, you need to parse that using natural language processing to make it more structured. That’s essentially the focus of these kinds of vendors when it comes to AI.
Yoram: It seems to me, that if you’re looking at the complete incident management lifecycle from the moment the data is generated to when it is acted and collaborated on- it seems that you need all three layers. So, let’s say we accept this framework, how does that help us to better adopt AIOps? How do we know which tools we need for our business?
Elik: Great question. I would start with this:
I feel like a lot of companies are in a “tail wagging the dog” situation, where they start with the solution, instead of starting with the main problem they need to solve. They have a vendor already in place, for example an ITSM solution, or an event correlation tool, or a log management solution. Now, that vendor says: “Hey, we also have these AIOps capabilities. Why don’t you use them?”. And what happens now is that they’re driving their entire AIOps strategy based on what the vendor provides rather than their needs.
So, what I encourage all our customers and people I talk to in the industry, is to always start with this framework. You have these three layers. Try to understand where you’re strong, where you’re weak, where you have the biggest gap. Once you have that analysis, you can understand where you want to start your journey towards a AIOps. Where AIOps can move the needle the most for you. Only after you have identified the area you want to improve, then you can look into vendors and see who’s the best vendor to solve the problem for you.
Yoram: So: take a step backwards from the tools that you’re using, and don’t let the tools drive the AIOps strategy. Don’t say because I’m using this vendor, let’s see what it can do for me in AIOps. First try to understand what AIOps is, looking at your three layers: your observability and monitoring, your event correlation and automation, and your collaboration. Then, try to understand how mature you are in each of these layers, and then try to keep the same level of maturity across all three because you need all of them.
Elik: Yes. I think the best way to explain this is by giving you a couple of examples. I spoke to a customer who says – you know what, we have good log management, it’s very mature, it’s in place, it gives us good visibility around all the log data our applications and services are generating. We have very good infrastructure monitoring. We’ve invested a good five years in improving our monitoring. But we have some gaps in visibility on application performance. How can we overcome them in the best way? And my answer would be, get an application performance monitoring tool and leverage the fact they have AIOps capabilities to generate good anomaly data on top of that. That’s the best thing you can do right now. But, if the question is – we have all this visibility, we’ve invested in 15 different monitoring tools, we get all this data. The challenge right now ahead of us is how do we take all this beautiful data that is being generated, and turn that into something that’s actionable? If that’s your problem, then obviously you should be investing in the middle layer around event correlation and automation. One last example. We have terrific observability across the stack and have a very good data-to-insight or data-to-action tier that solves that problem. But what we do have are a lot of team members that enter data that’s not very accurate, not very up to date. We want to make sure that our data is being automatically scanned and improved. That’s where you would essentially look into AIOps modules in collaboration tools. So that’s how I would approach it: First, evaluate the maturity of each layer, and then think about the best vendors or solutions in that specific layer where you have the gaps and how you can solve that.
Yoram: So first accept the fact that there are three layers, that you need to be strong on all of them, then try to understand the maturity of each of them, and then find the best vendor for where you want to develop the most.
Elik: Exactly.
Yoram: I think that then starts to pose some more challenges in trying to understand how mature you are, how do you do that? How do you go into actual use cases? I’m assuming that’s one of the biggest challenges customers have when they set out on their AIOps journey.
Elik: 100 percent. I think there is a very important foundational step of embracing this framework that we just discussed here and thinking about the right way around your AIOps strategy, instead of, again, the tail wagging the dog, meaning – have the dog wag the tail.
The next step, which is where the rubber hits the road, is how do you figure out the different use cases you have, and understand how to solve each one of them individually?
Yoram: And I think that opens up some great topics for conversation for next time, so I think we’ll end that. I want to thank you so much for talking to me today.
Elik: This was great, Yoram. Thank you.
Yoram: Thank you so much. And if you want to hear more CTO perspectives or learn more about AIOps or about the BigPanda platform, please visit us at BigPanda.io. See you next time.