The doctor is in: why domain agnostic AIOps is a necessity for diagnosis
Date: September 14, 2021
Category: The CTO Perspective
Author: BigPanda
Gartner recently identified two different high-level categories of AIOps: domain-centric and domain-agnostic.
In this CTO Perspective discussion, BigPanda CTO Elik Eizenberg explains the difference between the two, and why you would need the latter to gain an overall understanding of how your IT Ops is functioning.
Read the skinny for a brief summary, then either lean back and watch the interview, or if you prefer to continue reading, take a few minutes to read the transcript. It’s been lightly edited to make it easy for you to consume it. Enjoy!
The Skinny
In its latest market guide – Gartner discusses the difference between domain-agnostic AIOps and domain-centric AIOps, and why it believes the market is moving towards the former as it allows enterprises to tackle complex IT Ops use cases. An interesting analogy that may help to better understand the difference between the two, is medical diagnosis. Medical devices are getting smarter, using algorithms to improve the information they provide medical professionals. But as any doctor will tell you – true diagnosis can only be performed by combining the information from several such relevant devices – which together provide the big picture. AI in medical diagnosis requires ingesting and analyzing data from many sources, combining the data to contextual, insight-rich information that leads to accurate diagnosis. AIOps is very similar in that alert correlation, root cause analysis and automation can only be performed by ingesting and analyzing alerts and contextual data from all monitoring and operational tools in the IT stack, or in short: domain-agnostic AIOps. What is the right strategy for doing so and what are the challenges? Watch the interview to find out.
The Interview
The Transcript
Yoram: Hello and welcome to the CTO Perspective, where we discuss unique perspectives about the most current issues in IT operations. Today we’ll be talking about domain-agnostic AIOps with Elik Eizenberg, Chief Technology officer and co-founder at BigPanda. Hi Elik.
Elik: Hi, Yoram. Great to be here as always.
Yoram: Always great seeing and talking to you. So let’s talk about domain-agnostic AIOps. Gardner in its recent Market Guide for a AIOps stated that there is no future for IT operations without AIOps, and then went on to discuss two general types of AIOps: domain-centric AIOps and domain-agnostic AIOps. You had the chance to participate in a webinar with one of the lead analysts who wrote this this guide. Can you elaborate a little bit about what is domain-agnostic AIOps and what is domain-centric AIOps?
Elik: Let me start by saying that I think the most recent market guide from Gartner is the most mature articulation of what AIOps should be and is today. As you mentioned, Gartner said there are essentially two different high-level categories of AIOps: domain-centric and domain-agnostic.
Domain-centric is about monitoring and observability tools implementing some AI capabilities around their first party data. So essentially taking whatever data they already have and introducing some AI capabilities on top of that to solve specific use cases. Domain-agnostic AIOps is where you combine data points or data from various datasets, various tools. You connect to multiple monitoring tools, multiple change tools, topology tools, and you find ways to turn that combined data across those different systems into insight or action automatically.
Yoram: Ok, so maybe a small example of each.
Elik: Yeah, of course. On the domain-centric use case, let’s say you have an APM, an application performance monitoring tool that tracks your latency for a specific application. A domain-centric approach for AIOps would be – let’s identify anomalies. For example, let’s identify when that latency is higher than usual for that time of day and automatically create an alert or a warning based on that trend analysis for the APM metric. A domain agnostic use-case would be taking alerts coming in from your log management system, your APM tool, your network management system and your real user monitoring tool, normalizing them into a consistent data model, and then correlating them together into a logical incident across all of these different tools and data sources.
Yoram: Maybe to get a better feel and understand when to use each one and what the differences between them are: I heard this actually from you, a really smart analogy, you’ve compared it to diagnosis in the medical world.
Elik: 100 percent. I love that analogy because it’s simple and it really clarifies the strengths and weaknesses of each approach. When you think about diagnosis, we’re already seeing a lot of AI technology being embraced by the different health care vendors. Let’s say you have a thermometer. And that thermometer checks if you have a fever, measures your temperature. That thermometer can provide AI technology that identifies anomalies in your temperature. Maybe your temperature is too high for your age, or your sex, or a certain time of day. You might have another tool, let’s say a device measuring your blood pressure. That device might also have some AI technology the surfaces the trends in your blood pressure, whether they’re positive or negative trends, it might be able to identify those. So obviously, these are very good AI capabilities, but they are limited to first-party datasets. For the thermometer use case, they can work only on the temperature data and for the blood pressure use-case, only on the blood pressure data. That’s valuable, but it’s limited to only one type of dataset. Now, a domain-agnostic approach would combine different data points together from different tools to provide insight.
We all know that in the world of diagnosis, what is really important is combining different data points to give you the big picture rather than looking at a specific silo.
So, in the health care industry, I think it’s very clear that domain-agnostic would probably be the superior approach for AI.
Yoram: So, what you’re saying is: a doctor needs all this data from different data types and different datasets to understand what’s going on with your body. Similarly, if you’re talking about AIOps, what you’re saying is that domain-centric AIOps doesn’t provide you with the big picture. You’re missing the big picture. That would be the biggest challenge.
Elik: Yeah, exactly. I would say this: Domain-agnostic is about different datasets across your stack, across your infrastructure. So, it will help you consume data from both your legacy environment and modern environment, cloud environment and on-prem environment and also across your stack. This means network data, storage data, virtualization data, and also high-level application data and user data. It can handle all the different types of data sets, normalize them, and process them to generate insight.
Yoram: And couldn’t you achieve that by just having domain-centric AIOps in each one of your tools?
Elik: Not really. What you can do is go on this long journey of introducing AIOps into each one of your tools. So, if you have 15 or 20 tools, as most companies do, you’ll have to go on a big journey of configuring, optimizing AIOps for each one of those tools. And even then, you will still have a siloed picture. Each tool will know what’s happening with its own first party data, but there will not be any tool that will show you the big picture, how the different data points correlate, or make sense together.
Yoram: Ok, so I assume that’s what domain-agnostic AIOps solves: providing you with that big picture, with just one tool.
Elik: 100%. Essentially what AIOps tools do, is they connect into your different data sources, and they collect data directly from your APMs, log management tools, time series tools, user monitoring tools, APM tools, as well as your change data sources, CI/CD tools, change management tools, as well as topology data sources like virtualization and cloud topology and service discovery. It takes all these datasets, normalizes them, enriches them and then leverages them to correlate data points together to provide good insight.
Yoram: Ok, but I’m sure domain centric tool vendors would argue that they know their datasets better than any domain agnostic tool could ever understand.
Elik: Yeah, that’s a very valid point, actually. Nobody knows better their data than the vendor generating the data. So, if I’m a log management tool, I understand my log data best. Nobody can compete with that. In fact, I have access to more layers of that data than anyone else because I don’t necessarily expose the full dataset via API.
So, if you are looking for a deeper solution around a very particular use case, you might want to use the domain-centric solution. Take that latency example I provided earlier. If your main problem is around detecting outliers in your latency datasets, for sure, a domain-centric AIOps solution on top of your APM would be best. If your main problem is how to turn all the existing visibility you already have from all these different sources, from all these heterogeneous datasets, into insights, then a domain-centric tool would not be enough.
Yoram: And then you need a domain-agnostic tool take that in and tie it into other data points.
Elik: Exactly.
Yoram: So even if they know their own data better, the ability to connect the dots, so to speak, and create a big picture, is a big advantage that they cannot provide.
Elik: Yeah, exactly. I think that every IT leader or any architect evaluating what tools to use right now, should really ask the question: what is my most burning pain? If that pain right is related to a very particular use case for a particular dataset, nothing would be better than a domain centric solution. But if the problem is, again, how to take heterogeneous high-volume data, turn that into insight, then domain centric solutions will just not get you there.
Yoram: If I’m looking into introducing a domain agnostic solution into my organization, wouldn’t that be a big step? If we’re saying that domain-centric is just one part of your IT stack, wouldn’t it be smarter to just start with this one piece, rather than go to a domain-agnostic approach, which seems like a big step.
Elik: I would argue the opposite, for two reasons. The first: in the end what you’re trying to do is move the needle for your business. Improve your operational metrics, improve your costs, improve MTTR.
To move the needle for your business using a domain-centric approach, you’ll now have to go to every single tool, one by one, and start deploying, optimizing, and configuring AIOps solutions for that particular tool. And that can be a very long journey.
If it takes you a few good months to get acquainted with one good AIOps solution for just your APM or just your log management, it could easily take you years to get through the entire journey of configuring domain centric AIOps for all your tools. That is one reason why domain-agnostic solutions have better time to value. The second reason is that you must take into account the strength of AI. Machine learning technology has a sweet spot, and that sweet spot is essentially taking a lot of heterogeneous data, high volume heterogeneous data, and making sense of it without a lot of human involvement. You don’t have to rely on humans to start generating that insight.
In a domain-agnostic use-case you actually have more data, you have more diverse data. And this is exactly where AIOps or exactly where machine learning shines and provides good time to value while depending less on your people.
Yoram: So you’re saying it’s the opposite, actually. The fact that you are going agnostic, ingesting all these tools, makes your ML better, quicker. It provides you value much quicker than if you would have gone domain-centric. So it would be essentially easier.
Elik: From my conversations with people in the industry and with BigPanda customers, it is clear that in the end, when you’re measured by your ability to improve operational KPIs, the domain agnostic approach provides better time to value. And of course, a higher return on investment. Because you deployed once, you get value across your entire stack, rather than with a domain centric approach where you get a lot of value but it’s limited to a very particular tool, data set and part of your stack.
Yoram: As you said, a lot of that has also to do with AI and ML, and the data they need. I think that opens a completely different conversation, which we’ll save for another time. I really want to thank you for this conversation, it has been illuminating.
Elik: Thank you so much, Yoram.
Yoram: And if you want to hear more CTO perspectives or learn more about AIOps, IT operations, or the BigPanda platform, please visit BigPanda.io. See you next time.