Not all alert correlation platforms are created equal
Ask yourself these questions to find the right fit in an alert correlation platform.
To maintain operational visibility in modern IT environments, companies are abandoning monolithic monitoring solutions from legacy vendors in favor of a modern set of “best of breed” monitoring tools. Today’s average IT monitoring stack consists of about 6-8 tools, including at least one from each of the following categories: systems monitoring, end user monitoring, application performance monitoring (APM), error detection, log analytics, chat, and ticketing. When service disruptions occur, operations engineers face a flood of alerts across different layers of the IT stack, with no fast way to figure out what’s really going on. Customers are left stranded, while IT professionals struggle to detect, triage and remediate urgent issues. Downtime abounds which negatively impacts revenue, performance, and brand loyalty.
Alert Correlation Platforms are a proven solution to this challenge. They intelligently gather alerts from fragmented monitoring tools and deliver high-level insights that operations teams can quickly understand and act upon. Correlating alerts accelerates incident resolution, which leads to less downtime and happier customers. Investing in the right alert correlation platform may very well determine how much time your CEO spends apologizing on social channels to customers about unresolved outages.
Keep in mind however, that not all Alert Correlation Platforms are created equal. When conducting a proof of concept (POC), there are a wide variety of questions that you should keep in mind to determine if the solution meets your needs, how long it will take to deploy, and what your total cost of ownership will be. We developed this Q&A to help you determine what questions to ask when evaluating an Alert Correlation Platform.
Time to Value
Time to value is a key concern when evaluating an Alert Correlation Platform. Some solutions can be deployed in days, while others may take many months to deploy. Before choosing any vendor, you should carefully consider a variety of questions to determine both how long a POC will take to show value and how long it will take to deploy the platform into production.
Is the alert correlation platform a SaaS solution or does it have to be installed and maintained on-premises?
SaaS solutions are inherently easier and faster to deploy and maintain. Customers can often conduct an initial proof of value in an afternoon. Contrast that against on-premises software that can take several weeks to even just get started. A few things to keep in mind when considering SaaS vs. On-Prem solutions include the following:
- On-premise solutions are expensive: you have to invest significant capital and resources—in hardware, software, licensing, bandwidth, storage, and headcount to keep them running.
- Alert correlation entails deep integrations to your monitoring tools. Your platform needs to stay connected to monitoring tools that are based on fluctuating data models. Monitoring vendors continuously change these data models as they release new versions. Maintaining the integrations on an ongoing basis is a headache you would rather have your SaaS vendor handle — rather than you having to do so with an on-premise solution.
Figure 1: Out-of-the-box integrations, full list at bigpanda.io/integrations
Can I start seeing correlations straight out-of-the-box? Or does the correlation platform require algorithms to be trained with sample datasets?
Monitoring tools don’t have a standard data format. Each tool sends alerts in its own unique way. A good alert correlation platform understands these data models immediately, without requiring months of machine learning to start bringing value. BigPanda, for example, normalizes information from fragmented monitoring tools into a common data model, so that alert correlation can start as soon as data starts flowing into the system.
Other alert correlation platforms base their correlation on natural language processing. That takes time. You may have to teach the platform with a large volume of alerts for an extended period of time before the correlation engine understands how to deal with each type of alert. Your alert correlation engine could take months to learn data models. The question is: do you want to wait that long to even determine if a correlation platform produces results to address your needs?
So ask the vendor: how quickly will you have alerts flowing in and how quickly will your correlations work? Will they work right out of the box? The response to this question will tie directly to the time it takes to find value from your alert correlation platform.
How long will it take to connect my existing monitoring tools?
Alert correlation platforms have to integrate with a wide variety of monitoring tools and collaboration platforms. And they must keep those integrations up to date. Point to point integrations require constant upgrades and support, which can be a maintenance headache.
Ask your vendor which integrations they support right out-of-the-box. Consider how many of these integrations are to legacy tools and how many are to modern ones. If your IT organization is moving toward a modern stack, you don’t want your alert correlation platform tied to legacy tools.
The answer to these questions will help you understand if you can realize fast time to value.
Digging into the Correlation Platform
When an alert correlation platform is explicitly aware of the data models of the monitoring solutions that it connects with, you get faster time to value. Let’s consider in greater detail some other questions you should consider.
Is the correlation logic transparent and customizable?
If a vendor’s alert correlation logic is an impenetrable black box, your IT engineers won’t understand or trust it. Additionally, if you can’t customize or extend the correlation logic to suit your specific needs, it is unlikely to meet the requirements of highly dynamic environments where systems, teams, and priorities change constantly. Find out from any potential alert correlation vendor:
- Is your correlation logic transparent, and will my IT engineers be able to easily comprehend it?
- What is required to customize or extend the correlation logic so that you can take full ownership of it.
What does cross-source correlation mean?
Some vendors describe cross-source correlation as simply taking alerts from different monitoring tools and relating them to each other to create incidents.
However, to find the right problem fast, you need to go further. You should be able to correlate across all your incidents to identify when your critical environments are affected. BigPanda, for instance, enables you to correlate incidents across any kind of logical grouping – such as business service, applications, customers, business units, or really any logical grouping of incidents that helps quickly detect and triage what matters most to you.
For example, a team that is responsible for monitoring an e-commerce application will want to see all incidents related to failed e-commerce transactions, including all monitoring alerts related to that application or business service. If your alert correlation platform doesn’t let you see what is happening across multiple incidents or “situations”, then important information can easily fall through the cracks.
Figure 2: Automatically group alerts for any app, micro-service, business service, team, customer, or any custom environment
Do I need an external dependency model for correlation?
If your alert correlation engine depends on an external dependency model such as a configuration management database (CMDB) to tell you how to interpret alerts or correlation logic, then it is your responsibility to keep that external dependency model up to date. This can quickly become a maintenance nightmare. Avoid it. Modern applications change dynamically, and effective alert correlation platforms infer dependencies automatically based on alert metadata, without needing an external dependency model.
Maximizing Your Investment in Your Ticketing Systems
Alert correlation platforms should work seamlessly with your existing ticketing and collaboration systems. An alert correlation platform is the perfect complement to your ticketing system. The right correlation platform should keep your ticketing system free of machine generated noise, while minimizing changes to existing workflows, tools and processes. What questions should you ask, to find out how to maximize your existing ticketing or notification systems?
Are my existing ticketing systems and collaboration tools supported?
You probably have an existing ticketing or collaboration tool in place, such as ServiceNow, JIRA, PagerDuty, or Slack. A robust alert correlation platform has to integrate with a wide variety collaboration platforms, enrich tickets and notifications, and keep ticket clutter to a minimum, while allowing you to leverage your existing investment. If the alert correlation vendor expects you to replace your existing incident management and collaboration platform, find out how much that will cost and how much effort it will require for you and your team.
Can the alert correlation platform create and update incident workflows?
The point of an alert correlation platform is to make it easier to view and resolve incidents in your collaboration tools. Here are a few pointers on what enables this:
- Your alert correlation platform should correlate alerts and create enriched incidents, that have all relevant information required to resolve an issue, before opening a ticket in any collaboration tool.
- Your alert correlation engine should give you the option of automatically or manually opening tickets in your collaboration tools. Manual ticketing is important, in case you need to triage or provide a human workflow, prior to pushing enriched tickets into the collaboration tool.
- The alert correlation platform should maintain two-way, real-time communications with your ticketing systems. This ensures that tickets remain updated with relevant information as additional alerts and events occur. Be sure to ask any Alert Correlation vendor about two-way synchronization with your ticketing system.
Will you keep the CMDB updated on incidents?
If you have a CMDB, you will likely want to keep it up to date on incidents to gain visibility at the level of your business services, to deliver superior customer service. Be sure that your alert correlation platform has the ability to automatically update your CMDB information on incidents as they occur.
Does ServiceNow certify and endorse the alert correlation platform?
In this section, we will look at ServiceNow in particular, as it has gained traction as a popular IT collaboration tool, over the past decade. What additional questions should you ask, to find out how to maximize your existing ServiceNow implementation?
Figure 3: store.servicenow.com
If you are a ServiceNow customer, you want the guarantee of a ServiceNow certified application for integration with your alert correlation platform. This will ensure that your alert correlation vendor:
- Seamlessly integrates with the ServiceNow platform
- Leverages best practices around incident management and resolution
- Has the flexibility to customize ticket creation to suit your needs within the ServiceNow application
Picking the right alert correlation platform is an important decision that can make or break your IT modernization strategy. BigPanda delivers on all of these criteria defined above, and much more. And like any SaaS platform, we provide an online free trial to get you started in minutes. Visit us at bigpanda.io/signup.