How to enrich IT alerts and add context with Alert Intelligence

5 min read
Time Indicator

I see it daily in my role, IT organizations are paying for best-of-breed monitoring tools but struggle to tie the pieces together between these siloed systems. The wound of these silos is further punctured when incidents arise. Incidents are costly for so many reasons, like wasted company resources, potential revenue loss, customer satisfaction, employee burnout, etc. This is exactly why BigPanda exists, to apply AI to the complex problems IT operations, NOC, SRE, and DevOps teams face daily.

BigPanda Incident Intelligence and Automation, powered by AIOps, consists of three practice areas: Alert Intelligence, Incident Intelligence, and Workflow Automation. Alert Intelligence consists of the following capabilities: setting up monitoring integrations and then the filtering, suppression, normalization, deduplication, aggregation, and enrichment of those altering events produced by monitoring tools.

For this blog post, let’s focus on my favorite part of Alert Intelligence: event enrichment. Event enrichment is SO powerful. Monitoring tools are crucial to alert you about various events within your ecosystem; however, they are often lacking critical information. Event enrichment allows us to fill that gap by adding context, such as the application that the monitoring source is alerting you about. I’ll walk you through a real-life example—but first, let’s get an understanding of the concept.

The value of enrichment

High-quality alerts are fundamental to optimizing an ITOps organization to be proactive, efficient, and effective.

To initiate an alert-quality improvement initiative, the first step is to adopt a mentality of continual enhancement. It is not a one-time configuration but a sequence of steps that commences with identifying low-quality, disruptive alerts that have no operational or business context and specifying the essential criteria for establishing uniform alert quality standards and service-level agreements (SLAs).

Enrichment is the primary factor in addressing informational gaps in alerts, minimizing noise, enhancing operator efficiency, and building a foundation for long-term success.

How enrichment works

BigPanda’s event enrichment adds additional contextual information to your alerts, including business segment, relevant CI/CD elements, or operational data. This enriches events with crucial information that aids in promptly detecting, comprehending, and resolving incidents. In addition, enriched events facilitate event correlation and workflow automation, enabling you to identify and respond to issues more effectively.

During event ingestion, BigPanda ingests raw event data and converts it into key-value pairs called tags. By establishing new tag rules based on these tags, event enrichment helps you add metadata and context to incoming events in your organization’s system.

By including contextual information like location, host, or affected services, the quality of alerts is significantly improved. With enriched events, organizations can maximize the value provided by their monitoring and observability tools.

BigPanda Enrichment Process

BigPanda Enrichment Process

There are three enrichment types that we will walk through together.

  • Extraction: extract values from an existing tag to create new custom tags.
  • Composition: create a tag composed of several function types.
  • Mapping: added automatically to the list of tag rules when a map includes a result tag value with the same name as a tag.

Example use cases for enrichment

I was recently working with a retailer who was looking to bring in an AIOps tool to solve a variety of pain points. Some of their pain points include:

  • Few monitoring tools were owned by their MSP which meant there was no visibility into alerting until it was critical and a ticket was created.
  • Network and Infrastructure teams were completely siloed, leading to friction during critical incidents.
  • There was no singular unifying tie into components when alerts were being generated across multiple sources.
  • They needed a way to cut through the noise and prioritize high-quality, enriched alerts correlated into incidents.

Now let’s look at an example of an alert before BigPanda and how each enrichment methodology helped this client solve their pain points. Below is the sample alert payload.

Sample alert payload in Postman

Before enrichment

Before any type of enrichment, here’s how that alert would come into the BigPanda UI:

Before enrichment

Adding enrichment

Extraction

The team shared that they’ve established a consistent host-naming convention methodology.

{HOSTNAME} – {SERVICE} – {AWS REGION} – {ENV}

With this information, we set up a few extractions to bring out critical pieces of information. Now, we can correlate off tags like service, keep a pulse on AWS regional issues, and ignore alerts coming from non-production environments.

Tag-extraction process

Tag-extraction process

We’re going to utilize some simple regular expressions to capture segments of the host name and build out new tags as shown below.

Creating a service tag from the second segment of the host name

Creating a service tag from the second segment of the host name

BigPanda’s preview feature showing us our new service tag

Creating an aws_region tag from the third segment of our host tag

Creating an aws_region tag from the third segment of our host tag

BigPanda’s preview feature showing us our new aws_region tag

BigPanda’s preview feature showing us our new aws_region tag

Creating an env (environment) tag from the last segment of our host tag

BigPanda’s preview feature showing us our new env tag

BigPanda’s preview feature showing us our new env tag

Composition

They also shared that every service has a knowledge-base article associated with it for basic troubleshooting.

Tag-composition process

Tag-composition process

We were able to create a composition tag to automatically populate the alert with that information automatically.

Creating our new kb_article tag

Creating our new kb_article tag

BigPanda’s preview feature showing us our new kb_article tag

BigPanda’s preview feature showing us our new kb_article tag

Mapping

Lastly, they had a spreadsheet of data that mapped host to application relationships as well as which team supports each application. We utilize our mapping function to bring this data into BigPanda.

BigPanda mapping process

BigPanda mapping process

This mapping enrichment enables us to identify team ownership within the alert and also do application-level correlations. We take the data in .csv format and use our API to upload it to BigPanda.

Mapping enrichment

The result

With a few simple steps, we were able to enrich the alerts and easily convert low-quality alerts to  high-quality for an organization without having to make any changes to their monitoring tools.

Convert low-quality alerts to high-quality alerts

Now that we’ve established a great foundation for event enrichment via Alert Intelligence, in future blogs we will be discussing other aspects and features of Alert Intelligence. If you want to dive deeper into Alert Intelligence, we have recently launched a Alert Intelligence certification program within Big Panda University.

If you want to experience for yourself how BigPanda’s AIOps Incident Intelligence and Automation platform helps prevent and resolve IT outages, try our self-guided tour today.