Context-aware automation with BigPanda and Ansible®
Many ITOps organizations we speak with want a state of self-healing systems capable of identifying and resolving issues without human intervention. Thanks to the progress in AI and ML, AIOps has made significant advancements in areas that automate many of the steps involved with identifying and triaging incidents.
We ask ITOps leaders why they aren’t taking the next step with auto-remediating incident response workflows. The benefits of eliminating repetitive and manual tasks would clearly dwarf the initial cost and would help ITOps scale by focusing on high-impact work.
Yet, we receive a chorus of objections. Udo Strick, Principal Architect of Enterprise Systems Management at Waste Management, explains the problem perfectly during this road to AIOps webinar:
“Put bluntly, computers are dumb and lack creativity. They do only what they’re told, so we have to know exactly what we want them to do before they can do it. This is why only standardized processes can be automated successfully.”
Automating incident response slams to a halt in the face of disconnected processes and siloed tools and systems.
AIOps as a cornerstone to automation strategy
BigPanda’s Operational Intelligence and Automation Platform, powered by AIOps, simplifies a complex and ever-evolving IT landscape by turning IT noise into insights and automated actions. AIOps is a journey, and we guide our customers through three main milestones to building an event-driven automation practice that enables IT Operations to confidently automate manual incident response activities.
Standardize and automate how incidents are detected: Unify disconnected processes, best-of-breed tools, and fragmented data across products and services to improve alert quality and identify actionable, important alerts.
Standardize and automate how to triage incidents: Rich, contextual information enriched to incidents, such as topology, CMDB and change data gives IT operations the full technical insights to standardize, prioritize, and automate the incident triage process, including root cause analysis, across L2 and L3 resources and ITSM teams.
Today, we’re excited to announce new capabilities for customers to standardize and auto-remediate incident response workflow. A certified integration with Red Hat® Ansible® makes it easy to self-heal incidents in Ansible® based on the conditions and business rules of an incident provided by BigPanda. This powerful combination gives both IT operation teams and automation tools the necessary intelligence to “know exactly what we want them [automation] to do before they can do it”
Jumpstart your automation journey with BigPanda and Red Hat® Ansible®
Native integrations for both the Ansible® Automation Platform and event-driven automation are now generally available. We also created a Content Collection within the Ansible® Automation Hub that gives users pre-build automation scripts and modules to reduce the complexity, time, and developer resources to start automating repetitive tasks.
Imagine the following use case:
- A server has reached CPU capacity, which produces a single alert, or a server is down resulting in many alerts across different network, infrastructure, database, and application performance monitoring. In both these cases, ITOps has decided that the first step is to restart the server. That logic is enriched into the alert or incident tag.
- The alert or incident payload data is automatically sent to Ansible®’s event-driven automation using our native integration to auto-remediate the action, which in this case is to restart the server.
- Based on the success or failure of the automation, the alert or incident tag is updated in BigPanda.
- The outcome is recorded in BigPanda’s incident feed and documented in Unified Analytics for analysis and insight.
All of these actions are now available for use and can be found in the Ansible® Automation Hub. No scripting or custom code is required. There are also a host of other automation libraries from observability vendors also available. Imagine if restarting the server didn’t solve the incident, and you know what the second step you want to take, you could then conduct it with your observability tool using Ansible®. This is the power of intelligent automation by BigPanda.
Single orchestration layer for incident automation
A capacity-related issue is resolved differently than change related issue, which is different from component failure. Yet, they may produce a similar alert without identifying the cause of the incident.
Most BigPanda customers have between 15-25 monitoring and observability tools, and when they focus on evoking automation directly from observability tools, they can find themselves automating actions across multiple alerting systems due to the lack of clarity under what conditions to trigger a playbook. This creates significant overlap and confusion.
BigPanda acts as a system of record by integrating and correlating alerts and incidents from various monitoring and observability tools. It can identify when multiple alerts are related to a single underlying issue. By intelligently focusing on automating actions related to the cause, and not just the symptoms, BigPanda and Ansible® help customers minimize observability redundancy and automation overlap, which in turn reduces the number of automation scripts, actions and licensed nodes required.
Overcoming InfoSec hurdles for intelligent automation
Building a sustainable, intelligent automation pipeline requires transparency and trust that the right automation will be evoked at the right time. Another aspect of the trust building is working with InfoSec teams to allow an open gateway from SaaS-based platforms to directly evoke automation from the web. The data security risks and impact to the organization are daunting.
Security and compliance of enterprise systems almost always require an API gateway and intermediary (i.e. Kafka) to receive specific instructions to invoke automation on IT systems. This presents lengthy security reviews, costly software licenses, and integration challenges.
BigPanda’s native integration with Ansible® event-driven automation integration overcomes these security challenges. Here’s how:
- Ansible® event-driven automation removes the need for complex gateways, que’s, and other API security requirements.
- BigPanda provides the logic to securely execute stateful automation on complex use cases containing multiple alerts and root causes.
- The BigPanda and Ansible® integration provides just one integration making it easy to evoke automation from BigPanda that meets security standards.
Build trust as you evolve your automation maturity
Automation that is unbounded and has the potential to create, change, delete, or add data, users/assets, and connections could cause harrowing effects on IT. A key challenge is identifying the right problem with a solution that can be automated and making sure to understand how the decision was arrived at, and how the computer came to the resolution. Essentially, it’s everything that Udo mentioned about how intelligent, context-based automation can overcome the challenge of ‘computers being dumb’.
Our recommendation is to start small. Use Unified Analytics within BigPanda to identify the proverbial “low-hanging fruit” such as alerts and incidents that repeat on a consistent basis. Document and record what the agreed response is to the incidents in the form of enrichment and tagging. Start automating the response using Ansible®, and leverage analytics within BigPanda to show the value you are delivering to your organization.
Get started today
The BigPanda and Ansible® integration presents a holistic solution to the challenges faced by IT organizations in their quest for automation. By following the stages outlined in this article, you can lay a solid foundation for intelligent incident management and automation, ultimately enhancing efficiency and lightening the load on your IT teams.
For existing BigPanda customers, the Ansible® integration is generally available. You can deploy our certified integration to the Ansible Automation Tower and event-driven automation by following the directions on the Ansible Content Collections page.
If you are an Ansible® customer, connect with your account manager to set up a joint demo with BigPanda to unleash the potential of automating incident response steps and the advantages of a more efficient and streamlined IT operations.