“Our NOC team is excellent at what they do, but we could never hire enough engineers to investigate every alert manually, particularly on peak traffic days when the business relies on us most. We added BigPanda to our operational tools suite to help us find the right alert before any customers are impacted. We evaluated many products and selected BigPanda because of its modern user interface, tight integration with our ServiceNow ticketing system, and native SaaS architecture.”
– Vismay Thakkar, Gap Senior IT Director
As the scope of services has increased and the complexity of Gap’s infrastructure has grown, the volume of monitoring alerts has steadily increased and so has the impact of down time. Peak periods of demand, like Cyber Monday, place significant load on infrastructure which complicates capacity and problem management for engineering and ops teams. Unfortunately, what hasn’t increased is Gap’s IT headcount. To guarantee compliance with high service expectations, Gap needed to understand which alerts were actionable, then triage operational incidents and identify their root cause, while identifying trends to prevent the same issues from recurring. After evaluating many traditional and modern event management solutions, Gap selected BigPanda.
Gap deployed BigPanda to aggregate and correlate monitoring alerts from systems management tools like Nagios, plus log analytics tools like Splunk. Operational incidents are synchronized with ServiceNow tickets to ensure integration with Gap’s existing NOC collaboration process. Previously, alerts from Nagios created thousands of noisy ServiceNow tickets, making it difficult for NOC engineers to quickly identify critical issues. Leveraging BigPanda’s native ServiceNow integration, Gap was able to dramatically reduce ticket volume in ServiceNow and keep those tickets updated in real-time.
Gap leveraged BigPanda to correlate all alerts into meaningful incidents in ServiceNow. For example, if there is a problem on the network edge, BigPanda might look at dozens of alerts to determine that a single switch went offline. It populates a ticket in ServiceNow which summarizes all related issues for one device. BigPanda maintains incident records that are married to other incidents in ServiceNow. It knows exactly when to update an existing incident, rather than create a new one. This dramatically reduces the total number of incidents. It’s an example of the type of intelligence that BigPanda brings to alert monitoring, correlation, and incident analysis.