How Sony improves IT incident management with AIOps

IT Operations (ITOps) and Incident Management teams face ever-growing challenges. They have to maintain and modernize increasingly complex IT infrastructures while simultaneously managing extremely high alert volumes and ensuring seamless service delivery. This situation leaves these teams feeling overwhelmed, like firefighters tasked with extinguishing blazes while also being expected to design and build a better firetruck.
This was the situation facing Ben Narramore, the Director of Global Operations and Service Management at Sony PlayStation. In a recent webinar, Narramore described how his Network Operations Center (NOC) and Incident Management teams grappled with overwhelming alert volume and manual, reactive processes. These strained resources and impacted service quality, so Sony partnered with BigPanda.
“We were dealing with an overwhelming volume of alerts and were constantly escalating and having to wake everyone up at 2 am to get the whole team involved,” said Narramore. “BigPanda helped us turn down the noise and link our data together. My operations folks now have a single place where they can go to get answers and solutions.”
With BigPanda, Sony was able to accelerate incident response, improve operational efficiency, save time and money, and enhance the work-life balance of its IT teams.
Turning down IT alert noise with AIOps
One of Sony’s most pressing challenges was the overwhelming scale and velocity of alert data flooding into the NOC. The amount of data generated by Sony’s systems had grown beyond humans’ limits. This challenge overwhelmed the L1 NOC and Service Desk teams tasked with manually aggregating, analyzing, and summarizing this data. Gaining situational awareness of incident impact, priority, and assignment was a lengthy process that involved too many teams.
Sony began their AIOps journey with AI-powered alert correlation, significantly reducing alert noise and improving alert actionability with critical insights and workflow automation. By aggregating, synthesizing, and transforming siloed operational data into context-rich insights, Sony streamlined incident response and enhanced issue detection across siloed systems.
With context-rich, actionable alerts, Sony’s NOC teams can focus on resolving critical issues instead of manually sifting through mountains of alert data, drastically improving response times and overall efficiency.
“It’s about speed; depending on your business, seconds can mean thousands or even millions of dollars,” said Narramore. “BigPanda delivers correlated data from various data sources directly to our NOC teams so they can assign the right teams faster.”
These operational improvements also improved the work-life balance of the operations teams. With fewer after-hours escalations and reduced alert fatigue, Sony’s IT staff experienced less burnout and stress.
Enhancing the value of ServiceNow with actionable tickets
Beyond reducing alert fatigue and delivering context for the NOC, Sony sought to improve its entire incident management process. A key aspect of this transformation was delivering those context-rich insights to Incident Management teams using the integration between BigPanda and ServiceNow.
Prior to adopting BigPanda, Sony’s Incident Management teams (who work in ServiceNow) were disconnected from the event and alert data within the NOC. With BigPanda AIOps, the NOC gained enhanced context and improved visibility into problems, allowing them to proactively detect and assign issues before they become critical incidents. However, there was still a communication gap between the NOC and incident management teams. Incident tickets lacked essential context, such as the reason for ticket creation, priority, impact, and assignment. As a result, Incident Management teams struggled to validate problems, understand their causes, and identify solutions, which led to delays, unnecessary escalations, and extended resolution times.
To bridge this gap, Sony integrated BigPanda into ServiceNow, enriching incident tickets with the context that BigPanda delivers to the NOC. BigPanda automatically synchronizes incident data from various sources, giving these teams access to critical contextual information to triage and investigate incidents faster, directly within ServiceNow. This streamlines communication and enables faster decision-making, making it easier for responders to understand, prioritize, and resolve issues without unnecessary escalations.
Shifting from reactive to proactive incident response
Before adopting AIOps, Sony PlayStation’s IT operations were largely reactive. The operations teams found themselves constantly reacting to issues after they had already impacted services, rather than proactively preventing them. This firefighting approach wasn’t sustainable and led to increased operational costs and fatigued staff.
AI-powered Root Cause Analysis from BigPanda helped Sony PlayStation change this dynamic and move from reactive incident management to proactive investigation. By quickly identifying the underlying causes of incidents, Sony’s Incident Management teams can proactively investigate and address issues before they escalate into major outages.
“Seven years ago, we were just reacting to things and putting out fires constantly,” said Narramore. “Now, we’re investigators; we can see a little fire over there, put it out early, and stay ahead of situations. It’s a huge win for our operations teams.”
Similar Incidents gives these teams access to historical incident analysis, allowing them to recognize patterns, anticipate problems, and implement preventive measures to prevent recurring issues.
This shift from reactive troubleshooting to proactive problem-solving allows Sony to significantly reduce escalations. Responders can resolve high-priority incidents faster, enhancing service reliability and operational efficiency. As a result, Sony’s teams can focus on strategic initiatives rather than being stuck in a cycle of firefighting.
Sony continuously improves incident management with AIOps
With AIOps from BigPanda, Sony drives data-driven process improvement and collaboration among NOC and Incident Management teams and transitions them from reactive to proactive incident management. Unified Analytics facilitates this data-driven approach, enabling Sony to enhance its processes, maintain ongoing efficiency, and ensure long-term operational stability.
To learn more, check out our full webinar about How Sony expanded AIOps insights to Incident Management teams.