IT Service Management.
The last ten years have brought enormous changes to production environments, driven by a best-of-breed approach to production infrastructure enabled by open source and cloud. This has been a boon for developers in terms of flexibility and productivity, but it’s also placed a new set of challenges and expectations on Ops.
Many alerts place an unnecessary burden on Ops teams instead of helping them to solve issues. The main problem is that most alerts are not actionable enough:
- They point to issues that don’t require a response
- They lack critical information, forcing you to spend time searching for more insights in order to gauge their urgency
In many ways, incident management for devops is similar to typical issue tracking processes: it facilitates coordination and collaboration of daily tasks. For this reason, tools such as Jira, Zendesk, and even email are often used as solutions for incident management. But incident management faces one unique challenge that makes it different from other issue tracking processes. In addition to human-operated workflows, incident management also relies heavily on machine-driven workflows. Unfortunately, traditional issue trackers and ticketing systems cannot accommodate for this with their current product mechanics.
Few things damage productivity as much as waiting. Waiting forces us to context switch, disrupts our creative momentum and eliminates our ability to experiment. Whether we are deploying a new service or troubleshooting a problem, waiting puts a heavy tax on efficient work.
Anomaly detection for monitoring has been a trending topic in recent years. And while the math behind it is fascinating, too much of the discussion has revolved around histograms, moving averages and standard deviations. More discussion needs to happen around its practical applications, and for that reason, this practical guide to anomaly detection will attempt to provide an actionable overview of current off-the-shelf anomaly detection tools.
Monitoring applications in production has never been easier. With only a few code lines, you'll have New Relic installed and monitoring your application from nearly every angle. When something goes wrong, New Relic will start sending alerts. But then what? (hint – New Relic and BigPanda together is the answer).
BigPanda is an incident management platform for modern IT, NOC and DevOps teams. With BigPanda, you will prioritize and route your incidents better and faster, while vastly improving your team’s collaboration and processes. This is part 3 in a series on Getting Started with BigPanda. This product introduction will help you to get up and running quickly so you can get back to hunting fail-whales and 404 errors.
BigPanda is an incident management platform for modern Ops environments. With BigPanda, you will prioritize and assign your incidents better and faster, while vastly improving your team’s collaboration and processes. This is part 4 in a series on Getting Started with BigPanda. This guide will help you get up and running quickly and maximize the value you get out of the platform.