This is part two of a two-part post about using event correlation to thwart DDoS attacks. Channeling Mark Twain: it would have been shorter if I had more time. In the last post I described why DDoS attacks for SaaS providers are no different than performance and availability issues experienced in other domains like healthcare, finance, or retail. In this post I’ll share a customer story about a security breach that never happened… thanks to a savvy DevOps team and data science.
If you work in tech, you’ve probably heard of the Pareto principle, or, as it’s more commonly called, the 80/20 rule. According to the 80/20 rule, for many events, 80 percent of the results are generated by 20 percent of the inputs.
A little background: back in the late 1800s the Italian economist Vilfredo Pareto noticed that approximately 80 percent of the land in Italy was owned by 20 percent of the population. Not long after, Pareto also observed that 20 percent of the peapods in his garden generated 80 of the crop’s yield – and thus the 80/20 principle was born.
At BigPanda, we're committed to giving you the tools and information you need to be successful. In keeping with this goal, we’re excited to announce BigPanda Docs, our revamped help documentation that features more content, better navigation, and more ways for you to give us feedback.
We’re adjusting to the new reality that DevOps is a compelling layover on the journey between legacy ops and self-healing infrastructure. Eliminating the cultural gap between developers and operations, the now-cliched state of IT nirvana called “DevOps”, is by no means the end goal. The goal is reliable system performance and availability without human intervention - the panacea called “NoOps”.
We’re proud to be unveiling a new concept we pioneered in the den that finally moves beyond dashboards as eye candy to a new place where IT analytics can be used to make better ops decisions. It’s called Service Health Analytics and it exposes all data from all monitoring sources in the form of configurable dashboards that can be customized, saved, and shared.
What is MTTR? Don’t answer with what it stands for or how you use it. The question is more philosophical than literal. For too long we’ve measured operational performance based on the number of minutes it takes to resolve an incident. The almighty trend line slopes down then we gulp milk from the jug of IT inflated ego like NASCAR drivers drunk on Nagios exhaust fumes.
Like the Zen riddle about one hand clapping it’s important to first ask:
ITSM is evolving thanks to new capabilities that make it easy to visualize service health based on real-time CMDB updates fed via automated change management driven by smarter monitoring infrastructure. We’re nearing a time where machines will manage machines. At BigPanda, we’re doing our part to get there quickly.
In 1792, the New York Stock Exchange opened its doors on Wall Street with five stocks available for trade. Today, more than 2,800 companies list on the NYSE with a combined market value of more than $15 trillion. In 223 years, everything except the name has changed.
One of the first things we do right after installing Nagios, is set up email notifications. Without that, how would you know when something went wrong?
In many ways, incident management for devops is similar to typical issue tracking processes: it facilitates coordination and collaboration of daily tasks. For this reason, tools such as Jira, Zendesk, and even email are often used as solutions for incident management. But incident management faces one unique challenge that makes it different from other issue tracking processes. In addition to human-operated workflows, incident management also relies heavily on machine-driven workflows. Unfortunately, traditional issue trackers and ticketing systems cannot accommodate for this with their current product mechanics.
Many alerts place an unnecessary burden on Ops teams instead of helping them to solve issues. The main problem is that most alerts are not actionable enough:
Few things damage productivity as much as waiting. Waiting forces us to context switch, disrupts our creative momentum and eliminates our ability to experiment. Whether we are deploying a new service or troubleshooting a problem, waiting puts a heavy tax on efficient work.