To maintain operational visibility in modern IT environments, companies are abandoning monolithic monitoring solutions from legacy vendors in favor of a modern set of “best of breed” monitoring tools. Today’s average IT monitoring stack consists of about 6-8 tools, including at least one from each of the following categories: systems monitoring, end user monitoring, application performance monitoring (APM), error detection, log analytics, chat, and ticketing. When service disruptions occur, operations engineers face a flood of alerts across different layers of the IT stack, with no fast way to figure out what’s really going on. Customers are left stranded, while IT professionals struggle to detect, triage and remediate urgent issues. Downtime abounds which negatively impacts revenue, performance, and brand loyalty.
Given that the very development of Icinga arose from the need for additional functionalities in open source monitoring, it’s little surprise that the tool has become indispensable for so many IT professionals. Its configurability and flexibility allow for a sophisticated approach to monitoring, which is both scalable and extensible to large, complex environments.
Salesforce likely lost quite a bit of money last Tuesday. IDC estimates that the typical infrastructure failure costs organizations $100,000 per hour, while a critical application failure costs as much as $500,000 to $1 million per hour. Salesforce was down for over 20 hours and still continued to have service disruptions. This in turn translated to heavy financial loss for Salesforce customers worldwide, as they struggled to manage their lifeblood processes that depend on the SaaS giant. The Salesforce reputation struggled and the CEO, Marc Benioff, meted out public apologies on social channels.
In October 2015, BigPanda launched #FlyAboveTheNoise, a program geared at assisting IT & DevOps professionals to rise above their IT alert noise. The premise was simple: Trade in your noisy IT alerts by taking a trial of BigPanda, and you get a drone.
“Fly Above the Noise was a huge success. 3,000+ respondents and millions of correlated alerts later, it’s clear that the world not only needs to be free from alert noise, but it needs drones to automate the event management process,” said BigPanda CEO Assaf Resnick.
We’re happy to announce that BigPanda now integrates with Catchpoint! Catchpoint is a popular cloud-based monitoring tool used by ops teams to measure availability and performance for synthetic transactions and real user web sessions. By integrating with BigPanda, Catchpoint customers can now aggregate all of their monitoring alerts in one place, intelligently clustering them to reduce alert noise and spot critical issues faster.
Software everywhere or software nowhere?
“Software is increasingly everywhere”, Kirkpatrick explained, “but it’s so seamless that you don’t even see it. You just enjoy new efficiencies and ways of getting things done”.
For many IT and Ops teams, Nagios is both a blessing and a curse. On the one hand, Nagios gives you near real-time visibility into the inner workings of your IT infrastructure. But on the other hand, Nagios can generate so many alerts that it’s impossible for any single person (or even any team) to keep up.
If you’re struggling with a flood of Nagios alerts, this two-part blog series is for you. We’ll take a close look at the complicated relationship that IT and Ops professionals have with the monitoring tool, explain why Nagios is so noisy, and discuss the simple way that you take charge of your alerts and maximize the way Nagios works for you.
In between sessions at last weekend’s DevOpsDays Silicon Valley, scores of attendees filled the halls, amplifying the Computer History Museum with chatter and turning it into something more akin to a high school cafeteria than a conference venue. As crowds formed to share their stories and insights with one another, a common theme quickly emerged: It just isn’t as easy as we thought it would be.
This is part two of a two-part post about using event correlation to thwart DDoS attacks. Channeling Mark Twain: it would have been shorter if I had more time. In the last post I described why DDoS attacks for SaaS providers are no different than performance and availability issues experienced in other domains like healthcare, finance, or retail. In this post I’ll share a customer story about a security breach that never happened… thanks to a savvy DevOps team and data science.
If you work in tech, you’ve probably heard of the Pareto principle, or, as it’s more commonly called, the 80/20 rule. According to the 80/20 rule, for many events, 80 percent of the results are generated by 20 percent of the inputs.
A little background: back in the late 1800s the Italian economist Vilfredo Pareto noticed that approximately 80 percent of the land in Italy was owned by 20 percent of the population. Not long after, Pareto also observed that 20 percent of the peapods in his garden generated 80 of the crop’s yield – and thus the 80/20 principle was born.
Every company’s a target, every customer’s at risk. But the now-cliched threat of data breaches from Distributed Denial of Service (DDoS) attacks obscures a bigger threat: outages that impact not just data integrity but also profitability, brand equity, and customer retention.
The volume of attacks is growing and so is the impact of down time. According to Akamai’s most recent State of the Internet report, DDoS attacks are a bigger threat than ever before. “The number of DDoS attacks continued to increase substantially in Q2 2015, more than doubling the number observed in Q2 2014.”