Posts aimed at C-level, upper D-level or BDM audiences.
Enterprise application and computing environments have changed radically over the past fifteen years. Anyone who has spent even a day in an IT role can tell you that.What gets less attention, however, is how those changes undermine the ability of operations teams to do their jobs. The problem is that as computing and application environments have changed dramatically, workflows and org charts have not.
Data center growth over the last 15 years has created significant growing pains in terms of data center management. Tasks that once could be done manually by IT teams have hit the limits of scalability, cost, and efficiency. The key to enabling IT to meet these challenges involves one key theme: automation.
It’s well known in IT operations that things don't break on their own. Close to 80% of production outages occur because of changes made by developers or someone in IT. However, this fact often eludes us when it comes to actually resolving production issues.
The last ten years have brought enormous changes to production environments, driven by a best-of-breed approach to production infrastructure enabled by open source and cloud. This has been a boon for developers in terms of flexibility and productivity, but it’s also placed a new set of challenges and expectations on Ops.
What is MTTR? Don’t answer with what it stands for or how you use it. The question is more philosophical than literal. For too long we’ve measured operational performance based on the number of minutes it takes to resolve an incident. The almighty trend line slopes down then we gulp milk from the jug of IT inflated ego like NASCAR drivers drunk on Nagios exhaust fumes.
Like the Zen riddle about one hand clapping it’s important to first ask:
- What’s an incident?
- What does it mean to resolve one? …and (the ever-blasphemous)
- Is it unequivocally better to resolve them quickly?
We’re proud to be unveiling a new concept we pioneered in the den that finally moves beyond dashboards as eye candy to a new place where IT analytics can be used to make better ops decisions. It’s called Service Health Analytics and it exposes all data from all monitoring sources in the form of configurable dashboards that can be customized, saved, and shared.
We’re adjusting to the new reality that DevOps is a compelling layover on the journey between legacy ops and self-healing infrastructure. Eliminating the cultural gap between developers and operations, the now-cliched state of IT nirvana called “DevOps”, is by no means the end goal. The goal is reliable system performance and availability without human intervention - the panacea called “NoOps”.