|

RESOLVE ’22: Guideposts and indicators in the Ops domain

5 min read
Time Indicator

It is difficult to define a single, solid maturity model for IT Operations. As moderator Jason Walker, BigPanda’s COO, said in our RESOLVE ’22 event Bit by bit, maturity models in “almost every other domain of IT” have not turned into a workable set of guideposts and indicators in the Ops domain.

We welcomed Insurity’s Lead Cloud Operations Performance & Monitoring Admin, Ronnel Vergara, to take the stage and talk over this high-level topic at our event. And as the conversation rolled on, Ronnel and Jason laid out several indicators by which IT Ops leaders can measure maturity in DevOps, AIOps and CloudOps.

NOCs are an early indicator of maturity

First up, Ronnel and Jason covered a division that is at the center of operations but doesn’t always receive a ton of attention: the network operations center (NOC), sometimes called the global operations center or simply the command center.

“It’s a centralized location where you’re going to have all your IT teams looking at the performance and health of systems, specifically,” Ronnel said. “One of the problems companies have is the amount of noise. You need a place to centralize all these monitoring tools, all the alerts you receive via emails: Teams, VictorOps or whatever tool you use.”

Ronnel explained that the centralization aspect is important because many less-mature companies “run lean” and don’t necessarily focus on it like they should.

“Companies have their cloud operations team, system admins, DevOps… everyone’s just looking out to see what they’re doing—but they all have their day job, right?” he said. “Their goal is not to actually monitor, but to do other things.”

Automation is key to achieving that level of centralization. As both our panelist and moderator noted, simple monitoring isn’t enough. Companies also need tools that help them make sense of what they’re looking at and—perhaps more importantly—drill down on the specific problems they wish to solve.

“But in order to do all that automation, you need to be aware of more than the technology itself,” Ronnell said. “You need to be aware of the process you have to follow in order to automate the stuff.” You need to understand the process of how you escalate and how each issue is to be treated…“those are what will give you a good idea of how you want to drive that NOC.”

CMDBS, topology and contextual awareness

Configuration management databases (CMDBs) have come back into vogue with the emergence of AIOps. Because they give organizations a deeper view into contextual data, Ronnell said they can be quite important to companies trying to achieve a level of automated, centralized control.

Jason noted that a lot of the instant management processes and tools companies use “are not initially automated or tied together.” Instead, “they’re manually tied together by people in the NOC who look at alerts on the one hand, and then maybe a chat channel, paging system or ticketing system on the other.”

CMDBs provide utility in this instance by giving greater context, Ronnell said. But companies must be careful to continually update those CMDBs in a way that keeps the data they provide valid.

“They go out of date really quickly,” he said. “So you need a process that allows you to keep them up to date” without a lot of manual intervention. “This is something that has to be ongoing on a daily basis. If you keep track of everything, you can build a picture, a topology, and go towards that maturity of topology we all strive for.”

Managing change velocity in modernizing companies

A little later in the conversation, Jason took viewers through a common sight in the NOC.

“In a slightly advanced company, you see a dashboard, and there are all these active changes that just took place within 24 hours, let’s say,” he said. “And the velocity of those changes over time has vastly increased. Change velocity now at big organizations is massive—almost a separate event stream.”

Jason then invited viewers to “tie that into your CMDB topology, and every little change I make… all those changes are happening at once, and they should update your topology.”

But because CMBDs are traditionally regarded as “very static platforms,” Jason said, that level of change is “actually a very rare thing to see” in businesses. That is true even in companies of some size and sophistication, he said.

“If you’re able to keep up with all the changes you have in the CMDB, it also passes by that stoplight that is the NOC,” Ronnel replied. Companies must “tie it up between the before and the after” so they are aware of how the system looks before and after the change.

Naturally, not overloading the NOC with info during a planned change that may generate numerous events is also an area of concern for forward-thinking IT Operations teams. Here, Jason said “the broadness of the types of changes that get made, and the different types of maintenance by different teams” require companies to take a broad approach that combines process and technology.

To do suppression right, Ronnel said companies need what amounts to a committee. When a significant change is made, having a dedicated group that “signs off on different approvals from the different teams” can speed adoption of positive change without significant negative impact to ongoing operations.

“We need to be transparent,” Ronnel said. And a combination of transparency and alert standardization—as well as associated metrics like severity ranking systems—can have a major impact.

Bit by bit in full

A one-on-one convo with verified industry experts, Bit by bit contains more information than any one blog post can contain. For the full panel and more talks with people leading the technology world, we invite readers to view the following link.