ITOM

IT Operations Management.

How a Culture of Sharing Transforms IT Incident Management

By |2018-04-17T18:32:00+00:00January 22nd, 2015|Blog|

Earlier this month at BigPanda we released our new Sharing feature, which allows NOC teams to quickly share active and critical incidents with the right teams and subject-matter experts. BigPanda already helps NOC teams today by giving them instant visibility into incoming related alerts so that they don’t have to sift through dozens of emails and web pages with every outage or disruption. They can also attach playbooks and timeseries graphs directly to BigPanda, which means no more navigating around, combing through bookmarks, trying to find the right wiki page for that memory issue, or the right Graphite link for that misbehaving database host.

BigPanda Creates the 1000th Jenkins plugin

By |2018-04-17T18:52:17+00:00November 5th, 2014|Blog|

For those of you who are not familiar with Jenkins, it's a dead simple open sourced Continuous Integration solution, which takes absolutely no time to set up. Jenkins has a vibrant ecosystem and community, and until recently, Jenkins only had 999 plugins available...

Automating Incident Management

By |2018-04-17T18:52:20+00:00October 28th, 2014|Blog|

Data center growth over the last 15 years has created significant growing pains in terms of data center management.  Tasks that once could be done manually by IT teams have hit the limits of scalability, cost, and efficiency.  The key to enabling IT to meet these challenges involves one key theme: automation.

ansible-exec: ansible-playbook wrapper for executing playbooks

By |2018-04-17T18:53:38+00:00August 26th, 2014|Blog|

Ansible is a great automation tool. We use it for server provisioning, application deployments and running maintenance scripts. One problem it does have however, is how (in)convenient it is to run playbooks as opposed to regular shell scripts. Write and run enough Ansible playbooks, and eventually you’ll get tired of the repetitive typing your fingers have to do.

Is Change Visibility Your New Blind Spot?

By |2018-04-17T18:37:38+00:00October 2nd, 2013|Blog|

It’s well known in IT operations that things don't break on their own.  Close to 80% of production outages occur because of changes made by developers or someone in IT.  However, this fact often eludes us when it comes to actually resolving production issues.

Naught: Zero Downtime for Node.js Applications

By |2018-04-17T18:37:34+00:00March 22nd, 2014|Blog|

Service downtime is a harmful event to most technology businesses, especially to those who require their services to be constantly available. Downtime has many causes, such as hardware failures and network issues. In today’s web-scale world, application deployment is one of the main reasons for such downtime. This is particularly common with organizations performing Continuous Delivery, in which developers deploy their code at an unprecedented speed. Since there is always a good chance that the new code contains errors, the frequency of application changes holds a high risk of service malfunction.