Insight and analysis on the data center space from industry thought leaders.
Beefing Up Data Center Resilience
Here are five ways data center operators can increase the resilience of their facility – and secure smooth operations without failure – by deploying the best-of-the-breed data center infrastructure management (DCIM) solutions.
February 1, 2016
Sev Onyshkevych is Chief Marketing Officer for FieldView Solutions.
A data center is very much like a car – it needs maintenance to run smoothly and not break down in the middle of your journey. The measurement of how vulnerable your system is to failure determines the resilience of your facility. You can increase that resilience to boost your uptime.
Data Center Resilience (or Resiliency) as described by TechTarget is defined as: “the ability of a server, network, storage system, or an entire data center, to recover quickly and continue operating even when there has been an equipment failure, power outage or other disruption.”
Here are five ways data center operators can increase the resilience of their facility – and secure smooth operations without failure – by deploying the best-of-the-breed data center infrastructure management (DCIM) solutions.
Realize That Your Resilience Changes Constantly
Imagine that your car is running on four donuts instead of tires. The first step is to acknowledge you’re riding on donuts – and know that while you’re still moving, you’re just not as safe as you might be. Knowing that you’ve got a single point of failure, and are operating in a weaker (less resilient) environment should lead you to take a corrective action by locating a garage quickly and replacing the donuts with new tires or better yet, renting another car with real tires so you can drive without any failures.
Your system can be more or less resilient at any given moment, depending on such variants as: the reliability of your power sources, load, time of day and the occurrence of any planned maintenance or unplanned outage. Constant monitoring of your resilience will allow you to take proactive measures to improve it without the risk of failure. You have the option to fix things, or shift load around to avoid disasters.
Have a 'Dashboard'
What if you had no dashboard in your car? How would you know where you were going, how fast you were driving, when you might run out of gas, how many miles you had on the car or if anything was wrong with any of your systems? Would you feel safe with readings that had been taken last week?
Having a central place to view all the pertinent information about your data center infrastructure is as critical as your car’s dashboard. Maintaining a data center with a clipboard and a spreadsheet in hand is a thing of the past – not to mention that it is too cumbersome, time consuming and by the time you gather the critical information it’s obsolete. A real-time dashboard showing all the critical information in one single pane of glass allows you to proactively prevent failures and plan effectively and intelligently for the future.
Know Your Capacity
Just as you would use your dashboard to find out how much gas you’ve got in your tank (i.e. -- before getting on the highway for that long-distance trip), your data center management dashboard should offer real-time intelligence about how much space, energy, cooling and network capacity you’ve got left. More importantly, it should show you how to use all this capacity to its fullest. This could include information on whether you can delay expansion plans or all together eliminate the need for expensive facility construction.
Not having all this information at your fingertips would be the same as driving without a gas gauge, never mind your temperature gauge, your oil level, etc. Forewarned is forearmed.
Run Failure Simulations
Before you buy a car, you read about the crash tests and other safety tests conducted by the manufacturers, Government and the likes of Consumer Reports, to know what your risk is in case of an accident, and to ensure the brakes, the suspension and everything else in the car works perfectly.
Data centers are no different when it comes to testing the infrastructure. Running “What If?” analyses is the equivalent of crash testing a car – it helps you be aware of failure points and the necessary measures you need to take to avert disaster, as well as what the impact might be when a disaster strikes. The “What If?” analysis should help you answer such questions as:
What if something fails while you are doing maintenance on your equipment?
What if something fails after something else has failed already, and you’re operating in a less-resilient environment?
If your disaster scenario happens, where will the load go? What else may fail as a result? Will that failure be contained, or will it become a “cascading failure?” (your multi-car collision scenario)
So in effect, you’re testing how resilient your system is today, and how resilient it might be under varying circumstances.
The ability to test your system in simulation allows you to discover weak spots and make changes to strengthen your infrastructure. It allows you to be proactive about disaster avoidance, and know the appropriate corrective responses to avoid disasters – which in the data center world means costly downtime.
Alarms and Alerts
You know those “idiot lights” in your car that let you know when something is wrong -hopefully before a total system breakdown? Or a system like On Star alerts the right people when something goes wrong– people who can help and make a difference.
Well, deploying a system that provides similar alarms and alerts in a data center can ensure smooth operations and decrease downtime. It can alert you when something has the potential to go wrong, leaving you enough time to correct it and avoid disaster. This could include an alarm that lets you know your temperature is too high, or your power has switched to a back-up system, or alerts you if you’re nearing capacity.
If you think of your data center as being a lot like your car, then you know that you have the power to increase its resilience – and ensure its ability to keep running, even when something goes wrong. It’s simple. Maintain it and pay attention to the details, and it will run smoothly for you. Ignore it or let up on your vigilance, and you’re headed for a breakdown. Luckily tools like DCIM help data center operators guarantee uptime with real-time information that helps managers make critical business decisions and avoid disasters.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.
About the Author
You May Also Like