Insight and analysis on the data center space from industry thought leaders.
Eliminating Downtime: Six Key Considerations for Your Hosting Architecture
By making the right hosting architecture choices up front, companies can ensure that their site will always be available with no interruptions to the end user experience, writes Jeffrey Papen of Peak Hosting. To eliminate downtime, here are six architecture choices that should be considered.
December 1, 2014
Jeffrey Papen is CEO and founder of Peak Hosting, a managed hosting provider that has helped design, build, maintain and support some of the world’s largest Internet properties.
There was a time when conventional wisdom held that network downtime was unavoidable, and while it could be minimized, it was next to impossible to eliminate. However, for companies that rely on their network being up 24/7 in order for their business to run, any downtime, no matter how minimal, is unacceptable.
The good news is that eliminating downtime completely is possible. It starts with the physical infrastructure, but the software element is critical as well. By making the right hosting architecture choices up front, companies can ensure that their site will always be available with no interruptions to the end user experience. Even during scheduled maintenance the system will be up and running and available to customers.
Eliminating downtime comes from properly architecting your system. We all know that uptime doesn't come cheap, but saying, “It’s OK for X to fail because I can always provision more in my cloud” doesn’t make it true from your customer’s perspective: When their systems fail, they don’t become understanding, they become angry.
Six Architecture Choices to Eliminate Downtime
Design for a true 2N architecture. When designing a hosting environment, you should literally install two of everything. This means dual power supplies, dual hard drives, dual PDUs (Power Distribution Units), dual UPSs (Uninterruptable Power Supplies), dual generators, dual top of rack switches, dual NICs. The list goes on; just make sure there are two of whatever gets put in.
Although the cloud hosting industry promises to replace a failed hard drive within an hour, they still only leverage a 1N architecture, meaning it will require you to spend hours, or even days, getting your code, configuration and data back to its pre-crash state. The impact of this is a double whammy, because not only was your service down for an extended period of time, you also will need to pull staff away from focusing on core company competencies that advance the business in order to redeploy your environment. It is inevitable that parts will eventually fail, but a 2N architecture ensures that component failure doesn’t have to equal service failure.
Leverage RAID. Carrying on with the theme of 2N and dual hard drives, you should implement some sort of RAID solution, with RAID 1 being the bare minimum. RAID 1 will ensure that you have an exact mirrored copy of your hard drive ready to go immediately should a hard drive fail. Depending on your system’s performance and complexity, you should also consider RAID 6 and 10 (Note: RAID 5 is a bad idea).
Buy the best hardware. The old adage is true: you get what you pay for. If all your company cares about is buying the cheapest hardware, you’re going to have very high failure rates. Purchasing hardware from the best vendors with the best reputations will ensure that you’re starting with quality products.
Burn in your infrastructure for 72 hours. No amount of 2N architecture can prevent a hardware failure from a motherboard, CPU, or RAM DIMM failing. However, if you burn in your system for at least three full days before you put it into production, you can generally discover any hardware issues before they impact your service.
Design for ongoing maintenance. It’s not just component failure that you need to be prepared for; you also have to upgrade code and software. By designing in a 2N architecture throughout your entire environment, every moving part on a server can be maintained without taking the service offline.
Be prepared for a catastrophe. While it’s incredibly rare, a catastrophic event could take an entire site out. The software you choose will need to support the ability to failover to another facility. Make sure you have a back-up facility in place.
When it comes to hosting solutions, remember this – it doesn’t have to fail in the first place. Design accordingly and keep your service always up and your customers always happy.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.
About the Author
You May Also Like